Hi everyone, again a question with dagster-pyspark...
# ask-community
Hi everyone, again a question with dagster-pyspark - for which I tried to look for previous conversations but didn’t find any solution. I am trying to use emr_pyspark_step_launcher for my emr cluster and to pass pipeline path, I am doing
"local_pipeline_package_path": str(Path(__file__).parents[1])
so that code.zip can have parent of parent directory as I have some utility files which I would need to run with pyspark. However, my run fails with
does anyone have any idea or have faced such issue? Thanks!
dagster bot responded by community 1
Where does the fail happen? If you’re using the EMR launcher it could fail either while trying to launch the job from the dagster runner or in the EMR step/job itself.
it fails in EMR step
What’s the exact module not found error? When I got the EMR step launcher set up I had to install many dependencies
Copy code
sudo yum install -y python3-devel
sudo python3 -m pip install dagster dagster-aws boto3 dagster-spark
sudo python3 -m pip install dagster-pyspark --no-deps
Here’s a snippet from the bootstrap script that I use with EMR, not 100% if this is what is happening to you, but hope it helps 🙂
👍 1
I have installed all the dependencies. This error is for my utility method which I am importing from local path
do you think I should also add util files to EMR cluster?
Yep, I was just trying to find how I set mine up.
I don’t have the “best setup” for it ATM, but it essentially posts the entire repo into EMR for me
And the repo is like:
Copy code
Eventually I’ll need to change this to only be:
Copy code
Yeah, i want to post the entire repo. right now it is just putting the pipeline file It would be great to learn about how to achieve this
Do I need to change anything for this argument?
whenever you have some time to check the setup - please let me know. Thank you for your help, I really appreciate this!
I have mine set up to point to
I thought -
will do the same thing but I will try parent.parent also
do you also set -
I think those settings aren’t compatible with one another, I do have
set to
✔️ 1
I’m not sure what the exact outcome would be TBH, and it probably depends heavily on how you have your repo structured. One thing that’s pretty great about dagster is the ability to run it locally, you can run dagit via VSCODE debugger and get breakpoints with PDB to figure out what exact values are being passed and other great stuff
That’s helpful! I will try that. Thank you so much for your time and help 🙌thank you box
No worries! Hope you’re able to get it working shortly 🙂
❤️ 1
🤞 1