Hey everyone, We are experiencing an issue with t...
# ask-community
s
Hey everyone, We are experiencing an issue with the emr_pyspark_step_launcher in a local dagster deployment. When building a job using this resource in a standalone repository, the job executes successfully when executed through dagit. However, when we tried to include this new job leveraging the emr_pyspark_step_launcher in our normal larger code repository, we are getting an unusual error in the resource, saying we have these two settings defined when we have only defined one of them. Attached are screenshots of our resource settings as well as the error that is popping up
r
Could you use
deploy_local_pipeline_package
instead of
deploy_local_job_package
? Looks like this is a bug in our current implementation
s
Hi Rex thanks for the response. We are trying to connect to EMR PySpark cluster and pyspark step launcher only copies the folder where we have the job file. Instead we want to copy the entire parent folder with other folders under it like 'Hooks'. is there a way to copy all the folders under the parent folder instead of just the 'job' folder where the job file is located? basically we got to copy code.zip into pyspark EMR. is there a setting to copy the 'hooks' folder and other folders under the same parent as well?
r
The entire package is synced if you specified the package path correctly. Do you have an
__init__.py
file in your hooks directory?
s
Thanks.
yes we do have init files in all the folders under the parent folder
the standalone repo works fine and when we try to run this as one of the jobs with other jobs, it is giving these issues
Also we tried the
s3_pipeline_package_path
parameter and it is giving the following error while execution. what has s3_path got to do with local_job_path?