Hey everyone We are experiencing an issue with the emr pyspa dagster #ask-community

Hey everyone, We are experiencing an issue with t...

Sai Gopinath

02/17/2023, 2:41 PM

Hey everyone, We are experiencing an issue with the emr_pyspark_step_launcher in a local dagster deployment. When building a job using this resource in a standalone repository, the job executes successfully when executed through dagit. However, when we tried to include this new job leveraging the emr_pyspark_step_launcher in our normal larger code repository, we are getting an unusual error in the resource, saying we have these two settings defined when we have only defined one of them. Attached are screenshots of our resource settings as well as the error that is popping up

rex

02/17/2023, 4:13 PM

Could you use

deploy_local_pipeline_package

instead of

deploy_local_job_package

? Looks like this is a bug in our current implementation

Sai Gopinath

02/17/2023, 6:46 PM

Hi Rex thanks for the response. We are trying to connect to EMR PySpark cluster and pyspark step launcher only copies the folder where we have the job file. Instead we want to copy the entire parent folder with other folders under it like 'Hooks'. is there a way to copy all the folders under the parent folder instead of just the 'job' folder where the job file is located? basically we got to copy code.zip into pyspark EMR. is there a setting to copy the 'hooks' folder and other folders under the same parent as well?

rex

02/17/2023, 7:18 PM

The entire package is synced if you specified the package path correctly. Do you have an

__init__.py

file in your hooks directory?

Sai Gopinath

02/17/2023, 8:16 PM

Thanks.

Sai Gopinath

02/17/2023, 8:17 PM

yes we do have init files in all the folders under the parent folder

Sai Gopinath

02/17/2023, 8:31 PM

the standalone repo works fine and when we try to run this as one of the jobs with other jobs, it is giving these issues

Sai Gopinath

02/20/2023, 12:08 AM

Also we tried the

s3_pipeline_package_path

parameter and it is giving the following error while execution. what has s3_path got to do with local_job_path?

Open in Slack

Previous Next