Hello I am trying to setup data pipeline on AWS EMR cluster dagster #ask-community

Hello, I am trying to setup data pipeline on AWS ...

Vrushank Kenkre

12/07/2022, 10:53 AM

Hello, I am trying to setup data pipeline on AWS EMR cluster using dagster. I am using the with_pyspark_emr to do this. But when I launch dagit and run the job I am running into module import errors

Traceback (most recent call last):

File "/home/hadoop/.local/lib/python3.7/site-packages/dagster/_core/code_pointer.py", line 138, in load_python_module

return importlib.import_module(module_name)

File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module

return _bootstrap._gcd_import(name[level:], package, level)

File "<frozen importlib._bootstrap>", line 1006, in _gcd_import

File "<frozen importlib._bootstrap>", line 983, in _find_and_load

File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'with_pyspark_emr'

`dagster._core.errors.DagsterImportError: Encountered ImportError:

No module named 'with_pyspark_emr'

while importing module with_pyspark_emr. Local modules were resolved using the working directory

/home/ec2-user/dagster/my-dagster-project

. If another working directory should be used, please explicitly specify the appropriate path using the

-d

--working-directory

for CLI based targets or the

working_directory

configuration option for workspace targets.` I have setup dagtser on a dev EC2 machine and trying to run the job on EMR. The module with_pyspark_emr in present in

/home/ec2-user/dagster/my-dagster-project

, I am not able to figure out what the issue is. Can someone please help?

dagster bot responded by community 1

Vrushank Kenkre

12/07/2022, 11:06 AM

This fixed it: https://dagster.slack.com/archives/C01U954MEER/p1666968770375629?thread_ts=1666967497.771219&cid=C01U954MEER

40 Views

Open in Slack

Previous Next