Hello, I am trying to setup data pipeline on AWS ...
# ask-community
v
Hello, I am trying to setup data pipeline on AWS EMR cluster using dagster. I am using the with_pyspark_emr to do this. But when I launch dagit and run the job I am running into module import errors
Traceback (most recent call last):
File "/home/hadoop/.local/lib/python3.7/site-packages/dagster/_core/code_pointer.py", line 138, in load_python_module
return importlib.import_module(module_name)
File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'with_pyspark_emr'
`dagster._core.errors.DagsterImportError: Encountered ImportError:
No module named 'with_pyspark_emr'
while importing module with_pyspark_emr. Local modules were resolved using the working directory
/home/ec2-user/dagster/my-dagster-project
. If another working directory should be used, please explicitly specify the appropriate path using the
-d
or
--working-directory
for CLI based targets or the
working_directory
configuration option for workspace targets.` I have setup dagtser on a dev EC2 machine and trying to run the job on EMR. The module with_pyspark_emr in present in
/home/ec2-user/dagster/my-dagster-project
, I am not able to figure out what the issue is. Can someone please help?
dagster bot responded by community 1