Somasundaram Sekar
06/23/2021, 7:46 PMModuleNotFoundError: No module named 'boto3'
Below is the the mode definition
emr_mode = ModeDefinition(
name="emr",
resource_defs={
"pyspark_step_launcher": emr_pyspark_step_launcher.configured(
{
"cluster_id": {"env": "EMR_CLUSTER_ID"},
"local_pipeline_package_path": os.path.dirname(os.path.realpath(__file__)),
"deploy_local_pipeline_package": True,
"region_name": "eu-central-1",
"staging_bucket": "dagster-scratch-xxxxxxx",
}
),
"pyspark": pyspark_resource,
"s3": s3_resource,
},
intermediate_storage_defs=[
s3_intermediate_storage.configured(
{"s3_bucket": "dagster-scratch-xxxxxxx", "s3_prefix": "simple-pyspark"}
)
],
)
and the full stack trace from the stdout of the emr
Traceback (most recent call last):
File "/mnt/tmp/spark-47f207e0-9fdc-4577-9701-e276c50f9c5a/emr_step_main.py", line 9, in <module>
import boto3
ModuleNotFoundError: No module named 'boto3'
jordan
06/23/2021, 7:54 PMSomasundaram Sekar
06/24/2021, 6:03 PMjordan
06/24/2021, 6:06 PMemr_step_main.py
script needs. So presumably dagster
(and perhaps also dagster-aws
), boto3
, etc. I’d recommend installing all of them when you bootstrap the cluster or even building the cluster from an AMI that already has your libraries installed.