https://dagster.io/ logo
Title
s

Somasundaram Sekar

06/23/2021, 7:46 PM
EMR Step launcher throwing No module named 'boto3' Hi, I'm trying to run a solid with pyspark code (took inspiration from here). The step however fails with
ModuleNotFoundError: No module named 'boto3'
Below is the the mode definition
emr_mode = ModeDefinition(
    name="emr",
    resource_defs={
        "pyspark_step_launcher": emr_pyspark_step_launcher.configured(
            {
                "cluster_id": {"env": "EMR_CLUSTER_ID"},
                "local_pipeline_package_path": os.path.dirname(os.path.realpath(__file__)),
                "deploy_local_pipeline_package": True,
                "region_name": "eu-central-1",
                "staging_bucket": "dagster-scratch-xxxxxxx",
            }
        ),
        "pyspark": pyspark_resource,
        "s3": s3_resource,
    },
    intermediate_storage_defs=[
        s3_intermediate_storage.configured(
            {"s3_bucket": "dagster-scratch-xxxxxxx", "s3_prefix": "simple-pyspark"}
        )
    ],
)
and the full stack trace from the stdout of the emr
Traceback (most recent call last):
  File "/mnt/tmp/spark-47f207e0-9fdc-4577-9701-e276c50f9c5a/emr_step_main.py", line 9, in <module>
    import boto3
ModuleNotFoundError: No module named 'boto3'
j

jordan

06/23/2021, 7:54 PM
This error is on your EMR node, correct? Have you installed boto3 on your EMR cluster? https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html
s

Somasundaram Sekar

06/24/2021, 6:03 PM
@jordan indeed it was the problem. My apologies
But it extended to needing dagster libraries in EMR, is that a requirement.
j

jordan

06/24/2021, 6:06 PM
Yeah - you’ll need whatever your
emr_step_main.py
script needs. So presumably
dagster
(and perhaps also
dagster-aws
),
boto3
, etc. I’d recommend installing all of them when you bootstrap the cluster or even building the cluster from an AMI that already has your libraries installed.
👍 1