https://dagster.io/ logo
#ask-community
Title
# ask-community
m

Martin Remy

05/31/2022, 1:37 PM
Hello Dagster users and team, We are currently trying to onboard on Dagster with our new Pyspark pipelines running on AWS EMR. I am currently doing a PoC of your solution and trying to demonstrate that we can execute an
op
on our EMR cluster. I use the
emr_pyspark_step_launcher
demonstrated in your documentation here. I strictly copied this code and simply replaced values to match my current EMR cluster. Then I created a test case using Pytest to try and launch it. My current version is :
Copy code
import pytest

from dagster import execute_pipeline, reconstructable
from pipeline.data.jobs.pyspark_dagster import make_and_filter_data_emr

def test_emr():
    result = execute_pipeline(reconstructable(make_and_filter_data_emr))
    assert result.success
I've tried both with
execute_pipeline
and
execute_in_process
, I always have different errors raised by Dagster. Is that documentation up to date ? Is it suppose to work out of the gate ? Do you have other documentation I could look into or a solution ?
d

daniel

05/31/2022, 2:59 PM
Hi Martin - is it possible to post the text and stack trace of the error that you ran into?
m

Martin Remy

05/31/2022, 3:01 PM
Hi Daniel ! Thanks for taking the time ! With the version of the test class posted above, the error is the following :
o

owen

06/01/2022, 5:43 PM
hi @Martin Remy -- just picking up on this thread, I believe you should be able to fix this with something like:
Copy code
def get_test_job():
    return make_and_filter_data_emr

def test_emr():
    result = execute_pipeline(reconstructable(get_test_job))
    assert result.success