https://dagster.io/ logo
#announcements
Title
# announcements
b

benyuel

07/09/2020, 8:29 PM
👋 Is it possible to have one solid return a value that is to be used as a resource config value for another solid downstream?
a

alex

07/09/2020, 8:31 PM
solid outputs can only be consumed as inputs, so if i am interpreting your question correctly no is the answer
what the problem you are trying to solve?
b

benyuel

07/09/2020, 8:34 PM
long story but I was going to see if I could have • solid a -> launch emr cluster and return cluster id • solid b -> run pyspark code using emr_step_launcher resource with cluster id from solid a • solid c -> terminate emr cluster
The
emr_step_launcher
resource assumes you have a cluster id already so solid a would orchestrate creating that cluster
a

alex

07/09/2020, 8:36 PM
hmm i see ya we’ve talked about this sort of thing
what you can do is write a new
@resource
that returns the
EmrPySparkStepLauncher
that does what you describe
you can have context manager style
@resource
functions that can
yield
their resource and then will execute the code after the yield during pipeline shutdown, allowing you to teardown your cluster
b

benyuel

07/09/2020, 8:39 PM
I think I’ve tried that but only one solid then executes on each cluster if I have more than one pyspark solid. i.e. each pyspark solid then gets a unique emr cluster. If I’m understanding you right
a

alex

07/09/2020, 8:39 PM
ah shoot ya this is only really valid with the
in_process
executor
b

benyuel

07/09/2020, 8:40 PM
I suppose I could maybe create the cluster prior to the pipeline definition and set the cluster id as a global var that the pipeline definition would then use?
a

alex

07/09/2020, 8:41 PM
ya doing it as solids is a bit risky since failures will cause down stream solids to skip
b

benyuel

07/09/2020, 8:42 PM
ok so you’d recommend any pipeline with more than one pyspark solid to have the cluster created outside of the pipeline then?
a

alex

07/09/2020, 8:42 PM
probably for now, this is a problem we know we want to solve but i think we dont have the tools needed in the framework yet
which executor are you using?
b

benyuel

07/09/2020, 8:45 PM
which executor are you using?
not sure I follow, still new to dagster
a

alex

07/09/2020, 8:46 PM
the
Executor
decides how to federate out each step, it controlled by the
execution
key in your run config yaml / dict
b

benyuel

07/09/2020, 8:49 PM
Where would I find that, I’m not explicitely setting it anywhere? in the logs?
a

alex

07/09/2020, 8:50 PM
ya you may see some
EngineEvent
s report the name
if you are not setting it defaults to in process, which would work with the solution i described above if you are doing multiprocess there may be a way to do the context manager style @resource in conjunction with using the file system / env var for coordination to have all the solids use the same cluster id
b

benyuel

07/09/2020, 8:53 PM
ok, yeah I think I am using in process
I will take another look though when I retry to confirm
thanks for the help
1 follow up question: based on the above is the behavior for the
in_process
executor that a resource is only instantiated once even if multiple solids require the same resource?
a

alex

07/10/2020, 2:57 PM
Yea for in process that should be true