https://dagster.io/ logo
f

Fran Sanchez

05/29/2020, 3:12 PM
I would like to better understand the the execution model of dagster, RunLauncher, ExecutionPlan, Executor, etc. Where should I start?
a

alex

05/29/2020, 3:16 PM
the link in the docs is busted but heres a picture from the repo

https://github.com/dagster-io/dagster/blob/master/docs/sections/api/apidocs/internal/execution_flow.png

that was a snapshot of how things were layered as of March 2020
I would use this as a map then cross reference the API docs https://docs.dagster.io/docs/apidocs/ or just the source code itself
f

Fran Sanchez

05/29/2020, 3:18 PM
Thanks, I will try to have a look also at the libraries
Is there any library currently implementing the RunStep?
a

alex

05/29/2020, 3:18 PM
RunStep?
a

alex

05/29/2020, 3:19 PM
pyspark
uses the
StepLauncher
if thats what you mean
f

Fran Sanchez

05/29/2020, 3:20 PM
Maybe it's just my missunderstanding of it, but it seems that you can create runners at different levels, like step or plan?
I'm not familiar enough with it yet... need to read a bit more code to get familiar
a

alex

05/29/2020, 3:22 PM
ya I think part of what you are referring to is the recently added
StepLauncher
https://dagster.phacility.com/D2688
f

Fran Sanchez

05/29/2020, 3:22 PM
What should I implement if I want to create a custom runner? For example I want to write something that will take a whole compiled pipeline (is this what is called an ExecutionPlan?) and then execute it using its own internal mechanism.
a

alex

05/29/2020, 3:23 PM
likely an
Engine
so the examples to reference would be the celery and dask engines
👍 1
f

Fran Sanchez

05/29/2020, 3:23 PM
How is that different from the k8s one?
a

alex

05/29/2020, 3:25 PM
they are all similar in nature - they are effectively handling how to execute each individual step, usually the key aspect being federating out the work somewhere and managing that
f

Fran Sanchez

05/29/2020, 3:26 PM
So, is it fair to assume that every Launcher will execute something out-process?
Regardless if it's a StepLauncher or RunLauncher?
a

alex

05/29/2020, 3:26 PM
so our
k8s
deployment example uses a
celery_k8s_job_executor
which is submits tasks to a celery queue for each step that will in turn submit k8s jobs
f

Fran Sanchez

05/29/2020, 3:28 PM
Ok, I think I understand better... but at the same time the k8s deployment launches a dagster job to orchestrate the pipeline execution or is it done from the original process?
a

alex

05/29/2020, 3:29 PM
so the
RunLauncher
determines where the
Executor
( I called it Engine above by mistake) or run master is operating then the
Executor
decides how to handle each step and a
StepLauncher
is a way to special case steps from the default
Executor
behavior - the current version does this to ship
pyspark
solids to a spark cluster
👍 1
the
k8s
deployment uses the
K8sRunLauncher
to launch the run master as its own k8s job
f

Fran Sanchez

05/29/2020, 3:31 PM
I think I get, that's a very helpful starter! 👍
Thank you Alex
Once I fully understand it I might try to write it down and open an MR to the docs
👍 2
a

alex

05/29/2020, 3:33 PM
no problem - good luck!
t

Travis Cline

06/03/2020, 7:29 PM
it seems like this diagram link is 404ing now -- did that image move?
a

alex

06/03/2020, 7:33 PM

https://github.com/dagster-io/dagster/blob/master/docs/next/public/assets/images/apidocs/internal/execution_flow.png

ya it got moved to fix the broken link in the docs site
t

Travis Cline

06/04/2020, 2:08 AM
danke
2 Views