Silly question What s the difference between a Dagster <http dagster #ask-community

Silly question. What's the difference between a Da...

Michel Rouly

06/02/2023, 3:53 PM

Silly question. What's the difference between a Dagster StepLauncher and Executor?

StepLauncher is responsible for executing steps, either in-process or in an external process.

For the given context and execution plan, orchestrate a series of sub plan executions in a way that satisfies the whole plan being executed.

Michel Rouly

06/02/2023, 3:55 PM

Does an

Executor

ultimately end up invoking a

StepLauncher

Michel Rouly

06/02/2023, 3:57 PM

I'm currently most familiar with

k8s_job_executor

, which is a

StepDelegatingExecutor

. That has its own

StepHandler

that launches steps. For k8s, that's the

K8sStepHandler

Michel Rouly

06/02/2023, 3:58 PM

But the

K8sStepHandler

just directly invokes the K8s API. No

StepLauncher

to be seen.

Michel Rouly

06/02/2023, 3:58 PM

My confusion is based on this github discussion about the EMR step launcher.

Michel Rouly

06/02/2023, 3:59 PM

It makes it sound like it could be a

StepHandler

or indeed even an

Executor

Michel Rouly

06/02/2023, 4:00 PM

For example, the emr_pyspark_step_launcher has the op run inside an EMR cluster, instead of the process that the executor would normally execute it inside.

Michel Rouly

06/02/2023, 4:00 PM

It sounds like the

StepLauncher

executes first, and for the EMR launcher, it it delegates to EMR instead of invoking the

Executor

Michel Rouly

06/02/2023, 4:03 PM

as far as I can tell there are only really EMR and DataBricks step launchers.

Michel Rouly

06/02/2023, 4:05 PM

StepExecutionContext

has a

step_launcher

in it...

Michel Rouly

06/02/2023, 4:05 PM

whereas

PlanOrchestrationContext

and

StepOrchestrationContext

have an

executor

Michel Rouly

06/02/2023, 4:05 PM

(source)

johann

06/02/2023, 4:06 PM

Yep you’ve caught some overlapping concepts. The current reality of how these are used: • executor is responsible for launching all steps (step = an execution of an asset or op) • the StepDelegatingExecutor is a specific executor which takes StepHandlers and uses them to launch the steps. Ideally all executors would use the StepDelegatingExecutor framework to dedup more logic. We’re getting around to that slowly • StepLaunchers are fairly distinct and have a confusing name given the above. They came before StepDelegatingExecutor or they’d be called something different. They are a way to override execution for a particular step by shipping it off to emr or databricks

🌈 1

❤️ 1

Michel Rouly

06/02/2023, 4:07 PM

have a confusing name given the above

haha, you're telling me 😛

Michel Rouly

06/02/2023, 4:07 PM

interesting. would it be fair to say that, if we want to customize step "launching" behavior, we should look into Executors these days, rather than StepLaunchers? and specifically we should look into

StepHandler

johann

06/02/2023, 4:07 PM

A lovely endstate would be: all executors use the StepDelegatingExecutor with different default StepHandlers. Instead of StepLaunchers, you can override the StepHandler for a given step

Michel Rouly

06/02/2023, 4:07 PM

jinx!

Michel Rouly

06/02/2023, 4:08 PM

We're looking into setting up a Kubernetes Spark (as opposed to EMR, DataBricks) step launcher thingy. And all the examples I had seen for Spark were

StepLauncher

, although all my experience has been with

k8s_job_executor

Michel Rouly

06/02/2023, 4:09 PM

It's a little unfortunate that all the specific Spark interop examples are in an interface I can't / shouldn't use, but I think this clarifies the direction for me a bit.

johann

06/02/2023, 4:10 PM

Got it. For this use case I think I’d actually recommend a StepLauncher for now, since it sticks with the pattern we currently have going

👍 1

Michel Rouly

06/02/2023, 4:11 PM

yeah, that definitely makes sense

johann

06/02/2023, 4:11 PM

The StepHandlers so far have only been used for K8s and Docker, and I’d be worried about hitting some limitation with the current health monitoring api or something like that. Versus the step launcher pattern is more tried

👍 1

Michel Rouly

06/02/2023, 4:11 PM

got it. thank you!

johann

06/02/2023, 4:12 PM

Np!

Michel Rouly

06/02/2023, 4:12 PM

I guess a related question. do you imagine an

Executor

and a

StepLauncher

would play nice within the same job?

johann

06/02/2023, 4:12 PM

Yes

Michel Rouly

06/02/2023, 4:12 PM

perfect 🙂

johann

06/02/2023, 4:12 PM

All jobs have an executor. Some ops have a step launcher

👍 1

johann

06/02/2023, 4:14 PM

Under the hood, when you’re using a step launcher: • the executor still spins up a process/container/pod for the given step • that process/container/pod makes the call out to databricks/emr, instead of invoking the python code locally. It polls until whatever it launched completes

Michel Rouly

06/02/2023, 4:15 PM

Ahhh interesting. So there is still a step running in the background.

johann

06/02/2023, 4:15 PM

Correct

johann

06/02/2023, 4:15 PM

It adds a bit of overhead but makes monitoring the launched thing easier

Michel Rouly

06/02/2023, 4:17 PM

We're interested in removing that step pod, because it has a habit of getting killed and our jobs lose all Spark progress. This is just an artifact of our K8s clusters aggressively cycling nodes. Run pods are stable, but step pods are treated as fault tolerant / can be descheduled. I guess we could just use the Child Process Executor for Spark steps, so the step would run on the run pod.

Michel Rouly

06/02/2023, 4:17 PM

OK, well either way, this gives me some very useful context to start figuring out our strategy.

👍 1

johann

06/02/2023, 4:38 PM

If you absolutely needed to remove the step pod and didn’t want to do the multiprocess executor, then yeah I’d recommend trying out a step handler. We currently only support 1 StepHandler per job though, so the job would need to be purely Spark

2 Views

Open in Slack

Previous Next