Hey I wonder if I can combine nest jobs running on multiple dagster #ask-community

Hey! I wonder if I can combine/nest jobs running o...

Arturs Stramkals

11/25/2021, 10:58 AM

Hey! I wonder if I can combine/nest jobs running on multiple pieces of hardware, in multiple parts? What I’ve got working right now is: 1. Job

starts on our Dagster cluster, 2. Job

creates an external environment, 3. Job

launches execution of job

in the external environment. My question is on whether if there’s a recommended way to tie

and

together, since functionally

job actually is complete only when job

is done, because

would in ideal world destroy the external environment upon completion.

max

11/29/2021, 5:02 PM

Hi Arturs -- I think there are a bunch of ways you could accomplish this -- does job

really need to be a separate job, or could you write a job factory that embedded the graph for job

into the graph for job

? So that the setup ops from

ran ahead of the business logic for

and then the teardown ops ran after it was complete?

Arturs Stramkals

12/02/2021, 5:08 PM

@max

is a reasonably complex Dagster job (or graph I guess), that dynamically assembles itself into anywhere between hundreds and thousands ops, depending on execution time circumstances. For that reason, I would prefer to be able to actually review individual steps in Dagit if necessary. I’m not sure what would be a job factory in this case, but basically what I want to do is tie together the construction-deconstruction job

with the business logic job

so that the entire process can be visually reviewed in a single flamegraph.

max

12/02/2021, 5:10 PM

is the reason that they aren't a single job that job

is re-used elsewhere?

Arturs Stramkals

12/02/2021, 5:11 PM

If that’s impossible, it’s not a blocking problem - would just make for a really annoying Dagit user experience, where a reviewer will open failed

A

job

a1b2c3

and will then need to figure out that it is connected to the failed

B

run

x4y5z6

max

12/02/2021, 5:12 PM

i guess i'm just asking why the jobs are separate? if you want elements of job

to run after job

is complete

Arturs Stramkals

12/02/2021, 5:12 PM

They’re not a single job because environment for

is ephemeral, and

must be executed inside a specific environment that is at all times external to both Dagit and Dagster Daemon.

happens on a Spark cluster.

max

12/02/2021, 5:13 PM

so the ops in job

instigate compute on the Spark cluster?

Arturs Stramkals

12/02/2021, 5:13 PM

The job of

is to survey environment, determine spec of Spark cluster necessary, provision the cluster, submit the job to the cluster, and close the cluster once the job has completed.

is a PySpark project, yes.

max

12/02/2021, 5:14 PM

it seems to me like logically this is a single job (unless the right way to think of

is as a context manager/environment provisioner that gets reused for jobs

, ...)

max

12/02/2021, 5:15 PM

if that's the case, i'm having.a hard time understanding what technical limitations require it to be split into two jobs -- there's no reason that ops can't launch Spark jobs on clusters created by upstream ops (and then cleaned up by downstream ops)

Arturs Stramkals

12/02/2021, 5:15 PM

That’s technologically impossible, in this situation there’s a hard requirement that I run the equivalent of

spark-submit b.py

locally from the Spark cluster’s perspective.

Arturs Stramkals

12/02/2021, 5:16 PM

My question is basically if I can connect a graph running on a 3rd party environment to an ongoing job and insert it between the op that caused it to be executed and the op that is scheduled next.

max

12/02/2021, 5:17 PM

you can certainly have an op which kicks off the dependent job

max

12/02/2021, 5:17 PM

that op can emit structured metadata which can provide links to the dependent job

max

12/02/2021, 5:18 PM

depending on exactly what you're doing, you may also be able to use functionality like asset sensors to introduce loose coupling between your various jobs

max

12/02/2021, 5:18 PM

you might also be able to use the resource system to provision your ephemeral cluster (there are models in the codebase of how to do this with EMR)

max

12/02/2021, 5:19 PM

maybe i should have asked this earlier -- is

a Dagster job or a Spark job?

Arturs Stramkals

12/02/2021, 5:20 PM

It’s both. It is a Dagster job that dynamically generates a set of Spark operations at runtime. It’s several nested layers of dynamic mapping ops.

Arturs Stramkals

12/02/2021, 5:22 PM

Is there a documentation page describing dependent jobs functionality? Changing our provisioning logic for the sake of this is not a feasible option, I think.

max

12/02/2021, 5:22 PM

all i mean is that an op can call the python or graphql apis to start another job

max

12/02/2021, 5:23 PM

but again i'm having trouble understanding why that's necessary in your case -- an op that generates a set of Spark operations feels like it should be able to run within any Dagster graph -- so hard to see why the jobs need to be separate

Arturs Stramkals

12/02/2021, 5:23 PM

Because it cannot execute on the same hardware, that’s technologically impossible in our hardware configuration.

max

12/02/2021, 5:24 PM

gotcha, so you can't remotely execute your spark-submit call

Arturs Stramkals

12/02/2021, 5:24 PM

I can and I do, that’s the problem.

Arturs Stramkals

12/02/2021, 5:25 PM

When you perform a spark-submit call, your Spark cluster creates a new set of Python processes, marshalled by it and local to its workers.

Arturs Stramkals

12/02/2021, 5:26 PM

So in that process I have a Dagster job executing, that feeds workloads to Spark.

max

12/02/2021, 5:26 PM

so you are running your Dagster job (

) within a Spark job?

Arturs Stramkals

12/02/2021, 5:27 PM

Due to the way our infrastructure is set up, it is not possible for me to feed individual workloads to our Spark cluster.

Arturs Stramkals

12/02/2021, 5:27 PM

Yes, correct, I should’ve expressed myself clearer about that when you asked before.

max

12/02/2021, 5:27 PM

i see

max

12/02/2021, 5:28 PM

and is the primary concern to find a way to link the two jobs in the UI, or is it more around control flow -- like ensuring that job B has completed before running the cleanup step from job A

Arturs Stramkals

12/02/2021, 5:30 PM

Just the user experience, I can handle the cleanup step via environmental configuration by setting up self-destruction triggered by time spent idle - so long as that it was actually visible to Dagit while executing, and so Dagit has the logs from the job if it failed.

Arturs Stramkals

12/02/2021, 5:31 PM

Although I guess logs also are not that critical, I don’t log locally to cluster anyway. So yeah, primary concern is to have them linked in UI.

max

12/02/2021, 6:10 PM

You can attach arbitrary tags to job runs -- so for instance if you wanted to tag job `A`'s run id on the run of job

Arturs Stramkals

12/02/2021, 6:32 PM

That will do, cheers. Any documentation pointers if you have them handy?

max

12/02/2021, 6:53 PM

@sandy

4 Views

Open in Slack

Previous Next