Arturs Stramkals
11/25/2021, 10:58 AMA
starts on our Dagster cluster,
2. Job A
creates an external environment,
3. Job A
launches execution of job B
in the external environment.
My question is on whether if there’s a recommended way to tie A
and B
together, since functionally A
job actually is complete only when job B
is done, because A
would in ideal world destroy the external environment upon completion.max
11/29/2021, 5:02 PMA
really need to be a separate job, or could you write a job factory that embedded the graph for job B
into the graph for job A
? So that the setup ops from A
ran ahead of the business logic for B
and then the teardown ops ran after it was complete?Arturs Stramkals
12/02/2021, 5:08 PMB
is a reasonably complex Dagster job (or graph I guess), that dynamically assembles itself into anywhere between hundreds and thousands ops, depending on execution time circumstances. For that reason, I would prefer to be able to actually review individual steps in Dagit if necessary. I’m not sure what would be a job factory in this case, but basically what I want to do is tie together the construction-deconstruction job A
with the business logic job B
so that the entire process can be visually reviewed in a single flamegraph.max
12/02/2021, 5:10 PMA
is re-used elsewhere?Arturs Stramkals
12/02/2021, 5:11 PMA
job a1b2c3
and will then need to figure out that it is connected to the failed B
run x4y5z6
.max
12/02/2021, 5:12 PMA
to run after job B
is completeArturs Stramkals
12/02/2021, 5:12 PMB
is ephemeral, and B
must be executed inside a specific environment that is at all times external to both Dagit and Dagster Daemon. B
happens on a Spark cluster.max
12/02/2021, 5:13 PMB
instigate compute on the Spark cluster?Arturs Stramkals
12/02/2021, 5:13 PMA
is to survey environment, determine spec of Spark cluster necessary, provision the cluster, submit the job to the cluster, and close the cluster once the job has completed. B
is a PySpark project, yes.max
12/02/2021, 5:14 PMA
is as a context manager/environment provisioner that gets reused for jobs B
, C
, ...)Arturs Stramkals
12/02/2021, 5:15 PMspark-submit b.py
locally from the Spark cluster’s perspective.max
12/02/2021, 5:17 PMB
a Dagster job or a Spark job?Arturs Stramkals
12/02/2021, 5:20 PMmax
12/02/2021, 5:22 PMArturs Stramkals
12/02/2021, 5:23 PMmax
12/02/2021, 5:24 PMArturs Stramkals
12/02/2021, 5:24 PMmax
12/02/2021, 5:26 PMB
) within a Spark job?Arturs Stramkals
12/02/2021, 5:27 PMmax
12/02/2021, 5:27 PMArturs Stramkals
12/02/2021, 5:30 PMmax
12/02/2021, 6:10 PMB
Arturs Stramkals
12/02/2021, 6:32 PMmax
12/02/2021, 6:53 PM