Hi friends I have 3 jobs A B and C They each resemble ``` jo dagster #ask-community

Hi friends! I have 3 jobs, A, B, and C. They each...

Moody Edghaim

02/27/2023, 3:35 PM

Hi friends! I have 3 jobs, A, B, and C. They each resemble:

Copy code

@job
def A():
   job_a_graph()

@graph
def job_A_graph():
   some_op()

Jobs A, B, and C, are strung together to form a fourth "mega" job, using the graph from each job (of A, B, and C). It's important that I maintain the ability to run these separately, but also through the mega job if the situation arises. I would like these graphs to run sequentially, but job B doesn't have any dependencies, so it runs concurrently with job A. Creating data dependencies in the 'mega' job is tricky, because it uses the graph from every other job. Graphs seem to be a lot more strict than ops (each input must map to an op and be "used", return types cannot be None), so it's quite difficult to tie these graphs together without introducing extra boilerplate. The abstraction for creating a nothing dependency between ops/graphs leaves a bit to be desired (it's hard to read, and reason with imo - see here). A few questions: • is there a reason we can't utilize the None return of a graph to create a data dependency to another graph? This is possible with ops. I'm trying to understand the context as to why • is there a 'cleaner' way to create dependencies between graphs? In my example, there would be quite a lot of extra boilerplate to: ◦ create a graph 'wrapper' around graphs which need a dependency ◦ previous graph should pass in some "unused" input to graph wrapper ◦ graph wrapper introduces a new op just to consume the unused input.

👍 1

sandy

02/27/2023, 7:48 PM

Hi @Moody Edghaim - I wrote up a Github Discussion on the current state of this here: https://github.com/dagster-io/dagster/discussions/9930 This is definitely not the smoothest part of Dagster right now.

🙏 1

Moody Edghaim

02/27/2023, 9:01 PM

I appreciate the write up and example solution (and answering in GH discussions 😍)! Unfortunately, this essentially nullifies the modularity of graphs A, B, and C, as I want to be able to run them strung together in the "mega-job" which uses all graphs, or as stand-alone jobs - where inter-graph dependency doesn't matter since there's only 1 graph. I can (and probably will) introduce graph wrappers, but it's a lot of extra boilerplate I was hoping to avoid. Also, quick follow up question - is it possible to do this with a graph which ends with conditional branching?

Copy code

@graph
def graph_A(foo):
    run, skip = should_run(foo)

    find_bar(run)
    skip_bar(skip)

sandy

02/28/2023, 12:10 AM

as long as one branch returns a value, the downstream ops should run

Piotr Danielczyk

05/19/2023, 1:38 PM

@sandy is the status on running graphs sequentially still the same? Is there any plan to implement it in the future? Thank you

Piotr Danielczyk

05/19/2023, 1:48 PM

And @Moody Edghaim, how did you end up solving this issue? Thanks

sandy

05/31/2023, 11:57 PM

@Piotr Danielczyk there haven't been changes on this since this discussion. We don't have near-term plans for it at the moment, although we are eventually interested in addressing.

2 Views

Open in Slack

Previous Next