Hi Team I was trying to implement a specific use case that I dagster #ask-community

Hi Team, I was trying to implement a specific use ...

Madhu D

08/30/2023, 6:56 AM

Hi Team, I was trying to implement a specific use case that I have and have run into a blocker and would appreciate it if anyone can point me in the right direction with respective documentation links: The details of the use case are as follows: We have a sample project structure as below: Project_main_dir | |--ops | |-op1.py | |-op2.py | |--graphs | |-graph1.py | |-graph2.py | |--nestedgraphs | |-nestedgraph1.py | |-nestedgraph2.py | |--jobs |-job1.py Setup: The graph1.py and graph2.py has the ops tied together as per the dependency that we need, to ensure the order of execution of the ops in a specific manner. These graph1.py and graph2.py are invoked inside the graph nestedgraph1.py and each of them returns an output to variables say graph1_op and graph2_op. The jobs1.py in turn invokes graphs from the nestedgraph1.py and nestedgraph2.py wherein each of them returns outputs to variable say nestedgraph1_op and nestedgraph2_op. Expectation: I want to ensure that graph1_op and graph2_op to be returned together from nestedgraph1.py to job1, and when the job1 runs, a dependency be created between nestedgraph invocations in such a way that only after the ops related to the generation of nestedgraph1_op(generating both graph1_op and graph2_op) are executed, the execution of ops related to nestedgraph2 should start. The pseudo code for the above scenario is something like below, please note that the nestedgraph and job files are generated programmatically on the basis of some input json: op1.py: imports .... @op def op1(): ...op1 logic return op1_op op2.py: imports .... @op def op2(): ...op2 logic return op2_op graph1.py: imports .... @graph def graph1(): op1_op = op1() return op1_op graph2.py: imports .... @graph def graph2(): op2_op = op2() return op2_op nestedgraph1.py: imports .... @graph def nestedgraph1(): graph1_op = graph1() graph2_op = graph2() return graph1_op, graph2_op job1.py: imports ... @job def job1(): x, y = nestedgraph1() nestedgraph2([x, y]) Problem: I am facing an issue in establishing the dependency on both x, y for nestedgraph2 invocation, currently the dependency maps to either graph1_op or graph2_op when I explore the job through the dagit UI and I am assuming it is because that particular graph output was generated first and mapped by dagster automatically. Also, as per my use case the nestedgraph1 graph can return more than 2 outputs as well and in such a case, I want to ensure the dependency on all of the outputs of nestedgraph1 before invoking nestedgraph2 inside the job1. Thanks for the help in advance!

Open in Slack

Previous Next