Hey all I was curious as to what are the best practices arou dagster #ask-community

Hey all, I was curious as to what are the best pra...

Scott Hood

03/28/2022, 3:10 PM

Hey all, I was curious as to what are the best practices around creating jobs that trigger runs of other jobs or even if this is practice at all within dagster. I haven't been able to see any documentation around this, however, know that its a practice used within other orchestration tooling.

jamie

03/28/2022, 3:16 PM

Hi @Scott Hood you might be able to use sensors for your use case. In particular, run status sensors would be able to watch a job for a particular status (passed, failed, etc) and then launch a new job accordingly https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#run-status-sensors

Scott Hood

03/28/2022, 3:25 PM

Hey @jamie, Interesting, I was actually thinking it would have been possible to simply call

RunRequest

or something like that from within the job itself. One of the issues with using the sensor is you lose the flow visualization of "This job calls x job to do work". If this was a popular job called by many parties I could see this as being not super easy to troubleshoot, having to track down which run was related to your job.

jamie

03/28/2022, 3:28 PM

ok that's good context. another way to think about this would be use the graph abstraction. let's say you have job A and you want it to call job B, but maybe there's a job C that also calls job B. if you instead use the graph abstraction, you can now create a job that contains graph A and graph B, and another job that contains graph C and graph B

Scott Hood

03/28/2022, 3:38 PM

Cool, I didn't know that was a feature. I knew that one could share operations / utilize resources in order to share functionalities between jobs but didn't know you could combine graphs together. One thing about this is that let's say you wanted to have job like monitoring around graph B, graph abstraction would treat the combinations of A/B and C/B as two jobs instead of 3 correct? So although you would gain the ability to share that functionality, you wouldn't easily be able to track which parties were utilizing the graph without looking at the code itself?

Scott Hood

03/28/2022, 3:51 PM

Although not logging, I did realize that in the ops view section on each op there is a tab at the bottom that shows "All invocations" which links all jobs utilizing said op.

jamie

03/28/2022, 4:23 PM

yeah job A/B and job B/C would be considered two separate jobs. For example, this file would create 3 different jobs (the third job is just

graphA

)

Copy code

@graph
def graphA():
    op1()
    op2()
    ...

@graph  
def graphB():
    op3()
    op4()
    ...

@graph  
def graphC():
    op5()
    op6()
    ...

@job 
def job1():
    graphA()
    graphB()

@job 
def job2():
    graphC()
    graphB()

job3 = graphA.to_job(...)

👍 1

Scott Hood

03/28/2022, 4:32 PM

@jamie regarding the run_status_sensor. Does that only work if associated with the same repository as the jobs you want to listen to? Or could you have a "shared repository" set this up and listen to all other repositories' jobs?

jamie

03/28/2022, 4:35 PM

I think the way it works is you set up a run status sensor in a repository and it will listen to all of the jobs in that repository. but if you had two repositories, you'd need to include the run status sensor in both (not 100% sure if this is possible, i'd have to test it out), or have two run status sensors (one in each repository)

Scott Hood

03/28/2022, 4:37 PM

That's my interpretation as well. Still going down the pros and cons of multi repositories for our organization and such.

2 Views

Open in Slack

Previous Next