https://dagster.io/ logo
n

Nicolas Gaillard

12/16/2020, 10:46 AM
Hello guys, I'm new to Dagster (using 0.9.18 and running inside a container) and I was wondering if it was possible to chain several pipelines ? For example, I want to start my pipeline_2 as soon as pipeline_1 is completed. Thank you in advance and have a nice day!
1
n

Noah K

12/16/2020, 10:49 AM
You could make a solid that hits the GraphQL API
Though a think a single pipeline with a composite solid would be the more Correct™ way to handle it
1
n

Nicolas Gaillard

12/16/2020, 11:07 AM
I don't quite understand. It would mean that my "global" pipeline contains a composite solid, which would contain, two solids corresponding to my two pipelines?
s

sandy

12/16/2020, 4:23 PM
@Nicolas Gaillard that's exactly right. Out of curiosity, why separate them into two separate pipelines? They're in different logical domains, so it's helpful to organizing them that way? They have different sets of dependences? You want to be able to run them at different cadences?
n

Nicolas Gaillard

12/16/2020, 4:37 PM
The objective is to process data from the same business domain (collection, transformation and exposure). In the case where the provisioning doesn't work, it forces me to restart the entire pipeline. In addition, I expose the data to different services and at different frequencies. Finally, in order to avoid a huge pipeline with a multitude of solids (and even when performing abstractions using composite solids), I find it reasonable to separate the pipelines. What do you think about it ?
s

sandy

12/16/2020, 5:26 PM
Got it. Will get back to you with a code example soon.
In the case where the provisioning doesn't work, it forces me to restart the entire pipeline.
Would you mind expanding on this a little bit? The issue is that you experience failures while provisioning for individual solids within a pipeline?
n

Nicolas Gaillard

12/16/2020, 5:35 PM
Awesome, can't wait to read you!
Let's imagine that I have a daily pipeline that collects CSV data through a solid, performs transformations with Pandas in another, stores the data in a database and a last one that generates a view (using this data). For some reason, my fourth solid may fail occasionally (other data not ready for example). It seems to me that I can't easily restart via the UI only this solid and therefore I have to restart the whole pipeline.
s

sandy

12/16/2020, 5:39 PM
Do you know what intermediate storage you're using? The default is in-memory, which means that any outputs produced by the earlier steps will disappear between runs
I wrote up a code snippet here: https://github.com/dagster-io/dagster/discussions/3436. Let me know if you have any questions. It's a little bit involved - we're interested in making this simpler in the future. Note that it depends on some internal APIs, so it might get broken in a future release.
n

Noah K

12/16/2020, 11:49 PM
Having a libraries thing like dagster-graphql-solids might be nice
Was kind of surprised there wasn't already one 🙂
s

sandy

12/16/2020, 11:51 PM
Yeah - this has been one of most upvoted issues for a while: https://github.com/dagster-io/dagster/issues/2674
n

Noah K

12/16/2020, 11:53 PM
Though counterpoint, it's very easy to write one myself 😄
n

Nicolas Gaillard

12/17/2020, 8:30 AM
Thank you for your messages and yes, I do use in memory intermediate storage. Thanks again for the snippet, it looks great. I'll try to integrate it and will get back to you if I have any questions or to give you feedback!
s

sandy

12/17/2020, 3:46 PM
If you switch to a different intermediate storage, like the filesystem one, you'll be able to kick off pipelines from the middle