Oren Lederman

02/18/2023, 6:50 PM
Is there a way to "group" several ops together so that if one of them fails, all three of them will need to rerun? In my example, I have a model training pipeline that (among other ops) includes an op that downloads a large dataset to local storage, an op that calls a shell command to train a model, and an op that writes the results to a model store. The problem here is the local storage, of course. I can't rely on local storage to persist between ops (for example if one fails). I could combine all three steps into one op (but then they are hard to test), and I could use a persistent volume in k8s (but it adds some complexity). So I was thinking that if I can somehow group them together (if one fails, all fail and need to be part of a rerun) the data I need is going to be available in local storage. Any thoughts? Is it too much of an anti-pattern?

Eduardo Muñoz

02/19/2023, 7:55 AM
I also have a similar problem with downloading a large file. I need to download it in chunks, and the decompress it, also in chunks. With vanilla Python this is simple, and I would save the file and decompress it in the /tmp folder. I would like to know the Dagster way of doing it. DynamicOutputs seems to be the best way of doing it, however, seems like reinventing the wheel for a simple problem already solved.

Oren Lederman

02/19/2023, 7:11 PM
Thanks Eduardo, I was wondering if your question was related to mine 🙂 The link doesn’t help much though - I understand how IO Managers work (more or less). It’s more about how to optimize for something that is a bit of an edge case for Dagster


02/21/2023, 6:49 PM
Hi Oren, the ideal way I can imagine this working is to wrap the three ops you want to retry together in a
, and then specify a retry policy on the graph. But unfortunately, we don't currently support retry policies on graphs at the moment. Would you mind filing an issue for this?

Oren Lederman

02/21/2023, 8:53 PM