Is there a way to "group" several ops together so that if one of them fails, all three of them will need to rerun?
In my example, I have a model training pipeline that (among other ops) includes an op that downloads a large dataset to local storage, an op that calls a shell command to train a model, and an op that writes the results to a model store.
The problem here is the local storage, of course. I can't rely on local storage to persist between ops (for example if one fails).
I could combine all three steps into one op (but then they are hard to test), and I could use a persistent volume in k8s (but it adds some complexity). So I was thinking that if I can somehow group them together (if one fails, all fail and need to be part of a rerun) the data I need is going to be available in local storage.
Any thoughts? Is it too much of an anti-pattern?