Hi folks I have a potentially silly question that I m a litt dagster #announcements

Hi folks, I have a potentially silly question that...

Ben Torvaney

03/08/2021, 5:37 PM

Hi folks, I have a potentially silly question that I’m a little confused about. If I have some output that I want to persist to storage (e.g. filesystem, s3, etc), and I’m not concerned with loading that output into another downstream step, should I still use an IO manager? The documentation & deprecation of file-handlers seems to push in this direction, but for simple use-cases it seems like additional complexity (compared to a resource) for limited payoff?

antonl

03/08/2021, 5:42 PM

You could model that as an asset materialization instead. Logically, that means that your "output" is actually a side effect of the computation, and your solid would return nothing.

Ben Torvaney

03/08/2021, 5:43 PM

Yes, that was what I was thinking. Are there any downsides to that approach that I might be missing?

antonl

03/08/2021, 5:46 PM

I haven't explored that space too much, tbh. One potential downside is that if you change your mind and want to do some additional processing, you'll have change the solid. If you work with a particular runtime type, it's likely you'd want to do something else with it in the future. It's not too hard to switch back and forth though.

✅ 1

yuhan

03/08/2021, 5:52 PM

Hi Ben, @antonl is totally right. You can write a solid which returns nothing. and persist the data inside the body of your solid, and yield AssetMaterialization to log the IO. the downside to handing IO inside the body of a solid is that the Dagster machinery could possible loss the track of your asset lineage. We are rolling out an experimental lineage feature soon, which model our asset lineage based on inputs and outputs (cc @owen)

Ben Torvaney

03/08/2021, 5:55 PM

Thanks @antonl & @yuhan - so you’re saying there’s no shame in using/defining a resource for this IO and using AssetMaterialization?

👍 1

sandy

03/08/2021, 5:56 PM

There's no shame in that!

owen

03/08/2021, 5:57 PM

if you're not using that data anywhere else in the pipeline, you're also not losing any lineage information (in the future 🙂)

✅ 1

Ben Torvaney

03/08/2021, 5:58 PM

Easy to say when you haven’t seen my code, @sandy 😬 Thanks everyone for your help! Have a great day!

Open in Slack

Previous Next