Hi folks, I have a potentially silly question that...
# announcements
b
Hi folks, I have a potentially silly question that I’m a little confused about. If I have some output that I want to persist to storage (e.g. filesystem, s3, etc), and I’m not concerned with loading that output into another downstream step, should I still use an IO manager? The documentation & deprecation of file-handlers seems to push in this direction, but for simple use-cases it seems like additional complexity (compared to a resource) for limited payoff?
a
You could model that as an asset materialization instead. Logically, that means that your "output" is actually a side effect of the computation, and your solid would return nothing.
b
Yes, that was what I was thinking. Are there any downsides to that approach that I might be missing?
a
I haven't explored that space too much, tbh. One potential downside is that if you change your mind and want to do some additional processing, you'll have change the solid. If you work with a particular runtime type, it's likely you'd want to do something else with it in the future. It's not too hard to switch back and forth though.
1
y
Hi Ben, @antonl is totally right. You can write a solid which returns nothing. and persist the data inside the body of your solid, and yield AssetMaterialization to log the IO. the downside to handing IO inside the body of a solid is that the Dagster machinery could possible loss the track of your asset lineage. We are rolling out an experimental lineage feature soon, which model our asset lineage based on inputs and outputs (cc @owen)
b
Thanks @antonl & @yuhan - so you’re saying there’s no shame in using/defining a resource for this IO and using AssetMaterialization?
👍 1
s
There's no shame in that!
o
if you're not using that data anywhere else in the pipeline, you're also not losing any lineage information (in the future 🙂)
1
b
Easy to say when you haven’t seen my code, @sandy 😬 Thanks everyone for your help! Have a great day!