How do I get Dagster's IO managers to play nicely ...
# ask-community
c
How do I get Dagster's IO managers to play nicely with Polars and especially the
lazyAPI
? Conceptually what I want to do is simple: I don't want to write the assets to disk because it's impossible to Pickle them. As far as I know each DAG in the node is a different process so an in-memory file manager leads to a
KeyError
. EDIT: I could wrap a bunch of the pipeline that uses the
LazyAPI
and run it synchronously using
execute_in_process
with an
InMemoryIOManager
. Is this the idiomatic way to do this?
dagster bot responded by community 1
z
Yeah you run into the same thing with spark. It's just a symptom of multiprocessing in python and it being really difficult to share objects across process boundaries. If you don't want to have to serialize to disk at the end of each step then you need to use the in memory executor and InMemoryIOManager - doesn't necessarily need to be via
execute_in_process
though, I think you can configure the in-memory executor on jobs and execute them through the launchpad like you normally would.
1
c
Thanks for the response. So in principle I was on the right track? I load in about 10 assets, these can be serialized. Afterwards I "lazify" them and do most of my transforms on them and create one final table which I
collect
. This part should be one job that runs in memory in one single process. The last step can be persisted. Finally I write to the DB which can be done without an IO manager as well. In short, there should be 3 jobs.