https://dagster.io/ logo
Title
b

Bennett Kanuka

04/26/2021, 5:40 PM
Hi all, i'm just at the phase of evaluating whether dagster is a good fit for our project. 90%+ of our processing is on image files which need to be saved to a persistent storage during (and after) a run. Then likely urls/filepaths would be passed between solids rather than the images themselves. Is this a well defined pattern in dagster? From what I can tell, I should to use a
FileManager
but I can't get any further than that. Could someone explain how I would do this or point to an example of a pipeline that passes file handles rather than pickled data? Edit: this would be running on GCP in production so I would be using the
dagster_gcp.gcs_file_manager
but I dont know how to use it
a

alex

04/26/2021, 5:56 PM
I believe the
IOManager
abstraction should help with this use case https://docs.dagster.io/concepts/io-management/io-managers#io-managers
b

Bennett Kanuka

04/26/2021, 5:58 PM
IOManager seems to deal with automatic serialisation of Python types...right? Im struggling with putting this into practice
a

alex

04/26/2021, 6:21 PM
Ya its a bit subtle and we can do better at how we explain all of this. So
FileManager
is a resource https://docs.dagster.io/concepts/modes-resources The idea being you can have different implementation for that resource so that you can locally use the file system then in prod use GCS The file manager setup will have interactions with that resource happening in the body of your solids for reading/writing FileHandles <-> file data
IOManager
is another resource, but its a special one in that Dagster will take care of invoking it behind the scenes so that your
@solid
code can be oriented around just working with the data and separate out completely how its stored