https://dagster.io/ logo
t

Tobias Macey

05/05/2020, 8:36 PM
What's the "best practice" for managing output files? I was starting to write a resource module for creating a directory on disk, but wanted to consider the case where I'm running on a distributed execution engine á la Dask. Does the dagster-aws S3 integration allow for that kind of situation out of the box?
m

max

05/05/2020, 11:28 PM
the s3 integration would def work for this
you might also be interested in exploring the
FileManager
system
which, tbh, is a little rough because it's still under-exercised
but the context object you get in the body of a solid compute function has a
file_manager
member
that you can use to manage files in a storage backend-independent way
so that you could switch from using a local filesystem in test to s3 in prod, etc
t

Tobias Macey

05/06/2020, 12:51 AM
Good to know, thanks!