Hi ya'll. Have an io manager best practice questio...
# ask-community
a
Hi ya'll. Have an io manager best practice question. I'm sourcing a handful of datasets with dagster defined assets and would like to write the extracts to their respective directories, based on source (likely derived my group_name or key_prefix), on a local NAS. Would the recommended practice here be: 1. Hardcode the paths in the asset definitions 2. Create individual FileSystemIOManagers for each 3. Create a ConfigurableIOManager that can take the group_name / key_prefix as a variable to add to the basepath that all of those directories would share 4. Something else? I'd say my understanding of IO managers is still maturing blob teefs. Pretty loose at the moment. Thanks in advance!
🤖 1
j
Unless each asset needs to be stored in a unique way, I wouldn’t make a FileSystemIOManager for each, at that point you might as well write the loading and storing code in the assets themselves. 3 is a good option, and is what we do in most of our io managers
a
Thanks @jamie makes sense! I'll have a look at the ConfigurableIOManager documentation and proceed with #3. Do you by chance have a top of mind example of this kind of io manager that I could use as a base to get started? No worries if not.
j
most of our file system IO managers use a base class called UPathIOManager, that handles most of the file path creation logic and partition handling - here’s an example of the AWS S3 IO manager that uses UPathIOManager https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-aws/dagster_aws/s3/io_manager.py#L25 here’s another one from an example that doesn’t use the UPathIOManager, https://github.com/dagster-io/dagster/blob/master/examples/project_fully_featured/project_fully_featured/resources/parquet_io_manager.py
a
Awesome, thank you!