https://dagster.io/ logo
#ask-community
Title
# ask-community
m

martin o leary

08/10/2023, 6:07 PM
Hey team, what is the recommended way to save a file to s3 bucket from inside my own custom io manager?
dagster bot responded by community 1
z

Zach

08/10/2023, 6:30 PM
I don't think it really matters from Dagster's point of view - you can just boto3.client.put_object, or a boto3 bucket resource, or S3FS if you'd like
this 1
m

martin o leary

08/10/2023, 6:34 PM
Yea sorry, I should have been a bit clearer. Trying to herd cats here as well 😃 I could set the s3_resource as a dependency of my io_manager and use that to get a client right?
z

Zach

08/10/2023, 6:35 PM
Yes you could, that would be the ideal way to inject your S3 dependency
m

martin o leary

08/10/2023, 6:36 PM
Thanks! I know that sounds straight forward to you but I'm after getting tied up in knots reading the docs on file IO managers 🙂
z

Zach

08/10/2023, 6:37 PM
It's all good, totally fair question!
d

Daniel Gafni

08/11/2023, 7:51 AM
The default IOManager works with S3 out of the box You can also extend
UPathIOManager
to easily implement your own filesystem-based IOManager with custom logic. It would work with S3 out of the box too.
m

martin o leary

08/11/2023, 10:18 AM
Is there an example of that anywhere @Daniel Gafni?
A more modern example without the ConfigurableIOManagerFactory:
Copy code
import json
from typing import Any, Optional

import dagster._check as check
from dagster import ConfigurableIOManager, InitResourceContext, InputContext, OutputContext, UPathIOManager
from pydantic import Field, PrivateAttr
from upath import UPath


class JSONIOManager(ConfigurableIOManager, UPathIOManager):
    base_dir: Optional[str] = Field(default=None, description="Base directory for storing files.")

    _base_path: UPath = PrivateAttr()

    def setup_for_execution(self, context: InitResourceContext) -> None:
        self._base_path = (
            UPath(self.base_dir)
            if self.base_dir is not None
            else UPath(check.not_none(context.instance).storage_directory())
        )

    def load_from_path(self, context: InputContext, path: UPath) -> str:
        with path.open("rb") as file:
            return json.loads(file)

    def dump_to_path(self, context: OutputContext, obj: Any, path: UPath):
        with path.open("wb") as file:
            json.dumps(obj, file)
@sandy perhaps we can update the
UPathIOManager
usage examples in the docs?
m

martin o leary

08/11/2023, 10:30 AM
Thanks for this @Daniel Gafni Where I got confused when I read that in the documentation initially was that I can't see how `Upath`"just works" with S3 ?
Ok - now I see in the docs
s3:
and
s3a:
AWS S3 (requires
s3fs
to be installed)
d

Daniel Gafni

08/11/2023, 10:31 AM
It's instantiating the
S3FileSystem
from
s3fs
internally, which is used for all the FS operations
🙌 1
7 Views