Hi Guys, Thanks for bringing dagster to the world...
# ask-community
l
Hi Guys, Thanks for bringing dagster to the world ! I have a newbie question : Why is there two different resources for S3 : ā€¢ S3FileManagerResource ā€¢ S3resource Who seem to be doing almost the same thing. It is not entirely clear to me which one is to be used. Is there a difference ? Which one should one use ? in which case ? Thanks in advance, Lucas
šŸ¤– 1
dagster bot responded by community 1
n
ā€¢ `S3Resource`: this gives you an s3 client, basically an interface to your bucket for doing i/o in your ops (it's a boto s3 client underneath, to be specific, but it's in convenient and familiar dagster language so it can be used and configured like your other dagster resources) ā€¢ `S3FileManagerResource`: this is also an interface to your s3 storage, so you could use it as an interface to do the same kind of i/o as
S3Resource
(above), but it also conforms to dagster's representation of a file manager, which has a bunch of filesystem-agnostic methods available. this would a useful way to connect to s3 if you wanted to write the same code for different kinds of storage locations. so you could have file-related operations that work with s3 storage, a local filesystem, or somewhere else in the cloud.
šŸŒˆ 1
šŸ‘ 1
ā¤ļø 1
i put
S3Resource
first because it's probably the way to go most of the time. easy access to any of your buckets from an op! ...but if you have a variety of s3 and non-s3 storage locations that need to be interchangeable, i could see why you would want the level of abstraction the
S3FileManagerResource
provides. personally i am not using any dagster `FileManager`s, but the regular
S3Resource
is very flexible and useful all over the place.
šŸŒˆ 1
ā¤ļø 1
šŸ‘ 1
s
@Navah Farahat nailed it.
l
Thank you very much for your answer !
j
@Navah Farahat I came across your earlier above answer on the
S3Resource
class and was hoping you could help clarify how it instantiates in the project's
__init__.py
vs a submodule. Briefly, the below block runs without error in my project's
__init__.py
but I'm unable to then access that
S3Resource
instance in my asset that is trying to write to s3.
Copy code
defs = Definitions(
    assets=load_assets_from_package_module(wxelt),
    resources={
        "snowflake_io_manager": SnowflakePandasIOManager(
            account=EnvVar("SNOWFLAKE_ACCOUNT"),
            user=EnvVar("SNOWFLAKE_USER"),
            password=EnvVar("SNOWFLAKE_PASSWORD"),
            role=EnvVar("SNOWFLAKE_ROLE"),
            warehouse=EnvVar("SNOWFLAKE_WAREHOUSE"),
            database=EnvVar("SNOWFLAKE_DATABASE"),
            schema=EnvVar("SNOWFLAKE_SCHEMA"),
        ),
        "s3": S3Resource(
            aws_access_key_id=EnvVar("AWS_KEY"),
            aws_secret_access_key=EnvVar("AWS_SECRET_KEY"),
            region_name='us-east-1'
        )
    },
    schedules=[daily_refresh_schedule],
)
When I move the instantiation of
S3Resource
to the relevant asset definition, Dagster no longer seems able to retrieve environmental variables through`EnvVar` and I get the following error when attempting to deploy:
Copy code
The above exception was caused by the following exception:
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
There seems to be some context-specific handling of credentials under the hood that I can't figure out. Are you able to provide any clarity?