Hi all! I'm rather new to dagster and am trying t...
# ask-community
n
Hi all! I'm rather new to dagster and am trying to get an asset to use a custom BigQuery I/O manager. I got the I/O manager to work with my creds by forking the quickstart repo for GCP. My company uses a repository instead of the definitions that the quickstart repo uses. The simple asset I'm trying to run is simply:
Copy code
from dagster import asset
import pandas as pd

@asset(
    group_name="new_group",
    required_resource_keys={"bq_io_manager"}
)
def totango_accounts() -> pd.DataFrame:
    data = {
        'name': ['name', 'name_2'],
        'number': ['+123', '+456']
    }

    return pd.DataFrame(data)
And repository.py looks like:
Copy code
resource_defs = {
        "io_manager": s3_pickle_io_manager.configured(
            {"s3_bucket": "<bucket>", "s3_prefix": s3_prefix}
        ),
        "s3": s3_resource,
        "bq_io_manager": bigquery_pandas_io_manager.configured(
            {
                "credentials": {"env": "BIGQUERY_JSON_PATH"},
                "project_id": {"env": "BIGQUERY_PROJECT_ID"},
                "dataset_id": '<dataset_name>'
            }
        )
    }
    
    repository_assets = load_assets_from_modules(
        modules=[
            some_s3_assets,
            dagster_quickstart_assets,
            my_new_assets  #assets from the script above
        ]) \
        + dbt_assets


    return [
        *with_resources(
            definitions=repository_assets, resource_defs=resource_defs
        ),
        define_asset_job(name="all_assets_job"),
    ]
Even though I pass the I/O manager that I copied from the quickstart repo to the relevant asset it somehow uses the
s3
resource instead. It successfully runs but I'm at a loss for why it uses a completely different resource than the one I explicitly give the asset. Does anyone have an idea what might be occuring?
🤖 1
🙃 Forgot to use
io_manager_key
instead of
required_resource_keys
. Obviously user error but is it expected behavior that if an invalid required resource key that it would fall back to another resource that handles the same types?
s
Glad you were able to figure this out. I'm not 100% following what you mean by "but is it expected behavior that if an invalid required resource key that it would fall back to another resource that handles the same types?"
n
Hi Sandy, ya that was not stated clear. When I used
required_resource_keys
instead of
io_manager_key
it saved the data using a S3 IO manager even though I never made any mention of S3 in the asset. I had the impression that maybe it falls back on a I/O manager that handles the type that was being outputted (pandas dataframe)? But that seemed like an odd and unhelpful behavior instead of throwing an error.
s
Ah I see - all assets have "io_manager" as their default IO manager, so if you don't specify an IO manager, this default is still used. Something that we could consider doing is issuing an error or warning if someone provides an IO manager to "required_resource_keys". However, there are some situations where that can still be a legitimate thing to do, so it's tough.
n
Makes sense! Appreciate the answer, I think us renaming the IO manager named
io_manager
currently would be enough to get errors when we "should" in the future. Thanks!
👍 1