Hi all I m rather new to dagster and am trying to get an ass dagster #ask-community

Hi all! I'm rather new to dagster and am trying t...

Noah Ford

01/08/2023, 10:39 PM

Hi all! I'm rather new to dagster and am trying to get an asset to use a custom BigQuery I/O manager. I got the I/O manager to work with my creds by forking the quickstart repo for GCP. My company uses a repository instead of the definitions that the quickstart repo uses. The simple asset I'm trying to run is simply:

Copy code

from dagster import asset
import pandas as pd

@asset(
    group_name="new_group",
    required_resource_keys={"bq_io_manager"}
)
def totango_accounts() -> pd.DataFrame:
    data = {
        'name': ['name', 'name_2'],
        'number': ['+123', '+456']
    }

    return pd.DataFrame(data)

And repository.py looks like:

Copy code

resource_defs = {
        "io_manager": s3_pickle_io_manager.configured(
            {"s3_bucket": "<bucket>", "s3_prefix": s3_prefix}
        ),
        "s3": s3_resource,
        "bq_io_manager": bigquery_pandas_io_manager.configured(
            {
                "credentials": {"env": "BIGQUERY_JSON_PATH"},
                "project_id": {"env": "BIGQUERY_PROJECT_ID"},
                "dataset_id": '<dataset_name>'
            }
        )
    }
    
    repository_assets = load_assets_from_modules(
        modules=[
            some_s3_assets,
            dagster_quickstart_assets,
            my_new_assets  #assets from the script above
        ]) \
        + dbt_assets


    return [
        *with_resources(
            definitions=repository_assets, resource_defs=resource_defs
        ),
        define_asset_job(name="all_assets_job"),
    ]

Even though I pass the I/O manager that I copied from the quickstart repo to the relevant asset it somehow uses the

s3

resource instead. It successfully runs but I'm at a loss for why it uses a completely different resource than the one I explicitly give the asset. Does anyone have an idea what might be occuring?

🤖 1

Noah Ford

01/08/2023, 11:22 PM

🙃 Forgot to use

io_manager_key

instead of

required_resource_keys

. Obviously user error but is it expected behavior that if an invalid required resource key that it would fall back to another resource that handles the same types?

sandy

01/09/2023, 5:48 PM

Glad you were able to figure this out. I'm not 100% following what you mean by "but is it expected behavior that if an invalid required resource key that it would fall back to another resource that handles the same types?"

Noah Ford

01/09/2023, 6:04 PM

Hi Sandy, ya that was not stated clear. When I used

required_resource_keys

instead of

io_manager_key

it saved the data using a S3 IO manager even though I never made any mention of S3 in the asset. I had the impression that maybe it falls back on a I/O manager that handles the type that was being outputted (pandas dataframe)? But that seemed like an odd and unhelpful behavior instead of throwing an error.

sandy

01/11/2023, 4:23 PM

Ah I see - all assets have "io_manager" as their default IO manager, so if you don't specify an IO manager, this default is still used. Something that we could consider doing is issuing an error or warning if someone provides an IO manager to "required_resource_keys". However, there are some situations where that can still be a legitimate thing to do, so it's tough.

Noah Ford

01/11/2023, 4:49 PM

Makes sense! Appreciate the answer, I think us renaming the IO manager named

io_manager

currently would be enough to get errors when we "should" in the future. Thanks!

👍 1

2 Views

Open in Slack

Previous Next