Noah Ford
01/08/2023, 10:39 PMfrom dagster import asset
import pandas as pd
@asset(
group_name="new_group",
required_resource_keys={"bq_io_manager"}
)
def totango_accounts() -> pd.DataFrame:
data = {
'name': ['name', 'name_2'],
'number': ['+123', '+456']
}
return pd.DataFrame(data)
And repository.py looks like:
resource_defs = {
"io_manager": s3_pickle_io_manager.configured(
{"s3_bucket": "<bucket>", "s3_prefix": s3_prefix}
),
"s3": s3_resource,
"bq_io_manager": bigquery_pandas_io_manager.configured(
{
"credentials": {"env": "BIGQUERY_JSON_PATH"},
"project_id": {"env": "BIGQUERY_PROJECT_ID"},
"dataset_id": '<dataset_name>'
}
)
}
repository_assets = load_assets_from_modules(
modules=[
some_s3_assets,
dagster_quickstart_assets,
my_new_assets #assets from the script above
]) \
+ dbt_assets
return [
*with_resources(
definitions=repository_assets, resource_defs=resource_defs
),
define_asset_job(name="all_assets_job"),
]
Even though I pass the I/O manager that I copied from the quickstart repo to the relevant asset it somehow uses the s3
resource instead. It successfully runs but I'm at a loss for why it uses a completely different resource than the one I explicitly give the asset.
Does anyone have an idea what might be occuring?Noah Ford
01/08/2023, 11:22 PMio_manager_key
instead of required_resource_keys
. Obviously user error but is it expected behavior that if an invalid required resource key that it would fall back to another resource that handles the same types?sandy
01/09/2023, 5:48 PMNoah Ford
01/09/2023, 6:04 PMrequired_resource_keys
instead of io_manager_key
it saved the data using a S3 IO manager even though I never made any mention of S3 in the asset. I had the impression that maybe it falls back on a I/O manager that handles the type that was being outputted (pandas dataframe)? But that seemed like an odd and unhelpful behavior instead of throwing an error.sandy
01/11/2023, 4:23 PMNoah Ford
01/11/2023, 4:49 PMio_manager
currently would be enough to get errors when we "should" in the future. Thanks!