https://dagster.io/ logo
#dagster-support
Title
# dagster-support
m

MikeVL

02/06/2023, 9:32 PM
Hello, Trying to use bigquery_pandas_io_manager however having an issue trying to finding a way to pass in "dataset_id" when @asset() is initialized instead of using the default.
Copy code
@io_manager(
    config_schema={
        "credentials": StringSource,
        "project_id": StringSource,
        "dataset_id": Field(
            str, default_value="my_dataset", description="Dataset ID. Defaults to 'my_dataset'"
        ),
    }
)
def bigquery_pandas_io_manager(init_context: InitResourceContext) -> BigQueryDataframeIOManager:
    return BigQueryDataframeIOManager(
        credentials=json.loads(init_context.resource_config["credentials"]),
        project_id=init_context.resource_config["project_id"],
        dataset_id=init_context.resource_config["dataset_id"],
    )
From: https://github.com/dagster-io/dagster/blob/78df951f1f0d3934a5ca3ea34a84036f31d3525c/examples/quickstart_gcp/quickstart_gcp/io_managers.py
🤖 1
Hi @Sean Lopp, I was looking at you snowreport project because you are using Bigquery and I was wondering if you had any ideas on how you can pass in the dataset name as a parameter for the asset. I see that you have it specified to "snowreport" for prod and "snowreport_branch" for branch deployment in the repository.py. I'm hoping to find a way to have those specified here somewhere if that is possible so you're not locked into one dataset:
Copy code
@asset(
    io_manager_key="bq_io_manager",
    required_resource_keys={"bq_auth"},
    ins = {key: AssetIn(key) for key in asset_keys},
    partitions_def=DailyPartitionsDefinition(start_date="2022-10-05"),
    key_prefix="snocountry"
)
def resort_raw(context, **resort_assets) -> pd.DataFrame:
    """Insert clean resort records to BQ"""
s

Sean Lopp

02/07/2023, 4:07 PM
@MikeVL I think you'll need to adjust the IO manager so that it reads the dataset from part of the asset key. That example project is a little wonky because it uses the GCP clients directly, in more recent examples I've been using pandas_gbq. Here is another example, https://github.com/slopp/dagster-conditional-etl-gcp-demo/blob/main/dagster_project/resources.py#L80 In either case, you could do something like this to have the asset key_prefix specify the dataset name:
Copy code
dataset_table = f"{context.asset_key.path[-2]"}.{context.asset_key.path[-1]}"
m

MikeVL

02/07/2023, 4:16 PM
Ah ok that makes sense. Thanks for the assistance!