Hi, how can I configure the dataset and location t...
# integration-bigquery
b
Hi, how can I configure the dataset and location to read from? It is an asset produced by dbt. I read https://docs.dagster.io/_apidocs/libraries/dagster-gcp-pandas#dagster_gcp_pandas.bigquery_pandas_io_manager Currently it is looking in
marts
but the correct dataset is prefixed by dbt
abc_marts
. Also it is looking inside US location, which is not where the dataset is located. I tried setting the default location to
"location": "europe-west3"
in my definitions which seems to have no effect. Is it possible to specify the dataset location and datset name per asset or asset group as well?
Copy code
from dagster import asset, AssetIn
import pandas as pd

@asset(
    ins={"dim_ga4__users": AssetIn(
        key_prefix=["dbt_models", "marts"]
    )},
    group_name="marts"
)
def show_users_head(dim_ga4__users) -> pd.DataFrame:
    print(dim_ga4__users.head)
    return dim_ga4__users
j
hey @Benedikt Buchert if the dbt dataset is in the
dataset.table
abc_marts.gim_ga4__users
then you will want your AssetIn to be
Copy code
ins={"dim_ga4__users": AssetIn(
        key_prefix=["dbt_models", "abc_marts"]
    )},
I believe that should correspond to the full asset key of the dbt dataset as loaded by dagster, but if that’s not the case, let me know and i can take a closer look. checking on the location stuff now
yeah that’s an oversight on my part - the location didn’t get fully propagated to the query. putting up a PR now and it should get into this week’s release
b
Thank you @jamie thank you for creating that pull request. The key prefix that is automatically pulled from dbt is marts even though the dataset is abc_marts. This is because the schema from dbt https://docs.getdbt.com/docs/build/custom-schemas#why-does-dbt-concatenate-the-custom-schema-to-the-target-schema. I guess I can change that on the dbt side. But I guess it would be nice having the ability to adjust this probably it is more of a dbt integration issue though. Currently if I use key_prefix=["dbt_models", "abc_marts"] it does not match anymore.
j
i see
i’ll be the first to admit my dbt knowledge isn’t very strong, so correct me if i’m wrong about any of this. Based on my understanding you have a schema specified in your profiles.yml - what schema is that?
abc_marts
or just
marts
? it would also be helpful if you could share the code snippet of how you’re loading the dbt models as assets. feel free to dm that to me if you dont want to share publicly
b
In profiles.yml It is defined as
abc
. Then in the project.yml it is set to marts for all models that are living in the marts folder. This leads to the dataset being named
abc_marts
. But the default behaviour is to take
AssetKey([model_name])
. So I guess what I need to do is to us the
node_info_to_asset_key
function to adjust the behaviour and prefix everything with
abc
or whatever I have defined in my profiles.yml.
Copy code
dbt_assets = load_assets_from_dbt_project(
    project_dir=DBT_PROJECT_PATH,
    profiles_dir=DBT_PROFILES,
    key_prefix=["dbt_models"],
    source_key_prefix=["dbt_source"]
)
https://docs.dagster.io/_apidocs/libraries/dagster-dbt#assets-dbt-core Right?
At least for the last asset key prefix
marts
I need to adjust it.
j
yeah basically the last key prefix before the asset name needs to match the dataset name. I think this is also a really good argument to have the
dataset
config for the io manager override the key prefix, that would allow you to set the dataset on the io manager itself and then it would ignore key prefixes
👍 1
b
If I would do it in the io manager, I assume I would still be able to adjust that dynamically per model. So it knows the correct dataset per model?
j
basically the two approaches you can take right now are: 1. set
dataset
on the io manager. then every asset using this io manager will be stored in + loaded from that specified dataset 2. set the dataset for each asset via the
key_prefix
and the io manager will store + load each asset from the dataset specified via key prefix right now these are mutually exclusive (ie if you have key prefixes AND set
dataset
config on the io manager, we throw an error), but we likely could/should relax that a bit. the issue is determining which approach to prefer if a user specifies both ways
👍 1
b
Copy code
def node_info_to_asset_key(node_info: Mapping[str, Any]) -> AssetKey:
    asset_array = [
        node_info['schema'],
        node_info['name']
    ]
    return AssetKey(asset_array)
This fixes the issue and also simplifies the mapping for Fivetran imports for Bigquery.
Copy code
dbt_assets = load_assets_from_dbt_project(
    project_dir=DBT_PROJECT_PATH,
    profiles_dir=DBT_PROFILES,
    key_prefix=["dbt_models"],
    source_key_prefix=["dbt_source"],
    node_info_to_asset_key=node_info_to_asset_key
)