Benedikt Buchert
02/27/2023, 9:53 PMmarts
but the correct dataset is prefixed by dbt abc_marts
. Also it is looking inside US location, which is not where the dataset is located. I tried setting the default location to "location": "europe-west3"
in my definitions which seems to have no effect. Is it possible to specify the dataset location and datset name per asset or asset group as well?
from dagster import asset, AssetIn
import pandas as pd
@asset(
ins={"dim_ga4__users": AssetIn(
key_prefix=["dbt_models", "marts"]
)},
group_name="marts"
)
def show_users_head(dim_ga4__users) -> pd.DataFrame:
print(dim_ga4__users.head)
return dim_ga4__users
jamie
02/28/2023, 5:19 PMdataset.table
abc_marts.gim_ga4__users
then you will want your AssetIn to be
ins={"dim_ga4__users": AssetIn(
key_prefix=["dbt_models", "abc_marts"]
)},
I believe that should correspond to the full asset key of the dbt dataset as loaded by dagster, but if that’s not the case, let me know and i can take a closer look.
checking on the location stuff nowjamie
02/28/2023, 5:22 PMBenedikt Buchert
02/28/2023, 8:14 PMjamie
02/28/2023, 8:28 PMjamie
02/28/2023, 8:34 PMabc_marts
or just marts
? it would also be helpful if you could share the code snippet of how you’re loading the dbt models as assets. feel free to dm that to me if you dont want to share publiclyBenedikt Buchert
02/28/2023, 8:45 PMabc
. Then in the project.yml it is set to marts for all models that are living in the marts folder. This leads to the dataset being named abc_marts
. But the default behaviour is to take AssetKey([model_name])
. So I guess what I need to do is to us the node_info_to_asset_key
function to adjust the behaviour and prefix everything with abc
or whatever I have defined in my profiles.yml.
dbt_assets = load_assets_from_dbt_project(
project_dir=DBT_PROJECT_PATH,
profiles_dir=DBT_PROFILES,
key_prefix=["dbt_models"],
source_key_prefix=["dbt_source"]
)
https://docs.dagster.io/_apidocs/libraries/dagster-dbt#assets-dbt-core
Right?Benedikt Buchert
02/28/2023, 8:48 PMmarts
I need to adjust it.jamie
02/28/2023, 8:51 PMdataset
config for the io manager override the key prefix, that would allow you to set the dataset on the io manager itself and then it would ignore key prefixesBenedikt Buchert
02/28/2023, 8:57 PMjamie
02/28/2023, 9:54 PMdataset
on the io manager. then every asset using this io manager will be stored in + loaded from that specified dataset
2. set the dataset for each asset via the key_prefix
and the io manager will store + load each asset from the dataset specified via key prefix
right now these are mutually exclusive (ie if you have key prefixes AND set dataset
config on the io manager, we throw an error), but we likely could/should relax that a bit. the issue is determining which approach to prefer if a user specifies both waysBenedikt Buchert
03/04/2023, 6:22 PMdef node_info_to_asset_key(node_info: Mapping[str, Any]) -> AssetKey:
asset_array = [
node_info['schema'],
node_info['name']
]
return AssetKey(asset_array)
This fixes the issue and also simplifies the mapping for Fivetran imports for Bigquery.
dbt_assets = load_assets_from_dbt_project(
project_dir=DBT_PROJECT_PATH,
profiles_dir=DBT_PROFILES,
key_prefix=["dbt_models"],
source_key_prefix=["dbt_source"],
node_info_to_asset_key=node_info_to_asset_key
)