Lorenzo
01/17/2023, 10:37 AM"dbt ls --output json ... --select model: xyz "
and it does it for each job, it starts over again every time. Seems like it is checking what models of my project have alreasy been materialized and which ones are still never materialized, how can I avoid this time-consuming behaviour?
Thanks in advance! 👀Jonathan Neo
01/17/2023, 11:48 AMdbt_assets = load_assets_from_dbt_project(
project_dir=DBT_PROJECT_PATH, profiles_dir=DBT_PROFILES, key_prefix=["jaffle_shop"]
)
Lorenzo
01/17/2023, 12:05 PMTMP_UM_CS_S_CHANNEL_P_AU_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_CS_S_CHANNEL_P_AU_NEW_RECORDS")
TMP_UM_PO_B_PO_HEADER_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_PO_B_PO_HEADER_NEW_RECORDS")
TMP_UM_PO_B_PURCHASE_ORDER_DETAIL_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_PO_B_PURCHASE_ORDER_DETAIL_NEW_RECORDS")
TMP_UM_QU_B_QUESTIONNAIRE_DETAIL_P_AU_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_QU_B_QUESTIONNAIRE_DETAIL_P_AU_NEW_RECORDS")
TMP_UM_QU_B_QUESTIONNAIRE_HEADER_P_AU_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_QU_B_QUESTIONNAIRE_HEADER_P_AU_NEW_RECORDS")
TMP_UM_QU_S_QUESTIONNAIRE_ANSWER_P_AU_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_QU_S_QUESTIONNAIRE_ANSWER_P_AU_NEW_RECORDS")
TMP_UM_SH_B_SHOP_GOLIVES_P_AU_NEW_RECORDS_asset = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="TMP_UM_SH_B_SHOP_GOLIVES_P_AU_NEW_RECORDS")
Jonathan Neo
01/17/2023, 12:13 PMload_assets_from_dbt_project
to load all my dbt assets.
If you want to specify certain dbt models only, you could do:
load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select="model_1 model_2 model_3 model_4")
Having multiple load_assets_from_dbt_project
would trigger multiple dbt ls
commands, and therefore take a long time to execute.Lorenzo
01/17/2023, 12:18 PMJonathan Neo
01/17/2023, 12:55 PMbuild_asset_reconciliation_sensor
. What the reconciliation sensor does is that it materializes only model_4 when model_3 is fixed.
I have an example here in a toy project: https://github.com/jonathanneo/my-dbt-dagster/blob/578ff10b9c1a4478f5e5462e7aa5d3ff2a4e07e7/stargazer/assets_modern_data_stack/my_asset.py#L97-L99Lorenzo
01/17/2023, 1:37 PMAdam Bloom
01/17/2023, 3:16 PMload_assets_from_dbt_manifest
loader instead of the one you’re currently using: https://docs.dagster.io/_apidocs/libraries/dagster-dbt#dagster_dbt.load_assets_from_dbt_manifest
This requires you to run dbt ls
yourself (I.e. during your user code deployment container build) and then reuses the output for every dbt asset.Lorenzo
01/18/2023, 1:17 PM/usr/bin/python3 /home/lorenzo/.local/bin/dbt --no-use-color --log-format json ls --project-dir /home/lorenzo/Documents/GitHub/dagster-dbt-test/dbt_python_assets/dbt_python_assets/../UM_FOX_AU-dbt/dbt --profiles-dir /home/lorenzo/Documents/GitHub/dagster-dbt-test/dbt_python_assets/dbt_python_assets/../UM_FOX_AU-dbt/dbt/config --select TMP_UM_SH_B_SHOP_HIERARCHY_P_AU_UPDATE --output json
for each and every asset during my run. Keep in mind that I imported every model as a singular asset to be able to restart the DAGs with maximum granularity. It looks a bit strange, because it does this command for each asset during the import of the code, and then it repeats the same thing for each asset (again) when I run a DAG.
Thank you! yayQwame
01/18/2023, 5:52 PMdbt ls
command for any asset that I materialize, even if it's not a dbt asset.Adam Bloom
01/18/2023, 5:57 PMload_assets_from_dbt_manifest
rather than load_assets_from_dbt_project
- see my comment aboveQwame
01/18/2023, 5:59 PMdbt ls
on each asset materialization.Adam Bloom
01/18/2023, 6:00 PMload_assets_from_dbt_project
is invoked. you won't see it happening on each startup with load_assets_from_dbt_manifest
Qwame
01/18/2023, 6:02 PMowen
01/18/2023, 6:05 PMload_assets_from_dbt*
will allow you to execute any subset of dbt models, so loading each model as a separate call is not recommended and doesn't have a real benefit.load_assets_from_dbt_project
, that means that in order to load your repository code, dagster will need to run dbt ls
(there's no way to load just the subset of the repository that is unrelated to dbt). I'd definitely endorse @Adam Bloom’s suggestion of using load_assets_from_dbt_manifest
for this case.Lorenzo
01/19/2023, 8:50 AM