Kirk Stennett
12/06/2022, 4:17 PMrex
12/06/2022, 4:21 PMselect
argument where you can put in your tag selector.
For cloud, you can just add the --select
argument in your dbt Cloud job. Then, when we create software defined assets from your dbt Cloud job, we will respect this selector.Kirk Stennett
12/06/2022, 4:23 PMAssetSelection
? I'm on 1.0.6 / 0.16.6 if that changes anythingrex
12/06/2022, 4:23 PMload_assets_from_dbt_project
or load_assets_from_dbt_manifest
Kirk Stennett
12/06/2022, 4:25 PMAssetSelection
And given the latter, is it problematic if I redeclare assets given my solution?rex
12/06/2022, 4:29 PMKirk Stennett
12/06/2022, 4:30 PMrex
12/06/2022, 4:34 PMload_assets_from_dbt_project
or load_assets_from_dbt_manifest
• 1b: accomplished with the node_info_to_group_fn
argument: you can use the node information (that contains the dbt tags!) and map the tag name to a group name
• 2a: when using define_asset_job
, use the selection
argument that can take in an AssetSelection
, specifically AssetSelection.groups
. Then you can use the group name from 1b hereKirk Stennett
12/06/2022, 4:39 PMtags=['tables', 'daily']
could I have those be separate groups? From what I can tell given a node it only returns a single strdef create_arbitrary_dbt_run_job(dbt_models="tag:daily"):
assets: Sequence[AssetsDefinition] = with_resources(
load_assets_from_dbt_project(
project_dir="project",
profiles_dir=os.getenv("DBT_PROFILES_DIR"),
select=dbt_models
),
{
"dbt": dbt_cli_resource.configured(
{
"project_dir": "project",
"profiles_dir": os.getenv("DBT_PROFILES_DIR"),
}
)
},
)
job = define_asset_job(name="arbitrary_dbt_test", selection=KeysAssetSelection(*assets[0].asset_keys))
return ScheduleDefinition(
job=job,
cron_schedule="@daily"
)
view_job = create_arbitrary_dbt_run_job()
rex
12/06/2022, 4:59 PMtag:tables
and tag:daily
have overlapping models, yet you want to materialize them in separate runs?Kirk Stennett
12/06/2022, 5:07 PMowen
12/06/2022, 9:22 PMAssetSelection
, which is resolved by shelling out to dbt
. In fact, I might just model it that way explicitly (as in a get_asset_selection_for_dbt_selection()
function, which takes in a dbt string and returns an AssetSelection.keys()
).
main issue here is actually performance, as load_assets_from_dbt_project requires compiling the project (which can be quite slow, and would need to be done in every subprocess that's executing dagster code, which can add up if you're calling this multiple times). You could use load_assets_from_dbt_manifest instead, which should be way faster