Hey all, is there a way with dagster_dbt to have i...
# integration-dbt
k
Hey all, is there a way with dagster_dbt to have it materialize assets that match a DBT tag? From what I've seen you have to use the pre-defined asset name that's given (with some customization there like prefix). Or if I want to do something like this am I better off building something custom that runs the job from the CLI resource and determines which assets were materialized? The goal is to be able to do arbitrary dbt runs and have Dagster show the corresponding assets as materialized
r
Are you using core or cloud? For core, the asset command takes in a
select
argument where you can put in your tag selector. For cloud, you can just add the
--select
argument in your dbt Cloud job. Then, when we create software defined assets from your dbt Cloud job, we will respect this selector.
k
This is for core, and I'm not sure I follow. Is this something done through
AssetSelection
? I'm on 1.0.6 / 0.16.6 if that changes anything
r
Ah, I am talking about the interface that generates software defined assets from dbt in `dagster-dbt`:
load_assets_from_dbt_project
or
load_assets_from_dbt_manifest
k
Ah so then the flow would be: Create an op that takes an arbitrary selector, loads those assets and then creates an asset job around those? Before I was just loading all assets from the dbt project by default, and then creating asset jobs that ran some subset using
AssetSelection
And given the latter, is it problematic if I redeclare assets given my solution?
r
Oh I see what you’re trying to do. Is this the workflow you are expecting? 1. Load all the models from a dbt project as assets into Dagster. a. I should only have to load this once. b. I have tagged my models accordingly using dbt tags. 2. Given the models from (1), I want to be able to select arbitrary subsets of them to materialize. a. Ideally, using the tags that I’ve created.
k
Yep! That's exactly what I'm trying to do
I'm not tied to that of course, it's just how I thought you had to load everything. So if the solution is to load them via jobs, then that's fine with me
r
• 1a: accomplished with either
load_assets_from_dbt_project
or
load_assets_from_dbt_manifest
• 1b: accomplished with the
node_info_to_group_fn
argument: you can use the node information (that contains the dbt tags!) and map the tag name to a group name • 2a: when using
define_asset_job
, use the
selection
argument that can take in an
AssetSelection
, specifically
AssetSelection.groups
. Then you can use the group name from 1b here
cc @owen if I missed anything
k
Is there a way to add multiple groups? For instance if I had a tag that looked like:
tags=['tables', 'daily']
could I have those be separate groups? From what I can tell given a node it only returns a single str
Or do you see any problem with this? From what I can tell it has the behavior I'm looking for:
Copy code
def create_arbitrary_dbt_run_job(dbt_models="tag:daily"):
    assets: Sequence[AssetsDefinition] = with_resources(
        load_assets_from_dbt_project(
            project_dir="project",
            profiles_dir=os.getenv("DBT_PROFILES_DIR"),
            select=dbt_models
        ),
        {
            "dbt": dbt_cli_resource.configured(
                {
                    "project_dir": "project",
                    "profiles_dir": os.getenv("DBT_PROFILES_DIR"),
                }
            )
        },
    )
    job = define_asset_job(name="arbitrary_dbt_test", selection=KeysAssetSelection(*assets[0].asset_keys))
    return ScheduleDefinition(
        job=job,
        cron_schedule="@daily"
    )

view_job = create_arbitrary_dbt_run_job()
r
I think the problem that I see here is that the models could potentially be loaded multiple times right? say if
tag:tables
and
tag:daily
have overlapping models, yet you want to materialize them in separate runs?
k
Ah yeah that's true. For the time being I'm less worried about the clashing. I'm trying to provide an interface and abstraction so that analysts can run jobs via dagster somewhat similarly to dbt. From a dagster standpoint do you see a problem with the sample? Or is it mostly just problematic from the DBT side?
I guess the more I think about it the more it makes sense to have pre-defined paths for builds
Either way, thanks for the help getting this going. I appreciate it!
o
hi @Kirk Stennett! that's a really clever setup actually, and I think it would work perfectly fine. In essence, I view what you're doing as creating a custom type of
AssetSelection
, which is resolved by shelling out to
dbt
. In fact, I might just model it that way explicitly (as in a
get_asset_selection_for_dbt_selection()
function, which takes in a dbt string and returns an
AssetSelection.keys()
). main issue here is actually performance, as load_assets_from_dbt_project requires compiling the project (which can be quite slow, and would need to be done in every subprocess that's executing dagster code, which can add up if you're calling this multiple times). You could use load_assets_from_dbt_manifest instead, which should be way faster
👍 1