I had a simple DBT project ```models foo bar baz``` And Dags dagster #integration-dbt

I had a simple DBT project: ```models/ - foo - ...

geoHeil

02/28/2023, 7:58 AM

I had a simple DBT project:

Copy code

models/
	- foo
	- bar
	- baz

And Dagster nicely was extracting foo,bar,baz into asset groups. However, now, I needed to refactor this into:

Copy code

models/
	- snowflake
		- foo
		- bar
		- baz
	- postgres
		- p1
		- p2
		- p3

as you can see one additional level is used here. How can I: - parse the DBT proect such that the snowflake/postgres (first layer) becomse the

compute_kind

label and the second layer (fo,barbaz, p1,p2,p3) is parsed to asset groups?

🤖 1

geoHeil

02/28/2023, 8:08 AM

Notice: These folders are only asset groups - not parts of the asset key. I.e. the physical location is inside the same schema

geoHeil

02/28/2023, 8:11 AM

Secondly - I also need to tell DBT to tag all secondary items with the first level - in order to allow dagster to apply a system specific IO manager.

Stephen Bailey

02/28/2023, 11:15 AM

1. you can provide a function for assinging group_names that parses the manifest / node metadata and returns what you want (which I think is

node['fqn'][1]

when you run

load_assets...

Copy code

def extract_group_name_from_node(node: dict) -> str:
    "Used for parsing asset metadata from dbt project"
    if len(node["fqn"]) > 2:
        return f"dbt__{node['fqn'][1]}__{node['fqn'][2]}"
    return f"dbt__{node['fqn'][0]}"

2. what you're really asking with the two separate io_managers is how to load two separate dbt projects using only one project directory (because dbt itself doesn't have a concept of multi-compute models within a single project initialization, unless im mistaken). what comes to mind is that you create two separate

dbt_project.yml

files and have them select for different paths --

model-paths: ["models/snowflake"]

and

model-paths: ["models/postgres"]

. Then you call the load function two separate times -- you'll also need to configure a different

profiles.yml

adatper for each and pass that in at load time..

geoHeil

02/28/2023, 1:36 PM

thanks!

geoHeil

02/28/2023, 1:37 PM

I think then (i.e. similar switches would also be required for sqlfluff) it is better to make a 2nd toplevel folder

nod2 1

3 Views

Open in Slack

Previous Next