Hi Folks working more on my `dagster dbt` use case Few more dagster #ask-community

Hi Folks - working more on my `dagster-dbt` use ca...

Martin O'Leary

08/10/2022, 8:26 AM

Hi Folks - working more on my

dagster-dbt

use case. Few more questions: 1. I've split up my job by making multiple calls to

load_assets_from_dbt_project

and using the

select

parameter to select models upstream and downstream of a python asset. Is there a way to give these resulting ops more meaningful names other than `run_dbt_<project_name>_<hash>`` ? 2. I have variables that I use in my DBT project - is there a pattern I can use to have the user be required to enter these as config when running a job via dagit? I know I can pass them to

dbt_cli_resource.configured

but I want to have the user be required to enter these or see defaults like with an op. I also would like to not have to specify the project and profile parameters this way though. I want those to either be defaults that nobody needs to touch or added via non-runtime config

owen

08/10/2022, 5:01 PM

hi @Martin O'Leary! right now, there's no way to manually specify a name for the underlying operation (although this is a very reasonable request). however, I do want to point out that SDAs can do this upstream/downstream selection for you automatically. If you load all of the dbt models as SDAs, and there's a non-dbt asset that is both upstream and downstream of some of those dbt models, Dagster will automatically chop up the dbt execution into separate ops. Depending on your needs, this automatic partitioning may make your implementation a bit simpler

👍 1

owen

08/10/2022, 5:07 PM

for the config bit, instead of passing in a dictionary of config into

.configured

, you can pass in a config function, which basically defines a new config schema (in your case, it seems like you'd want that schema to be

{"vars": Permissive()}

or something like that). Then, you can hard-code stuff that you wouldn't want to change (such as project dir / profiles dir)

🙌 1

owen

08/10/2022, 5:07 PM

docs here: https://docs.dagster.io/concepts/configuration/configured#partially-filling-the-configuration

Martin O'Leary

08/16/2022, 10:10 AM

Hi @owen - I was OOO for a few days and just got back to this yesterday. On the asset loading ... I previously had 3 calls to

load_assets_from_dbt_project

and used the

select

arg to specify which models should be included in that group of assets and as you suggested I let dagster/dbt do the thinking for me and replaced it with one call to

load_assets_from_dbt_project

and

select="*"

The issue I now have is that I have an operation disconnected from the rest and it seems to run in parallel (and disconnected) even though other models have these as dependencies. Here I've shown the operation graph, the flat chart of the execution and asset graph that highlights the issue. The asset graph highlights the 3 seeds and 2 models which have downstream dependencies and are part of the single (disconnected) operation in the other 2 graphs. 🤷‍♂️

Martin O'Leary

08/16/2022, 10:35 AM

Also, here is the lineage graph from DBT which shows the dependencies are picked up by DBT Does DBT check if the seeds are already loaded to the DB I wonder? Maybe it checks if they are there already and runs downstream models without having to load the seeds 🤷‍♂️

Martin O'Leary

08/16/2022, 10:35 AM

Martin O'Leary

08/17/2022, 10:19 AM

oh and when I load in the assets in multiple calls to

load_assets_from_dbt_project

the dependency seems to hold (operations execute in the right order)

owen

08/17/2022, 5:55 PM

hmm interesting -- to be clear, Dagster's representation of the asset graph is correct, it's just the operational graph that looks wrong?

👍 1

owen

08/17/2022, 5:56 PM

we've had a report of the operational graph showing up incorrectly in dagit under certain conditions, and I'm wondering if this is related -- it's possible that the generated job actually does have that connection between run_dbt..._2 and the downstream step, but that this connection is not being represented in dagit

owen

08/17/2022, 5:57 PM

I would expect it to run in parallel with run_dbt_...3, but that the final dbt step would wait for both (3) and (2) to complete

owen

08/17/2022, 5:58 PM

do you mind sharing the timeline view as well for this run? (should look like this):

owen

08/17/2022, 5:59 PM

curious if you click on that same step, if it shows up as connected to any other steps

Martin O'Leary

08/17/2022, 8:26 PM

Here's 2 examples of the same full job run funnily enough - the first image looks like

..dbt_2

finishes before the

...dbt

(which depends on it) As they execute "live" though it always shows

...dbt_2

finishing last 🤷‍♂️ It could be a dagit only thing which is fine because the asset graph shows the correct connections and dependencies are all connected

owen

08/17/2022, 8:28 PM

ah interesting, it looks like it's not a dagit-only thing based on those screenshots. can you confirm which asset keys are generated in the

_2

step?

owen

08/17/2022, 8:28 PM

it's possible that there's a bug in the subsetting logic, just trying to wrap my head around exactly what's happening

Martin O'Leary

08/17/2022, 8:32 PM

Sure - so the asset keys that are generated in the

...2

step are a bunch of seed files and single downstream models which operate on them before they are fed into the models in the

...dbt

job

Martin O'Leary

08/17/2022, 8:35 PM

Another bit of information - when I load all in one call to

load_assets_from_dbt_project

, the 3 separate dbt operations show the same inputs (9 sources) and same outputs (31 models) on the information on the right of the screen when I click on them

owen

08/17/2022, 8:35 PM

hm so it's just those 5? Based on the screenshot, it looks like those assets map to the

run_dbt_mca_backtesting_dbt

step (the

-o-

icon in the middle of the asset), rather than the

..._2

step. this could be a dagit issue, but I want to double check

owen

08/17/2022, 8:36 PM

the outputs / inputs issue is a known limitation of the current way we do this split operation (but it shouldn't impact execution)

👍 1

owen

08/17/2022, 8:38 PM

hm but even if they were mapped to that step, the execution diagram wouldn't make a ton of sense (there'd be no reason for that step to be sequenced after anything else)

Martin O'Leary

08/17/2022, 8:41 PM

The models / assets downstream of the

...2

operation are these (that's changed slightly because I was missing the python computed asset in previous screenshot as I'm changing my setup on the fly here 🙂 ) and they are all materialized as part of the

...dbt

operation

Martin O'Leary

08/17/2022, 8:43 PM

This isn't a blocker for me - I can try and create a reproducible example tomorrow and either a) you'll have a good example to go off or b) i'll see that I'm doing something wrong and correct it 🙂

owen

08/17/2022, 8:44 PM

perfect, that sounds great! 🙂

Open in Slack

Previous Next