Hi Folks - working more on my `dagster-dbt` use ca...
# ask-community
m
Hi Folks - working more on my
dagster-dbt
use case. Few more questions: 1. I've split up my job by making multiple calls to
load_assets_from_dbt_project
and using the
select
parameter to select models upstream and downstream of a python asset. Is there a way to give these resulting ops more meaningful names other than `run_dbt_<project_name>_<hash>`` ? 2. I have variables that I use in my DBT project - is there a pattern I can use to have the user be required to enter these as config when running a job via dagit? I know I can pass them to
dbt_cli_resource.configured
but I want to have the user be required to enter these or see defaults like with an op. I also would like to not have to specify the project and profile parameters this way though. I want those to either be defaults that nobody needs to touch or added via non-runtime config
o
hi @Martin O'Leary! right now, there's no way to manually specify a name for the underlying operation (although this is a very reasonable request). however, I do want to point out that SDAs can do this upstream/downstream selection for you automatically. If you load all of the dbt models as SDAs, and there's a non-dbt asset that is both upstream and downstream of some of those dbt models, Dagster will automatically chop up the dbt execution into separate ops. Depending on your needs, this automatic partitioning may make your implementation a bit simpler
👍 1
for the config bit, instead of passing in a dictionary of config into
.configured
, you can pass in a config function, which basically defines a new config schema (in your case, it seems like you'd want that schema to be
{"vars": Permissive()}
or something like that). Then, you can hard-code stuff that you wouldn't want to change (such as project dir / profiles dir)
🙌 1
m
Hi @owen - I was OOO for a few days and just got back to this yesterday. On the asset loading ... I previously had 3 calls to
load_assets_from_dbt_project
and used the
select
arg to specify which models should be included in that group of assets and as you suggested I let dagster/dbt do the thinking for me and replaced it with one call to
load_assets_from_dbt_project
and
select="*"
The issue I now have is that I have an operation disconnected from the rest and it seems to run in parallel (and disconnected) even though other models have these as dependencies. Here I've shown the operation graph, the flat chart of the execution and asset graph that highlights the issue. The asset graph highlights the 3 seeds and 2 models which have downstream dependencies and are part of the single (disconnected) operation in the other 2 graphs. 🤷‍♂️
Also, here is the lineage graph from DBT which shows the dependencies are picked up by DBT Does DBT check if the seeds are already loaded to the DB I wonder? Maybe it checks if they are there already and runs downstream models without having to load the seeds 🤷‍♂️
oh and when I load in the assets in multiple calls to
load_assets_from_dbt_project
the dependency seems to hold (operations execute in the right order)
o
hmm interesting -- to be clear, Dagster's representation of the asset graph is correct, it's just the operational graph that looks wrong?
👍 1
we've had a report of the operational graph showing up incorrectly in dagit under certain conditions, and I'm wondering if this is related -- it's possible that the generated job actually does have that connection between run_dbt..._2 and the downstream step, but that this connection is not being represented in dagit
I would expect it to run in parallel with run_dbt_...3, but that the final dbt step would wait for both (3) and (2) to complete
do you mind sharing the timeline view as well for this run? (should look like this):
curious if you click on that same step, if it shows up as connected to any other steps
m
Here's 2 examples of the same full job run funnily enough - the first image looks like
..dbt_2
finishes before the
...dbt
(which depends on it) As they execute "live" though it always shows
...dbt_2
finishing last 🤷‍♂️ It could be a dagit only thing which is fine because the asset graph shows the correct connections and dependencies are all connected
o
ah interesting, it looks like it's not a dagit-only thing based on those screenshots. can you confirm which asset keys are generated in the
_2
step?
it's possible that there's a bug in the subsetting logic, just trying to wrap my head around exactly what's happening
m
Sure - so the asset keys that are generated in the
...2
step are a bunch of seed files and single downstream models which operate on them before they are fed into the models in the
...dbt
job
Another bit of information - when I load all in one call to
load_assets_from_dbt_project
, the 3 separate dbt operations show the same inputs (9 sources) and same outputs (31 models) on the information on the right of the screen when I click on them
o
hm so it's just those 5? Based on the screenshot, it looks like those assets map to the
run_dbt_mca_backtesting_dbt
step (the
-o-
icon in the middle of the asset), rather than the
..._2
step. this could be a dagit issue, but I want to double check
the outputs / inputs issue is a known limitation of the current way we do this split operation (but it shouldn't impact execution)
👍 1
hm but even if they were mapped to that step, the execution diagram wouldn't make a ton of sense (there'd be no reason for that step to be sequenced after anything else)
m
The models / assets downstream of the
...2
operation are these (that's changed slightly because I was missing the python computed asset in previous screenshot as I'm changing my setup on the fly here 🙂 ) and they are all materialized as part of the
...dbt
operation
This isn't a blocker for me - I can try and create a reproducible example tomorrow and either a) you'll have a good example to go off or b) i'll see that I'm doing something wrong and correct it 🙂
o
perfect, that sounds great! 🙂