Is there a future plan to support multiple dbt cmd...
# integration-dbt
j
Is there a future plan to support multiple dbt cmd statements in a job? Referring to
dbt Cloud
resource and as Assets. For e.g a
dbt run-operation
before or after
dbt build
in a job?
a
You should be able to do that today. You'll just need to build your own op to do so. https://docs.dagster.io/_apidocs/libraries/dagster-dbt#dagster_dbt.DbtCliResource.run_operation
j
Thanks @Adam Bloom Sorry I meant with “dbt Cloud” as software defined assets, should have mentioned that in the original msg 🙂
r
yeah this should be doable in the future. basically we can just autodiscover your
dbt run
or
dbt build
for you in your job. this constraint was added just to make the implementation simple.
a question back at you though: 1. What’s your use case for running commands before your `dbt run`/`dbt build`? What specific commands are you running, and to do what? 2. Are you also expecting support for multiple
dbt run
and
dbt build
commands in a single job? And if so, what’s your use case (i.e. why can’t it be just one)
Feel free to file a feature request if you want to track this issue 🙂 I can assign it to myself
f
I'm also curious about this feature. My use case is that I need to run
dbt snapshot
on source tables, independently of the
dbt build
, since the result of
dbt snapshot
(the snapshots tables) are used as `source`s
j
Hey Rex, my use case.... After running our main
dbt build
cmd we then do a
dbt run-operation (get artifacts)
and a
dbt build -s (artifacts models)
which basically handles the builds artifact assets which we use to track performance of models and runs. Now obviously we could split this up, do some sort of asset/job status dependency to trigger a new dbt cloud job (and there are other options as well) but it's our current process and having this ability in the dagster library would allow for a smooth transition
We also have a
dbt test
job which we run - but I believe currently only
dbt run [build]
is supported so again, workaround exist but extra work
r
re: your
dbt test
job - doesn’t
dbt build
also run tests? Or are you saying that you have a dbt Cloud job that runs only
dbt test
? And if so, are you expecting to retrieve software-defined assets from it? I ask this because we expect at least
dbt run [build]
to be present in the job, as that is what is materializing/generating the software defined assets
If one of you could make an issue around this that would be great!
a
I think there is a valid use case for separating
dbt run
and
dbt test
(and therefore not using
dbt build
) - I'm doing that (but with dbt CLI) as well. Reasoning: you want to know if tests fail, but you don't want test failures to tear down downstream assets (i.e. views).
r
but you don’t want test failures to tear down downstream assets (i.e. views).
Could you clarify this a bit further? I might just be unaware of this interaction in dbt. So are you saying,
dbt build
is more of an atomic transaction w.r.t. materializing models AND running tests? Whereas separating
run
and
test
has the ability to have separate transactions for each step?
a
from https://docs.getdbt.com/reference/commands/build
The
dbt build
command will:
• run models
• test tests
• snapshot snapshots
• seed seeds
In DAG order, for selected resources or an entire project.
So, if a test fails on a resource upstream in the DAG, then the downstream models aren't run - they'll be skipped due to the test failures (they're treated as if the model run failed with
dbt run
). Sometimes useful, but not really for my use cases
r
ahh i see i see, thanks for the clarification
yeah let me try to get this in for this week’s release 🫡
a
If there's a better way to handle this with dbt cli assets, I'd be interested too. I'm currently handling this with a job that is triggered by a sensor, which does the trick.
r
I am going to build support for multiple commands, but will still retain the constraint that the dbt Cloud must have one of
dbt run
or
dbt build
. So: •
["dbt build"]
["dbt run"]
["dbt run", "dbt test"]
["dbt test", "dbt run"]
["dbt snapshot", "dbt build"]
["dbt build", "dbt run-operation"]
["dbt build", "dbt run-operation", "dbt build"]
["dbt build", "dbt run"]
["dbt deps"]
["dbt test"]
@Jason re:
After running our main
dbt build
cmd we then do a
dbt run-operation (get artifacts)
and a
dbt build -s (artifacts models)
which basically handles the builds artifact assets which we use to track performance of models and runs.
Now obviously we could split this up, do some sort of asset/job status dependency to trigger a new dbt cloud job (and there are other options as well) but it’s our current process and having this ability in the dagster library would allow for a smooth transition
I will have to think about this use case a bit more. At first pass, I don’t want our library to be in the business of merging multiple run artifacts manually - i would just like to retrieve a single artifact that describes the entire dbt Cloud job, and process that… i’ll post in the dbt slack for some guidance in that regard
j
@rex that makes sense - I can setup
dbt build
&
dbt run-operation
in one job then in another job
dbt build -s artifact models
For the cmd above are they reversible?
r
would not
dbt build
and
dbt build -s artifact models
have overlapping models?
yeah it’s reversible
👍🏾 1
j
In dbt we have them separated by tags.
r
your initial
dbt build
would select all the models though, right? So even if you split it up into different dbt Cloud jobs,
dbt build -s (artifacts models)
will be a subset of the initial
dbt build
. So you would technically be creating duplicate software defined assets for the subset of models selected
j
Sorry, I didn’t actually provide you with the entire cmd. The dbt job (cmd) uses include and exclude arguments for tags. So the initial one would look something like
dbt build —exclude tag:my_model
so everything but my_model is created and hence no asset. Then in another job I’d use
dbt build -m tag:my_model
and only run those models. I’m thinking each will be their own asset_groups and jobs. Then I make the 2nd asset group trigger when the first materializes successfully. BUT now that I’m thinking about it, how would Dagster handle the
run-operation
or other non build cmds? Will it always run the cmd when the job is launched?
r
gotcha, makes sense. Dagster will just trigger the dbt Cloud job, and the dbt Cloud job will run the commands
j
That’s what I thought but I can also select individual assets - so in that case you only overwrite the build cmd I’m assume
r
yeah we will only overwrite the build command
j
Awesome. Sounds good to me
r
( in the subsetting case)
sweet
thank you all for the input!
👍🏾 1
j
no problem. That’s for the quick response on this
r
👍🏾 1
j
@rex I just upgraded to
1.1.6
but I'm getting this message when testing. (plus got to run but be back a bit later to follow up)
Copy code
dagster_dbt.errors.DagsterDbtCloudJobInvariantViolationError: The dbt Cloud job 'Dagster - Daily' (****) must have a single `dbt run` or `dbt build` in its commands. Received commands: ['dbt build --exclude tag:dbt_artifacts+', "dbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'"].
r
sorry about this - we’re doing a string check to see if the command is “dbt run” or “dbt build”, and here its treating as if “dbt run-operation” is a “dbt run” since they share the same prefix
… as a workaround, you could try using
dbt  run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'
(notice the two spaces between
dbt
and
run-operation
)
👍🏾 1