Jason
12/05/2022, 11:22 PMdbt Cloud
resource and as Assets.
For e.g a dbt run-operation
before or after dbt build
in a job?Adam Bloom
12/05/2022, 11:24 PMJason
12/05/2022, 11:29 PMrex
12/05/2022, 11:40 PMdbt run
or dbt build
for you in your job. this constraint was added just to make the implementation simple.dbt run
and dbt build
commands in a single job? And if so, what’s your use case (i.e. why can’t it be just one)Félix Tremblay
12/06/2022, 1:46 PMdbt snapshot
on source tables, independently of the dbt build
, since the result of dbt snapshot
(the snapshots tables) are used as `source`sJason
12/06/2022, 2:41 PMdbt build
cmd we then do a dbt run-operation (get artifacts)
and a dbt build -s (artifacts models)
which basically handles the builds artifact assets which we use to track performance of models and runs.
Now obviously we could split this up, do some sort of asset/job status dependency to trigger a new dbt cloud job (and there are other options as well) but it's our current process and having this ability in the dagster library would allow for a smooth transitiondbt test
job which we run - but I believe currently only dbt run [build]
is supported so again, workaround exist but extra workrex
12/06/2022, 4:37 PMdbt test
job - doesn’t dbt build
also run tests?
Or are you saying that you have a dbt Cloud job that runs only dbt test
? And if so, are you expecting to retrieve software-defined assets from it?
I ask this because we expect at least dbt run [build]
to be present in the job, as that is what is materializing/generating the software defined assetsAdam Bloom
12/06/2022, 4:40 PMdbt run
and dbt test
(and therefore not using dbt build
) - I'm doing that (but with dbt CLI) as well. Reasoning: you want to know if tests fail, but you don't want test failures to tear down downstream assets (i.e. views).rex
12/06/2022, 4:57 PMbut you don’t want test failures to tear down downstream assets (i.e. views).Could you clarify this a bit further? I might just be unaware of this interaction in dbt. So are you saying,
dbt build
is more of an atomic transaction w.r.t. materializing models AND running tests?
Whereas separating run
and test
has the ability to have separate transactions for each step?Adam Bloom
12/06/2022, 5:00 PMThecommand will:dbt build
• run models
• test tests
• snapshot snapshots
• seed seeds
In DAG order, for selected resources or an entire project.So, if a test fails on a resource upstream in the DAG, then the downstream models aren't run - they'll be skipped due to the test failures (they're treated as if the model run failed with
dbt run
). Sometimes useful, but not really for my use casesrex
12/06/2022, 5:01 PMAdam Bloom
12/06/2022, 5:04 PMrex
12/06/2022, 10:08 PMdbt run
or dbt build
.
So:
• ✅ ["dbt build"]
• ✅ ["dbt run"]
• ✅ ["dbt run", "dbt test"]
• ✅ ["dbt test", "dbt run"]
• ✅ ["dbt snapshot", "dbt build"]
• ✅ ["dbt build", "dbt run-operation"]
• ❌ ["dbt build", "dbt run-operation", "dbt build"]
• ❌ ["dbt build", "dbt run"]
• ❌ ["dbt deps"]
• ❌ ["dbt test"]
After running our maincmd we then do adbt build
and adbt run-operation (get artifacts)
which basically handles the builds artifact assets which we use to track performance of models and runs.dbt build -s (artifacts models)
Now obviously we could split this up, do some sort of asset/job status dependency to trigger a new dbt cloud job (and there are other options as well) but it’s our current process and having this ability in the dagster library would allow for a smooth transitionI will have to think about this use case a bit more. At first pass, I don’t want our library to be in the business of merging multiple run artifacts manually - i would just like to retrieve a single artifact that describes the entire dbt Cloud job, and process that… i’ll post in the dbt slack for some guidance in that regard
Jason
12/06/2022, 10:17 PMdbt build
& dbt run-operation
in one job then in another job dbt build -s artifact models
rex
12/06/2022, 10:18 PMdbt build
and dbt build -s artifact models
have overlapping models?Jason
12/06/2022, 10:20 PMrex
12/06/2022, 10:23 PMdbt build
would select all the models though, right?
So even if you split it up into different dbt Cloud jobs, dbt build -s (artifacts models)
will be a subset of the initial dbt build
. So you would technically be creating duplicate software defined assets for the subset of models selectedJason
12/06/2022, 10:30 PMdbt build —exclude tag:my_model
so everything but my_model is created and hence no asset.
Then in another job I’d use dbt build -m tag:my_model
and only run those models.
I’m thinking each will be their own asset_groups and jobs. Then I make the 2nd asset group trigger when the first materializes successfully.
BUT now that I’m thinking about it, how would Dagster handle the run-operation
or other non build cmds? Will it always run the cmd when the job is launched?rex
12/06/2022, 10:32 PMJason
12/06/2022, 10:34 PMrex
12/06/2022, 10:34 PMJason
12/06/2022, 10:34 PMrex
12/06/2022, 10:34 PMJason
12/06/2022, 10:35 PMrex
12/07/2022, 5:36 AMJason
12/08/2022, 10:46 PM1.1.6
but I'm getting this message when testing. (plus got to run but be back a bit later to follow up)
dagster_dbt.errors.DagsterDbtCloudJobInvariantViolationError: The dbt Cloud job 'Dagster - Daily' (****) must have a single `dbt run` or `dbt build` in its commands. Received commands: ['dbt build --exclude tag:dbt_artifacts+', "dbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'"].
rex
12/08/2022, 11:03 PMdbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'
(notice the two spaces between dbt
and run-operation
)