Really would like to have `dbt tests` as bona-fide...
# dagster-feedback
e
Really would like to have
dbt tests
as bona-fide Dagster Assets. We've thought about it and they seem ideal as assets, with a catch. I want them to materialize the test result on failure (
ignore_handled_error
in
dbt_resource
works for this). But currently tests are hard coded in
dagster_dbt
to not be assets, and my hacky workarounds have short-comings 😛. But it is really nice to be able to put tests, which are really just sql models like dbt, into an asset lineage graph, see the dependencies, assign freshness policies and have a reconciliation sensor handle it all, plus the occasional asset job for batch tests.
s
I think currently they are logged as
AssetObservation
events, which seems to me the right approach conceptually. The problem you get with tests is that you usually have more than 1 test per model, which means you'll quickly obliterate your asset graph's intelligibility. Have you tried using
dbt build
instead of
dbt run
to materialize your models? Then you are basically lumping all your model sql (test + model) into the same asset, which sounds like what you are aiming for.
e
I do use that instead, but there are a few disadvantages, let me know if I am off: 1. RunFailure sensors (or similar) would report the asset itself as failed to materialize if test failed, or would not trigger, depending on the settings in the dbt resource. But the asset is in the database, tests are just not passing. 2. There is no easy way to see which tests failed in the UI, Or visualize the dependencies of tests that depend on multiple DBT models, or trigger tests that depend on multiple models. 3. If I could set the tests as downstream assets, I can choose when to run them. I can set freshness policy, etc. and have it be managed by my reconciliation sensor. (This means that users can set freshness policy on the test, I don't have to do anything in dagster). 4. It helps with visibility and governance. Dagster UI can be very nice for this!
s
Yeah, you're right. What we actually do in practice and which I meant to mention was that we basically have two levels of assets in our project: • table-level assets are what gets populated by
load_assets_from_dbt_project
• job-level assets are jobs of
dbt run --select x.y.z
and
dbt test --select x.y.z
so we have schedules/sensors that run all the job-level assets on their expected cadence, and we build the dependency graph on that. our dbt project is layered, so it basically amounts to
Copy code
run_layer_1_job = `dbt run --select layer_1`
test_layer_1_job = `dbt test --select layer_1` #downstream of layer 1
run_layer_2_job = `dbt run --select layer_2` # downstream of layer 1
test_layer_2_job = `dbt test --select layer_2` #downstream of layer 2
run_layer_3_job = `dbt run --select layer_3` # downstream of layer 2
...
this means that we never use reconciliation sensors or freshness policies for dbt
e
Thanks for the knowledge sharing, really appreciated. We are going to do something similar for now most likely.
I, nevertheless, made a draft PR real quick. Would appreciate any feedback. The benefits to tests as Real assets are many. I can use the full capabilities of dagsters software-defined-assets including Freshness policy, downstream and upstream logic (think of downstream assets that rely on a specific test), have all of this be automatically run with retries and user-defined parameters instead of via explicitly defined jobs and schedules. And at least we don't mind having the tests visible in the UI, We categorize things into asset groups anyway so would be happy to just hide the irrelevant groups. https://github.com/dagster-io/dagster/pull/13324