Hello all, I am looking at moving above the softwa...
# integration-dbt
Hello all, I am looking at moving above the software defined asset layer for dbt executions. I just don't see the SDA approach scaling as we have Python scripts which ingest hundreds of tables in a single execution, and multiple dbt projects downstream from that. Even if I go to the trouble of refactoring our dbt projects for key linking between them, I don't see how a single ingestion script can generate the requisite hundreds of source table SDAs for the dbt assets to link to. Ideally, I would like to define an op as a run/build/test execution of a dbt project through the dbt CLI. I believe I can then build a graph of these executions, link them to the source data ingestion step also as an op, define a graph, and execute the graph in a single job. Does anyone have an example of a dbt CLI execution within an op? DBT 1.5 is now out that has the Python library to execute DBT within Python, which could be done within an op. Not many code examples to go on yet. Any input, guidance, or sanity check is appreciated.
Hi @Eric Coleman, as mentionned in the above thread, you can achieve that using @multi_asset and using (if necessary) the asset factory pattern. I don't think you would need to refactor your dbt projects (we didn't). As long as you can define a logic based on your dbt model definition (node info), you can link your assets in Dagster. In our project, we a mix of @dbt_asset and op / job using DbtCliResource yielding asset materializations. We used the op / job approach to leverage dbt state selection (see this discussion on github). Hope this helps.
Yes, think the op yielding an asset materialization is what I was looking for. Thank you! I think long term we can to refactor into the SDA architecture, but we have some aggressive timelines and I am looking for an on-ramp to get us there. This might do it.
👌 1