Hey folks, sharing this here in case anyone else r...
# integration-dbt
m
Hey folks, sharing this here in case anyone else runs into it soon. Might stem from
dagster-dbt
v 0.22.4
. Issue: Cannot reference same Dagster Asset Key in two DBT Sources My guess is, we're not the only ones using this pattern: having one asset materialized as two or more tables, which are then used as DBT sources.
pingu surprise 2
a
The best practice pattern is one asset per table. You can have a single op materialize multiple assets though
m
Thanks @Adam Bloom for the feedback! 🙏 I'll have to think about that. (Disclosure: I do not have expertise in Data Engineering -- yet -- but I'm trying to learn as I go) Hmm, I'd agree it's a practice that makes things easier to orchestrate. However, it feels overly limited. My understanding is that an
IOManager
allows you to abstract away from the asset how the asset is stored. Hence, I should be able to choose whether I want to use a : • single denormalized table, or • a normalized data model, or • any other approach. If I stick to "one asset per table", I'm coupling my data model to my dagster code. Granted, in DBT, I work directly with tables, so there's a baked-in coupling, but it doesn't have to be that way with Dagster too, I guess.
1000 1
r
I responded here: https://github.com/dagster-io/dagster/issues/19701#issuecomment-1936361435. As Adam said, the practice is one asset per table.
If I stick to "one asset per table", I'm coupling my data model to my dagster code.
With our documentation, we provide a method
get_asset_keys_by_output_name_for_source
so you don't have to statically define your asset keys when you define your computations. You can just retrieve them from your dbt project. https://docs.dagster.io/integrations/dbt/reference#upstream-dependencies
🙏 2
i
@rex I had this issue, i used the multi_asset as it was suggested but when the dbt asset runs, its running with two dbt cli commands instead of one
I've sent you the logs in your dm, so you can see by yourself
As two CLI Commands runs at a single run, the database is throwing a exception
faced this the entire day, couldn't find a fix for it and I was just changing my entire project to use dagster-dbt 😞
r
Let's use your original thread so we don't ping these other folks -- i can follow up with you.
m
Thanks @rex for the quick response!
❤️ 1
b
I'd like to challenge the “best practice” of one asset per table. I don’t believe that’s true in all situations. Bulk table loads where it’s necessary to materialize all tables; I.e., replicating an entire database en masse, are often necessary. A lot of the time with multiple purposes. I may require the entirety of the database to be replicated into my target while I only care to further orchestrate or monitor a handful of the tables. If I only need to track that the entire replication has occurred, it makes no sense to split the tables that I want/need and the tables that I don't. The asset abstraction I use should be determined by the computation and use case pertinent to that asset alone, not what fits the paradigm of the downstream external tool. Moreover, I shouldn’t be forced into a “best practice” or specific abstraction for my assets. Having to register each and every table of the database as an asset when the appropriate level of abstraction for me is the all-or-nothing replication asset makes very little sense. It does make sense for all dbt generated models to be their own Dagster assets since that’s exactly how they’re defined. Sources are different. They’re not defined in dbt. They're referenced. An asset does not need to be a source model and a source model does not have to be an asset, so why should I have to force that? One asset to one table may often be the right way to go. But not always. And to be forced to do it leaves a bad taste in my mouth.