https://dagster.io/ logo
#integration-dbt
Title
# integration-dbt
t

Tobias Macey

01/18/2023, 3:43 PM
I just want to double check my understanding. I've got a definition that is loading a set of assets from an Airbyte instance (using
load_assets_from_airbyte_instance
) which creates the raw data tables which my dbt project relies on. I also have it loading the dbt project using
load_assets_from_dbt_project
, which loads all of those assets. Unfortunately the staging dbt models are disconnected in the lineage view from the airbyte streams. If I follow things correctly it seems that I need to define an IO manager that will serve to map those two sets of assets together? Or is there another way to tell Dagster that the two groupings of assets are related without having to write a custom IO manager that communicates with AWS Glue.
j

Jonathan Neo

01/18/2023, 3:49 PM
Hey @Tobias Macey does your dbt project use the dbt
sources.yml
to define the airbyte sources? That might be what's missing. Here's a toy project that I put together with airbyte and dbt: https://github.com/jonathanneo/data-aware-orchestration
Dagster uses that to create the global DAG between different asset classes (e.g. airbyte, dbt).
t

Tobias Macey

01/18/2023, 3:52 PM
So, the challenge is that I'm using Airbyte to load data into S3 and populate table definitions in AWS Glue, which are then processed via Trino. dbt has a sources file that points to the tables in Trino, but there's not a continuous thread between them.
I'll take a look at your example though.
j

Jonathan Neo

01/18/2023, 3:54 PM
Ah I see. My toy repo is a vanilla use-case and probably won't give you the answer you're looking for.
You might be able to trick dagster into thinking that the airbyte and trino assets are the same by using the dbt sources.yml file.
Copy code
version: 2
sources:
- name: trino
  database: trino
  schema: public
  tables:
    - name: airbyte_asset_name # this is what (1) dagster will use to create the global DAG, and (2) what dbt source() macro will use
      identifier: trino_table_name # this is what dbt will physically use to run the model
t

Tobias Macey

01/18/2023, 4:21 PM
I figured it out. The missing piece was the
key_prefix
in the Airbyte asset loader so that it was scoped to the full asset key that dbt was looking at. Thanks!
❤️ 1
3 Views