I’m looking for suggestions and best practices for...
# integration-dbt
I’m looking for suggestions and best practices for working with an external dbt repo and dagster and how to integrate them with GitHub actions: currently, I have a GitHub repo with only the existing dbt project and in a separate one the dagster logic. The dbt repo has a workflow which is triggered from a push event or a cronjob. The workflow is responsible for seeding, snapshotting, running and testing. If it is triggered by a push event, only the impacted models are run and tested, otherwise the whole graph is considered. Ideally, I would like to have the dbt workflow responsible only to validate the changed models and let dagster run the periodic cronjobs. I was curious to know how you’ve designed the communication and workflows between the 2 repos in a CI/CD mindset
Is your goal to split the dbt workflow into two? One for CI/CD purposes (
state:modified+ --full-refresh
, etc.) and another workflow for updating your data on a schedule? Having the workflows in separate locations (your CI/CD tool or Dagster) causes an issue in which Dagster won't know your data is up-to-date if you have a separate dbt run on CD triggered externally. For Dagster to know the state of your data and update your dbt models on CD, have you thought of triggering a run via the GraphQL API? If you can do a
dbt ls --select state:modified+
, then you can get a list of the models/assets that need to be updated. and then pass those assets into a GraphQL query like this:
Copy code
mutation LaunchAdHocRunMutation (
) {
        repositoryName: $repositoryName
        assetSelection: "changed_source*"
That's not a working GraphQL query, but it gets the message across. HEre are some resources on the GraphQL API and the asset selection syntax
👀 1