Hi all! We are using DBT standalone (cli) right no...
# announcements
m
Hi all! We are using DBT standalone (cli) right now. I’ve been doing some research on an orchestration tool that can handle retries, versioning, improves logging, etc. At first I thought of Jenkins but then Airflow was recommended. After even more reading, Dagster was one of two choices recommended to me if starting from scratch (the other being Prefect). I watched Nick’s release video on YouTube from 9 months ago and understood his “hello world” examples, but I’m having trouble finding further documentation to answer my questions: Does Dagster permanently replace Airflow, or does it simply work alongside/integrate with it? Is it intended to also replace DBT as well? I saw that there are integrations with both (dagster-dbt, dagster-airflow), but I wasn’t sure if that’s meant as part of a “temporary migration” away from airflow or meant to be long term. Thank you!
a
Does Dagster permanently replace Airflow, or does it simply work alongside/integrate with it?
At this point it is not recommended to use the airflow integration unless you have an attachment to an existing airflow setup. The features you need should all be available in Dagster it self, and we plan to keep adding more and more.
Is it intended to also replace DBT as well?
We’re big fans of DBT and do not see Dagster as a replacement. We have found that Dagster is a great compliment to DBT once you start needing to manage the data before it gets to DBT or after its ready.
I’m having trouble finding further documentation to answer my questions
Definitely lacking - this is something we’re currently working on improving the communication for
m
Thanks Alex! Is there good integration with the DBT process itself? For example, being able to restart a failure within DBT itself, or executing only part of the DBT pipeline? Right now we are using singer pre-DBT to extract the data. The pain point we are trying to solve for now is scheduling DBT pipelines to run (not in Cron) and being able to have better control and visuals into specific failures within DBT itself.
My worry is we aren’t ready for a Dagster, etc. as we are only using DBT now (after singer extracts the data) but we also want to get ourselves out of using Cron, hence doing the research.
a
The support we have right now is pretty small and new https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-dbt/dagster_dbt/__init__.py but we have a few users who are actively using DBT w/ Dagster so i expect it to continue improving
m
That’s great, thank you! I will review that code and that should answer a bunch of my questions.
s
@Matt Juszczak we also have another user who is using dagster + dbt to great effect. we're looking to upstream his work eventually once it is rock solid. i would classify the current dbt library as proof-of-concept/prototype quality
In terms of Airflow, we envision Dagster as a wholesale replacement for greenfield implementations. However lots of people use Airflow are interested in adopting Dagster incrementally while not thrashing their entire Airflow installation. We expect lots of systems to end up with a long running Dagser/Airflow stack because of the stickiness of these systems.
m
Got it. Thank you! That’s super helpful. @schrockn
👍 1
s
basically we ❤️ dbt and think it is the right tool for transformations exclusively within the data warehouse that are pure sql. in that context dagster is about orchestrating dbt with other tools, and using tools like dagit to monitor things operationally (e.g. like keeping track of all the materializations the result from a dbt transform)
❤️ 1
t
@Matt Juszczak glad you found your way here 🙂
❤️ 1
m
Haha, hi @Tobias Macey
r
Same use-case. We looking for a replacement for Matillion. Followed by DBT (in Bigquery) and Looker.