Does anyone know of a good comparison of Dagster, ...
# announcements
d
Does anyone know of a good comparison of Dagster, Airflow (+Cloud Composter, Astronomer), Prefect, and Kedro? It's hard to figure out which framework to use esp. with Airflow v2 coming out. I don't need anything super involved, just a DAG (prefer open source) tool in place of a bash driver running a chain of Py and SQL scripts. The jobs are chained such that the exit status of one determines what happens down the line.
❤️ 1
👍 1
r
this article touches on a few of them, i found it helpful in thinking about the differing goals of the tools https://medium.com/@will_flwrs/python-data-engineering-tools-the-next-generation-354e00f2f060
y
I'm one of the maintainers of Kedro. @Ryan Carlson was right to share that article. We have a completely different aim. We focus on the problem of workflow standardisation when you're trying to create data science code that is maintainable because you've thought about what software engineering convention for DS code looks like. Our users dig us if they've had issues around trying to take code into production because: 1. They were too heavily reliant on Jupyter notebooks 2. They mish-mash tons of script and create their own CLIs and project structures that are difficult to maintain I suspect you understand problem 2 very well, because of how you described what you're looking for. Kedro also determines the running order for your pipeline, so that's a worry off your shoulders. We are not an orchestrator though, and this is why we believe in a workflow that starts in Kedro but ends with any tool that will help you schedule and orchestrate your pipeline runs and we've found that Dagster, Prefect and Airflow are perfect for that. We've even started creating a series of deployment docs in our latest sprint, our Prefect one has been completed as an example. The Astronomer team is also picking up the Kedro-Airflow plugin. Hope this helps, shout if you have more questions.
👍 2
g
@David Krysl Curious to hear what you ended up going with having been through the same debate myself. I'm the sole data engineer in what is otherwise an ML research team and Dagster seemed like the most lightweight tool as an upgrade from cronjobs / supports better local development so I've started with Dagster, but really struggled with the choice. Hard to do any quick POCs that yield any real useful conclusions.