Hans Peter Hagblom

01/17/2023, 1:34 PM
Hi! I have a question in regards to performance of Dagster, we are currently using Airflow and found performance quite problematic (we were running on a single EC2 instance), especially if we had DAGs with a task for each table in a source system (could be up to say 100 tasks in a dag. I wonder how Dagster would compare performance-wise to Airflow when there is a lot of executions to schedule. Airflow sometimes needs a lot of hand-holding especially if there would be a backlog of built up task executions queued up because a pipeline has stopped because of an error, with starvation problems of free workers e.t.c. How does Dagster compare in this are in your experience.


01/19/2023, 2:42 AM
hey @Hans Peter Hagblom this could be a very expansive topic, and most of the performance of airlfow/dagster is determined by how you have each configured to run. Additionally most of the scalability features in Dagster are geared towards running in containerized application clusters like k8s/ecs. I think if you compare the two just running on bare EC2s there won't be much difference in performance. However the code-locations isolation of your dagster code via grpc-servers from the actual daemon(scheduler) that kicks off runs avoids what is a common performance gotcha with airflow, that on each tick of the scheduler it (might) be having to re-evaluate all your DAG code. If performance is an issue I'd really recommend trying out Dagster Cloud. The Serverless/Hybrid deployment options should run circles around airflow on a single EC2 airflow. The k8s helm chart for Dagster OSS should also do the same.