The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hi all, I just posted a new article about <https://airbyte.com/blog/modern-open-data-stack-four-core-tools|The Open (aka Modern) Data Stack Distilled into Four Core Tools>, where Dagster is part one part of it.

The goal with the open data stack is that companies can reuse existing battle-tested solutions and build on top of them instead of reinventing the wheel by re-implementing key components from the Data Engineering Lifecycle for each component of the data stack.
In the past, without these tools available, the story usually went something like this:

- Extracting: “Write some script to extract data from X.”
- Visualizing: “Let’s buy an all-in-one BI tool.”
- Scheduling: "Now we need a daily cron."
- Monitoring: "Why didn't we know the script broke?"
- Configuration: "We need to reuse this code but slightly differently."
- Incremental Sync: "We only need the new data."
- Schema Change: "Now we have to rewrite this."
- Adding new sources: "OK, new script..."
- Testing + Auth + Pagination: "Why didn't we know the script broke?"
- Scaling: "How do we scale up and down this workload?"

Hope that is interesting to you.

great write up - very much agree that the airbyte+dbt+dagster stack is the best of breed at the moment (I haven't tried metabase though)

o.0 so many scripts running on <http://airbyte.com|airbyte.com>

I'd say lightdash would fit in better but I guess that's a matter of taste.