Hi all, I just posted a new article about The Open (aka Modern) Data Stack Distilled into Four Core Tools, where Dagster is part one part of it.
The goal with the open data stack is that companies can reuse existing battle-tested solutions and build on top of them instead of reinventing the wheel by re-implementing key components from the Data Engineering Lifecycle for each component of the data stack.
In the past, without these tools available, the story usually went something like this:
- Extracting: “Write some script to extract data from X.”
- Visualizing: “Let’s buy an all-in-one BI tool.”
- Scheduling: "Now we need a daily cron."
- Monitoring: "Why didn't we know the script broke?"
- Configuration: "We need to reuse this code but slightly differently."
- Incremental Sync: "We only need the new data."
- Schema Change: "Now we have to rewrite this."
- Adding new sources: "OK, new script..."
- Testing + Auth + Pagination: "Why didn't we know the script broke?"
- Scaling: "How do we scale up and down this workload?"
Hope that is interesting to you.
01/05/2023, 3:10 PM
great write up - very much agree that the airbyte+dbt+dagster stack is the best of breed at the moment (I haven't tried metabase though)