https://dagster.io/ logo
b

Brad

11/22/2019, 11:46 PM
I was just listening to Nick on the software engineering podcast. https://podcasts.google.com/?feed=aHR0cHM6Ly9zb2Z0d2FyZWVuZ2luZWVyaW5nZGFpbHkuY29tL2ZlZWQvcG9kY2FzdC8&episode=aHR0cDovL3NvZnR3YXJlZW5naW5lZXJpbmdkYWlseS5jb20vP3A9ODI3Ng He acknowledged that the current iteration of Dagster is only a piece of the vision. Curious if there's any time of product outline for what a 1.0 release might look like?
Interested to get a feel for what's coming down the pipe. In the exploratory phase of assessing if the technology appropriate to adopt for a new project we're working on.
s

schrockn

11/23/2019, 12:44 AM
Hey Brad to your first question. The reality is that there is no real standard in the broader open source community. For example pandas is arguably the reason that python started to become popular in the data community, and it still does not have a 1.0 release. We/I don’t have particularly strong opinions around when to mark this project as 1.0. It will be a combination of confidence that we have achieved a yet-to-be defined level of maturity, probably combined with some breaking API changes based on lessons learned along the way, and also a quantum leap in tooling based on those APIs. But to be honest, w/r/t to 1.0 ¯\_(ツ)_/¯ is the reality.
Your other question in terms of what is coming down the pipe we have better answers for. We are still cohering our concrete roadmap for a next major release (tentatively codenamed Santa Baby 🎅🎅🎅). We have a few themes though: 1. Cloud-Native execution: So far we have focused on our programming model mostly with only modest support for DevOps. This will change very soon. We will have a out-of-box k8s deployment ready to go along with on-demand ephemeral compute for runs. We are still working on the exact shape of this. 2. Ability to queue runs for cloud execution. This directly applies to 2, but will be have new APIs for GraphQL and easy-to-use tools for enqueuing runs. This is enabled by number 1 and enables number 3. 3. Explicit support for backfills. This is a critical activity for managing data asset generation. Now that we have native scheduling abstractions, we will be building what we think is a clean backfill abstraction with a more generic/abstraction partitioning mechanism that we will think will be useful in other contexts as well. 4. The ability to schedule ad-hoc computations in the cloud using is also very useful for ML workflows as well. So at the end of this burst of work we think we can support end-to-end data engineering and data science workflows, because you can express your DAGs and dependencies in a single unified system, rather than having to artificially silo them because of limitations or overspecialization of current tools. Hope that helps!
b

Brad

11/23/2019, 1:19 AM
Wow. Sounds like some really solid functionality coming. Looking forward to see how the backfill support turns out. Really appreciate the thorough response.
👍 2
s

schrockn

11/23/2019, 1:58 AM
Thanks Brad! In the meantime would love for you to try it out and give us feedback.
b

Brad

11/23/2019, 12:32 PM
For sure. Actually going through and writing a library of generic solids for my team now. Goal is to get 80% of the steps we normally use standardized and into a shared library of solids. And in addition, putting together blueprints to serve as best practices for different types of work (like mapping a solid over multiple outputs of a previous solid once I figure out a clear way to do that).
e

Eric

11/23/2019, 5:47 PM
Just want to throw out there are also some great talks on YouTube that Nick gave as well as another episode on Dagster on the Software Engineering Daily podcast all of which I enjoyed listening to. https://softwareengineeringdaily.com/2019/11/15/dagster-with-nick-schrock/