Muhammad Jarir Kanji

11/19/2022, 1:31 PM
Re: software-defined assets, they're a nice, higher-level abstraction that's easier to digest (especially for the other members of the team who are not data engineers), but beyond all that, does this abstraction actually allow us to do something new that one can't easily do with tasks? Does it enable some new, significant functionality? For example, the Dagster vs. Airflow page lists these questions as why thinking in terms of assets vs. tasks is better: • Is this asset up-to-date? • What do I need to run to refresh this asset? • When will this asset be updated next? • What code and data were used to generate this asset? • After pushing a change, what assets need to be updated? And I agree, but it's not really a new feature or capability (or at least, not significant enough in my eyes). As long as you know that the
task/op is what's responsible for manifesting the
data asset, you can answer all those questions. It's nice to not have to make that link (and maybe hide some of the underlying complexity), but it's not a new capability. And I'd love to hear the Dagster team's thoughts on what functionality they think focusing on data assets enables or which they hope it will enable in the future.


11/21/2022, 3:55 PM
Hi Muhammad - I'd recommend looking at the blog post if you haven't already: Additionally, Dagster 1.1, which we released Friday and will be publicizing more in a couple weeks, includes several features that take advantage of software-defined assets for automatic memoization and declarative scheduling.