I just watched the presentations (didn't get a chance to join it live), and for the asset lineage I'm curious if there's been any investigation or consideration of structuring or presenting that information in a way that would be compatible with the OpenLineage specification that Julien Le Dem et al. are trying to put forward? https://datakin.com/introducing-openlineage/
03/12/2021, 11:33 PM
First of all, we think it makes sense to have integrated data observability capabilities in a data-aware orchestrator. You have encoded all your dependencies already, why not try to make it as easy as possible to also do some amount of data lineage without having to integrate another tool.
For teams that use Dagster as their main orchestrator, we think Dagster’s Asset Catalog will end up being an extremely useful tool for viewing metadata about assets. For example, because Dagster is responsible for orchestration in addition to monitoring, it has the opportunity to answer questions like “when is the next time that this table will be regenerated?“, “is this table orphaned or are there pipelines that are responsible for updating it?“, and “what tables are scheduled with sensors that are waiting on updates to this table?”
Of course, there are many reasons why teams, especially those using multiple orchestrators, have good reason to use OpenLineage and Marquez. We anticipate an integration developing as the OpenLineage ecosystem grows and are supportive of that. We’ve looked at OpenLineage and don’t think that we’ve done anything to make the two systems fundamentally incompatible.
We don’t have plans to build this integration ourselves right now, but it’s definitely possible to imagine emitting metadata from Dagster in an OpenLineage format. We would certainly consider building an integration if it’s important to a large set of our users. We would also be up for supporting a team building that integration for the community.
03/13/2021, 2:36 AM
Thanks for the context, I think that all makes sense and is a reasonable approach, particularly given the early stage of the effort.