Prratek Ramchandani

04/25/2023, 1:00 PM
Are you all thinking about what role Dagster should play in defining/managing streaming data workflows? I would think it’s not uncommon for people to have streaming workflows mixed in with batch ones across their stack - maybe some data ingestion is streaming and those assets are consumed by assets Dagster knows about and materializes. Or alternatively you could have a materialized view downstream of assets refreshed by Dagster. The former at least could be modeled as an observable source asset but that feels incomplete in the sense that Dagster knows nothing about the definition of the asset. I like the idea of the orchestration tool as a control plane and would like that to extend to streaming data workflows as well. Maybe Dagster isn’t responsible for keeping those assets updated but should still know about their definition, freshness, lineage, etc in the same way.
👀 2

Dagster Jarred

04/25/2023, 4:00 PM
Hey @Prratek Ramchandani, we are actively thinking about this. I think you pointed out the a couple pathways that we’d consider, 1. representing its lineage in our graphs, 2. adding metadata about status. We’re trying to form up our ideas, and starting to talk to people who have previously expressed interest in the topic, would love to follow up with you separately to have a live conversation if you’ve got time in the next week or so. I’ll DM you

Prratek Ramchandani

04/26/2023, 8:20 PM
nice! i'd be happy to chat