Craig Glennie

03/15/2023, 6:54 PM
Hi there, I am in the early stages of evaluating workflow systems, and I'm trying to get a feel for the kind of throughput that Dagster can handle. I haven't had much luck with search. We are building an enrichment pipeline for data that could theoretically (though I doubt it will) get to maybe 10,000 documents per second. Each document would be getting enriched by various machine learning models (each probably with its own Seldon wrapper to make it an API) and ultimately landing in Elasticsearch and some other databases. I suspect it's a pretty standard thing we're trying to do, and I know (eg) Flink could handle whatever volume we might throw at it. Is Dagster a good choice here?