The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

I’m currently exploring how to setup a dynamically partitioned dataset (~20-50 partitions), running a backfill every minute. Is there a write up for such a scenario? Things I considered already: Using <https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution|in_process execution>, how to <https://dagster.io/blog/dynamic-partitioning|detect and create new partitions>, how to <https://github.com/dagster-io/dagster/discussions/15532|run a backfill on every tick>. I’m still not sure, if such a scenario would be to fine grained and which options I had to avoid latency/overhead issues. Dagster’s <https://github.com/dagster-io/dagster/issues/8466|connection pooling issues> might be a blocker at the moment(?). What else might need to be considered (e.g. log purging etc). Could anyone share her/his experiences?

I have a setup where new dynamic partitions are detected and added by a sensor that runs frequently and starts jobs just for those new partitions. Separately, I have a schedule that runs the job for all known partitions (ie a backfill) at a different frequency