I’m currently exploring how to setup a dynamically...
# ask-community
h
I’m currently exploring how to setup a dynamically partitioned dataset (~20-50 partitions), running a backfill every minute. Is there a write up for such a scenario? Things I considered already: Using in_process execution, how to detect and create new partitions, how to run a backfill on every tick. I’m still not sure, if such a scenario would be to fine grained and which options I had to avoid latency/overhead issues. Dagster’s connection pooling issues might be a blocker at the moment(?). What else might need to be considered (e.g. log purging etc). Could anyone share her/his experiences?
dagster bot responded by community 1
z
I have a setup where new dynamic partitions are detected and added by a sensor that runs frequently and starts jobs just for those new partitions. Separately, I have a schedule that runs the job for all known partitions (ie a backfill) at a different frequency