I’m currently exploring how to setup a dynamically partitioned dataset (~20-50 partitions), running a backfill every minute. Is there a write up for such a scenario? Things I considered already: Using
in_process execution, how to
detect and create new partitions, how to
run a backfill on every tick. I’m still not sure, if such a scenario would be to fine grained and which options I had to avoid latency/overhead issues. Dagster’s
connection pooling issues might be a blocker at the moment(?). What else might need to be considered (e.g. log purging etc). Could anyone share her/his experiences?