Hi, we're using v1.3.9 and we're keen to stick to ...
# ask-community
Hi, we're using v1.3.9 and we're keen to stick to a pure asset graph for a new use case, but we have a few major constraints: • We have a large asset that needs to be refreshed each day consisting of >400K records. We're using an IO Manager that wants to write these to a Databricks DeltaLake table, because it needs to be accessible to another application that will surface the data. • There are dependent assets on this large asset. We can't have that 400K dataset being persisted via the aforementioned IOManager, then rehydrated as an input to the downstream asset. I've looked at possible patterns that might help us deal with this, e.g. 1. using DynamicOut, breaking up the outputs along a particular dimension results in about 120 outputs, that but this seems to be only available to ops. We would need to therefore save the data in the op using an external resource. The downstream asset would also need to be an op that uses each upstream output as an input. 2. Using Dynamic partitioning, in which case I could stick with assets and have an IO Manager that looks at the partition key to insert/select the records based on that key from the same table. But what I don't understand is: a. How do I use Dynamic partition set to automatically materialise the asset for every key each time it runs b. The examples around Dynamic partitions seem to be designed for running backfills against a particular set of partition keys - it really doesn't seem intended for this problem. Can someone suggest patterns that work for our case?
I think this is a case for dynamic computation rather than dynamic partitioning as I understand it. Assuming that you want the compute to happen inside of dagster rather than calling out to some external service, you could use a graph backed asset to wrap a bunch of dynamic ops that break up your table