Caleb Parnell Lampen03/28/2023, 3:59 PM
to generate the partitions at runtime, based upon the files available on the system its running on. The problem is, it seems that there is no way to map dynamic partitions to each other (see feature request https://github.com/dagster-io/dagster/issues/13139). Thus, with dynamic partitions, I'm seemingly locked out of "fan-out" type operations, so I can't use the mapping strategy to only preprocess the calibration once per set. The second option is to use static partitions, and do the mapping on those. My main concern is that the static partitions it seems have to be instantiated before passing to the
decorator. So, we'd have to edit the code upstream of the asset, every time we want to run on new files. Possible, but this makes the code less portable. I'd prefer to separate the asset logic from the data it runs on. It would be better if we could define the static partitions downstream of the asset definitions, so they could be reusable against different data locations. For example, if we defined them in a per-deployment script that calls
. Even better would be if the static partitions could be loaded from a config file or environmental variables, but I'm guessing at that point we are back to dynamic partitions. A third option is to define partitions and assets only at the set level, but that isn't really the partitioning scheme that is natural for downstream things we want to do, and since most of the operations are highly parallelizable by file, we'd want to find a new parallelization solution nested inside of dagster (maybe dask). Any advice on how to handle this situation?
sean03/29/2023, 3:48 PM
by reading a file, so you shouldn't have to edit any code. You will of course need to reload the code location to register an update to the data file defining the partitions.
Caleb Parnell Lampen03/29/2023, 4:27 PM
sean03/29/2023, 7:11 PM
Harrison Conlin03/29/2023, 11:31 PM
Caleb Parnell Lampen04/04/2023, 8:57 PM