About the feature "Pass partition ranges to single...
# dagster-feedback
About the feature "Pass partition ranges to single run", it would be really neat if it didn't require rewriting my partitioned assets to explicitly tell Dagster how to loop over a range of partitions. The basic logic seems non-ambiguous. Would there be unexpected side effects in simply letting Dagster execute the DAG for each partition in a single run by himself without having to code it? I think that already provides benefits in reducing the time loading code. I think this could be the default behavior of the feature, and if people want to customize how the looping happens, then they can specify it by calling the relevant context arguments.
This depends on what you choose as your storage engine
In I.e. Spark partitions might have a different level of parallelism than compared to for looping inside spark
The default behavior would not consider if the resources have parallelism or not, it would just execute DAGs independently for each partition, one after the other, potentially in parallel according to the concurrency configuration.
Awakened my Gimp-fu Here's a visual representation of 3 partition DAGs running under a single run, instead of 3, and respecting a concurrency of 2.
But this is already pretty much the same thing for internal resources where no external thing like databricks cluster is spun up (or EMR) which might take a while to initialize?
as far as I understand this, the main point of this feature is to limit the instanciation of such resources in case of backfills and focus on the actual operations which can be then more efficient
My hypothesis is that having a single run could allow doing only once some of the loading that Dagster does to prepare a run. For example, parsing the definitions and loading them in memory and maybe other things.
hi @Nicolas Parot Alvarez! this is definitely an interesting idea, although @geoHeil is correct that this feature is intended for cases where you can group the execution of multiple partitions into a single operation, rather than multiple operations in the same run. at the moment, there's no way for dagster to know if an asset supports this behavior or not, so that button in the UI is the sort of implicit "all of my assets can be executed like this" button. but in a future world where there's some way (in code) to specify if execution can be grouped or not, I can imagine that something like what you're describing could be possible.
D 1
Thanks for your answer @owen. I think it would be most beneficial, if it reduces the loading time of Dagster, which I guess are the light blue bars that take most of the time on my DAG executions above. Would it be possible to compress those times if Dagster knew that it has to run the exact same code for each partition in a single run?