Nicolas Parot Alvarez

11/24/2022, 3:09 PM
Jenkins has a neat little feature in its cron syntax: the
value, that would be nice to have for Dagster schedule cron definitions.
To allow periodically scheduled tasks to produce even load on the system, the symbol
(for “hash”) should be used wherever possible. For example, using
0 0 * * *
for a dozen daily jobs will cause a large spike at midnight. In contrast, using
H H * * *
would still execute each job once a day, but not all at the same time, better using limited resources.
symbol can be used with a range. For example,
H H(0-7) * * *
means some time between 12:00 AM (midnight) to 7:59 AM. You can also use step intervals with
, with or without ranges.
symbol can be thought of as a random value over a range, but it actually is a hash of the job name, not a random function, so that the value remains stable for any given project.
The idea of using a hash of the job name is very simple and ingenuous, but not necessarily great at spreading the load. In a second stage, maybe Dagster could have something more advanced that would actually look for the times of least activity inside the allowed time range. This ideal schedule time could be set by the first run and kept for subsequent runs, or it could be recomputed every time.
❤️ 1