Additionally is there a convenient way to change the latency dagster #ask-community

Additionally, is there a convenient way to change ...

Félix Tremblay

04/21/2023, 1:14 PM

Additionally, is there a convenient way to change the latency of time-based partitions, after a lot of partitions have already been materialized? For example, if an asset partitioned by day has been materialized for more than a year, and then your SLA requirements change and you want to change the partition definition to 1h, while keeping the same asset definition, the same storage location, and without starting over the materializations? Thanks!

owen

04/21/2023, 8:48 PM

hi @Félix Tremblay! unfortunately, this is not possible at the moment, and the general advice (although not ideal) is to write a one-off job to emit a bunch of AssetMaterialization events representing the partitions that have already been filled in. definitely something that we hope to have a better story for in the future

Philippe Laflamme

04/22/2023, 8:41 PM

Is there a github issue / discussion we can subscribe to for following this topic? I have the same problem: hourly assets become pretty painful to work with after multiple years...

Félix Tremblay

04/22/2023, 9:04 PM

Hello Philippe, I'm curious to know more about what becomes painful about hourly assets after multiple years. Is it because the amount of partitions becomes very large? (In this case maybe some UI improvement could help?) Or is it that you are stuck with hourly partitions and would like to change the granularity?

Félix Tremblay

04/22/2023, 9:24 PM

@owen I think Dagster could provide an even more useful and ergonomic pattern for general time-based incremental loading/processing use cases than the current time-based partitions (e.g Hourly, Daily) strategy. You could have a flexible solution for time-based incremental loading/processing that is dynamic. The asset context would provide a time range (start and end), rather than a single partition key. The "frequency" of the partitions would be flexible and could be changed at any time (as easily as we can change the frequency of schedules and sensors)

Philippe Laflamme

04/22/2023, 10:03 PM

Things are better now to be honest, but I've had issues with performance of the interface with an hourly asset. Perhaps ui improvements can help, but regardless, it's a common pattern to "roll up" assets into a less granular partitioning scheme for reducing costs or improving analysis performance or similar...

owen

04/24/2023, 9:48 PM

@Félix Tremblay totally agree -- @claire and I were just talking about this! there are definitely plenty of challenges involved in the implementation (as this differs fairly significantly from existing partition definitions that we support), but as an end state it's very compelling. we're going to be taking a deeper look into how feasible this sort of thing would be

Félix Tremblay

04/25/2023, 3:10 AM

Thank you @owen! For now, a workaround would be to use Dynamic Partitions and stuff the range info (start, end, _include_start_, _include_end_) in the partition_key. A sensor and its cursor would manage the RunRequest creations. I don't particularly like the idea of overloading the partition_key to stuff multiple values. I created a GitHub discussion in this regard.

2 Views

Open in Slack

Previous Next