Benjamin Weise

04/17/2023, 12:56 AM
Hi all, I'm really interested in the work that you've done on versioning. For my use case, I have a slowly changing asset where we would received updates once per day where the asset itself might only change once per month. For most downstream calculations we use the latest version, but potentially for historical analysis we might need to materialize older asset versions. Would dagster versions be useful for this? I assume that currently only the current version is stored - is there anything on the roadmap to add more functionality to versioning? Or would I be better off using partitions for this?

Tim Castillo

04/17/2023, 3:37 PM
Hmmm, I can't think of a pattern that would enable this. Personally, I wouldn't use a partition for this either. When are these historical analyses done? Is it an ad hoc thing or a regular process you'll do to audit, ex. once a quarter? Where are you storing your asset? Most storage systems have some sense of time travel and I would lean on that for something like this.

Benjamin Weise

04/18/2023, 1:39 AM
Thanks Tim, that's very helpful to think about. To answer your questions: Currently I'm using the standard File System IOManager - for these assets we are reloading from external sources that don't have version management. We could look at doing something here to track versioning, but haven't thought deeply about it. The historical analyses are generally just run as python scripts or notebooks on an ad hoc basis. This is a pretty immature process at the moment, so we are looking at how much of it could be integrated into dagster.