I'm trying to design an asset that is the inference results of an ML model for "events". Each event produces one inference result. Events have string ids. We'd also like to experiment with different ML model versions, compare them, so we need the inference results for the same event across multiple versions, built on demand. What would you recommend to represent this structure in dagster? Is it 2-d partitioning? Can we use configuration to distinguish between model versions (I assume we can't as they will overwrite each other)?
Alternatively, should we use the model id as the AssetKey prefix instead? That seems a bit weird. It would be a better fit if it was a suffix, not sure how important is that later down the road.
03/11/2023, 5:42 PM
Hey Alex - one approach would be to use dynamic partitions to represent model versions. I would generally recommend against having a partition for every single event record, if you have more than - say - a couple thousand event records. If events have timestamps, you might consider partitions that correspond to time ranges of events.
03/11/2023, 5:45 PM
Thanks! Sorry I might've used the wrong word here, each "event" is by itself a set of data points with timestamps, about 10k data points each.
Internally each event is a 30-60 min video or audio file. Think of events as meetups or concerts.
So 2d partitions is the best approach then?
03/11/2023, 8:59 PM
Ah - yes, that sounds like the right approach. 2-d partitions don't yet work with dynamic partitions, but that will change with our release on Thursday of this week