I m trying to design an asset that is the inference results dagster #ask-community

I'm trying to design an asset that is the inferenc...

Alex Shtuchkin

03/10/2023, 11:32 PM

I'm trying to design an asset that is the inference results of an ML model for "events". Each event produces one inference result. Events have string ids. We'd also like to experiment with different ML model versions, compare them, so we need the inference results for the same event across multiple versions, built on demand. What would you recommend to represent this structure in dagster? Is it 2-d partitioning? Can we use configuration to distinguish between model versions (I assume we can't as they will overwrite each other)?

Alex Shtuchkin

03/10/2023, 11:54 PM

Alternatively, should we use the model id as the AssetKey prefix instead? That seems a bit weird. It would be a better fit if it was a suffix, not sure how important is that later down the road.

sandy

03/11/2023, 5:42 PM

Hey Alex - one approach would be to use dynamic partitions to represent model versions. I would generally recommend against having a partition for every single event record, if you have more than - say - a couple thousand event records. If events have timestamps, you might consider partitions that correspond to time ranges of events.

Alex Shtuchkin

03/11/2023, 5:45 PM

Thanks! Sorry I might've used the wrong word here, each "event" is by itself a set of data points with timestamps, about 10k data points each.

Alex Shtuchkin

03/11/2023, 5:48 PM

Internally each event is a 30-60 min video or audio file. Think of events as meetups or concerts.

Alex Shtuchkin

03/11/2023, 5:48 PM

So 2d partitions is the best approach then?

sandy

03/11/2023, 8:59 PM

Ah - yes, that sounds like the right approach. 2-d partitions don't yet work with dynamic partitions, but that will change with our release on Thursday of this week

2 Views

Open in Slack

Previous Next