I'd like to write a software defined asset that compares the latest materialization of asset X with the previous one. I'm not sure if this functionality is supported in Dagster...
I could build this from scratch, but it'd be good to follow best practices for this if they exit.
03/08/2023, 7:36 PM
What sort of comparison are you looking for? From a very basic perspective you could use metadata futures to automatically create graphs of an asset over time (eg: row count, average of some column, execution time).
03/08/2023, 9:37 PM
What are you looking to do with that comparison as well? Wondering if there’s a better abstraction for what you’re trying to do
03/09/2023, 1:27 PM
Potentially; I have some complex logic to generate the comparison - say you have a machine learning model and you use it to generate some predictions on a test set. Then you have a new materialization of the model and it generates different predictions. I'd like to generate a report (e.g. an HTML file or table) that compares both sets of predictions.
03/11/2023, 12:47 AM
IIRC there was some take of asset validation in dagster but not sure the status of that.
It will depend a little bit what you want to actually do. You could make an op that runs against an asset and does this sort of function, and keeps track of the results.
that being said it sounds like you may be looking for something along the lines of MLFlow or similar which iirc dagster has an integration for but I'm not sure if the status of it.
03/11/2023, 12:49 AM
yea I think that what you’re trying to do might be a bit annoying in vanilla dagster - mlflow integration is definitely a thing. We also have a wandb integration now