Hi all, How do you version your assets? I have the...
# ask-community
m
Hi all, How do you version your assets? I have the following use case:
Copy code
@asset
def training_data():
  ...
 
@asset
def ml_model(training_data):
  ...

@asset 
def model_report(ml_model):
   ...
This works well, but it overwrites the earlier models and model reports. Is there a way to version the assets without creating copies? Something to replace this:
Copy code
@asset 
def model_report_jan(ml_model_jan):
   ...

@asset 
def model_report_feb(ml_model_feb):
   ...

@asset 
def model_report_mar(ml_model_mar):
   ...
I think what I'm looking for is a way to tell dagster to load not the latest, but a previous materialization of an asset.
I'm thinking maybe a use of SourceAssets is best...
a
This works well, but it overwrites the earlier models and model reports.
Static (or even dynamic) partitions might help with this. You could define your own io manager to handle partitions to avoid overwriting.
m
I have an io-manager that writes two versions of each asset to persistent storage {asset_name} {asset_name.datetime.runid} as such I always have a copy of all the assets ever materialized As I reflect on the question, it's mainly around how I can have downstream assets read the materialization that is not the most recent.
Thinking that SourceAssets might help... but not 100% sure
Something like
Copy code
jan_model = SourceAsset(...) # Hard coded?

@asset
def model_report_jan(jan_model):
    ...
v
It seems like this use case is fitting for partitions, even a “simple” MonthlyPartitionsDefinition seems like it would solve it. If you need heavier partitioning logic, as @Andras Somi said I’d go for dynamic partitions. See here.
m
Thank you
v
Alternatively check this github discussion that explains the philosophy behind using partitions https://github.com/dagster-io/dagster/discussions/12061
m
That looks interesting - I'll check it out. Yet another way to phrase my ask is - what if I want to re-run part of my DAG using an old materialization of an asset - as a form of roll-back
r
I've been thinking about this as well, and am thinking that versioning is something an artifact management system should do, such as wandb/mlflow/dvc
Since an iomanager has run-ids/step-ids etc in it, you could register an artifact
I've been thinking of partitions more along the lines for inference on new data...
m
The trigger for this today was
Copy code
Hmm... the new model is predicting strange things on new data, I want to apply the old model to the new data to compare
I have an asset
Copy code
@asset
def y_pred(model, X):
   ...
and I wanted to run it with a previous materialization of the model (which I have saved somewhere)
a
You could probably pass the model version as a config param and load the appropriate model inside the asset function?
❤️ 1
r
Ideally if one could call the dagster api from the model management ui/cli etc one could probably invoke a sensor which will run inference with the older model...?
m
Interesting idea
r
i looked but could not find any hooks in mlflow though for this (for example). My guess is that the most general idea would be to setup a dagster sensor to check if a particular mlflow model has been promoted to a "production" stage or tagged specifically (using the mlflow api) and then run that model after fetching it (this part is easy with the mlflow api)
m
I'm updating the model via Dagster itself, so that's not a big issue in and of itself.
c
I built the IO Manager for W&B that got released recently. I would love to hear your thoughts if you ever use it 🙂