Is there common machine learning workflow in dagster (i mean going from training a model to using the model for predict)? Like dagster best practice?
I have a pipeline where I load, transform data, train model. After that I would like to start using this pipeline with this trained model in prod. What's the best way to do that? Or I shouldn't use dagster for pipeline serving at all?
06/08/2020, 3:30 PM
Hey Andrey - are you planning to do batch serving or to serve your model behind a web service? If you're doing batch serving, the best way accomplish this in Dagster is probably with separate pipelines: one for training the model and one for batch inference
For the feature engineering steps that need to happen both for training and inference, you can define solids that get used by both pipelines
06/08/2020, 4:54 PM
And I should load model from file each batch as I understand, right?
I wanted to serve whole pipeline as web services to do batch serving.
06/08/2020, 5:09 PM
Yeah - Dagster has no persistent process to store models across pipeline runs, so you'll need to reload the model for each run
06/09/2020, 6:06 AM
What would your recommendation be if we would like to monitor the model creation with some framework such as mlflow ? how could dagster fit in the process/pipeline?
06/09/2020, 3:39 PM
Hi @sephi - if you want to use MLflow to track model creation, I think the easiest would probably be to just call MLflow's tracking APIs from inside the solid that's training the models. If you're open to a less fully-featured but pure-Dagster solution, you could alternatively yield Dagster Materializations that contain model metrics, which you can then visualize in Dagit.
06/09/2020, 3:58 PM
I think currently MLFlow is more suited to record the various assets/metrics of the model creation - so I'll try to work with the former suggestion.