Hi, guys!
Is there common machine learning workflow in dagster (i mean going from training a model to using the model for predict)? Like dagster best practice?
I have a pipeline where I load, transform data, train model. After that I would like to start using this pipeline with this trained model in prod. What's the best way to do that? Or I shouldn't use dagster for pipeline serving at all?
s
sandy
06/08/2020, 3:30 PM
Hey Andrey - are you planning to do batch serving or to serve your model behind a web service? If you're doing batch serving, the best way accomplish this in Dagster is probably with separate pipelines: one for training the model and one for batch inference
sandy
06/08/2020, 3:31 PM
For the feature engineering steps that need to happen both for training and inference, you can define solids that get used by both pipelines
a
Andrey Alekseev
06/08/2020, 4:54 PM
And I should load model from file each batch as I understand, right?
I wanted to serve whole pipeline as web services to do batch serving.
s
sandy
06/08/2020, 5:09 PM
Yeah - Dagster has no persistent process to store models across pipeline runs, so you'll need to reload the model for each run
thankyou 1
s
sephi
06/09/2020, 6:06 AM
Hi @sandy
What would your recommendation be if we would like to monitor the model creation with some framework such as mlflow ? how could dagster fit in the process/pipeline?
s
sandy
06/09/2020, 3:39 PM
Hi @sephi - if you want to use MLflow to track model creation, I think the easiest would probably be to just call MLflow's tracking APIs from inside the solid that's training the models. If you're open to a less fully-featured but pure-Dagster solution, you could alternatively yield Dagster Materializations that contain model metrics, which you can then visualize in Dagit.
s
sephi
06/09/2020, 3:58 PM
interesting ...
I think currently MLFlow is more suited to record the various assets/metrics of the model creation - so I'll try to work with the former suggestion.