Hi guys Is there common machine learning workflow in dagster dagster #announcements

Hi, guys! Is there common machine learning workfl...

Andrey Alekseev

06/08/2020, 8:24 AM

Hi, guys! Is there common machine learning workflow in dagster (i mean going from training a model to using the model for predict)? Like dagster best practice? I have a pipeline where I load, transform data, train model. After that I would like to start using this pipeline with this trained model in prod. What's the best way to do that? Or I shouldn't use dagster for pipeline serving at all?

sandy

06/08/2020, 3:30 PM

Hey Andrey - are you planning to do batch serving or to serve your model behind a web service? If you're doing batch serving, the best way accomplish this in Dagster is probably with separate pipelines: one for training the model and one for batch inference

sandy

06/08/2020, 3:31 PM

For the feature engineering steps that need to happen both for training and inference, you can define solids that get used by both pipelines

Andrey Alekseev

06/08/2020, 4:54 PM

And I should load model from file each batch as I understand, right? I wanted to serve whole pipeline as web services to do batch serving.

sandy

06/08/2020, 5:09 PM

Yeah - Dagster has no persistent process to store models across pipeline runs, so you'll need to reload the model for each run

thankyou 1

sephi

06/09/2020, 6:06 AM

Hi @sandy What would your recommendation be if we would like to monitor the model creation with some framework such as mlflow ? how could dagster fit in the process/pipeline?

sandy

06/09/2020, 3:39 PM

Hi @sephi - if you want to use MLflow to track model creation, I think the easiest would probably be to just call MLflow's tracking APIs from inside the solid that's training the models. If you're open to a less fully-featured but pure-Dagster solution, you could alternatively yield Dagster Materializations that contain model metrics, which you can then visualize in Dagit.

sephi

06/09/2020, 3:58 PM

interesting ... I think currently MLFlow is more suited to record the various assets/metrics of the model creation - so I'll try to work with the former suggestion.

Open in Slack

Previous Next