Hello D s,
I’m very new to Dagster, the tool seems amazing and I’m looking forward to using it.
I want to use Dagster in a Machine Learning setup in a Kubernetes cluster.
I want to be able to train a model or predict using an existing model. The two tasks would coexist in a single pipeline in order to use reuse some solids like input preprocessing and have the same visibility over training and serving.
That seems pretty easy to do.
What I would like on the serving side is to have something close to a serverless approach. When storing a model I’d also store the Docker image that produced it (aka storing the whole python environment and pipeline definition, possibly Dagster repo).
Ideally when serving the model I’d go about
• receive a request for a prediction in a python script
• launch the Docker Image / Dagster repo
• execute the predict part of the pipeline and get the result
• stop the Docker Image / Dagster repo
• return the prediction as response to the original request
Is it possible to dynamically launch and stop Dagster repos?
Should I just leave a number of repos running (I guess it does not scale easily)?
Is Dagster the right tool for that or should I look at alternatives like Kubeflow?