The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hello :dagster: s,
I’m very new to Dagster, the tool seems amazing and I’m looking forward to using it.

I want to use Dagster in a Machine Learning setup in a Kubernetes cluster.
I want to be able to train a model or predict using an existing model. The two tasks would coexist in a single pipeline in order to use reuse some solids like input preprocessing and have the same visibility over training and serving.
That seems pretty easy to do.

What I would like on the serving side is to have something close to a serverless approach. When storing a model I’d also store the Docker image that produced it (aka storing the whole python environment and pipeline definition, possibly Dagster repo).
Ideally when serving the model I’d go about
• receive a request for a prediction in a python script
• launch the Docker Image / Dagster repo
• execute the predict part of the pipeline and get the result
• stop the Docker Image / Dagster repo
• return the prediction as response to the original request
Is it possible to dynamically launch and stop Dagster repos?
Should I just leave a number of repos running (I guess it does not scale easily)?
Is Dagster the right tool for that or should I look at alternatives like Kubeflow?