Fun weekend with dagster and mlflow I try to integ...
# announcements
t
Fun weekend with dagster and mlflow I try to integrate mlflow with dagster pipelines. I will publish an architecture of it. I am mixing: ā€¢ mlflow + minio (S3) + postgres for registering the model ā€¢ dagster with daemon launching specific mlflow image with a scheduler and amazing sensors (simple but the idea is here). If you have any improvements tell me šŸ™‚ https://github.com/slamer59/dagster-mlflow.git
šŸ’Æ 7
Things I miss the most. I need to rebuilt my mlflow image each time I change or add something to repo.py. I want to make a volume so I don't need to reconstruct the image (COPY statement in Docker). But I still don't know how to just have one image where I can load any repo.py On going work šŸ™‚
s
I don't think you would need MLFlow given the AssetMaterialization feature dagster provides. Here is why: ā€¢ A dagster pipeline is the same in principle as an experiment in MLFlow. ā€¢ Dagster runs are the same as ML Flow runs. ā€¢ Dagster assets are the same as MLFlow artifacts. In principle, to reproduce a model run, all you have to do is rerun a pipeline in Dagster with the appropriate asset materializations.
šŸ‘ 1
t
Thanks tour feedback. ā€¢ Pipelines in dagster is more than mlflow experiments. Dagster gives a deeper reusability IMHO ā€¢ Mlflow as a good abstraction for many frameworks, deployment/serve, etc. Similarities but also differences. Dagster is more focus on data lifecycle where mlflow on the models life cycle. So agree that some features are similar.
šŸ‘šŸ» 1
s
i'm not trying to suggest that they are somehow completely similar. It's just that all the features related to artifact versioning (which includes models, plots, data) that MLFlow currently provides are all things Dagster can do through asset materializations. MLFlow model artifact versioning can be completely replaced by Dagster AssetMaterialization with no loss in functionality.
šŸ‘ 1