https://dagster.io/ logo
#announcements
Title
# announcements
a

Amber Papillon

03/30/2021, 2:42 PM
MLFlow Tracking has some nice features that seem to be lacking in Dagster (a record of the current git commit, ML metrics, etc.) They also integrate nicely with Databricks that allows for easy cluster deployment. However, they really lack in building complicated pipelines, in which Dagster excels. Is there a way to get "the best of all worlds"? That is, integrate Dagster with MLFlow and have it thus run on Databricks? Or is there a good alternative?
s

sandy

03/30/2021, 2:58 PM
Hi Amber - Dagster and MLflow are complementary. We have users that orchestrate machine learning pipelines with Dagster and use MLflow to track the results. I.e. they invoke the MLflow Python APIs from inside their Dagster pipelines to write results to MLflow. Would that work for you?
a

Amber Papillon

03/30/2021, 3:04 PM
Yes, definitely. I just can't seem to find a demo or a tutorial of this kind of integration
s

sandy

03/30/2021, 4:56 PM
I filed a github issue to track adding a guide on this: https://github.com/dagster-io/dagster/issues/3970
t

Thomas

03/30/2021, 5:04 PM
Hello, I tried to couple them. I don't know if it is the best but... it works
s

sandy

03/30/2021, 5:09 PM
This is awesome @Thomas - were there any particular frictions you ran into?
t

Thomas

03/30/2021, 6:45 PM
I would say just when I tried to understand dagster 😄 It was a way to play with it.
You can see my todo list which is not related to mlflow at all
At the end I wanted to put pipelines which plays with mlflow lifecycle of a model. But i don't have a simple exemple that improves.
If first pipeline gives a better result -> test it on "a production" pipeline if success -> deploy mlflow model
h

Hugo Pedroso de Lima

03/30/2021, 8:29 PM
Hi Amber, I have actually implemented an integration of dagster and mlflow at my company (I'm hoping they will give me the green light to contribute this to dagster, if dagster ppl are interested) where I basically implemented mlflow as a dagster resource. The implementation also takes care of making sure everything gets logged to the same run when you use executors with multiple processes. If you want to start trying it out yourself I think doing it as a dagster resource is a neat way to do it.
🙂 2
💯 1
a

Amber Papillon

03/30/2021, 11:17 PM
That sounds very nice. Looking forward to see it
c

cat

04/13/2021, 5:58 AM
@Hugo Pedroso de Lima thats awesome to hear! we (dagster) would love to showcase your integration if your company is cool with it
👍 1
a

Amber Papillon

05/02/2021, 2:54 PM
@Hugo Pedroso de Lima any news on that?
h

Hugo Pedroso de Lima

05/04/2021, 1:40 PM
hi, so we wanted to refactor a couple of things and finish adding unit tests before submitting a PR to dagster, we have now done that, but I'm still waiting for approval from higher ups. We have approval from my direct manager, but need to wait for approval from manager above him before we're ok to submit the PR, he's currently on holiday for another few days so will raise this with him when he's back. Hopefully we can PR this within the next few weeks!
Hi, so I have an unofficial ok from the manager above mine, but need to wait for the ok from our legal department before I can go ahead and create the PR. I expect that they will ok it as this does not include any of our ml code so hopefully just a bit longer now.
blob pray 1
c

cat

05/13/2021, 9:15 PM
Awesome, thanks for the update 🙏
h

Hugo Pedroso de Lima

05/26/2021, 1:10 AM
hi again, we finally had the go ahead from the legal team so I created a pull request here: https://github.com/dagster-io/dagster/pull/4213
👍 1
2 Views