Hi if I have one of data science ml workloads which just nee dagster #ask-community

Hi, if I have one-of data science/ml workloads whi...

Samuel Stütz

03/26/2022, 11:50 AM

Hi, if I have one-of data science/ml workloads which just need more compute and some logging.?How would I best run these workflows so I can monitor them my k8s deployed dagit? So asset events propagate even if the code of the pipeline will likely never make it far beyond local experiments. I am not entirely clear on that. • I can run my local dagit and configure similar backends in dagster.yaml but would that work to create AssetMaterialization events in the k8s dagit. (one problem is I getting access to postgres is rather more tricky than reaching dagit) • the grpc server in workspace.yaml concerns my code as I understand it, but can I expose ingress or port-forward dagit to instead run against it rather than the local. The docs aren’t clear. • or do I only need to configure the Runk8sExecutor locally correctly? The use case is pipelines that run locally and the code may not make it into an production but the asset events and data should register for everyone to see.

daniel

03/28/2022, 2:09 PM

Hi Samuel - it's possible to run two dagits, one running locally on your machine and the other running in k8s, using the same storage. They would just need to share the same storage config in their dagster.yaml (they don't need to share the same workspace.yaml), but could use different run launchers to control where runs get launched. for this to work you would need to be able to get access to your production postgres from your local machine though.

Samuel Stütz

03/28/2022, 2:31 PM

So the idea would then be to run my local dagit for data science notebooks and as long as the eventlog is configured in workspace.yaml to the same backend (as in postgres access) it should work since dagit itself holds no state I guess. I feel like the best solutions would be a gitops style code deployment. Where any data scientist commits on his/her own branch (the deployment pulls the code) then triggers runs via graphQL. Less convenient but handing out postgres has more damage potential.

Samuel Stütz

03/28/2022, 2:34 PM

I will see where I go with this. Thanks.

2 Views

Open in Slack

Previous Next