Cross posting here <https github com dagster io dagster disc dagster #deployment-kubernetes

Cross-posting here: <https://github.com/dagster-io...

Steven Murphy

02/12/2024, 1:51 PM

Cross-posting here: https://github.com/dagster-io/dagster/discussions/19734 Anyone had experience with monitoring K8s code locations, and when they fail to start? I had a transient issue which resulted in a code-location not deploying successfully, and only noticed it when I logged in today. Would like to be more proactive.

Andrea Giardini

02/14/2024, 11:25 AM

Are you using liveness/readiness probes in Kubernetes?

Steven Murphy

02/14/2024, 11:28 AM

Not at the moment though it's food for thought Do you mean setting them here within the

dagster-user-deployments

of the

values.yaml

Steven Murphy

02/14/2024, 11:28 AM

^^That's the default one I'm looking at Edit: 1.5.6 anyway, not pulled the latest yet

Andrea Giardini

02/14/2024, 11:35 AM

yeah. liveness and readiness probes are useful tools in k8s to make sure a pod stays healthy and that no broken pod gets deployed

Andrea Giardini

02/14/2024, 11:35 AM

(nothing to do with dagster, pure k8s)

Andrea Giardini

02/14/2024, 11:39 AM

liveness probes would restart the user-code in case of failuers

Steven Murphy

02/14/2024, 11:42 AM

Cool I'll give that a shot. As for monitoring/alerting: if there happens to be a problem that's not transient, presumably that would require some monitoring solution outwith of Dagster? I can chat with our K8s infra guys. Reason I ask is because on the Dagster UI, you get a little warning triangle next to 'Deployments' along the top, was wondering if there's an event that accompanies that.

Andrea Giardini

02/14/2024, 11:43 AM

Kubernetes should suffice. With the proper probes Kubenernetes will not rollout a new version of the user-code unless all probles are successful.

Steven Murphy

02/14/2024, 11:44 AM

Appreciate the insight, thank you

Andrea Giardini

02/14/2024, 11:44 AM

Happy to help 🙌

3 Views

Open in Slack

Previous Next