Cross-posting here: <https://github.com/dagster-io...
# deployment-kubernetes
s
Cross-posting here: https://github.com/dagster-io/dagster/discussions/19734 Anyone had experience with monitoring K8s code locations, and when they fail to start? I had a transient issue which resulted in a code-location not deploying successfully, and only noticed it when I logged in today. Would like to be more proactive.
a
Are you using liveness/readiness probes in Kubernetes?
s
Not at the moment though it's food for thought Do you mean setting them here within the
dagster-user-deployments
of the
values.yaml
?
^^That's the default one I'm looking at Edit: 1.5.6 anyway, not pulled the latest yet
a
yeah. liveness and readiness probes are useful tools in k8s to make sure a pod stays healthy and that no broken pod gets deployed
(nothing to do with dagster, pure k8s)
liveness probes would restart the user-code in case of failuers
s
Cool I'll give that a shot. As for monitoring/alerting: if there happens to be a problem that's not transient, presumably that would require some monitoring solution outwith of Dagster? I can chat with our K8s infra guys. Reason I ask is because on the Dagster UI, you get a little warning triangle next to 'Deployments' along the top, was wondering if there's an event that accompanies that.
a
Kubernetes should suffice. With the proper probes Kubenernetes will not rollout a new version of the user-code unless all probles are successful.
s
Appreciate the insight, thank you
a
Happy to help 🙌