Hi team, could someone help me understand what the...
# ask-community
a
Hi team, could someone help me understand what the
/dagit_info
endpoint in the dagit's liveness probe does and why can it become slow? We recently are seeing a lot of dagit restarts due to the timeouts on this probe, however dagit itself works fine overall. The probe timeout is already set to 10s and we are on
0.12.12
r
Hey Arun - we actually removed liveness and startup probes in 0.14.0: https://docs.dagster.io/changelog#0140-never-felt-like-this-before. We suggest that you just remove the configuration entirely if you’re still on an older version. Something like this in your
values.yaml
should do the trick:
Copy code
livenessProbe: {}
a
Hi @rex, I am working on removing them. However, looks like some other end points also seems to have a hit. Before restart, I see that some pages are hanging
Will remove the probes and get more data
@rex Even after making the above change (for both liveness and startup) in
values.yaml
, I still see the liveness and startup probes being set to the dagit pods. Do you why this could happen?
r
hmm.. this might be because of https://github.com/helm/helm/issues/5407. Unfortunately, the workaround specified here (to specify the field as
null
instead of
{}
will not work because of our helm schema. For the
startupProbe
, there is an enabled flag that you can set instead. This will disable the startup probe https://artifacthub.io/packages/helm/dagster/dagster/0.12.12?modal=values&path=dagit.startupProbe.enabled. For the
livenessProbe
, looks like we didn’t add a similar flag in this old version. So you have a couple of options here: 1. manually remove the liveness probe from your deployments using
kubectl edit
or something similar 2. upgrade to 0.14.0 where this is disabled by default 3. you could try to override the livenessProbe so that it’s still enabled, but make the liveness check always return success. This way, it will never fail and you should not see restarts for your dagit.
I tried out (3), rendering the template locally. Using these values, and running this command:
helm template dagster/dagster -g -s templates/deployment-dagit.yaml --version 0.12.12 --values ./values.yaml
Copy code
dagit:
  livenessProbe:
    httpGet: ~
    exec:
      command:
      - true
  startupProbe:
    enabled: false
I get this Kubernetes deployment template for dagit, which has the
startupProbe
removed, and the livenessProbe should always be returning success. Here’s a truncacted snippet of that:
Copy code
...
          volumeMounts:
            - name: dagster-instance
              mountPath: "/opt/dagster/dagster_home/dagster.yaml"
              subPath: dagster.yaml
            - name: dagster-workspace-yaml
              mountPath: "/dagster-workspace/workspace.yaml"
              subPath: workspace.yaml
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          resources:
            {}
          livenessProbe:
            exec:
              command:
              - true
            failureThreshold: 3
            periodSeconds: 20
            successThreshold: 1
            timeoutSeconds: 3

      volumes:
        - name: dagster-instance
          configMap:
            name: RELEASE-NAME-dagster-instance
        - name: dagster-workspace-yaml
          configMap:
a
Thanks Rex. I will try the option 3.
@rex just tried option 3. Looks like it violates the schema constraints. Seeing the following errors for both httpGet and exec.command
Copy code
helmVersion=v3 error="dry-run upgrade for comparison failed: values don't meet the specifications of the schema(s) in the following chart(s):\n
dagster:dagit.livenessProbe.exec.command.0: Invalid type. Expected: string, given: boolean\n- dagit.livenessProbe.httpGet: Invalid type. Expected: object, given: null\n" phase=dry-run-compare
r
try putting quotes to
true
->
"true"
a
Went on a vacation, just seeing this today @rex how about the
httpGet
? Looks the
~
does not work. Sorry not very familiar with helm, do you recommend some other value?