Hi our code deployment suddenly failed and started giving th dagster #dagster-plus

Hi, our code deployment suddenly failed and starte...

Abhishek Agrawal

03/07/2023, 4:52 AM

Hi, our code deployment suddenly failed and started giving this DagsterUserCodeUnreachableError error. It was working fine until an hour ago and there have been no updates to the docker image. The agent pod is giving this error in the logs.

Tobias Pankrath

03/07/2023, 7:56 AM

Might be the same as my problem from the support channel?

Abhishek Agrawal

03/07/2023, 8:11 AM

Maybe.. mine was working and then all of a sudden this error started coming

daniel

03/07/2023, 1:25 PM

If you run 'kubectl describe` on the pod with 'spina' in the name that it's trying to connect to there, is there any indication why it failed to start up?

Abhishek Agrawal

03/07/2023, 1:53 PM

The logs don't show any issue at all. Just normal ones. I will run describe and let you know here.

daniel

03/07/2023, 1:55 PM

Is the pod possibly taking more than 3 minutes to start up? The timeout that it's using there is configurable in the agent helm chart

daniel

03/07/2023, 1:56 PM

if it's expected for it to take more than 3 minutes for your code to load after the pod starts up, you can increase this

serverProcessStartupTimeout

key here in the Helm chart to be greater than the default value of 180: https://artifacthub.io/packages/helm/dagster-cloud/dagster-cloud-agent?modal=values-schema&path=workspace.serverProcessStartupTimeout

Abhishek Agrawal

03/07/2023, 2:13 PM

Loading code means pulling docker image? Actually it doesn't take that long to pull the image, maybe 30-40s.

daniel

03/07/2023, 2:13 PM

I was referring to importing your Python / loading your Dagster definitions in code

daniel

03/07/2023, 2:14 PM

If you have logs or a 'kubectl describe' output that might give a sense of what was happening during those 3 minutes before it timed out, we can take a look

👍 1

Abhishek Agrawal

03/07/2023, 2:18 PM

Got it. I think I know what you mean now and it might be the reason. We are running a loop which is taking really long. I will try to push some code changes and see. Thanks!

3 Views

Open in Slack

Previous Next