https://dagster.io/ logo
#dagster-cloud
Title
# dagster-cloud
z

Zach

03/07/2024, 4:57 PM
I'm getting
Copy code
dagster._core.errors.DagsterUserCodeUnreachableError: Timed out waiting for call to user code GET_EXTERNAL_EXECUTION_PLAN [497304d0-36be-4b85-b91e-70e078eb1e00]
New or dormant Branch Deployments can take time to become ready, try again in a little bit.
on a branch deployment that I just ran a job on about an hour ago. Any suggestions on how to trouble shoot? It's unclear to me if there's any way to figure out what ECS service / task is currently serving a specific branch deployment as they're all named using UUIDs and we have 9 different branch deployments being served right now (it'd be nice if they were at least tagged with the branch or something like that - maybe I'll make a PR for that). I'm not seeing any new tasks trying to spin up in our branch deployment ECS cluster when I try to launch a job
🤖 1
d

daniel

03/07/2024, 5:02 PM
Hey Zach - the UUID for the branch deployment should be in the URL in the Dagster Cloud UI. That should be in the name of the relevant ecs service (and is also in the dagster/deployment_name tag in the Tags tab in the ECS console for that servcie)
z

Zach

03/07/2024, 5:03 PM
Ah okay that makes sense!
Hmm yeah it seems like no new tasks for runs are being spun up from any of our branch deployments
d

daniel

03/07/2024, 5:06 PM
The services are up and running though?
z

Zach

03/07/2024, 5:06 PM
Yes
d

daniel

03/07/2024, 5:06 PM
How's CPU/memory looking on the Health tab?
any chance they are overloaded?
z

Zach

03/07/2024, 5:07 PM
I tried redeploying a code location and it seemed to go okay. <1% cpu, ~25% mem usage
d

daniel

03/07/2024, 5:07 PM
and this was working fine until recently?
z

Zach

03/07/2024, 5:08 PM
Yeah I ran one about an hour ago just fine
Hmm the agent stopped reporting
d

daniel

03/07/2024, 5:08 PM
Any logs or errors from the agent?
z

Zach

03/07/2024, 5:09 PM
Hmm interesting cpu utilization plummeted and memory usage slightly increased after being completely stable about an hour ago
No logs in the agent for the last 90 minutes
I think I'll try just redeploying the agent
Weird, the agent on our prod deployment also stopped reporting about an hour ago. I did some Cloudformation stuff around then, starting to suspect I messed something up there
d

daniel

03/07/2024, 5:35 PM
Same agent serving both prod and branch deployments? or two different agents?
z

Zach

03/07/2024, 6:17 PM
Two different agents
d

daniel

03/07/2024, 6:17 PM
Two going down at once is certainly unusual
z

Zach

03/07/2024, 6:19 PM
Yeah it must be some dependency that got removed when I deleted a cloudformation stack. Just weird because all our deployments have their own stacks, but maybe some resources got changed outside of IaaC... Still learning some discipline there
It's interesting that the agents aren't producing any logs though
4 Views