Daniel Chalef

02/24/2023, 4:35 PM
Hi, I have Dagster running in a GCP Cloudrun container and am attempting to schedule jobs on a GKE cluster but getting a
No agent available
error from the cluster. The IAM service account that Dagster container executes as is bound to a Kubernetes service account which has CREATE, GET, LIST, DELETE etc privileges for pods, jobs, pods/log, pods/exec resources. I can schedule jobs, the pods start running, but the dagster op fails with the following error. Any advice?
kubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id': '623296dd-b3ef-4b5f-b283-06e0fe4bc4c7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 24 Feb 2023 16:12:13 GMT', 'Content-Length': '228'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get \\"<>\\": No agent available","code":500}\n'
This looks more like the symptom of the pod failing. Though I think Dagster should deal with it more gracefully.