https://dagster.io/ logo
#deployment-kubernetes
Title
# deployment-kubernetes
a

Alex Remedios

07/07/2022, 11:20 AM
Hi folks, I’m running 128 parallel tasks on 0.15.0 with the k8s_executor and am seeing sporadic groups of failing ops with this k8s event on failing pods
Copy code
Error: failed to reserve container name "dagster_dagster-step-7ac915e19de074261268d861f51d1504-lh5gh_x-dagster_931e866f-92f0-4af8-b087-875b78dd1128_0": name "dagster_dagster-step-7ac915e19de074261268d861f51d1504-lh5gh_x-dagster_931e866f-92f0-4af8-b087-875b78dd1128_0" is reserved for "f4795a5d5e4e9c42a46bf59b3a98d1401fc871a03226a71479c8a65c4c15a21c"
Seems like this could be retry-related, but would be keen to find anyone else who has seen this.
j

johann

07/07/2022, 12:13 PM
Hmm I haven’t seen this, are you on GKE?
a

Alex Remedios

07/07/2022, 12:14 PM
Thanks I had a glance at this but I’m using EKS. I’ve generally found that over 100 ops on the K8s jobs via dagster has a long tail of errors like this. Most likely I’ll try re-frame our use case
j

johann

07/07/2022, 12:29 PM
Hmm sorry you’re running in to stuff like this
Seems like the common thread on this error at least is clusters being under heavy load https://github.com/elastic/cloud-on-k8s/issues/2632
I’m a bit curious what container
f4795a5d5e4e9c42a46bf59b3a98d1401fc871a03226a71479c8a65c4c15a21c
is in your example. Are 2 containers trying to create for the same pod?
a

Alex Remedios

07/07/2022, 1:06 PM
f4795a5d5e4e9c42a46bf59b3a98d1401fc871a03226a71479c8a65c4c15a21c
I believe the dagster step container. It’s a standard setup that works fine 99% of the time. So it’s evidently some race condition triggered by transient errors during a scale-up.
r

Roei Jacobovich

07/16/2022, 6:31 AM
@Alex Remedios we’re using EKS as well and it happens to us mostly during high-peaks. Did you solve it somehow? thanks.
a

Alex Remedios

07/17/2022, 10:56 AM
hi Roei, I’ve resolved to use the dagster multiprocessing executor for orchestration then submit tasks to a ray cluster which may be better suited for high-dimensional homogenous compute.
👍 1
2 Views