The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

I'm seeing some k8s pods fail with `"Temporary failure in name resolution" calling Dagster graphql API causes pod failures` today (around 8AM and 2PM EST). Full stack / pod log attached. Could this be caused by anything Dagster side? Is this a network issue in my k8s cluster? No changes to my deployment today that I'm aware of.

Hey Mark - I'll ask around on team and check our logs, but I haven't seen any other reports of this particular error. My suspicion would a network error in your k8s cluster, but i'll see if I can find anything to confirm or deny that

<@U02P92K2FRQ> :point_up_2::skin-tone-2: I was seeing a DNS error vaguely similar to yours last week. No resolution, but it hasn't seemed to come back. My best guess is that our EKS service pod running the DNS got overloaded, or was making too many upstream network requests.

<@U02HUT2T17Z> what Dagster version and package version are you using? We’re also experiencing the same issue running our jobs in EKS using Dagster version 1.3.5 with version 0.19.5 packages. Some jobs are failing because of the connection to RDS, which requires dns. We’ve noticed it happen in two situations:
• our op that connects to RDS
• dagster-postgres processes that occurs in the background (not related to our code)

We haven't seen this come up again. We're no2 on 1.3.6, I think at that point we were on 1.1.20.