Hello all Has anyone encountered this issue as well on AWS E dagster #deployment-ecs

Hello all, Has anyone encountered this issue as w...

Arnoud van Dommelen

01/23/2023, 4:40 PM

Hello all, Has anyone encountered this issue as well on AWS ECS Fargate:

Cannotpullcontainererror: check schema1 manifest size has been retried 1 time(s): pulling from host <http://registry-1.docker.io|registry-1.docker.io> failed with status code [manifests 1.0]: 429 Too Many Requests

This happened when I activated a schedule that fired off 1500 jobs, where some jobs that are classified as "failed to start" got the error message above as stopped reason. The weird thing here is that it happens at the ResolvConf_InitContainer, which is a sidecar that is constructed by docker-compose (correct me if I'm wrong) and not the "normal" container on which the job runs. Is there a way to fix this issue? Or a simple explanation of why this error occurs would already be very helpful! Thank you in advance 🙏

Arnoud van Dommelen

01/23/2023, 4:41 PM

There was a different error message as well in some cases:

Cannotpullcontainererror: ref pull has been retried 1 time(s): failed to copy: httpReadSeeker: failed open: unexpected status code <https://registry-1.docker.io/v2/docker/ecs-searchdomain-sidecar/manifests/sha256:d7fb297faf83229eb460a595d9fa316899cb6c09564927ca2be827ec153f736c>: 429 Too Many Requests

Mike Atlas

01/23/2023, 4:42 PM

https://aws.amazon.com/premiumsupport/knowledge-center/ecs-pull-container-error-rate-limit/

Mike Atlas

01/23/2023, 4:45 PM

tldr: Docker.io registry doesn't let you hammer them. you could push a copy of this image to a private ECR repo and pull it as much as you want that way, or you could pay for a docker hub account and use those credentials to avoid being rate limited

Mike Atlas

01/23/2023, 4:45 PM

this isn't really a Dagster issue per se

Mike Atlas

01/23/2023, 4:46 PM

as for ECS, it's too bad there's no image caching or layer torrenting type thing so that hammering the registry 1500 times all in one go isn't necessary...

Mark Fickett

01/23/2023, 8:33 PM

I ran into something vaguely similar with EKS / Fargate. I had hoped that Fargate would make auto-scaling easy, but each Fargate node pulls a fresh Docker image. In our case, the image is 1.6GB, and our Dagster steps are pretty quick, so the image pull substantially increased the per-step time. (We're just using a fixed-size EKS cluster for now.) https://github.com/aws/containers-roadmap/issues/649 and https://github.com/aws/containers-roadmap/issues/696#issuecomment-996917490 are related tickets.

Arnoud van Dommelen

01/24/2023, 8:18 AM

Hi Mike, thanks for the fast reply! What image does Dagster exactly grab from Docker? As far as I know, we have the Daemon, Dagit and gRPC images stored in ECR already. Only the ResolvConf_InitContainer I cannot place, as this one is created automatically when deploying the Dagster infrastructure on AWS. Correct me if I am wrong! :)

Arnoud van Dommelen

01/24/2023, 8:20 AM

@Mark Fickett yes, been waiting quite some time already on the cache functionality for Fargate, would really speed up the process...

Arnoud van Dommelen

01/24/2023, 8:56 AM

This link covers the issue pretty well I think. The sidecar image uses a Docker pull: https://github.com/docker/compose-cli/issues/2190

Mike Atlas

01/24/2023, 5:30 PM

https://docs.dagster.io/_apidocs/libraries/dagster-aws#dagster_aws.ecs.EcsRunLauncher I think you might have

include_sidecars

set to True?

Mike Atlas

01/24/2023, 5:31 PM

Whether each run should use the same sidecars as the task that launches it. Defaults to False.

Mike Atlas

01/24/2023, 5:32 PM

Not sure if you need that sidecar actually,

ecs-searchdomain-sidecar

so maybe that would be all you needed

Arnoud van Dommelen

01/25/2023, 10:56 AM

No! I do not specify

include_sidecars

, I tried to play with that parameter but it did not solve the issue unfortunately

14 Views

Open in Slack

Previous Next