Hi. :wave: Im switching from Serverless to Hybrid ...
# dagster-plus
n
Hi. 👋 Im switching from Serverless to Hybrid mode, using the ECS and in the CloudFormation step in AWS, I'm getting issues. When creating the task and after 1hr, the stack is failing with the following reason (when creating the agentService):
Resource handler returned message: "Error occurred during operation 'ECS Deployment Circuit Breaker was triggered'." (RequestToken: xxxx-xxx-xxx, HandlerErrorCode: GeneralServiceException)
Does anyone faced similar issue before? Thanks.
d
Hey Nicolas - if you go into the ECS console are there any more logs from the service it tried to create that might help explain what's going on?
You might need to do the Cloudformation deploy again - but during that period try clicking thruogh into the ECS console under "Services" and "Tasks" and if a service is having trouble spinning up there will typically be some logs explaining why the service or task failed to start up
n
I will rerun here, and keep an eye there to see if i can get any more detailed log. Will let you know later. Thanks.
🙏 1
@daniel This is what I got there:
CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref docker.io/dagster/dagster-cloud-agent:1.1.20: failed to do request: Head "https://registry-1.docker.io/v2/dagster/dagster-cloud-agent/manifests/1.1.20": dial tcp xx.xxx.xx.xxx:yyy i/o timeout
d
Are you using the cloudformation template that creates a new VPC or the one that installs it in an existing VPC?
n
the ones in existing VPC. I've tried with new VPC now and worked... but not sure why didnt work with existing one!
d
Here's a list of networking requirements when running the Dagster Cloud agent in an existing VPC and how to check for them - the problem here likely stems from the networking configuration and/or security group rules in your cluster/VPC:
Copy code
- The VPC needs to use route53 for DNS
    - You can verify this by looking at the DHCP option set on the VPC
- The VPC needs to have assign_hostnames enabled
- The "default" security group in the VPC needs the following rules
    - An ingress rule that allows traffic from other addresses within the default security group. this allows the agent and grpc server to communicate with each other
    - Open egress from addresses in the Security Group to the internet, this allows the agent to communicate with Dagster Cloud
- (if using private subnets) The network ACL should allow the same rules as the security group, egress to the public internet and ingress from other hosts in the private subnet

How to check things:
- For the VPC DNS you can go to the VPC console, find the VPC the user wants and click on the DHCP option set
- For the security group go to the security groups section in the VPC console, filter for your VPC and find the one named "default"
- For the network ACLs you'll first need to find the subnet which you can also find from the VPC console and click on the tab for network ACLs
💎 1
n
awesome, thank you!
@daniel i've switched from Serverless to Hybrid with new VPC and worked. And now i want to apply the code location. I've followed the instructions to use this github workflows, created an image and pushed to ECR, attaching it on dagster_cloud.yaml.. but when the github action is running in my PR, im getting the following error under code location:
Copy code
Exception: Invalid image ****.dkr.ecr.***.<http://amazonaws.com/***:****|amazonaws.com/***:****>. Only images managed by Dagster Cloud can be used in Serverless deployments.
Got confused because i switched to Hybrid, and the error is saying about serverless. Could you help me understanding why, please. (Let me know if i should ask it in difference place/channel) Thanks
d
Did you switch to the hybrid github action instead of the serverless one?
Looks like it from your link
What does your dagster_cloud.yaml say now?
Youll want it to look like the example here: https://github.com/dagster-io/dagster-cloud-hybrid-quickstart/blob/main/dagster_cloud.yaml (i.e. it should reference your ECR registry so that it can build and push to it as part of the github action)
n
this is my dagster_cloud.yaml
Copy code
locations:
  - location_name: dagster_pipelines
    code_source:
      package_name: package_name
    build:
      directory: ./
      registry: ***.dkr.ecr.***.<http://amazonaws.com/***|amazonaws.com/***>
d
Is it possible to share your github action yaml?
The deploy.yml or branch_deployments.yml, whichever one is firing when you hit the error
n
for sure!
branch_deployments.yml
d
Where exactly are you seeing the error about the image? Do you have a stack trace?
n
its on the Github Actions
and then, when I go to dagster cloud and click on view error i see that message:
Copy code
Exception: Invalid image ****.dkr.ecr.***.<http://amazonaws.com/***:****|amazonaws.com/***:****>. Only images managed by Dagster Cloud can be used in Serverless deployments.
d
Oh sorry, this is a branch deployment, I see the problem. One moment and I'll fix this on our side, sorry for the trouble
🙌 1
n
no worries!
d
while I fix this, the other thing that you'll need to do is set up a hybrid agent that serves your branch deployments as well as your prod deployments. There should have been an "EnableBranchDeployments" option when you ran the cloudformation template, you'll want that to be true for this to work
👍 1
n
ohh, got it! its being false now! Im gonna change!
now it worked! I think the flag being as false was the problem. Thank you, appreciate it. 🙌
condagster 1
quick question, regarding the environment variables, should i do it through the environment variables from dagter cloud, or by adding
env_vars
under
branch_deployments.yml
file? on serverless approach i was using the latter.
d
Here's a doc that goes through the different options for setting environment variables in Cloud: https://docs.dagster.io/dagster-cloud/developing-testing/environment-variables-and-secrets#dagster-cloud-environment-variables-and-secrets
🙏 1
140 Views