https://dagster.io/ logo
#deployment-ecs
Title
# deployment-ecs
t

Tiri Georgiou

05/21/2021, 1:06 PM
Hey, curious if anyone could point me in the right direction on this channel. I've tried rewriting this https://github.com/dagster-io/dagster/blob/0.11.10/examples/deploy_docker/docker-compose.yml in the form of an ECS task using terraform. I've provisioned my dagster.yaml file to insert credentials for a separate RDS postgres db (using terraform). When I provision the cluster and the resources it seems to be stuck in
pending
and doesn't actually run. I suppose I could share my
ecs task definition
if it helps, but I was wondering if anyone could point me towards some resources on deploying to ECS? (happy to share more info on this)
j

jordan

05/21/2021, 1:32 PM
Are there any cloudwatch logs you can share?
t

Tiri Georgiou

05/21/2021, 1:40 PM
It comes up with an alarm for CapacityProviderReservation. I mean it really a test right now and it shouldnt consume much memory/cpu as its a literally a hello world example
this is my aws_ecs_task_definition:
Copy code
resource "aws_ecs_task_definition" "dagster_task" {
  family             = var.ecs_dagster_cluster
  execution_role_arn = aws_iam_role.ecs_dagster.arn
  task_role_arn      = aws_iam_role.ecs_dagster.arn // task to have permissions
  network_mode       = "bridge"
  container_definitions = jsonencode([
    {
      name      = "dagit"
      image     = "${data.aws_ecr_repository.daemon_dagit.repository_url}:latest"
      cpu       = 250
      memory    = 256
      essential = false
      hostname  = "docker-dagit"
      portMappings = [
        {
          protocol      = "tcp"
          containerPort = 3000
          hostPort      = 3000
        }
      ]
      environment = [
        {
          name  = "DAGSTER_HOSTNAME"
          value = aws_db_instance.pg.address
        },
        {
          name  = "DAGSTER_POSTGRES_USER"
          value = "pod"
        },
        {
          name  = "DAGSTER_POSTGRES_PASSWORD"
          value = local.secret
        },
        {
          name  = "DAGSTER_POSTGRES_DB"
          value = "podpoint"
        }
      ]
      entryPoint = ["sh", "-c", "dagit", "-h", "0.0.0.0", "-p", "3000", "-w", "workspace.yaml"]
      mount_points = [
        {
          containerPath = "/var/run/docker.sock"
          sourceVolume  = "docker_sock"
          readOnly      = true
        }
      ]
    },
    {
      name      = "daemon"
      image     = "${data.aws_ecr_repository.daemon_dagit.repository_url}:latest"
      cpu       = 250
      memory    = 256
      essential = false
      hostname  = "docker-daemon"
      environment = [
        {
          name  = "DAGSTER_HOSTNAME"
          value = aws_db_instance.pg.address
        },
        {
          name  = "DAGSTER_POSTGRES_USER"
          value = "pod"
        },
        {
          name  = "DAGSTER_POSTGRES_PASSWORD"
          value = local.secret // secret value defined in <http://postgres.tf|postgres.tf>
        },
        {
          name  = "DAGSTER_POSTGRES_DB"
          value = "podpoint"
        }
      ]
      entryPoint = ["sh", "-c", "dagster-daemon", "run"]
      mount_points = [
        {
          containerPath = "/var/run/docker.sock"
          sourceVolume  = "docker_sock"
          readOnly      = true
        }
      ]
    },
    {
      name      = "uptime"
      image     = "${data.aws_ecr_repository.uptime.repository_url}:latest"
      cpu       = 250
      memory    = 256
      essential = true
      hostname  = "docker-uptime" // Same name in workspace.yml
      environment = [
        {
          name  = "DAGSTER_HOSTNAME"
          value = aws_db_instance.pg.address
        },
        {
          name  = "DAGSTER_POSTGRES_USER"
          value = "pod"
        },
        {
          name  = "DAGSTER_POSTGRES_PASSWORD"
          value = local.secret
        },
        {
          name  = "DAGSTER_POSTGRES_DB"
          value = "podpoint"
        },
        {
          name  = "DAGSTER_CURRENT_IMAGE"
          value = "${data.aws_ecr_repository.uptime.repository_url}:latest"
        }
      ]
      entryPoint = ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-f", "uptime_dags/repository.py"]
    }
  ])
have in autoscaling group with a launch config with a
t2.small
I suppose the entrypoint in
uptime
container isn't needed because there is a CMD in the image with the same execution. I just did wrote it there as an experiment.
j

jordan

05/21/2021, 1:53 PM
Out of curiosity, what happens if you give it a little more memory? I don’t know off hand what dagster daemon’s typical memory footprint looks like.
I’m just trying to reproduce this using your terraform - are those images the same ones that we publish out to dockerhub/ecr? or are they your own?
Hm. I was able to get both running using your task definitions as is (obviously subbing our own own iam roles and what not) are you sure your cluster is provisioned correctly?
t

Tiri Georgiou

05/21/2021, 3:10 PM
For the dagit, and daemon yes they are the same
did you use the same memory and cpu?
the uptime container is pretty much a standard hello world (the one you generate when you set up a dagster project )
I'm starting to think the cluster provisions might be an issue, could you reference the provisions you used as a starting point?
The tf files don't contain any sensitive data so I can share these with you..
j

jordan

05/21/2021, 3:21 PM
I just clicked through the ECS creation wizard in the AWS UI and used its defaults - but I can take a look at the tf files you just shared.
And yeah, I used the same memory and cpu as you.
t

Tiri Georgiou

05/21/2021, 3:22 PM
Okay I didn't think it was down to mem/cpu
j

jordan

05/21/2021, 3:22 PM
Just running locally, it looks like it takes ~70mb of memory. So there should be plenty of headroom there.
t

Tiri Georgiou

05/21/2021, 3:23 PM
👌
Is there a specific role that needs to be set.. I used
Copy code
data "aws_iam_policy_document" "assume_role_policy_dagster" {
  statement {
    sid     = "3"
    actions = ["sts:AssumeRole"]

    principals {
      type = "Service"
      // <http://ecs-tasks.amazonaws.com|ecs-tasks.amazonaws.com> gives cluster permission to run tasks
      identifiers = ["<http://ec2.amazonaws.com|ec2.amazonaws.com>", "<http://ecs-tasks.amazonaws.com|ecs-tasks.amazonaws.com>"]
    }
  }
}
^^ with ecs-tasks
j

jordan

05/21/2021, 3:49 PM
For
resource.aws_ecs_task_definition.dagster_task.execution_role_arn
, I used a role that has the AWS managed
AmazonECSTaskExecutionRolePolicy
attached to it. I see you’re using a different managed policy - perhaps that’s related?
actually - disregard - the permissions in the one i use look like a subset of the permissions in the one you used
t

Tiri Georgiou

05/21/2021, 3:53 PM
yeah I think that comes under the above policy
ohh
Copy code
// ---- ECS SERVICES ----

resource "aws_ecs_service" "dagster" {
  name            = var.ecs_dagster_cluster
  cluster         = aws_ecs_cluster.dagster.id
  task_definition = aws_ecs_task_definition.dagster_task.arn
  iam_role        = aws_iam_role.ecs_dagster.arn // <<--- HERE? DIDNT ADD THIS?
  desired_count   = 1
  depends_on = [
    <http://aws_db_instance.pg|aws_db_instance.pg>
  ]
  capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.dagster_cp.name
    weight            = 100
  }
}
^^ maybe this?
didnt think I needed to attach the iam role on the service itself, although the task definition has a role attached
j

jordan

05/21/2021, 4:01 PM
hm. not sure. worth a try. anything special about that ami you included in the files you gave me? or can i just sub in whatever the latest amazon linux ami is?
t

Tiri Georgiou

05/21/2021, 4:02 PM
nothing fancy, there are specific ami's for container applications but as a simple application doesnt make much of a diff
Ah this is only for loadbalancers
j

jordan

05/21/2021, 7:13 PM
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_service#capacity_provider_strategy have you tried adding your capacity provider to your service? when i spun up your stack with the terraform files you sent, i was seeing that it couldn’t place any tasks.
t

Tiri Georgiou

05/24/2021, 8:48 AM
@jordan yes I managed to get it working and it the tasks run finally👍
only issue is I have an error in my GRPC server, I know in the task for the server I changed the hostname to match that in the
workspace.yaml
?
My guess is we need to some how pass in some variable generated when the container is created (Endpoint?)
^^ is there a way to pass an
env
variable like you can in dagster.yaml i.e.
Copy code
load_from:
  - grpc_server:
      host: 
        env: HOSTNAME
      port: 4000
      location_name: "server_uptime"
?
d

daniel

05/24/2021, 2:28 PM
@Tiri Georgiou what you wrote there should work exactly as written I think!
👌 1
(dagster.yaml and workspace.yaml use the same config parsing functionality)
6 Views