Hi, everyone, good morning! Has anyone here ever ...
# deployment-kubernetes
i
Hi, everyone, good morning! Has anyone here ever deployed Dagster on AWS using EKS? If so, is it very straightforward, or are there some gotchas I should pay attention to? And, how did you configured the CI/CD to update the code deployment instead of recreating a new instance of Dagster everytime?
m
We're deploying a hybrid Dagster deployment on EKS, so we're running the agent (and code servers they spawn) but not Dagit / event storage. We're managing everything through Pulumi, and although applying changes is not managed through CI it could in theory be since Pulumi detects what needs to change. In terms of Dagster, it was fairly straightforward. My first time using EKS, so it took a while to figure out what to use (no Fargate because of node startup time, no autoscaler yet). For CI to update code locations, we're just calling the CLI by setting the token in an env var and then calling
dagster-cloud workspace update-code-location
(or whatever it actually is), which will update an existing code location.
❤️ 2
a
We’ve been up and running for over 6 months now, but no gotchas that I remember. We’re running a full deployment. The database is RDS. We used flux to manage our helm installs. This does mean we separate the dagster daemon and user code helm charts, since upgrade them separately. (Does that count as a gotcha?)
❤️ 1
👀 1
d
We have Dagster deployed via Terraform on AWS using EKS. The only gotchas I can recall have been alleviated thanks to issues opened and support in this channel.
❤️ 1
As far as how we update the user code deployment via CI/CD, we set a Terraform Environment Variable in Gitlab-CI after the job builds/tags/pushes, which is then used as the value of the
dagster-user-deployments.deployments[0].image
❤️ 1
i
@Dusty Shapiro, would you mind sharing the Terraform script template that you're using? If it's not a lot of work to delete every private configuration, of course.
@Adam Bloom, Thanks for the insight I'll give read at flux documentation to see how it works, my plan is to do the same kind of deployment with user code separated from the rest
@Mark Fickett, Thank for the explanation it can be useful depending on how I'll try to build my own deployment 🙂
a
we managed the main dagster deployment with terraform + helm as well. It's pretty simple:
Copy code
resource "helm_release" "main" {
  name       = "dagster"
  chart      = "dagster"
  repository = "<https://dagster-io.github.io/helm>"
  namespace  = local.namespace
  version    = var.helm_version

  values = [
    templatefile("${path.module}/templates/values.yaml", {
      user_code_image   = aws_ecr_repository.user_code.repository_url,
      rds_endpoint      = var.rds_endpoint,
      rds_username      = local.rds_username
      rds_password      = random_password.dagster.result
      rds_database      = local.rds_database
      log_bucket_name   = var.log_bucket_name
      log_bucket_domain = var.log_bucket_domain
    })
  ]

  depends_on = [
    kubernetes_namespace.dagster,
    kubernetes_secret.docker_secrets,
    kubernetes_secret.user_code_secrets
  ]
}
in your values.yaml, there's some special config needed to separate the user code deployments:
Copy code
dagster-user-deployments:
  enabled: true
  enableSubchart: false
❤️ 1
m
@Mark Fickett We’re starting to try out a hybrid deployment and are also thinking about using Pulumi to spin everything up. Do you have any templates for this you’d be willing to share?
m
I left a chunk of what I'm using for our EKS cluster as an answer on this SO question: https://stackoverflow.com/questions/75023584/how-can-you-make-kubernetes-gracefully-handle-excessive-memory-usage-instead-of . I don't think I have bandwidth to pull out other examples unfortunately (takes some time to pick out what has internal details etc from what's generic).
❤️ 1
m
That’s a great starting point, thanks!