Dagster CI/CD question. We are looking for a way t...
# ask-community
Dagster CI/CD question. We are looking for a way to kick off a dagster job from CI and observe if a job completes or fails as a test. Specific scenario: CI job deploys a repo from PR branch to k8s, starts job X from PR repo, waits for job X to complete, pass the build if job completed successfully, fail otherwise. Thanks!
Hi Serj - would you want to have a separate dagster k8s cluster for these CI repos? I could imagine not wanting to pollute the production run list, etc. with these more temporary CI runs
we have dedicated dev cluster, and the plan is to deploy PR repos to it (with some naming tweaks to avoid name collision for pipelines) and then wipe them after some time automatically
Got it - so I think one missing piece here that would make this a lot easier to set up is this feature request here for letting k8s dynamically spin up new repositories via service discovery: https://github.com/dagster-io/dagster/issues/6295 Solving that will let us address the fact that with the current k8s setup, the workspace.yaml / set of repositories that are considered part of the cluster is static and solely determined by the deployments listed in the Helm chart. Until that's available, I think it would be tricky for a setup like this to support multiple PRs being tested in parallel within the same cluster. Once it is available, I think this could work really nicely - the CI would spin up a new repo, wait for it to be ready, launch a run via the GraphQL API, and then fetch the results from the graphql API too.
I didn't think about this part hard enough, and my intuition was to just update helm chart with every PR automatically and run deployments. That would also take care of cleanup. You know, when you think there is only one way to do something, you tend to look past whether it's a good way of doing it at all = )
since we have chart config per env, this would have no impact on prod
Updating the helm chart would work too (that's what we do for our own CI actually) - but we have to set a concurrency rule where only one test can happen at once because of the above limitations
Looking forward to being able to drop that restriction
The set of steps you put in your post above is more or less exactly what we do - run a helm upgrade, then use the Dagster graphql client to launch a run over dagit and see if it finished. But that's a test that we run on master rather than on each PR
can you elaborate on concurrency issue? I thought the only problem deploying the same repo multiple times would be a pipeline name collision, what am I not thinking about?
to quote Tolstoy: "All good CI systems are the same, all bad ones are bad in their unique way"
Sure. The way that we have it set up is that our helm chart looks like:
Copy code
  enabled: true
    - name: "example-repo"
        repository: "our-internal-registry/user-code-example"
        tag: latest
        pullPolicy: Always
        - "--python-file"
        - "/example_project/example_repo/repo.py"
      port: 3030
and then our CI system builds a new user-code-example image, and upgrades the helm chart, templating in a new tag. That makes the dev cluster redeploy the "example-repo" pod in the cluster (each deployment in the Dagster helm chart runs in its own service to serve metadata about the jobs/run sensors/etc.). If two tests did this at the same time, you would end up with the deployments stepping on top of each other
the underlying problem is that the set of code that's running in the cluster is uniquely determined by a centralized "dagster-user-deployments" list of deployments - you can't have one test adding one entry and another test adding a different entry at the same time
ah, I see, there is a race to update
I was thinking of actually adding and removing entries with per test-instance names
That could work, but I think you still want to make sure that two things aren't running helm upgrade at the same time
not positive how well helm handles concurrency
so, we have the setup where upgrade runs only on the job that is watching helm config repo
GitOps style, all builds only update the config map by making git commits, and then a trigger is watching the config
Ah nice, that could work!
there would be a hell of redeployments and still a race on commits made from the same base
so that is not a perfect solution for this problem, the jobs that update values.yaml would still need some kind of semaphore
but even if there is a race, all I would need to do is rerun the failed job, because the values.yaml would remain consistent simply thanks to git
yeah I think the ideal end state here is to have this source of truth for 'what code to spin up' in a dagster database instead of trapped in a helm values.yaml, it will give us a lot more flexibility to support use cases like this
for sure
just as I was saying, when you think there is only one way of doing something.... = )
@Charles Leung, PTAL ^^
👍 1