Dagster CI CD question We are looking for a way to kick off dagster #ask-community

Dagster CI/CD question. We are looking for a way t...

Serj Bilokhatniuk

06/02/2022, 3:45 PM

Dagster CI/CD question. We are looking for a way to kick off a dagster job from CI and observe if a job completes or fails as a test. Specific scenario: CI job deploys a repo from PR branch to k8s, starts job X from PR repo, waits for job X to complete, pass the build if job completed successfully, fail otherwise. Thanks!

daniel

06/02/2022, 4:08 PM

Hi Serj - would you want to have a separate dagster k8s cluster for these CI repos? I could imagine not wanting to pollute the production run list, etc. with these more temporary CI runs

Serj Bilokhatniuk

06/02/2022, 4:09 PM

we have dedicated dev cluster, and the plan is to deploy PR repos to it (with some naming tweaks to avoid name collision for pipelines) and then wipe them after some time automatically

daniel

06/02/2022, 4:17 PM

Got it - so I think one missing piece here that would make this a lot easier to set up is this feature request here for letting k8s dynamically spin up new repositories via service discovery: https://github.com/dagster-io/dagster/issues/6295 Solving that will let us address the fact that with the current k8s setup, the workspace.yaml / set of repositories that are considered part of the cluster is static and solely determined by the deployments listed in the Helm chart. Until that's available, I think it would be tricky for a setup like this to support multiple PRs being tested in parallel within the same cluster. Once it is available, I think this could work really nicely - the CI would spin up a new repo, wait for it to be ready, launch a run via the GraphQL API, and then fetch the results from the graphql API too.

Serj Bilokhatniuk

06/02/2022, 4:18 PM

interesting!

Serj Bilokhatniuk

06/02/2022, 4:21 PM

I didn't think about this part hard enough, and my intuition was to just update helm chart with every PR automatically and run deployments. That would also take care of cleanup. You know, when you think there is only one way to do something, you tend to look past whether it's a good way of doing it at all = )

Serj Bilokhatniuk

06/02/2022, 4:21 PM

since we have chart config per env, this would have no impact on prod

daniel

06/02/2022, 4:22 PM

Updating the helm chart would work too (that's what we do for our own CI actually) - but we have to set a concurrency rule where only one test can happen at once because of the above limitations

daniel

06/02/2022, 4:23 PM

Looking forward to being able to drop that restriction

daniel

06/02/2022, 4:24 PM

The set of steps you put in your post above is more or less exactly what we do - run a helm upgrade, then use the Dagster graphql client to launch a run over dagit and see if it finished. But that's a test that we run on master rather than on each PR

Serj Bilokhatniuk

06/02/2022, 4:24 PM

can you elaborate on concurrency issue? I thought the only problem deploying the same repo multiple times would be a pipeline name collision, what am I not thinking about?

Serj Bilokhatniuk

06/02/2022, 4:25 PM

to quote Tolstoy: "All good CI systems are the same, all bad ones are bad in their unique way"

daniel

06/02/2022, 4:27 PM

Sure. The way that we have it set up is that our helm chart looks like:

Copy code

dagster-user-deployments:
  enabled: true
  deployments:
    - name: "example-repo"
      image:
        repository: "our-internal-registry/user-code-example"
        tag: latest
        pullPolicy: Always
      dagsterApiGrpcArgs:
        - "--python-file"
        - "/example_project/example_repo/repo.py"
      port: 3030

and then our CI system builds a new user-code-example image, and upgrades the helm chart, templating in a new tag. That makes the dev cluster redeploy the "example-repo" pod in the cluster (each deployment in the Dagster helm chart runs in its own service to serve metadata about the jobs/run sensors/etc.). If two tests did this at the same time, you would end up with the deployments stepping on top of each other

daniel

06/02/2022, 4:28 PM

the underlying problem is that the set of code that's running in the cluster is uniquely determined by a centralized "dagster-user-deployments" list of deployments - you can't have one test adding one entry and another test adding a different entry at the same time

Serj Bilokhatniuk

06/02/2022, 4:29 PM

ah, I see, there is a race to update

dagster-user-deployments

map

daniel

06/02/2022, 4:29 PM

yeah

Serj Bilokhatniuk

06/02/2022, 4:29 PM

I was thinking of actually adding and removing entries with per test-instance names

daniel

06/02/2022, 4:29 PM

That could work, but I think you still want to make sure that two things aren't running helm upgrade at the same time

daniel

06/02/2022, 4:30 PM

not positive how well helm handles concurrency

Serj Bilokhatniuk

06/02/2022, 4:30 PM

so, we have the setup where upgrade runs only on the job that is watching helm config repo

Serj Bilokhatniuk

06/02/2022, 4:31 PM

GitOps style, all builds only update the config map by making git commits, and then a trigger is watching the config

daniel

06/02/2022, 4:31 PM

Ah nice, that could work!

Serj Bilokhatniuk

06/02/2022, 4:31 PM

there would be a hell of redeployments and still a race on commits made from the same base

Serj Bilokhatniuk

06/02/2022, 4:32 PM

so that is not a perfect solution for this problem, the jobs that update values.yaml would still need some kind of semaphore

Serj Bilokhatniuk

06/02/2022, 4:33 PM

but even if there is a race, all I would need to do is rerun the failed job, because the values.yaml would remain consistent simply thanks to git

daniel

06/02/2022, 4:34 PM

yeah I think the ideal end state here is to have this source of truth for 'what code to spin up' in a dagster database instead of trapped in a helm values.yaml, it will give us a lot more flexibility to support use cases like this

Serj Bilokhatniuk

06/02/2022, 4:34 PM

for sure

Serj Bilokhatniuk

06/02/2022, 4:34 PM

just as I was saying, when you think there is only one way of doing something.... = )

Serj Bilokhatniuk

06/02/2022, 6:08 PM

@Charles Leung, PTAL ^^

👍 1

40 Views

Open in Slack

Previous Next