Hey dagster people I m currently running Dagster on EKS w th dagster #deployment-kubernetes

Hey :dagster: people! I'm currently running Dagste...

Michel Rouly

05/10/2023, 4:18 AM

Hey D people! I'm currently running Dagster on EKS w/ the

K8sRunLauncher

. Is it possible to configure the

K8sRunLauncher

to set certain specific metadata on run pods only, but not on step pods?

Michel Rouly

05/10/2023, 4:19 AM

Some context. I'd like to schedule run pods on EKS Fargate to take advantage of isolation and stability for these long-lived pods.

Michel Rouly

05/10/2023, 4:20 AM

If I set

fargate=true

on the Dagster user code location server, every pod it schedules (run and step) will get

fargate=true

because of

includeConfigInLaunchedRuns

Michel Rouly

05/10/2023, 4:21 AM

if I set the label in the Dagster Helm chart's k8sRunLauncher config, it also seems to set the label on both runs and steps. I've tried both of these options below.

Copy code

k8sRunLauncher: {
// labels: {
//   fargate: 'true',
// },
// runK8sConfig: {
//   podTemplateSpecMetadata: {
//     labels: {
//       fargate: 'true',
//     }
//   }
// },

Michel Rouly

05/10/2023, 4:31 AM

I'm thinking this might be something to do with the

k8s_job_executor

inheriting some configurations from the

K8sRunLauncher

possibly? since it's technically the executor that's responsible for scheduling step pods

Michel Rouly

05/10/2023, 4:32 AM

there's this kind of poorly documented

pipelineRun

value in the Helm chart that might be relevant but I can't quite figure it out yet

Michel Rouly

05/10/2023, 4:44 AM

Yeah so it's definitely the executor doing the label inheriting. https://github.com/dagster-io/dagster/blob/b204c2784929e0aefdeda1c5131d296a1395fee1/python_modules/libraries/dagster-k8s/dagster_k8s/executor.py#L91

Michel Rouly

05/10/2023, 4:47 AM

Alright. I think I'm answering my own question here. I'll probably need to override the label specifically in executor configs, the defaulting behavior is pretty hardcoded

daniel

05/10/2023, 2:44 PM

This should probably be easier, in general configuration is passed through. But one way to set config on the run pod that won't apply to the step pod is to put in a tag on the job: https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment#per-job-or-per-op-kubernetes-configuration . But that's kind of annoying because you have to tag every job individually. Another way is to override the labels on the op (kind of like your conclusion, but you'd have to put it in a tag on the op - we don't yet support raw k8s configuration on the executor that applies to every step pod) - that's annoying for similar reasons since you need to tag every op. You can use a factory method that applies the tags on every job / op though, which can help a bit

Michel Rouly

05/10/2023, 3:09 PM

Awesome, thanks for confirming @daniel. I was thinking along those lines as well. Sounds like I'm not missing anything obvious here, so I'll keep driving along in this direction.

Michel Rouly

05/10/2023, 10:53 PM

So I thought I was being clever, and I had subclassed

K8sRunLauncher

with this:

Copy code

class EksFargateRunLauncher(K8sRunLauncher):
    def get_container_context_for_run(self, dagster_run: DagsterRun) -> K8sContainerContext:
        container_context = super().get_container_context_for_run(dagster_run)
        container_context.labels["fargate"] = "true"
        return container_context

and configured it appropriately as the custom run launcher for Dagit/Daemon via the Dagster Helm chart. Packaged it up in a custom Docker image and everything. It even appropriately schedules runs on Fargate (yay!!).

Michel Rouly

05/10/2023, 10:54 PM

....however, turns out that run pods, which inherit their image from user code deployments, still need to be able to resolve the run launcher (???)

Michel Rouly

05/10/2023, 10:56 PM

Copy code

dagster._check.CheckError: Failure condition: Couldn't import module lib_dagster.aws.eks when attempting to load the configurable class lib_dagster.aws.eks.EksFargateRunLauncher

this gets thrown after the run pod is scheduled and started up. I'm fairly certain this means Dagit/Daemon are working OK since the run pod even got scheduled. And that it's the run pod which, for whatever reason, needs to have a copy of the run launcher?

Michel Rouly

05/10/2023, 11:04 PM

I can just upgrade the user code deployments to include my new class. Not an issue....

Michel Rouly

05/10/2023, 11:04 PM

but that does make me realize the run launcher is only one half of the problem. k8s_job_executor also inherits k8s configs for steps from the user code location server. meaning, if I want to run my code location servers on Fargate (I do) I will also need to exclude the

fargate

label from them

Michel Rouly

05/10/2023, 11:07 PM

so I think I may be better served • using the default

K8sRunLauncher

configured to pass

fargate=true

for run pods • running user code locations on Fargate • subclassing the

k8s_job_executor

instead to ignore

fargate=true

unfortunately this means I'll need to use a custom executor in my downstream Dagster code locations which I think is probably lower impact to users than mucking with job tags. since there's only one Definitions per location, but there are potentially many jobs per location.

Michel Rouly

05/11/2023, 12:12 AM

Yep. Turns out this works perfectly as a drop in replacement.

Copy code

@configured(dagster_k8s_job_executor, config_schema=dagster_k8s_job_executor.config_schema)
def k8s_job_executor(config: dict) -> dict:
    return {
        **config,
        "labels": {
            # Disable Fargate for step pods unless requested. Otherwise, the K8s executor will cheerfully inherit
            # Fargate settings from the user code deployment and/or the run launcher.
            "fargate": "false",
            **config.get("labels", {}),
        },
    }

🎉 1

Open in Slack

Previous Next