https://dagster.io/ logo
#deployment-kubernetes
Title
# deployment-kubernetes
m

Michel Rouly

05/10/2023, 4:18 AM
Hey D people! I'm currently running Dagster on EKS w/ the
K8sRunLauncher
. Is it possible to configure the
K8sRunLauncher
to set certain specific metadata on run pods only, but not on step pods?
Some context. I'd like to schedule run pods on EKS Fargate to take advantage of isolation and stability for these long-lived pods.
If I set
fargate=true
on the Dagster user code location server, every pod it schedules (run and step) will get
fargate=true
because of
includeConfigInLaunchedRuns
if I set the label in the Dagster Helm chart's k8sRunLauncher config, it also seems to set the label on both runs and steps. I've tried both of these options below.
Copy code
k8sRunLauncher: {
// labels: {
//   fargate: 'true',
// },
// runK8sConfig: {
//   podTemplateSpecMetadata: {
//     labels: {
//       fargate: 'true',
//     }
//   }
// },
I'm thinking this might be something to do with the
k8s_job_executor
inheriting some configurations from the
K8sRunLauncher
possibly? since it's technically the executor that's responsible for scheduling step pods
there's this kind of poorly documented
pipelineRun
value in the Helm chart that might be relevant but I can't quite figure it out yet
Alright. I think I'm answering my own question here. I'll probably need to override the label specifically in executor configs, the defaulting behavior is pretty hardcoded
d

daniel

05/10/2023, 2:44 PM
This should probably be easier, in general configuration is passed through. But one way to set config on the run pod that won't apply to the step pod is to put in a tag on the job: https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment#per-job-or-per-op-kubernetes-configuration . But that's kind of annoying because you have to tag every job individually. Another way is to override the labels on the op (kind of like your conclusion, but you'd have to put it in a tag on the op - we don't yet support raw k8s configuration on the executor that applies to every step pod) - that's annoying for similar reasons since you need to tag every op. You can use a factory method that applies the tags on every job / op though, which can help a bit
m

Michel Rouly

05/10/2023, 3:09 PM
Awesome, thanks for confirming @daniel. I was thinking along those lines as well. Sounds like I'm not missing anything obvious here, so I'll keep driving along in this direction.
So I thought I was being clever, and I had subclassed
K8sRunLauncher
with this:
Copy code
class EksFargateRunLauncher(K8sRunLauncher):
    def get_container_context_for_run(self, dagster_run: DagsterRun) -> K8sContainerContext:
        container_context = super().get_container_context_for_run(dagster_run)
        container_context.labels["fargate"] = "true"
        return container_context
and configured it appropriately as the custom run launcher for Dagit/Daemon via the Dagster Helm chart. Packaged it up in a custom Docker image and everything. It even appropriately schedules runs on Fargate (yay!!).
....however, turns out that run pods, which inherit their image from user code deployments, still need to be able to resolve the run launcher (???)
Copy code
dagster._check.CheckError: Failure condition: Couldn't import module lib_dagster.aws.eks when attempting to load the configurable class lib_dagster.aws.eks.EksFargateRunLauncher
this gets thrown after the run pod is scheduled and started up. I'm fairly certain this means Dagit/Daemon are working OK since the run pod even got scheduled. And that it's the run pod which, for whatever reason, needs to have a copy of the run launcher?
I can just upgrade the user code deployments to include my new class. Not an issue....
but that does make me realize the run launcher is only one half of the problem. k8s_job_executor also inherits k8s configs for steps from the user code location server. meaning, if I want to run my code location servers on Fargate (I do) I will also need to exclude the
fargate
label from them
so I think I may be better served • using the default
K8sRunLauncher
configured to pass
fargate=true
for run pods • running user code locations on Fargate • subclassing the
k8s_job_executor
instead to ignore
fargate=true
unfortunately this means I'll need to use a custom executor in my downstream Dagster code locations which I think is probably lower impact to users than mucking with job tags. since there's only one Definitions per location, but there are potentially many jobs per location.
Yep. Turns out this works perfectly as a drop in replacement.
Copy code
@configured(dagster_k8s_job_executor, config_schema=dagster_k8s_job_executor.config_schema)
def k8s_job_executor(config: dict) -> dict:
    return {
        **config,
        "labels": {
            # Disable Fargate for step pods unless requested. Otherwise, the K8s executor will cheerfully inherit
            # Fargate settings from the user code deployment and/or the run launcher.
            "fargate": "false",
            **config.get("labels", {}),
        },
    }
🎉 1