This message was deleted dagster #deployment-kubernetes

Join Slack

This message was deleted.

# deployment-kubernetes

Slackbot

04/05/2023, 12:11 PM

This message was deleted.

daniel

04/05/2023, 12:14 PM

Hi Sundara - can you post the output of 'kubectl describe <pod name>' for a dagster-step pod with this config for one of your steps? I'm not totally following your comment about the multiprocess executor since that wouldn't be putting each op in its own k8s pod. There would just be one pod for the whole run, and that would pull any config from the job, not from anything on your ops.

daniel

04/05/2023, 12:48 PM

the main thing i would want to check from kubectl describe is whether the tolerations that you configured are acutally being applied - are you seeing what you expect when you inspect the pod?

daniel

04/05/2023, 12:51 PM

So to confirm - the issue now is that setting tolerations via run_k8s_config is having no effect in the run pod? If you kubectl describe it, you don't see any tolerations being applied? Do other things that you set via run_k8s_config work?

daniel

04/05/2023, 12:51 PM

What version of Dagster are you using?

daniel

04/05/2023, 12:56 PM

Earlier you said "But if i change it to multiprocess executor, the pod's are getting deployed on the correct node" - I'm having trouble matching that with the most recent comment that the tolerations aren't being applied

daniel

04/05/2023, 1:10 PM

Is it all run_k8s_config fields that aren’t being applied, or just tolerations?

daniel

04/05/2023, 1:12 PM

Are you certain that the image that’s launching the run is also on dagster 1.1.14 - this is using the Dagster helm chart? Asking because run_k8s_config was added fairly recently and wouldn’t be applied on older versions

daniel

04/05/2023, 1:15 PM

Can you give a set of steps to reproduce the problem using your setup? I’m pretty confident that run_k8s_config works in general when using the helm chart, so there would have to be something unique about how you’re launching runs using it

daniel

04/05/2023, 1:21 PM

I see - my understanding is that execute_job doesn't go through the run launcher or create a new pod. It executes the job synchronously in the same process that calls the execute_job function.

daniel

04/05/2023, 1:21 PM

so i'm not totally clear which pod you would be checking for tolerations for the run

daniel

04/05/2023, 1:24 PM

the executor won't change whether or not execute_job uses the run launcher - it will change the behavior for how each op within the run is executed

daniel

04/05/2023, 1:25 PM

i.e. it would allow you to configure the step pods but not the run pod (since there will never be a run pod)

daniel

04/05/2023, 1:31 PM

I’m not totally sure what “a k8s-job.yaml” refers to exactly, can you elaborate?

daniel

04/05/2023, 1:36 PM

I see - so you’ve kind of implemented your own run launcher here it sounds like

daniel

04/05/2023, 1:37 PM

You can use the k8s job executor while running execute_job and it will spin up pods for each step, yes

daniel

04/05/2023, 1:40 PM

I would have expected the tags in your original post to work - can you post the full op decorator with the tags?

daniel

04/05/2023, 1:42 PM

similar question to before - does all pod_spec_config / dagster-k8s/config not work, or just the tolerations? Do you see that ttl_seconds_after_finished that you also set being applied to the k8s job that's created for each step?

daniel

04/05/2023, 7:40 PM

Can you pass along a full set of code that reproduces the problem?

3 Views

Open in Slack

Previous Next