Hello I have dagster deployed in k8s with helm using separat dagster #deployment-kubernetes

Hello - I have dagster deployed in k8s with helm u...

Nolan Nichols

06/02/2022, 10:24 PM

Hello - I have dagster deployed in k8s with helm using separate

dagster

and

user-code

deployments. When I launch a job from dagit using the launchpad, the job fails when trying to call an executable (e.g the aws cli) using subprocess. As I was debugging this, I noticed that the

Args

listed in the

dagster-run

description seem to be generated dynamically. Can you help me understand how the

Args

list is constructed?

Nolan Nichols

06/02/2022, 10:26 PM

For example, I have an entrypoint on my

Dockerfile

that sources a virtualenv and then runs the dagster api, but what I see in the pod description is

Copy code

Args:
      /opt/maze-env/bin/python
      -m
      dagster
      api
      execute_run
      {"__class__": "..."}

Nolan Nichols

06/02/2022, 10:27 PM

It seems that the first argument is being constructed using

sys.executable

rex

06/02/2022, 10:27 PM

the implementation is here if you are curious: https://sourcegraph.com/-/editor?remote_url=https%3A%2F%2Fgithub.com%2Fdagster-io%2F[…]traffic&utm_source=vscode-extension&utm_content=vsce-commands

Nolan Nichols

06/02/2022, 10:34 PM

Thanks! I think our issue is that we are relying on a conda environment to manage executables that are called by an

op

using

subprocess

, so the environment is not being activated when the job pod is run. We hacked around this issue in the

user-code

deployment by wrapping the dagster command with the following

Copy code

#!/bin/bash
# need to activate environment and then use dagster, see Dockerfile CMD
source /opt/maze-env/bin/activate
exec /opt/maze-env/bin/dagster "$@"

Nolan Nichols

06/02/2022, 10:36 PM

Are there any approaches that would allow us to activate a virtual environment in the job pod?

rex

06/02/2022, 10:38 PM

I believe this should have been resolved with https://github.com/dagster-io/dagster/pull/5415/files

rex

06/02/2022, 10:38 PM

for context: https://dagster.slack.com/archives/C01U954MEER/p1635257521137700?thread_ts=1634638764.338800&cid=C01U954MEER

👀 1

rex

06/02/2022, 10:40 PM

cc @daniel can you specify the executable path in the helm chart?

Juan Arrivillaga

06/02/2022, 10:41 PM

Actually, I think that is precisely what is undermining our hack

Juan Arrivillaga

06/02/2022, 10:42 PM

The issue isn't the executable path, that is being picked up fine.

Juan Arrivillaga

06/02/2022, 10:43 PM

what you seemed to have before was just:

Copy code

command = ["dagster", "api", "execute_run", input_json]

And that would have actually worked for us. The replacement of:

Copy code

command = args.get_command_args()

Juan Arrivillaga

06/02/2022, 10:44 PM

is what broke things for us. Admittedly it was an ugly solution on our end. Using

sys.executable

with

-m

and

dagster

is probably the correct way to go about this for the launcher

daniel

06/02/2022, 10:47 PM

What version of dagster are you on currently Juan?

Juan Arrivillaga

06/02/2022, 10:47 PM

0.14.13 I believe

Juan Arrivillaga

06/02/2022, 10:48 PM

And sorry, I believe I'm not following what was going on in that second link, where docker-compose is being referenced. If we could override that somehow we could solve this

daniel

06/02/2022, 10:49 PM

I think you can override the ENTRYPOINT in your container and it will still be applied - it's the CMD that we replace

Juan Arrivillaga

06/02/2022, 10:50 PM

I'm not sure I follow. As I understand it, those args will still be passed to an ENTRYPOINT, no?

Juan Arrivillaga

06/02/2022, 10:51 PM

So, looking at: https://docs.dagster.io/_apidocs/cli#cmdoption-dagster-api-grpc-use-python-environment-entry-point

daniel

06/02/2022, 10:52 PM

Ah yes I came here to post that link 🙂

Juan Arrivillaga

06/02/2022, 10:52 PM

ideally we would want the opposite of the behavior that the

--use-python-environment-entry-point

induces.

daniel

06/02/2022, 10:52 PM

I'm not totally sure what you mean by the opposite

Juan Arrivillaga

06/02/2022, 10:52 PM

but this job is being run by the k8s launcher

Juan Arrivillaga

06/02/2022, 10:52 PM

so, "If this flag is set, the server will signal to clients that they should launch dagster commands using <this server’s python executable> -m dagster, instead of the default dagster entry point. "

daniel

06/02/2022, 10:53 PM

The default behavior is for it to just run 'dagster api execute_run' (no python)

daniel

06/02/2022, 10:53 PM

So I'm surprised to see that your arguments are

Copy code

/opt/maze-env/bin/python
      -m
      dagster

Juan Arrivillaga

06/02/2022, 10:54 PM

seems to be due to this, no? https://sourcegraph.com/-/editor?remote_url=https%3A%2F%2Fgithub.com%2Fdagster-io%2F[…]traffic&utm_source=vscode-extension&utm_content=vsce-commands

daniel

06/02/2022, 10:55 PM

that injects an entry point, but the default entry point is still just 'dagster'

Juan Arrivillaga

06/02/2022, 10:55 PM

sorry, this: https://github.com/dagster-io/dagster/pull/5415/files

daniel

06/02/2022, 10:55 PM

since... 0.13.14, looks like https://docs.dagster.io/changelog#breaking-changes-4

daniel

06/02/2022, 10:55 PM

"When running your own gRPC server to serve Dagster code, jobs that launch in a container using code from that server will now default to using dagster as the entry point. Previously, the jobs would run using PYTHON_EXECUTABLE -m dagster, where PYTHON_EXECUTABLE was the value of sys.executable on the gRPC server. For the vast majority of Dagster jobs, these entry points will be equivalent. To keep the old behavior (for example, if you have multiple Python virtualenvs in your image and want to ensure that runs also launch in a certain virtualenv), you can launch the gRPC server using the new ----use-python-environment-entry-point command-line arg."

daniel

06/02/2022, 10:56 PM

that's why I asked what version you're on earlier - are you sure your code is past that version?

Nolan Nichols

06/02/2022, 10:58 PM

from the user-code deployment

Copy code

(maze-env) root@user-code-dagster-user-deployments-maze-etl-747c8fb8fd-mnldq:/app# dagster --version
dagster, version 0.14.16

daniel

06/02/2022, 11:00 PM

And the helm chart / dagster system components are also on 0.14?

Juan Arrivillaga

06/02/2022, 11:00 PM

Sorry, I'm probably just confused, but that PR that I linked to, https://github.com/dagster-io/dagster/pull/5415 seems to do the opposite of what that change log is saying

daniel

06/02/2022, 11:01 PM

Yeah, that PR predates that release

daniel

06/02/2022, 11:01 PM

I'll double-check that it still works this way though

Juan Arrivillaga

06/02/2022, 11:01 PM

ah I see, sorry!

Juan Arrivillaga

06/02/2022, 11:04 PM

Oh jeez! It seems the dagster daemon is on 13.9!

daniel

06/02/2022, 11:04 PM

Aha!

Juan Arrivillaga

06/02/2022, 11:04 PM

oof well that would explain it...

Juan Arrivillaga

06/02/2022, 11:05 PM

heh, btw, this comment: https://github.com/dagster-io/dagster/pull/5415#pullrequestreview-791128540 is exactly our issue

Juan Arrivillaga

06/02/2022, 11:07 PM

I really appreciate y'all taking the time to respond to this, btw.

daniel

06/02/2022, 11:08 PM

No problem - yeah this was a bit of a journey. First people wanted multiple python environments in one image, but then our solution for that affected setups like yours

Juan Arrivillaga

06/02/2022, 11:09 PM

Yeah. My experience has been that conda environments at least don't play super well with containers

Juan Arrivillaga

06/02/2022, 11:10 PM

and our approach has been to wrap entrypoints with a script that first activates the environment. Hacky for sure.

Nolan Nichols

06/02/2022, 11:33 PM

OK, we updated the daemon and now seeing an error about our workspace not being found. Looking into it, but any chance you know what changed that would cause this, @daniel?

Copy code

Error: No arguments given and workspace.yaml not found.

rex

06/02/2022, 11:34 PM

daniel is in CST so he’s off right now

rex

06/02/2022, 11:35 PM

how did you upgrade your daemon?

rex

06/02/2022, 11:35 PM

if it was through the helm chart, the workspace.yaml should be mounted as a volume and the dagster_home would have already been set to find the workspace.

Nolan Nichols

06/02/2022, 11:38 PM

Ya I just switched the version from

0.14.6

in the helm chart and did an upgrade

rex

06/02/2022, 11:41 PM

could you do a kubectl describe on the daemon pod?

rex

06/02/2022, 11:42 PM

what helm chart version are you running as well?

Nolan Nichols

06/02/2022, 11:43 PM

Will do and pick it up tomorrow, thanks for the help!

daniel

06/02/2022, 11:49 PM

You'll want to upgrade the helm chart itself to the same version too - the helm chart version should stay the same as the version being used by dagit and the daemon

👍 1

Nolan Nichols

06/03/2022, 2:44 PM

I'm probably doing this wrong but when I try to upgrade the helm chart to a new version it fails to find it when I pass

--version

as described in the docs

Copy code

helm upgrade --install dagster -n dagster dagster/dagster -f dagster-values.yaml --version 0.14.17
Error: failed to download "dagster/dagster" at version "0.14.17"

daniel

06/03/2022, 3:50 PM

hm you might need to run

helm repo update

first?

Nolan Nichols

06/03/2022, 3:51 PM

ok, got it - ya just needed to run

helm repo update

and now everything is working as expected 🎉

🎉 1

Nolan Nichols

06/03/2022, 3:51 PM

Thanks again for the help!

Jeremy Fisher

07/01/2022, 12:11 AM

Just migrated to a new k8s cluster and ran into this issue.

helm repo update

fixed it 🥳

4 Views

Open in Slack

Previous Next