Hello - I have dagster deployed in k8s with helm u...
# deployment-kubernetes
n
Hello - I have dagster deployed in k8s with helm using separate
dagster
and
user-code
deployments. When I launch a job from dagit using the launchpad, the job fails when trying to call an executable (e.g the aws cli) using subprocess. As I was debugging this, I noticed that the
Args
listed in the
dagster-run
description seem to be generated dynamically. Can you help me understand how the
Args
list is constructed?
For example, I have an entrypoint on my
Dockerfile
that sources a virtualenv and then runs the dagster api, but what I see in the pod description is
Copy code
Args:
      /opt/maze-env/bin/python
      -m
      dagster
      api
      execute_run
      {"__class__": "..."}
It seems that the first argument is being constructed using
sys.executable
n
Thanks! I think our issue is that we are relying on a conda environment to manage executables that are called by an
op
using
subprocess
, so the environment is not being activated when the job pod is run. We hacked around this issue in the
user-code
deployment by wrapping the dagster command with the following
Copy code
#!/bin/bash
# need to activate environment and then use dagster, see Dockerfile CMD
source /opt/maze-env/bin/activate
exec /opt/maze-env/bin/dagster "$@"
Are there any approaches that would allow us to activate a virtual environment in the job pod?
r
I believe this should have been resolved with https://github.com/dagster-io/dagster/pull/5415/files
cc @daniel can you specify the executable path in the helm chart?
j
Actually, I think that is precisely what is undermining our hack
The issue isn't the executable path, that is being picked up fine.
what you seemed to have before was just:
Copy code
command = ["dagster", "api", "execute_run", input_json]
And that would have actually worked for us. The replacement of:
Copy code
command = args.get_command_args()
is what broke things for us. Admittedly it was an ugly solution on our end. Using
sys.executable
with
-m
and
dagster
is probably the correct way to go about this for the launcher
d
What version of dagster are you on currently Juan?
j
0.14.13 I believe
And sorry, I believe I'm not following what was going on in that second link, where docker-compose is being referenced. If we could override that somehow we could solve this
d
I think you can override the ENTRYPOINT in your container and it will still be applied - it's the CMD that we replace
j
I'm not sure I follow. As I understand it, those args will still be passed to an ENTRYPOINT, no?
d
Ah yes I came here to post that link 🙂
j
ideally we would want the opposite of the behavior that the
--use-python-environment-entry-point
induces.
d
I'm not totally sure what you mean by the opposite
j
but this job is being run by the k8s launcher
so, "If this flag is set, the server will signal to clients that they should launch dagster commands using <this server’s python executable> -m dagster, instead of the default dagster entry point. "
d
The default behavior is for it to just run 'dagster api execute_run' (no python)
So I'm surprised to see that your arguments are
Copy code
/opt/maze-env/bin/python
      -m
      dagster
d
that injects an entry point, but the default entry point is still just 'dagster'
j
d
"When running your own gRPC server to serve Dagster code, jobs that launch in a container using code from that server will now default to using dagster as the entry point. Previously, the jobs would run using PYTHON_EXECUTABLE -m dagster, where PYTHON_EXECUTABLE was the value of sys.executable on the gRPC server. For the vast majority of Dagster jobs, these entry points will be equivalent. To keep the old behavior (for example, if you have multiple Python virtualenvs in your image and want to ensure that runs also launch in a certain virtualenv), you can launch the gRPC server using the new ----use-python-environment-entry-point command-line arg."
that's why I asked what version you're on earlier - are you sure your code is past that version?
n
from the user-code deployment
Copy code
(maze-env) root@user-code-dagster-user-deployments-maze-etl-747c8fb8fd-mnldq:/app# dagster --version
dagster, version 0.14.16
d
And the helm chart / dagster system components are also on 0.14?
j
Sorry, I'm probably just confused, but that PR that I linked to, https://github.com/dagster-io/dagster/pull/5415 seems to do the opposite of what that change log is saying
d
Yeah, that PR predates that release
I'll double-check that it still works this way though
j
ah I see, sorry!
Oh jeez! It seems the dagster daemon is on 13.9!
d
Aha!
j
oof well that would explain it...
I really appreciate y'all taking the time to respond to this, btw.
d
No problem - yeah this was a bit of a journey. First people wanted multiple python environments in one image, but then our solution for that affected setups like yours
j
Yeah. My experience has been that conda environments at least don't play super well with containers
and our approach has been to wrap entrypoints with a script that first activates the environment. Hacky for sure.
n
OK, we updated the daemon and now seeing an error about our workspace not being found. Looking into it, but any chance you know what changed that would cause this, @daniel?
Copy code
Error: No arguments given and workspace.yaml not found.
r
daniel is in CST so he’s off right now
how did you upgrade your daemon?
if it was through the helm chart, the workspace.yaml should be mounted as a volume and the dagster_home would have already been set to find the workspace.
n
Ya I just switched the version from
~
to
0.14.6
in the helm chart and did an upgrade
r
could you do a kubectl describe on the daemon pod?
what helm chart version are you running as well?
n
Will do and pick it up tomorrow, thanks for the help!
d
You'll want to upgrade the helm chart itself to the same version too - the helm chart version should stay the same as the version being used by dagit and the daemon
👍 1
n
I'm probably doing this wrong but when I try to upgrade the helm chart to a new version it fails to find it when I pass
--version
as described in the docs
Copy code
helm upgrade --install dagster -n dagster dagster/dagster -f dagster-values.yaml --version 0.14.17
Error: failed to download "dagster/dagster" at version "0.14.17"
d
hm you might need to run
helm repo update
first?
n
ok, got it - ya just needed to run
helm repo update
and now everything is working as expected 🎉
🎉 1
Thanks again for the help!
j
Just migrated to a new k8s cluster and ran into this issue.
helm repo update
fixed it 🥳