OK this is doing my head in now: I have a sensor t...
# ask-community
b
OK this is doing my head in now: I have a sensor that as part of its operation, reads some files in from GCS. It's been working fine for weeks now. A few days ago, I upgraded to 0.15.2 but then due to some (unrelated) issues I've rolled back to 0.14.15. From the time I upgraded onwards, this sensor has stopped working because of an error from GCS saying the service account doesn't have the right privileges. The job that the sensor invoked (as well as a stack of other jobs in the same repo) all use the same service account to access GCS resources and they all work fine still. It's literally just this sensor that's complaining. The sensor uses
build_resources()
to pass in the GCS resource but I've not changed any of that config. Anyone got any ideas as to what could be occurring?
p
Hi Ben. Are you using the default run launcher or a custom run launcher? I’m wondering if the creds are not getting passed in the same way… if you are using a custom run launcher, your ops / resources might be running in a different execution environment than your sensors.
b
Thanks, Phil. I'm using the default
k8sRunLauncher
and yeah, it definitely seems like it's a cred/env problem. Like I said, the jobs in that repo all use the same GCS resource and seem to work fine - it's just the sensor that's struggling. I know there have been some changes in this area leading up to 0.15 but I don't understand how the rollback hasn't restored operation at this point.
I'm using the dagster helm chart that I've customised a bit and separate user code deployments. Where precisely does the daemon (which runs sensors right?) derive its env from in the charts?
p
The daemon at startup spins up a grpc server for the configured repository location. The sensors will then call out to the user code on this grpc server to perform the actual sensor evaluations. I believe the env / secrets are all within the
dagsterDaemon
section of the helm chart (cc @rex)
b
Yeah that's what I thought originally (when I first deployed this cluster) so I tried to add the GCS secret volume to the daemon config but you actually can't:
Copy code
Error: UPGRADE FAILED: values don't meet the specifications of the schema(s) in the following chart(s):
dagster:
- dagsterDaemon: Additional property volumes is not allowed
- dagsterDaemon: Additional property volumeMounts is not allowed
r
I want to clarify a few things: 1. As Phil said, the sensor daemon doesn’t actually evaluate your code. It calls out the code server, which loads your user code and processes the sensor evaluation request. 2. Because of (1), the sensor daemon does need to have the env/secrets set up to load the user code. The env and secrets should be set up in the code server with the sensor code. In Helm, you can configure this at
.Values.dagster-user-deployments.deployments
. Here, you can configure the volumes/volumeMounts you need for the code server.
b
Yeah so my (separate chart) user code deployment has all the secret volumes/env required by both the sensor and the jobs (they are the same - a GCP service account key file mounted via a secret volume)
I have
enableSubchart
sert to false in my main chart
Which I thought meant that all the stuff in
deployments
got ignored. Certainly I don't manage my user code in that section at all
r
yeah all the stuff in
deployments
will be ignored if you don’t manage your user code in that section
I was assuming that you were managing it all in one helm chart
b
Yeah all good - no I've got separate charts for each one of my ...erm "departmental" areas I guess you'd call them (actually I maintain one for each sport that we process)
r
when you rolled back to 0.14.15, did you roll back all your components? i.e. dagit/daemon, the helm chart, the user code?
b
Yes afaik (not ruling out I could have messed it up)
My user code deployments are all under CI/CD and the dagster version is just a variable that gets passed into that process
Is there a way to verify what release the user code deployments are built against?
r
could exec into the user code deployments and run
dagster --version
b
Ah yeah ok
Wait, the version is correct but I can't see my secret volume mounted in there
👀 1
r
ahh i hope that’s the issue
b
How is anything in that deployment working then? Almost all the jobs in there use a GCS resource for something and they'e all using the GCS IO manager 😕
r
well, Dagster jobs are started in their own Kubernetes jobs. If you’ve configured the launcher to include those volumes on the jobs, they are individually attached to each job that is created
b
Yeah ok but not for sensor evals I guess
r
right, that’s correct
b
ok so yeah that would explain it, maybe I messed up something in that chart on the rollback to stop it working. Alright - I've got some investigation to do. Thanks for your help! I'll let you know if I get stuck again
Yup ok something got messed up in that chart somehow. A manual deploy of it sorted it out though. Thanks for the pointer!