Any plans for a GCP agent? <https://docs.dagster.i...
# ask-community
s
s
What GCP service are you imagining? You can run the Kubernetes agent on GKE (even GKE autopilot if you want something closer to serverless)
s
Just an agent that runs on a GCP VM, not using kubernetes
d
The Docker agent mentioned on that page would work well for this I think (it should probably say “on a single machine” rather than “on your computer”)
s
interesting, thanks
Started up a docker agent, but getting errors from the docker logs
Copy code
ERROR - Unable to update prod:example_location. Updating location with error data: docker.errors.APIError: 500 Server Error for <http+docker://localhost/v1.41/images/create?tag=1.1.5&fromImage=657821118200.dkr.ecr.us-west-2.amazonaws.com%2Fdagster-cloud-serverless-base-py3.8>: Internal Server Error ("Head <https://657821118200.dkr.ecr.us-west-2.amazonaws.com/v2/dagster-cloud-serverless-base-py3.8/manifests/1.1.5>: no basic auth credentials")
d
Did you possibly switch from a serverless deployment to a hybrid deployment?
s
it seems like it’s having trouble getting an image from ECR
Yes, trying out hybrid - originally was on serverless
d
What’s your organization name?
Did you tell it in the UI that you wanted to switch to hybrid?
s
I think the org name is
scott-testing
I don’t see an option to switch to hybrid
d
You should be able to switch it to hybrid from the Agents tab in dagster cloud - I’ll be able to post a screenshot in a couple of hours. Then you’ll want to delete your serverless code location from the code locations tab
s
Ah, there it is
d
Once your hybrid agent is running you can follow this guide to deploy code to a hybrid agent https://docs.dagster.io/dagster-cloud/getting-started/getting-started-with-hybrid-deployment#step-3-deploy-your-code
s
hmm, got the docker network running, but with an example project https://github.com/dagster-io/dagster-cloud-hybrid-quickstart I am running into an error I can’t figure out. I have gcr connected, and I am seeing images show up in gcr, but I’m still getting errors in the logs:
docker.errors.APIError: 500 Server Error for <http+docker://localhost/v1.41/images/create?tag=sometag&fromImage=gcr.io%2Fmy-project%2Ftestdaghybrid>: Internal Server Error ("unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: <https://cloud.google.com/container-registry/docs/advanced-authentication>")
- The instructions at that link leave a lot of possible places where I went wrong, but it’s hard to figure out which one. Any tips?
d
Typically that would involve commenting out this step in the GitHub action https://github.com/dagster-io/dagster-cloud-hybrid-quickstart/blob/main/.github/workflows/deploy.yml#L68 And adding the corresponding GCR_JSON_KEY secret to the GitHub repository
s
Thanks! Yeah, I had already done that. Any other tips?
d
You could verify that the key that you specified as a secret actually has the permissions that it needs to access GCR
Which step of the action is failing exactly?
s
It’s failing on the step Build and deploy to Dagster Cloud hybrid
yeah, that’d be good to verify the key has the needed permissions, it’s just weird that images are created on GCR - but then I still get this unauthorized error - so it seems like authorization worked, but still getting the error
d
Oh I see, this is the agent not having the credentials it needs to pull the image, not the GitHub action
s
Great! Glad that narrows it down to the agent. I am loading that json file with the docker command, but it must be missing something. Or, we do have to run docker on our machine with sudo, so perhaps something about sudo is involved here
d
What I'd suggest doing to debug is running a shell in the same container that's running the agent
docker exec -it <your agent container name here> /bin/bash
then inside the shell:
docker pull <your image here>
If your credentials are set up correctly, the docker pull command there should be able to pull the image in your GCR repo with your code in it
s
thanks! I’ll try that
Tried that. After
docker exec ...
, docker does not exist in the container. Looking for docs on the docker agent …
d
When you say "docker does not exist" - what error message are you seeing exactly?
your earlier error message showed an error from the docker python library, so it would be surprising if the docker library wasn't available
s
bash: dagster-docker: command not found
woops
bash: docker: command not found
d
hm, maybe the python docker library is available but the docker CLI is not installed.
s
in the container i see python 3.8.15, which I think was the same as i saw in the error logs
Within a python repl within the container, if I run the below I do get the same error I see in the agent logs
Copy code
import docker
client = docker.from_env()
client.images.pull('<http://gcr.io/foo/bar|gcr.io/foo/bar>')
d
Ah OK, great - bit more roundabout, but that's the permissions problem to solve
s
Right
s
@Scott Chamberlain I was able to get this to work in GCP by giving my default compute engine service account the Artifact Registry Administrator and Artifact Registry Repository Administrator (This was with the agent running in a GCP VM)
s
Okay, we’re getting somewhere. I had the host and container mixed up in the line in the docker run command where you specify the credentials file. That’s fixed. But now I’m getting an error:
docker.credentials.errors.InitializationError: docker-credential-gcloud not installed or not available in PATH
That’s great @Sean Lopp thanks for verifying that. - So this all should work if I can sort out my mistakes 😬
Since this
docker-credential-gcloud
error is in the container, i’m surprised because it just worked for Sean
s
Ah actually, I mis-remembered, I was using GKE not a VM. The default GKE cluster must already have a lot of container utilities installed It looks like you may need to extend the agent docker image to install the necessary gcloud components for authenticating against the registry
s
Ah, ok
d
In theory this credential helper is already installed in the image: https://github.com/GoogleCloudPlatform/docker-credential-gcr - but it's possible some more configuration is needed in order for it to work out of the box
s
It’s okay to extend the agent image and then host that on our own private GCR registry to use?
d
Absolutely - you can extend it or build your own, either is fine
👍 1
j
FWIW I don’t think
docker-credential-gcr
and
docker-credential-gcloud
are the same thing. We only install the former although we could probably install the latter too: https://cloud.google.com/container-registry/docs/advanced-authentication
s
Thanks @jordan - I’m not sure how one even installs
docker-credential-gcloud
I haven’t been able to find install instructions for it yet
s
~/.docker/config.json
reading some stackoverflow that recommends deleting that file or looking for a
credHelpers
entry and deleting it if it refers to gcloud instead of gcr which all sounds vaguely familiar to something I had to do when I first got my GCP stacking working a few months ago
s
Ha, I saw that too and disregarded it b/c it didn’t feel right, but worth a try …
j
Some folks on stackoverflow also seem to suggest that
-gcloud
might be deprecated and
-gcr
is the one to use?
s
I did try that, replaced all
gcloud
in the
credHelpers
block with
gcr
- then spun up agent again, and got Unauthorized errors that I got before.
I haven’t figured this out, but thanks for all your help on this!
s
I am going to try to replicate this evening. One last question - you are building and pushing the image from this same VM where the agent is running? Or is the push from GHA? I ask because there is sometimes an additional access control level on GCP VMs that is separate from the IAM stuff. If you look at the VM in gcp console you can see this under access scopes
s
Thanks Sean.
The problem is with the agent. We figured that out in the thread above. Images are being created in our GCR, so we’re pretty certain auth is working on Github Actions, but is not working with the agent. The failing step is pushing image from the VM where the agent is running
We do have full access to all Cloud APIs selected for the VM i’m running on
s
Right - I just want to be sure the VM can access the registry outside the agent, which I don’t think we have tested? (The image is built and pushed from the GitHub action not the VM IIUC)
s
Ah, interesting, so we shouldn’t have issues with the agent related to auth for the container registry
s
Well the agent does auth to the registry, this is what GCP refers to as client auth and the dagster agent code tries to do so using the approaches mentioned in the thread. But in GCP I’ve seen cases where the infrastructure access policies get in the the way of client auth even if the client auth is all correct so it’s worth ruling out
s
Okay, thanks for clarifying
@Sean Lopp did you happen to try to replicate last night? no worries if not
s
hey scott - I just spent some time on this and was able to overcome the error. I think the issue is related to how you are running docker as root So originally you were running this command:
Copy code
sudo docker run \
    --network=dagster_cloud_agent \
    --volume $PWD/dagster.yaml:/opt/dagster/app/dagster.yaml:ro \
    --volume /var/run/docker.sock:/var/run/docker.sock \
    --volume ~/.docker/config.json:/root/.docker/config.json:ro \
    --restart on-failure \
    -it <http://docker.io/dagster/dagster-cloud-agent:latest|docker.io/dagster/dagster-cloud-agent:latest> \
    dagster-cloud agent run /opt/dagster/app
That was trying to mount the docker config from your local user
~/.docker/config.json
to be used within the agent But since you are running docker with sudo the actual config you want is in
/root/.docker/config.json
So the following sequence of steps worked for me: 1. In the GCP VM, setup docker to authenticate and ensure the image can pulled correctly
Copy code
gcloud auth login
gcloud auth print-access-token | sudo docker login -u oauth2accesstoken --password-stdin us-central1-docker.pkg.dev

# verify pull 
sudo docker pull us-central1-docker.pkg.dev/myhybrid-200215/dagit/loppster:latest
2. Then when you launch the agent, mount the root docker config volume
Copy code
sudo docker run --network=dagster_cloud_agent --volume $PWD/dagster.yaml:/opt/dagster/app/dagster.yaml:ro --volume /var/run/docker.sock:/var/run/docker.sock --volume /root/docker/.config/:/root/docker/.config -it <http://docker.io/dagster/dagster-cloud-agent:latest|docker.io/dagster/dagster-cloud-agent:latest> dagster-cloud agent run /opt/dagster/app
That command leaves out the
--restart on-failure
bit that you might want to add back in
s
thanks @Sean Lopp! i did try a number of times with
/root/.docker/config.json
without success. I took a look at your Step 1 above with logging in and verifying the pull, but did you mean for me to replace the us-central1-docker.pkg.dev with a gcp equivalent - or to try the example as is?
s
yea you should replace that with your gcr registry, us-central1-docker.pkg.dev is a GCP artifact registry so it should be very similar
s
ok, will do. on the machine ive been testing dagster on, we are using a docker config that uses credHelpers only (which I don’t want to mess with as we have various infra pieces using it), and AFAIK auths is ignored if credHelpers is present. but i’ll test this on a separate VM where I can mess with the docker config