https://dagster.io/ logo
Title
g

geoHeil

06/03/2022, 6:05 PM
I am in the process of evaluating dagster cloud. Having an existing (docker-runner via docker-compose-based) setup I am trying to migrate to the managed edition of dagster cloud. However, I do have some questions regarding https://docs.dagster.cloud/guides/continuous-integration My setup is similar to: https://github.com/geoHeil/dagster-ssh-demo/blob/master/docker-compose.yml i.e. a docker-compose-based one with the Queued Run launcher. The desired final result is something similar to: 1. Testing pipeline for each commit: - checkout code - check compliance - if non compliant auto-format and push formatted edition back to github - reformatting, linting, type checking - unit tests - pushing a preview to dagit via dagster cloud 2. Manual review of the deploy preview in dagster cloud. - manual review - when clicking on merge/merging the MR to master continue with (3) 3. Main/Master pipeline - only run on the main branch - all the tests are run again - the version number is incremented - at least via semver - ideally via something like zest release where it is derived from a semantic changelog (https://github.com/zestsoftware/zest.releaser) - the docker image is deployed to the registry - dagster is updating the images and deploying the new version of the code Ideally, I do not need to build the container twice - rather can build it once (i.e. create my conda environment with all the dependencies also only once) and forward / increment the version number and tag it when all the tests have passed successfully.
I have already something which is performing the testing, formatting and linting in conda.
When trying to push to github docker registry I get a permission denied error error: denied: requested access to the resource is denied
By now I can get a preview of the image to push to the ghcr registry tagged with a n sha commit hash
d

daniel

06/03/2022, 6:48 PM
Hi Georg - just checking something first, do you have a Dagster Cloud account set up? Right now we're in early access so we'd need to work with you to set up a trial if you want to actually see your code in Dagster Cloud once you have the CI/CD set up. For the manual review step you mentioned, you might find this feature helpful: https://docs.dagster.cloud/guides/code-previews - that lets you view a read-only preview of your jobs in Dagster Cloud. The docker registry issue usually points to some kind of missing secret in the Github action so that it can log in to the Docker registry. Would probably need to see the exact details of your github action yaml in order to give more specific guidance on how to proceed
g

geoHeil

06/03/2022, 6:48 PM
yes that one is set up and the docker-based agent is provisioned
Exactly - but I struggle getting this feature to work just yet
Besides the CI pipeline I face two additional problems 1) Various variables and volumes were previously mapped in the docker OSS runner https://github.com/geoHeil/dagster-ssh-demo/blob/master/dagster.yaml#L13 how can I get them back in Cloud? 2) How can I select the desired environment? https://github.com/geoHeil/dagster-ssh-demo/blob/master/Dockerfile#L47 previously would:
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-f", "MWF/repository.py", "--attribute", "prod"]
to select the desired (dev/staging/prod) environment) 3) I am still very unfamiliar with the locations.yaml configuration file and it is unclear to me if (1,2) perhaps fit into it.
d

daniel

06/03/2022, 6:56 PM
Yeah, the locations.yaml is where you can configure python_file and attribute. There's also a container_context field in locations.yaml which is where things like volumes will go.
Here is an example locations.yaml file that uses container_context with docker: https://github.com/dagster-io/dagster-cloud-cicd-action/blob/main/example/container-context/locations.yaml
but I think it's going to be difficult to verify that it's working the way that you expect until you have a Dagster Cloud account set up?
g

geoHeil

06/03/2022, 6:59 PM
I do have one set up alredy - and the agent is running already
d

daniel

06/03/2022, 6:59 PM
ah sorry, I missed that
So you can put that same config in your dagster.yaml for cloud as well (and you don't need the postgres env vars anymore) - so it could look like:
user_code_launcher:
  module: dagster_cloud.workspace.docker
  class: DockerUserCodeLauncher
  config:
    network: dagster_network
    container_kwargs:
      auto_remove: true
      volumes:
        - /Users/geoheil/Downloads/fooo/dagster-ssh-demo/warehouse_location_dagster:/opt/dagster/dagster_home/src/warehouse_location
and that that will now apply to both your gRPC servers, and also any launched runs
And your locations.yaml could now look like:
locations:
  your_location_name_here:
    build: .
    registry: your_docker_registry_here
    python_file: MWF/repository.py
    attribute: prod
and unlike in OSS, you don't need to worry about spinning up gRPC servers anymore, the agent handles that for you now
g

geoHeil

06/03/2022, 7:12 PM
The additional environment variables would also go there like in dagster.yaml right?
d

daniel

06/03/2022, 7:12 PM
that's right - i just omitted them since in your OSS one they are all postgres related
And with cloud the days when you need to care about postgres are behind you
g

geoHeil

06/03/2022, 7:13 PM
great.
Am I reading the other examples correctly that you assume a setup.py -based package management for the automatic deployment to work? As I am using conda - I guess I will need to use https://dagster.slack.com/archives/D03J8B4D9LJ/p1654283686964189 then for now.
d

daniel

06/03/2022, 7:22 PM
You can also provide your own Dockerfile and build it however you like
as long as it has a 'dagster' entrypoint and has dagster and dagster-cloud installed, everything should still work
the build directory has to have either a Dockerfile or a requirements.txt
g

geoHeil

06/03/2022, 7:24 PM
ok
if the image contains several layers like https://github.com/geoHeil/dagster-ssh-demo/blob/master/Dockerfile#L43 how can I specify the desired layer?
'target: target-stage'
g

geoHeil

06/03/2022, 7:50 PM
One question whilst the image is building: is it possible to also push the image to ghcr? I see in the logs that the image is created but not pushed
d

daniel

06/03/2022, 7:54 PM
this is the deploy action? That is supposed to run docker build, then docker push
so if you have it configured to login to ghcr my expectation is that it would push it there too
g

geoHeil

06/03/2022, 7:54 PM
no was preview
understood this sound good
i.e. if I can get the preview to work without polluting the ghcr 😉
d

daniel

06/03/2022, 7:54 PM
oh, preview doesn't currently pus
g

geoHeil

06/03/2022, 7:55 PM
This means however, that I need to build the conda environment 2x once for the unit tests and then a 2nd time then the dagster-cloud preview task takes over (after the tests were all green)
Is there a way to only have to resolve these once?
d

daniel

06/03/2022, 7:56 PM
having a version of the preview action that doesn't build is a reasonable request, I can file an issue for that
similar to the update-only action, but for previews
g

geoHeil

06/03/2022, 7:57 PM
you mean straight from the conda env?
that would sound really convenient
d

daniel

06/03/2022, 7:57 PM
i was thinking you would build it, then use it for the preview, then keep that cached build for the deploy
g

geoHeil

06/03/2022, 7:58 PM
hm as written above ideally yes. But so far I have not really figured out how to do this. Keeping would mean to push it out to the registry though I guess?
d

daniel

06/03/2022, 7:58 PM
Probably, yeah
g

geoHeil

06/03/2022, 8:01 PM
do you have an example how to enable caching for ghcr?
When I tried:
docker/build-push-action@v3.0.0
with cache-from: type=gha; cache-to: type=gha,mode=max it failed for me with: buildx failed with: error: cache export feature is currently not supported for docker driver. For ECR you are using https://github.com/dagster-io/dagster-cloud-cicd-action/blob/main/.github/workflows/example-minimal.yml#L25 how could this be adapted for ghcr?
d

daniel

06/03/2022, 8:04 PM
I don't have a caching example at my fingertips unfortunately
g

geoHeil

06/03/2022, 8:07 PM
no problem perhaps you could add one in the examples section in the coming weeks?
I am making one strange observation: After getting an initial (solo) preview deploy working and now integrating it into the bigger CI pipeline i.e. tests first and then preview the preview step starts to fail with: Cannot read property 'head' of undefined
d

daniel

06/03/2022, 8:26 PM
do you have a full stack trace?
g

geoHeil

06/03/2022, 10:09 PM
preview failed 4 minutes ago in 4s 1s 1s 0s Run dagster-io/dagster-cloud-cicd-action/preview@v0.2.4 with: github-token: * location-file: ./locations.yaml dagit-url: https://***.dagster.cloud/*** api-token: * parallel: true Error: Cannot read property 'head' of undefined
this is all what I can see in the UI of github actions.
Furthermore @daniel I get an unauthorized error form dagster agent when trying to pull from GHCR. Do you know what I must change there?
d

daniel

06/03/2022, 10:29 PM
you'll need to be logged in using the right docker credentials in the docker context that your agent is using
via docker login
g

geoHeil

06/03/2022, 10:30 PM
Also the deploy action of dagster cloud indeed is pushing the image to GHCR - but it is not tagging it with the current tag of the git repository / version from setup.py.
ok. - let me try that
d

daniel

06/03/2022, 10:31 PM
Re the "head" issue - I think the preview action may currently assume that it's being run from a pull request
Are you running it from a different context? Like a push?
g

geoHeil

06/03/2022, 10:34 PM
I was running preview from a PR
i.e. a push to an existing PR
d

daniel

06/03/2022, 10:35 PM
that's strange - I'm pretty sure the line that's failing is this:
const commitSha = github.context.payload.pull_request.head.sha;
is there any reason you can think of that github wouldn't think you have a current pull request?
g

geoHeil

06/03/2022, 10:35 PM
Also regarding the login:
docker pull <http://ghcr.io/complexity-science-hub/migration-world-formula:337bd4|ghcr.io/complexity-science-hub/migration-world-formula:337bd4>
works fine (locally) but not from the agent.
Let me try to restart the agent after enabling the login - perhaps this helps
d

daniel

06/03/2022, 10:37 PM
You may need to add one more line when restarting the agent depending on your environment:
--volume ~/.docker:/root/.docker \
to make sure that it can access your stored docker credentials when pulling the image
(assuming ~/.docker is where your local docker config/creds are)
g

geoHeil

06/03/2022, 10:40 PM
I think this was the issue great one thing is fixed
This means two items are open 1) the preview failing with missing head and 2) the deploy not using the version of the git tag/setup.py to tag the image
d

daniel

06/03/2022, 10:43 PM
Can you elaborate on 2)? It usually uses the git hash as the image hash I believe
For 1) I might need to see your GitHub action yaml
g

geoHeil

06/03/2022, 10:44 PM
indeed it does use the hash. But I would prefer if it would use the tag
d

daniel

06/03/2022, 10:44 PM
Which tag exactly?
g

geoHeil

06/03/2022, 10:45 PM
sure here you go:
name: 2Previews
on: [push]
jobs:
  called_testing: 
    # performs linting, type checking, formatting, ...
    uses: ./.github/workflows/1testing.yml
  preview:
    runs-on: ubuntu-latest
    needs:
      - called_testing
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Create Dagster Cloud Code Preview
        uses: dagster-io/dagster-cloud-cicd-action/preview@v0.2.4
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          location-file: ./locations.yaml
          dagit-url: https://${{ secrets.DAGSTER_CLOUD_ORGANIZATION }}.dagster.cloud/${{ secrets.DAGSTER_CLOUD_DEPLOYMENT }}
          api-token: ${{ secrets.DAGSTER_CLOUD_AGENT_TOKEN }}
the one of the current ci pipeline run / latest /largest (most of the time) or perhaps better -> the value of version from the python package of the dagster pipeline package which is being deployed
d

daniel

06/03/2022, 10:47 PM
If you change it to on [pull_request] I think it may still trigger when you push to a branch that has a pull request active?
g

geoHeil

06/03/2022, 11:03 PM
indeed - this seems to work
This means the tagging topic is open. However, I observed that somehow the environment variables are not yet read correctly and any resources such as an minio-S3 fail to instanciate.
d

daniel

06/04/2022, 6:23 PM
On the tagging topic: The update-only part of the github action is a pretty thin layer on top of a 'dagster-cloud add-location' CLI call described here: https://docs.dagster.cloud/guides/adding-code#adding-a-location I think if you want custom tagging behavior it could make sense for you to make that CLI call directly within your CI/CD - since what you're describing here seems 100% reasonable, but not something that I would necessarily expect users to want in general