I have a question around (github actions) interact...
# dagster-plus
g
I have a question around (github actions) interacting with dagster cloud. Currently they take way too long for me as the base image is not cached. How can I enable caching? The pipeline looks like this: - on a PR - install all python dependencies (first conda then pip via conda) - the whole conda part is cached! - testing: - validation of code formatting - flake8 linting - mypy type checks - unit tests - dagster cloud deploy preview (dagster-io/dagster-cloud-cicd-action/preview@v0.2.5) - merge/commit to master - auto increment the version number & update release to changelog & push version tag - deploy to dagster cloud using (dagster-io/dagster-cloud-cicd-action/deploy@v0.2.5) The dagster code builds a docker image (from scratch) and therefore it is not cached. - How can enable cachiing (just like it also nicely works with mamba/conda) so the build pipeline gets faster? - IT seems strange that I first install the dependencies and then the dagster util builds another container. How can I somehow work straight inside the container and: - if every test passes publish/forward that same image (already faster than re-building) - enable caching to speed up the builds
s
Not sure how you currently have it set up, but you can customize your build/push step then run an update-only apply with branch deployments and regular updates. We do this so that we can install some private repositories using
ssh
flag and docker buildx:
Copy code
steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

      - name: Build and push Docker image to ECR
        uses: docker/build-push-action@v3
        with:
          push: true
          tags: "${{ my_tag }}"
          cache-from: type=gha
          cache-to: type=gha,mode=max
          ssh: default

      - name: Prod Deployment to Dagster Cloud
        uses: dagster-io/dagster-cloud-cicd-action/update-only@v0.2.6
        with:
          location-file: "locations.yaml"
          dagit-url: <https://whatnot.dagster.cloud/prod>
          api-token: ${{ secrets.DAGSTER_PROD_AGENT_TOKEN }}
          image-tag: ${{ my_tag }}
g
firstly, I natively call the testing
Copy code
jobs:
  testing:
    defaults:
      run:
        shell: bash -l {0}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Conda environment from environment.yml
        uses: mamba-org/provision-with-micromamba@v12
        with:
          environment-file: environment.yml
          environment-name: base
          cache-downloads: true
          cache-env: true

      - name: Install non-conda dependencies
        run: |
          pip install -e .

      - name: 'Yamllint'
        uses: karancode/yamllint-github-action@v2.0.0
        with:
          yamllint_file_or_dir: 'yamllint_config.yaml'
          yamllint_strict: true
          yamllint_comment: true
        env:
          GITHUB_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Checking formatting
        run: black --check MY_PROJECT MY_PROJECT_tests && isort --check-only MY_PROJECT MY_PROJECT_tests
      - name: If needed, commit black & isort changes to the pull request
        if: failure()
        run: |
          black MY_PROJECT MY_PROJECT_tests && isort MY_PROJECT MY_PROJECT_tests
          git config --global user.name 'autoblack'
          git config --global user.email '<mailto:autoblack_bot@corp.com|autoblack_bot@corp.com>'
          git remote set-url origin <https://x-access-token>:${{ secrets.GITHUB_TOKEN }}@github.com/$GITHUB_REPOSITORY
          git checkout $GITHUB_HEAD_REF
          git commit -am "fixup: Format Python code with Black"
          git push

      - name: Linting flake8
        run: flake8 MY_PROJECT MY_PROJECT_tests
      - name: Checking types
        run: mypy MY_PROJECT MY_PROJECT_tests
      - name: setup of DBT dependencies
        run: cd MY_PROJECT_dbt && dbt deps
      - name: Unit-tests
        run: pytest --ignore=MY_PROJECT_dbt/dbt_packages .
and then secondly reach out to the deployment (or preview) (and create a tag on master)
Copy code
branches:
      - main

jobs:
  called_testing:
    uses: ./.github/workflows/1testing.yml
  release:
    needs: called_testing
    runs-on: ubuntu-latest
    steps:
      - name: install zest
        run: pip install zest.releaser==6.22.2
      - name: checkout
        uses: actions/checkout@v3
        with:
          token: ${{ secrets.BUILD_SVC_PAT }}
      - name: make release
        run: |
          git config --global user.name 'autorelease'
          git config --global user.email '<mailto:autorelease_bot@corp.com|autorelease_bot@corp.com>'
          git remote set-url origin <https://x-access-token>:${{ secrets.BUILD_SVC_PAT }}@github.com/$GITHUB_REPOSITORY
          fullrelease --no-input

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v1
        with:
          registry: <http://ghcr.io|ghcr.io>
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Run Dagster Cloud CI/CD action
        uses: dagster-io/dagster-cloud-cicd-action/deploy@v0.2.5
        with:
          location-file: locations.yaml
          dagit-url: https://${{ secrets.DAGSTER_CLOUD_ORGANIZATION }}.dagster.cloud/${{ secrets.DAGSTER_CLOUD_DEPLOYMENT }}
          api-token: ${{ secrets.DAGSTER_CLOUD_AGENT_TOKEN }}
where:
dagster-io/dagster-cloud-cicd-action/deploy
is building the whole dockerfile (from scratch without caching) (not re-using i.e. the already installed dependencies from before).
s
Yeah, what i'm suggesting is that you use the
update-only
version of the cloud ci/cd action, so that you can customize the caching with the buildx action to something you prefer. If you want to just do one install of the dependencies, you'd probably need to build your updated docker image first, then run your tests in that container, then push that image if it works
g
This sounds very close to what I would want. However, so far I have not yet worked with buildkit (only basic docker). Is there anywhere a more complete example (perhaps including caching) available?
s
yeah, the GH action page has a lot of examples on it. I think the other change is that you could also consider putting your format/lint/test commands into a Makefile, so that you can run
docker run my_image format
,
docker run my_image lint
, etc.
g
I have created these makefile targets:
Copy code
make fmt-docker
make lint-docker
make test-myrepository
to translate the steps directly into the container. - The things around caching are still unclear to me. - How to handle side effects (inside/outside the container)?: - The autoformatter was actually trying to auto-format (and then commit the changes) - the release incrementer was changing the tag and pushing
@Stephen Bailey how can I enable the cachefrom/cacheto when using the make commands? These internal target back to docker-compose like https://github.com/dehume/big-data-madison-dagster/blob/main/docker-compose.yml#L93 in a multi-stage build.
What I mean is: Before calling out to
docker/build-push-action@v3
I would think it makes sense to do the linting/testing (to only publish the image in case the tests succeed.
In the docker-compose file I have added the
cache_from
/`cache_to` references. However, in the logs I can read that:
importing cache manifest from type=gha #10 ERROR: invalid reference format
it is somehow not happy.
I also face a 2nd problem: dbt dependencies need to be installed outside the image in CI first. This feels strange
@Stephen Bailey I think I almost got it. But I do not want to push the cached only image (used for linting and testing) over to the registry https://stackoverflow.com/questions/73484162/docker-run-image-from-gha-or-local-cache do you know how I could get it to execute (the linting & testing) and only push the final app /dagster workspace image over?
s
Can you just set
with: {push: False}
? in the
cache_base_builder
step?
👍 1
g
I find it strange that the --load (output docker) still takes almost 3 minutes) but it is much quicker/better cached now.