I have a question around github actions interacting with dag dagster #dagster-plus

I have a question around (github actions) interact...

geoHeil

08/23/2022, 3:56 AM

I have a question around (github actions) interacting with dagster cloud. Currently they take way too long for me as the base image is not cached. How can I enable caching? The pipeline looks like this: - on a PR - install all python dependencies (first conda then pip via conda) - the whole conda part is cached! - testing: - validation of code formatting - flake8 linting - mypy type checks - unit tests - dagster cloud deploy preview (dagster-io/dagster-cloud-cicd-action/preview@v0.2.5) - merge/commit to master - auto increment the version number & update release to changelog & push version tag - deploy to dagster cloud using (dagster-io/dagster-cloud-cicd-action/deploy@v0.2.5) The dagster code builds a docker image (from scratch) and therefore it is not cached. - How can enable cachiing (just like it also nicely works with mamba/conda) so the build pipeline gets faster? - IT seems strange that I first install the dependencies and then the dagster util builds another container. How can I somehow work straight inside the container and: - if every test passes publish/forward that same image (already faster than re-building) - enable caching to speed up the builds

Stephen Bailey

08/23/2022, 9:39 AM

Not sure how you currently have it set up, but you can customize your build/push step then run an update-only apply with branch deployments and regular updates. We do this so that we can install some private repositories using

ssh

flag and docker buildx:

Copy code

steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

      - name: Build and push Docker image to ECR
        uses: docker/build-push-action@v3
        with:
          push: true
          tags: "${{ my_tag }}"
          cache-from: type=gha
          cache-to: type=gha,mode=max
          ssh: default

      - name: Prod Deployment to Dagster Cloud
        uses: dagster-io/dagster-cloud-cicd-action/update-only@v0.2.6
        with:
          location-file: "locations.yaml"
          dagit-url: <https://whatnot.dagster.cloud/prod>
          api-token: ${{ secrets.DAGSTER_PROD_AGENT_TOKEN }}
          image-tag: ${{ my_tag }}

geoHeil

08/23/2022, 9:48 AM

firstly, I natively call the testing

Copy code

jobs:
  testing:
    defaults:
      run:
        shell: bash -l {0}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Conda environment from environment.yml
        uses: mamba-org/provision-with-micromamba@v12
        with:
          environment-file: environment.yml
          environment-name: base
          cache-downloads: true
          cache-env: true

      - name: Install non-conda dependencies
        run: |
          pip install -e .

      - name: 'Yamllint'
        uses: karancode/yamllint-github-action@v2.0.0
        with:
          yamllint_file_or_dir: 'yamllint_config.yaml'
          yamllint_strict: true
          yamllint_comment: true
        env:
          GITHUB_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Checking formatting
        run: black --check MY_PROJECT MY_PROJECT_tests && isort --check-only MY_PROJECT MY_PROJECT_tests
      - name: If needed, commit black & isort changes to the pull request
        if: failure()
        run: |
          black MY_PROJECT MY_PROJECT_tests && isort MY_PROJECT MY_PROJECT_tests
          git config --global user.name 'autoblack'
          git config --global user.email '<mailto:autoblack_bot@corp.com|autoblack_bot@corp.com>'
          git remote set-url origin <https://x-access-token>:${{ secrets.GITHUB_TOKEN }}@github.com/$GITHUB_REPOSITORY
          git checkout $GITHUB_HEAD_REF
          git commit -am "fixup: Format Python code with Black"
          git push

      - name: Linting flake8
        run: flake8 MY_PROJECT MY_PROJECT_tests
      - name: Checking types
        run: mypy MY_PROJECT MY_PROJECT_tests
      - name: setup of DBT dependencies
        run: cd MY_PROJECT_dbt && dbt deps
      - name: Unit-tests
        run: pytest --ignore=MY_PROJECT_dbt/dbt_packages .

and then secondly reach out to the deployment (or preview) (and create a tag on master)

Copy code

branches:
      - main

jobs:
  called_testing:
    uses: ./.github/workflows/1testing.yml
  release:
    needs: called_testing
    runs-on: ubuntu-latest
    steps:
      - name: install zest
        run: pip install zest.releaser==6.22.2
      - name: checkout
        uses: actions/checkout@v3
        with:
          token: ${{ secrets.BUILD_SVC_PAT }}
      - name: make release
        run: |
          git config --global user.name 'autorelease'
          git config --global user.email '<mailto:autorelease_bot@corp.com|autorelease_bot@corp.com>'
          git remote set-url origin <https://x-access-token>:${{ secrets.BUILD_SVC_PAT }}@github.com/$GITHUB_REPOSITORY
          fullrelease --no-input

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v1
        with:
          registry: <http://ghcr.io|ghcr.io>
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Run Dagster Cloud CI/CD action
        uses: dagster-io/dagster-cloud-cicd-action/deploy@v0.2.5
        with:
          location-file: locations.yaml
          dagit-url: https://${{ secrets.DAGSTER_CLOUD_ORGANIZATION }}.dagster.cloud/${{ secrets.DAGSTER_CLOUD_DEPLOYMENT }}
          api-token: ${{ secrets.DAGSTER_CLOUD_AGENT_TOKEN }}

where:

dagster-io/dagster-cloud-cicd-action/deploy

is building the whole dockerfile (from scratch without caching) (not re-using i.e. the already installed dependencies from before).

Stephen Bailey

08/23/2022, 10:03 AM

Yeah, what i'm suggesting is that you use the

update-only

version of the cloud ci/cd action, so that you can customize the caching with the buildx action to something you prefer. If you want to just do one install of the dependencies, you'd probably need to build your updated docker image first, then run your tests in that container, then push that image if it works

geoHeil

08/23/2022, 10:07 AM

This sounds very close to what I would want. However, so far I have not yet worked with buildkit (only basic docker). Is there anywhere a more complete example (perhaps including caching) available?

Stephen Bailey

08/23/2022, 10:15 AM

yeah, the GH action page has a lot of examples on it. I think the other change is that you could also consider putting your format/lint/test commands into a Makefile, so that you can run

docker run my_image format

docker run my_image lint

, etc.

geoHeil

08/23/2022, 11:19 AM

I have created these makefile targets:

Copy code

make fmt-docker
make lint-docker
make test-myrepository

to translate the steps directly into the container. - The things around caching are still unclear to me. - How to handle side effects (inside/outside the container)?: - The autoformatter was actually trying to auto-format (and then commit the changes) - the release incrementer was changing the tag and pushing

geoHeil

08/24/2022, 7:17 AM

@Stephen Bailey how can I enable the cachefrom/cacheto when using the make commands? These internal target back to docker-compose like https://github.com/dehume/big-data-madison-dagster/blob/main/docker-compose.yml#L93 in a multi-stage build.

geoHeil

08/24/2022, 7:18 AM

What I mean is: Before calling out to

docker/build-push-action@v3

I would think it makes sense to do the linting/testing (to only publish the image in case the tests succeed.

geoHeil

08/24/2022, 8:07 AM

In the docker-compose file I have added the

cache_from

/`cache_to` references. However, in the logs I can read that:

importing cache manifest from type=gha #10 ERROR: invalid reference format

it is somehow not happy.

geoHeil

08/24/2022, 8:13 AM

I also face a 2nd problem: dbt dependencies need to be installed outside the image in CI first. This feels strange

geoHeil

08/25/2022, 8:29 AM

@Stephen Bailey I think I almost got it. But I do not want to push the cached only image (used for linting and testing) over to the registry https://stackoverflow.com/questions/73484162/docker-run-image-from-gha-or-local-cache do you know how I could get it to execute (the linting & testing) and only push the final app /dagster workspace image over?

Stephen Bailey

08/25/2022, 3:45 PM

Can you just set

with: {push: False}

? in the

cache_base_builder

step?

👍 1

geoHeil

08/25/2022, 8:15 PM

I find it strange that the --load (output docker) still takes almost 3 minutes) but it is much quicker/better cached now.

3 Views

Open in Slack

Previous Next