https://dagster.io/ logo
Title
g

geoHeil

07/27/2022, 7:08 PM
I have a DBT project which is sourcing its connection details from the environment:
user: "{{ env_var('WAREHOUSE_POSTGRES_USER') }}"
The
dagster.yaml
file of the dagster-cloud runner actually provides these env vars:
run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    env_vars:
      - WAREHOUSE_POSTGRES_DB=secret_value
      - WAREHOUSE_POSTGRES_USER=secret_value
      - WAREHOUSE_POSTGRES_PASSWORD=secret_value
However, the pipeline for:
name: myname
on: [pull_request]
jobs:
  preview:
    env:
      WAREHOUSE_POSTGRES_DB: secret_value
      WAREHOUSE_POSTGRES_PASSWORD: secret_value
      WAREHOUSE_POSTGRES_USER: secret_value
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Create Dagster Cloud Code Preview
        uses: dagster-io/dagster-cloud-cicd-action/preview@v0.2.5
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          location-file: ./locations.yaml
          dagit-url: https://${{ secrets.DAGSTER_CLOUD_ORGANIZATION }}.dagster.cloud/${{ secrets.DAGSTER_CLOUD_DEPLOYMENT }}
          api-token: ${{ secrets.DAGSTER_CLOUD_AGENT_TOKEN }}
fails with:
/usr/bin/docker run <http://ghcr.io/my_org/my_project:git_hash|ghcr.io/my_org/my_project:git_hash> dagster-cloud workspace snapshot ***_cloud hash_token --url https://***.dagster.cloud/*** --api-token *** --image <http://ghcr.io/my_org/my_project:git_hash|ghcr.io/my_org/my_project:git_hash> --python-file MWF/repository.py --attribute ***_cloud
  Encountered an error:
  Parsing Error
    Env var required but not provided: 'WAREHOUSE_POSTGRES_USER'
Where do I need to specify the env variables so they can be found when creating the preview? The reason is that dagster-dbt and the CLI there needs these to import the DBT-based assets. It otherwise fails with an errorcode.
by the way such an error should fail the test - but generates a valid output / and the build seems to be green.
Even though the generated preview is garbage (and only shows an error message)
I fear that the env variables are not fed inside into the container. How can this be changed?
d

Dagster Jarred

07/27/2022, 8:15 PM
hey @geoHeil - I’ve got a cool feature in a pre-release state that I think would be great for this use case. While Code Previews won’t run in the agent, our new feature, Branch Deployments, which creates temporary deployments for each new branch. This will also resolve your env vars issue. If you’ve got 30-40m for a call in the next couple of days, we can help you get onboard
here’s more info if you’re interested - https://docs.dagster.cloud/guides/branch-deployments
g

geoHeil

07/28/2022, 3:34 AM
How does branch deployment tie the branch down into the materializd resources i.e. in case of DBT perhaps a branch specific schema?
'For now, I want to get: `Fatal error in the dbt CLI (return code 2) `solved. By adding in /(hardcoded) the the env vars inside docker the previous error in the logs vanished - but the error in the preview does not go away.
Before looking into branch deployments - do you think you could help me to get the preview to work?
Otherwise - today afternoon or tomorrow mornning would be great for a call
Do I see correctly that the Preview (when DBT is used) can never work as it is not running on the agent and DBTs DB connectivity cannot be mocked away i.e. I would need to pint it at a real database the and only then it would allow to draft the schema (i.e. a DBT compile even though the DB is not needed usually only starts when it is available). Or would it be possible to parse the DAG somehow without requiring the full DB connectivity?
d

daniel

07/28/2022, 2:53 PM
Hi georg - that's right. This is one of the main benefits of branch deployments - they operate on the agent in a much for similar environment. You would need to include similar env vars in the github action for previews to work
g

geoHeil

07/28/2022, 2:58 PM
But this is what I have done - I included the env vars and it still fails 1) included in the action yaml specifying the job 2) also in the dockerfile (but with dummy values). Dbt should not need an active connection to compile the SQL - only to run it. Is it somewhere possible for dagster to parse the DBT stuff without an active DB?
Furthermore: Is it possible to keep the deploy previews readonly i.e. not to start sensors / materialize jobs to prevent accidental overrides?
d

daniel

07/28/2022, 2:58 PM
I think its not only possible but required - they have to be read only
I'm not sure about the dbt-specific parts of this question. @owen do you possibly know that or the right other person to ask? (if dagster-dbt can parse dbt without an active connection)
g

geoHeil

07/28/2022, 3:01 PM
This is still unclear for me from reading the docs that @Dagster Jarred shared. This readonly is not specific to DBT but how to handle the asset materializations to arbitrary systems / IO managers. I.e. in the case of DBT cloud they create a schema for a PR (fine) . This is easy but here, all the IO managers would need to be adapted to a) include the branch-deploy feature and b) have added a cleanup method to cleanup no longer needed branch deploys.
d

daniel

07/28/2022, 3:01 PM
With branch deployments, you'll be able to tell within your code whether you're in a branch deployment and adjust accordingly. I'll have to check about the cleanup part
backing up to a previous question, I would expect setting env vars in the docker image to make them available when generating the preview snapshot - the action runs a docker run command within the image to generate the snasphot: https://github.com/dagster-io/dagster-cloud-cicd-action/blob/main/src/preview-action.js#L68-L83
g

geoHeil

07/28/2022, 3:07 PM
I hardcoded these directly into the dockerfile with dummy values (not the production password and server) this made the error: ENV VAR missing (shared above) go away. DBT still fails with error code 2 (as also shared). And as discussed - to parse the DBT project no active DB connection should be required.
d

daniel

07/28/2022, 3:07 PM
Ok - sounds like we're on the same page, will check about the answer to the dbt question
❤️ 1
o

owen

07/28/2022, 4:20 PM
my understanding is that in the general case, the dbt project can be parsed without actually connecting to the database. however, it does need the environment variables in order to compile the project. Something I've done myself is just change user:
"{{ env_var('WAREHOUSE_POSTGRES_USER') }}"
to
user: "{{ env_var('WAREHOUSE_POSTGRES_USER', '') }}"
, so that if the environment variable is not present, an empty string will be supplied (rather than producing an error). Another option that would skirt around this issue would be to use
load_assets_from_dbt_manifest
instead of
load_assets_from_dbt_project
, as this way the project would already be compiled ahead of time, so dagster would not need to do that on the fly.
g

geoHeil

07/28/2022, 4:59 PM
Can you support me to debug this @owen? I did set the vars - and it still failed. I am currently using:
load_assets_from_dbt_project
and would rather not want to move on to manifest - as a clean checkout of the project/ docker container should not include the compiled DBT definitions (at least I would not think this to be good practice)
o

owen

07/28/2022, 5:55 PM
hi @geoHeil are you able to locate a dbt error message (and if so, is it complaining specifically about an environment variable, or is it a different type of error?).
re: the docker container, one strategy we've seen is to put the dbt compilation step inside the docker file (so that a manifest.json file will be available in the built image). but I also understand not wanting to switch up the workflow, and realistically it should be possible to get the
...from_dbt_project()
working
g

geoHeil

07/28/2022, 6:00 PM
Once I do provide the ENV vars I do not get a complaint naming env vars as the cause. The CI pipeline does no longer (after setting dummy env vars in the dockerfile) complaining about missing env vars. BUT: It looks like it (the old preview) does now load the preview (which failed previously). What has changed though: Meanwhile I do have merged to master without the preview working and rolled out the changes.
Previously the workspace failed to load with DBT errorcode 2 - unfortunately, without any better logs.
👍 1
o

owen

07/28/2022, 6:07 PM
Got it -- I'm currently looking into surfacing better logs from dbt when that sort of error arises. I'm by no means an expert on how environment variables are set in different deployments, but my hunch is that adding a default value to the
env_var()
calls in the dbt project would prevent this issue from cropping up regardless of the root problem (as it would make the question of "is x env var set in y place" irrelevant until
dbt run
was called.
g

geoHeil

07/28/2022, 6:10 PM
But why does it render correctly after the fact? (where DBT was run at least once on the docker agent)?
Keep in mind setting the dummy values for the variables did not fix the issue - only make the helpful error message go away and seeking only errorcode 2
FYI: I have created an playground PR and there (after previously having deployed DBT to master at least once) the preview seems to work again
o

owen

07/28/2022, 6:24 PM
interesting — to be honest I don't see a good reason for that to be happening, but I'll poke around and see if I can replicate.
❤️ 1
:rainbow-daggy: 1
g

geoHeil

07/28/2022, 6:25 PM
Do not use dbt-duckdb though 😉 but postgres or anything more real
👍 1