https://dagster.io/ logo
#ask-community
Title
# ask-community
b

Brian Pohl

02/24/2023, 9:39 PM
hey Dagster team, i think i've found a bug related to dbt and virtual environments/code locations. i'll give a short version and a super duper long version. short version: my theory is that, because dbt gets called by passing the CLI command (e.g.
dbt run --models ...
) to
subprocess.Popen
, the subprocess is not aware of the virtual environment, and therefore seeks a version of dbt that is installed to the default Python interpreter. if you have not installed dbt outside of any virtual environment, you get errors. long version: i have been running dbt just fine in my Dagster jobs for a while in my one single code location (i use Dagster Hybrid, btw). i have the dbt resource and call it using
context.resources.dbt.run
. recently i decided to create a second code location. the two code locations both use the same Docker image, which has two virtual environments installed on it. the first code location/virtual environment is my default one, and the second location/environment is specifically for triggering things in Databricks. so in my
dagster_cloud.yaml
file, the two locations have all the same attributes, except they use a different repository in
python_file
and a different virtual env in
executable_path
. my Dockerfile used to look like this when i had one location:
Copy code
COPY dagster/requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
but now it looks like this to support my two locations:
Copy code
COPY dagster/requirements.txt /requirements.txt
COPY dagster/requirements_databricks.txt /requirements_databricks.txt

# Different Python environments for different code locations
RUN python -m venv base_env
RUN /base_env/bin/pip install -r /requirements.txt

RUN python -m venv databricks_env
RUN /databricks_env/bin/pip install -r /requirements_databricks.txt
note that the only
pip install
commands are in the virtual environments - no
pip install
on the default python interpreter. suddenly, all my dbt ops in every job give me this error:
FileNotFoundError: [Errno 2] No such file or directory: 'dbt'
. The error is from within my virtual environment, or so i thought, because you can see in these error messages that the python running is from
base_env
:
Copy code
File "/opt/dagster/app/dagster_repo/export_parcel_details.py", line 13, in dbt_parcel_details
    context.resources.dbt.run(models=['+parcel_details_export'], exclude='parcel_id_entity_resolution')
  File "/base_env/lib/python3.10/site-packages/dagster_dbt/cli/resources.py", line 145, in run
    return self.cli("run", models=models, exclude=exclude, select=select, **kwargs)
  File "/base_env/lib/python3.10/site-packages/dagster_dbt/cli/resources.py", line 90, in cli
    return execute_cli(
  File "/base_env/lib/python3.10/site-packages/dagster_dbt/cli/utils.py", line 101, in execute_cli
    process = subprocess.Popen(
  File "/usr/local/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.10/subprocess.py", line 1847, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
i figured out that, if i add install dbt to the default interpreter, so there are 3
pip install
commands in my Dockerfile, then everything works. interestingly, i tried installing only
dbt-core
to the default interpreter (we use snowflake, so to make anything work, we need both
dbt-core
and
dbt-snowflake
), like so:
Copy code
COPY dagster/requirements.txt /requirements.txt
COPY dagster/requirements_databricks.txt /requirements_databricks.txt

# install only dbt core to default interpreter
RUN pip install dbt-core==1.2.0

# Different Python environments for different code locations
RUN python -m venv base_env
RUN /base_env/bin/pip install -r /requirements.txt

RUN python -m venv databricks_env
RUN /databricks_env/bin/pip install -r /requirements_databricks.txt
see the image below for the errors this gives me. dbt kind of runs, but says i'm missing the snowflake adaptor. and the errors dbt is giving me are from
/usr/local/lib/python
- the default interpreter. so it seems that the subprocess is executing a command,
dbt run
, without the context of being in the virtual environment. i also built my Docker image locally - with no dbt on the default interpreter - and confirmed that, if you activate the virtual environment by running
source /base_env/bin/activate
, then you can run dbt just fine with
dbt run
.
i submitted this as an issue in Github: https://github.com/dagster-io/dagster/issues/12549
3 Views