Brian Pohl
02/24/2023, 9:39 PMdbt run --models ...
) to subprocess.Popen
, the subprocess is not aware of the virtual environment, and therefore seeks a version of dbt that is installed to the default Python interpreter. if you have not installed dbt outside of any virtual environment, you get errors.
long version: i have been running dbt just fine in my Dagster jobs for a while in my one single code location (i use Dagster Hybrid, btw). i have the dbt resource and call it using context.resources.dbt.run
. recently i decided to create a second code location. the two code locations both use the same Docker image, which has two virtual environments installed on it. the first code location/virtual environment is my default one, and the second location/environment is specifically for triggering things in Databricks. so in my dagster_cloud.yaml
file, the two locations have all the same attributes, except they use a different repository in python_file
and a different virtual env in executable_path
.
my Dockerfile used to look like this when i had one location:
COPY dagster/requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
but now it looks like this to support my two locations:
COPY dagster/requirements.txt /requirements.txt
COPY dagster/requirements_databricks.txt /requirements_databricks.txt
# Different Python environments for different code locations
RUN python -m venv base_env
RUN /base_env/bin/pip install -r /requirements.txt
RUN python -m venv databricks_env
RUN /databricks_env/bin/pip install -r /requirements_databricks.txt
note that the only pip install
commands are in the virtual environments - no pip install
on the default python interpreter. suddenly, all my dbt ops in every job give me this error: FileNotFoundError: [Errno 2] No such file or directory: 'dbt'
. The error is from within my virtual environment, or so i thought, because you can see in these error messages that the python running is from base_env
:
File "/opt/dagster/app/dagster_repo/export_parcel_details.py", line 13, in dbt_parcel_details
context.resources.dbt.run(models=['+parcel_details_export'], exclude='parcel_id_entity_resolution')
File "/base_env/lib/python3.10/site-packages/dagster_dbt/cli/resources.py", line 145, in run
return self.cli("run", models=models, exclude=exclude, select=select, **kwargs)
File "/base_env/lib/python3.10/site-packages/dagster_dbt/cli/resources.py", line 90, in cli
return execute_cli(
File "/base_env/lib/python3.10/site-packages/dagster_dbt/cli/utils.py", line 101, in execute_cli
process = subprocess.Popen(
File "/usr/local/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.10/subprocess.py", line 1847, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
i figured out that, if i add install dbt to the default interpreter, so there are 3 pip install
commands in my Dockerfile, then everything works. interestingly, i tried installing only dbt-core
to the default interpreter (we use snowflake, so to make anything work, we need both dbt-core
and dbt-snowflake
), like so:
COPY dagster/requirements.txt /requirements.txt
COPY dagster/requirements_databricks.txt /requirements_databricks.txt
# install only dbt core to default interpreter
RUN pip install dbt-core==1.2.0
# Different Python environments for different code locations
RUN python -m venv base_env
RUN /base_env/bin/pip install -r /requirements.txt
RUN python -m venv databricks_env
RUN /databricks_env/bin/pip install -r /requirements_databricks.txt
see the image below for the errors this gives me. dbt kind of runs, but says i'm missing the snowflake adaptor. and the errors dbt is giving me are from /usr/local/lib/python
- the default interpreter.
so it seems that the subprocess is executing a command, dbt run
, without the context of being in the virtual environment. i also built my Docker image locally - with no dbt on the default interpreter - and confirmed that, if you activate the virtual environment by running source /base_env/bin/activate
, then you can run dbt just fine with dbt run
.Brian Pohl
02/24/2023, 10:25 PM