Hey team! Does anyone here have experience with us...
# ask-community
h
Hey team! Does anyone here have experience with using Dagster with DBT? The two tools seem like a match made in heaven and I’d love to learn more about the ecosystem: specifically to do with running custom
dbt run
commands and parsing command line arguments/variables. Please see the comments for more info dabparrot
I’ve found that the dagster equivalent of the
dbt run
command is as follows:
Copy code
my_dbt_resource = dbt_cli_resource.configured({"project_dir": "/path/to/my/working/directory"})


@job(resource_defs={"dbt": my_dbt_resource})
def my_dbt_job():
    dbt_run_op()
However I’m looking to run the following dbt command in Dagster…
Copy code
`dbt run --vars '{"my_variable": "0.2"}' -m +final_model_in_dbt_dag`
It runs the dbt model named
final_model_in_dbt_dag
and all of it’s parent nodes in the DAG while passing in a command line variable
my_variable
(essentially running the DAG from start to end) Does anyone know how to structure the op/job such that it takes a the dbt
--vars
argument via dagster?
o
hi @Harpal! the
dbt_cli_resource
can be configured with flags that will be passed into all the underlying commands, so in your example, you would have:
Copy code
my_dbt_resource = dbt_cli_resource.configured(
    {
        "project_dir": "/path/to/my/working/directory",
        "vars": {"my_variable": 0.2},
        "m": "+final_model_in_dbt_dag",
    }
)
🎉 1
h
🎉
🎉 2
Thank you @owen! That did the trick
Dagster lists the 8 “assets” at the top of the screenshot but doesn’t show any of the transformations/assets in the DAG. This is strange as all of my other DAGs that are pure python have a full graph of each @op… Have I done something wrong or is this normal behaviour with dbt transformations in Dagster?
o
hi @Harpal -- this is a great question, and something that the new software defined asset apis aim to address. The reason you only see a single operation is that dagster only runs a single operation (
dbt run
) in order to create all those assets. If you scroll down a little bit in that link you'll see how to model this with software defined assets (which allow you to visualize those dependencies on top of the underlying operation). That would be using the
load_assets_from_dbt_project
function.
h
Oh wow. This is precisely what I need 😮 I’ve followed your video and the docs and it looks like I need to add the following code snippet into my code base:
Copy code
from dagster_dbt import dbt_cli_resource, load_assets_from_dbt_project

DBT_PROJECT_DIR = "/path/to/my/dbt/project/dir"

dbt_assets = load_assets_from_dbt_project(DBT_PROJECT_DIR)

analytics_assets = AssetGroup(
    dbt_assets,
    resource_defs={
        "dbt": dbt_cli_resource.configured(DBT_CONFIG),
    },
)
I’ve put the code snippet in its own
assets.py
file (a you did in the video) in the same directory as the previously mentioned
dbt_job.py
file but there were no observable changes to the dagit UI. Have I missed anything here? The
Assets
page of dagit has all 8 tables by name but the materializations page is bare (See screenshot below).
o
when you're loading up dagit, you'll want to point it at the assets.py file (instead of dbt_job.py), but otherwise that code looks right to me
h
Understood. Thanks again! The problem is likely related to the fact that I don’t see anything related to
assets.py
in the
Jobs
menu on the left hand side of the Dagit UI. Good to know that I’m on the right tracks though! I’ll keep digging.
o
ah yeah, there's no job associated with an asset group by default. a quick trick to bring you back into more familiar waters would be to add a build job command at the end of your asset group:
Copy code
analytics_assets = AssetGroup(
    dbt_assets,
    resource_defs={
        "dbt": dbt_cli_resource.configured(DBT_CONFIG),
    },
).build_job("my_job")
this will create a job that will materialize all the assets in the group, and will show up in the sidebar.
h
Yes, that make sense. It’s so strange because this still doesn’t cause a new Job called
my_job
to show up 😕 Not even an error message. *Must be a problem with my config
Copy code
from dagster_dbt import dbt_cli_resource, load_assets_from_dbt_project

DBT_PROJECT_DIR = "/path/to/my/dbt/project/dir"

dbt_assets = load_assets_from_dbt_project(DBT_PROJECT_DIR)

analytics_assets = AssetGroup(
    dbt_assets,
    resource_defs={
        "dbt": dbt_cli_resource.configured(DBT_CONFIG),
    },
).build_job("my_job")
o
hm.. mind sharing what your workspace page looks like? it's the button in the top right of dagit
h
Ooooh I think I’ve almost got this. What should be the value of
DBT_CONFIG
?
o
ah the minimal thing to put there would be
{"project_dir": "path to your dbt project"}
although it's a little silly that we make you specify it both in config and in the invocation of
load_assets_from_dbt_project
🤔
h
Still no luck unfortunately 😞 1. I’ve added this line to my
repo.py
file
from path.to.assets import analytics_assets
2. Added the new
assets_analytics
job it to the bottom of this list:
Copy code
@repository
def repo():
    return [
        a_list,
        of_all,
        jobs,
        analytics_assets,
    ]
Then i’m greeted with this behemoth of an error/warning message… (Removing the error messages stops the error messages but still doesn’t show the new job.)
Copy code
2022-03-14 20:35:08 +0000 - dagster.daemon.SchedulerDaemon - WARNING - Could not load location repo.py to check for schedules due to the following error: dagster_dbt.errors.DagsterDbtCliFatalRuntimeError: Fatal error in the dbt CLI (return code 2)

Stack Trace:
  File "/Users/user/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster/grpc/server.py", line 212, in __init__
    self._loaded_repositories = LoadedRepositories(
  File "/Users/user/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster/grpc/server.py", line 97, in __init__
    loadable_targets = get_loadable_targets(
  File "/Users/user/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster/grpc/utils.py", line 27, in get_loadable_targets
    else loadable_targets_from_python_file(python_file, working_directory)
  File "/Users/user/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster/core/workspace/autodiscovery.py", line 18, in loadable_targets_from_python_file
    loaded_module = load_python_file(python_file, working_directory)
  File "/Users/user/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster/core/code_pointer.py", line 79, in load_python_file
    return import_module_from_path(module_name, python_file)
  File "/Users/user/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster/seven/__init__.py", line 47, in import_module_from_path
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "dagster/repo.py", line 25, in <module>
    from dagster.sector_classification.jobs.assets import analytics_assets
  File "/Users/hdot/vs_code/machine-learning/dagster/sector_classification/jobs/assets.py", line 5, in <module>
    dbt_assets = load_assets_from_dbt_project(DBT_PROJECT_DIR)
  File "/Users/hdot/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster_dbt/asset_defs.py", line 172, in load_assets_from_dbt_project
    manifest_json, cli_output = _load_manifest_for_project(
  File "/Users/hdot/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster_dbt/asset_defs.py", line 18, in _load_manifest_for_project
    cli_output = execute_cli(
  File "/Users/hdot/.pyenv/versions/3.9.8/envs/machine-learning-3.9.8/lib/python3.9/site-packages/dagster_dbt/cli/utils.py", line 90, in execute_cli
    raise DagsterDbtCliFatalRuntimeError(logs=logs, raw_output=raw_output)

2022-03-14 20:36:10,931  35121   utils                Executing command: dbt --log-format json ls --project-dir /Users/user/machine-learning/dbt --profiles-dir /Users/user/machine-learning/dbt/config --select * --resource-type model --output json
2022-03-14 20:36:12 +0000 - dagster.builtin - ERROR - Encountered an error while reading the project:
2022-03-14 20:36:12,146  35121   utils                Encountered an error while reading the project:
2022-03-14 20:36:12 +0000 - dagster.builtin - ERROR -   ERROR: Runtime Error
  Could not find profile named 'moonfire_dbt'
2022-03-14 20:36:12,150  35121   utils                  ERROR: Runtime Error
  Could not find profile named 'moonfire_dbt'
2022-03-14 20:36:12 +0000 - dagster.builtin - ERROR - Encountered an error:
Runtime Error
  Could not run dbt
2022-03-14 20:36:12,159  35121   utils                Encountered an error:
Runtime Error
  Could not run dbt
2022-03-14 20:36:12,240  35121   utils                dbt exited with return code 2
2022-03-14 20:36:12,243  35121   api                  Started Dagster code server for file dagster/repo.py in process 35121
Workspace looks like this after removing the added content from
repo.py
.
o
your "behemoth" description is all too accurate 😞 -- but the relevant bit is
Copy code
2022-03-14 20:36:10,931  35121   utils                Executing command: dbt --log-format json ls --project-dir /Users/hdot/vs_code/machine-learning/dbt --profiles-dir /Users/hdot/vs_code/machine-learning/dbt/config --select * --resource-type model --output json
2022-03-14 20:36:12 +0000 - dagster.builtin - ERROR - Encountered an error while reading the project:
2022-03-14 20:36:12,146  35121   utils                Encountered an error while reading the project:
2022-03-14 20:36:12 +0000 - dagster.builtin - ERROR -   ERROR: Runtime Error
  Could not find profile named 'moonfire_dbt'
2022-03-14 20:36:12,150  35121   utils                  ERROR: Runtime Error
  Could not find profile named 'moonfire_dbt'
looks like the
moonfire_dbt
profile is not available in
...machine-learning/dbt/config
basically
Copy code
dbt ls --project-dir /Users/hdot/vs_code/machine-learning/dbt --profiles-dir /Users/hdot/vs_code/machine-learning/dbt/config --select * --resource-type model --output json
is the command that needs to succeed
h
Hello @owen Yes, you’re right! I spent some time reconfiguring my Dagster/DBT repo and have resolved the previous error! Thanks so much for your time
o
no problem at all! glad things are working for you 🙂 we're still actively developing this integration, so happy to hear any suggestions you have as you try it out!
h
Absolutely! Can’t wait to share the news with the team congadagstercongadagstercongadagstercongadagster Here were the blocking errors: • Because we use a dagster @repository I add to manually import and add
analytics_assets
to the list of jobs/sensors/assets/scehdules etc. • I had to move my
profiles.yml
file into a new directory called`dbt/config` (not sure If I can change this default) • add the following line of code to
assets.py
Copy code
from dagster import AssetGroup
I totally missed the import statement - my bad. The dagster command being ran in the Dagit launchpad automatically assumed that the asset exists in
dbt/config
. Not sure if that’s a feature or a bug tbh. But was fairly easy to debug once DBT and dagster were setup correctly.
❤️ 1