Is it reasonable or an anti-pattern to use dagster...
# ask-community
g
Is it reasonable or an anti-pattern to use dagster to trigger updates to materialized views?
The direct database (Postgres) options look questionable, from a scheduling perspective.
Gonna go ahead with using dagster to schedule the materialized view updates
y
Just to clarify, by materialized views, did you mean materialized views in a database for your biz logics (not dagster db for storing metadata)?
g
Yep, sorry this is a very general Data Engineering question, not Dagster specific at all.
y
Got it - yea I think it’s reasonable to do so, or even recommended. If you are using our newly introduced asset APIs (docs), you can put that logic in an asset and schedule it in an asset group so dagster can track the updates.
❤️ 1
g
I am not, but might be a good time for me to give them a try
y
g
Guess it'd provide the perfect documentation in terms of the underlying tables that are transformed to generate the materialized view. Also wonder if this is where I should be using DBT / the DBT integration.
y
For integration, you can also check out a more comprehensive example here: https://github.com/dagster-io/dagster/tree/master/examples/modern_data_stack_assets - which shows how to use the Software-Defined Asset APIs alongside Modern Data Stack tools (specifically, Airbyte and dbt)
❤️ 1
g
Can I use DBT without it running as a separate tool? (/ not how Airbyte is setup)
y
Yes you can use dbt itself w/ Dagster. You can use
load_assets_from_dbt_project
and
load_assets_from_dbt_manifest
to easily construct asset-based jobs from DBT models directly.
g
Will give it a quick go now then I think, thanks!
blob thumbs up 1
@yuhan how do I then expose the .build_job('Assets') in repo.py? Very sudo codey, but is below roughly right?
Copy code
analytics_assets = AssetGroup(
    [*dbt_assets],
    resource_defs={
        "dbt": dbt_cli_resource.configured(DBT_CONFIG),
    },
).build_job("Assets")
Copy code
@repository
def analytics() -> list:
    return [analytics_assets]
y
yup^
❤️ 1
g
Really pushing my luck with support here but I'm baffled by file_relative_path not managing to find either of my two DBT directories and can't think of any good way to debug it.
Stripped things back a touch, using an older simple example that I hope is still valid (was facing some slightly odd/annoying bugs)
Copy code
from dagster import pipeline
from dagster_dbt import dbt_cli_run

config = {"project-dir": file_relative_path(__file__, 'dbt_project')}
run_all_models = dbt_cli_run.configured(config, name="run_dbt_project")

@pipeline
def my_dbt_pipeline():
    run_all_models()
Weirdly I'm getting the error below despite the fact that I'm not specifying dbt anywhere ???
Copy code
FileNotFoundError: [Errno 2] No such file or directory: 'dbt': 'dbt'
  File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
    yield
  File "/usr/local/lib/python3.7/site-packages/dagster/utils/__init__.py", line 405, in iterate_with_context
    next_output = next(iterator)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/compute_generator.py", line 66, in _coerce_solid_compute_fn_to_iterator
    for event in _validate_and_coerce_solid_result_to_iterator(result, context, output_defs):
  File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/compute_generator.py", line 86, in _validate_and_coerce_solid_result_to_iterator
    for event in result:
  File "/usr/local/lib/python3.7/site-packages/dagster_dbt/cli/solids.py", line 115, in dbt_cli_run
    target_path=context.solid_config["target-path"],
  File "/usr/local/lib/python3.7/site-packages/dagster_dbt/cli/utils.py", line 74, in execute_cli
    process = subprocess.Popen(command_list, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
y
cc @owen can you look into this specific error? 🙏
o
hi @George Pearse , do you mind trying with this code?
Copy code
from dagster_dbt import dbt_cli_resource, dbt_run_op

from dagster import job

my_dbt_resource = dbt_cli_resource.configured(
    {"project_dir": "path/to/dbt/project"}
)

@job(resource_defs={"dbt": my_dbt_resource})
def my_dbt_job():
    dbt_run_op()
Just to help isolate the source of the issue, might be worth skipping the file_relative_path and hardcoding the project dir.
g
Still pretty certain I'm getting exactly the same 'dbt' directory error. Will double check I'm not doing something dumb with how I create my docker containers
o
hm yeah I'm wondering if this might just mean "dbt is not installed"
1
g
Is it dbt-core that I need?
Yep, think that's done it, thanks a lot Owen. Working with that example and can try the asset approach now because I think ti was the same error all along.
Would be good to have a clearer error if possible
o
yeah that's a pretty confusing one, and glad you got it working!
@Dagster Bot docs better error message when dbt is not installed when using dagster-dbt
d
g
It's also weirdly hard to get the dbt project structure right. But suspect that's a them thing. Saying I don't have a dbt_project.yml in a directory where I absolutely definitely do.
Generated via dbt init as well
For anyone else who similarly gets stuck (I really hadn't read enough of the documentation) the debugging dbt pages are great https://docs.getdbt.com/docs/guides/debugging-errors
Do I need to use asset definitions to get the SQL behind a dbt model in the UI?
o
hi @George Pearse! Basically yes. If you run a regular dbt_run_op (so, not using asset definitions), we will still create asset materialization events (representations of what did happen during a run), but these don't contain the model's sql and also don't show the relationships between different dbt models.
g
Yeah got this working and it's beautiful, big fan of the integration
❤️ 1
o
awesome! if you have any further feedback let us know -- it's still in its early stages so getting it up and running definitely isn't always the smoothest experience 🙂
g
Would be nice to have a simple way to link to a website with the DBT docs specific to a model, just because it provides a little more detail, and could reduce duplicated work for you guys.
When looking at the Asset Definition page
(though I haven't sorted out deploying the DBT docs yet)