hey all! I'm trying to run a job with dbt models i...
# ask-community
s
hey all! I'm trying to run a job with dbt models in dagster but I'm running into the following error (attached in screenshot). I don't get this error when I do
dbt run
on the dbt models directly. I'm hoping someone can explain how dagster is running these dbt ops and why I might be seeing this error. dbt model:
Copy code
{# incremental + dummy_partition helps reuse the same storage path. #}
{# For the based reference table, this helps later cleanup. #}

{{
  config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partitioned_by=('dummy_partition',),
  )
}}

SELECT
    _hoodie_commit_time,
    obj_id,
    full_document_json,
    event_type,
    org_id,
    'default' AS dummy_partition
FROM {{ ref('zbase__tracking_events') }}
WHERE
    _hoodie_commit_time LIKE
        CONCAT(
            (
                SELECT cp_str_yearmonthday
                FROM {{ ref('zbase__datetime') }}
                WHERE category = 't'
            ), '%')
    OR _hoodie_commit_time LIKE
        CONCAT(
            (
                SELECT cp_str_yearmonthday
                FROM {{ ref('zbase__datetime') }}
                WHERE category = 't-1day'
            ), '%')
ORDER BY event_type, obj_id
dagster job definition:
Copy code
@job(resource_defs={"dbt": dbt_resource}, tags=DAGSTER_K8S_CONFIG)
def dbt_run_test_job():
    dbt_run_op()
o
hi @Salina Wu! at the top of the logs in that screenshot, you can see the exact command that dagster is running for dbt (
dbt --log-format json run --project-dir ../dbt_data_tf --profiles-dir ../dbt_data_tf
). Running this command manually should have the same exact results as having dagster run it for you (unless maybe the profile has some values populated by environment variables that are not set in the environment dagster is running in, I guess).
can you confirm if running that command directly fails in the same way?
s
doesn't work for me:
Copy code
(base) salinawu@ip-172-20-2-45 dbt-data-transformation % dbt --log-format json run --project-dir ../dbt_data_tf --profiles-dir ../dbt_data_tf
{"code": "Z002", "data": {"e": "Runtime Error\n  fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file"}, "invocation_id": "e6707b1a-d377-4b05-b297-c3d286e9eaaa", "level": "error", "log_version": 1, "msg": "Encountered an error:\nRuntime Error\n  fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file", "node_info": {}, "pid": 8922, "thread_name": "MainThread", "ts": "2022-03-29T22:37:37.571450Z", "type": "log_line"}
oh wait i'm running in the wrong dir. one moment
yeah, I'm getting the same error when I run that manually
Copy code
{"code": "Z030", "data": {"keyboard_interrupt": false, "num_errors": 1, "num_warnings": 0}, "invocation_id": "fb4988bf-b84c-427f-af24-dca7cef82fee", "level": "info", "log_version": 1, "msg": "\u001b[31mCompleted with 1 error and 0 warnings:\u001b[0m", "node_info": {}, "pid": 10144, "thread_name": "MainThread", "ts": "2022-03-29T22:41:36.057985Z", "type": "log_line"}
{"code": "Z028", "data": {"msg": "Runtime Error in model zbase__tracking_events_subset_prev_2days (models/zbase/zbase__tracking_events_subset_prev_2days.sql)"}, "invocation_id": "fb4988bf-b84c-427f-af24-dca7cef82fee", "level": "error", "log_version": 1, "msg": "\u001b[33mRuntime Error in model zbase__tracking_events_subset_prev_2days (models/zbase/zbase__tracking_events_subset_prev_2days.sql)\u001b[0m", "node_info": {}, "pid": 10144, "thread_name": "MainThread", "ts": "2022-03-29T22:41:36.058516Z", "type": "log_line"}
{"code": "Z029", "data": {"msg": "  SUBQUERY_MULTIPLE_ROWS: Scalar sub-query has returned multiple rows. You may need to manually clean the data at location '<s3://forethought-athena-gatsby/dbt_v1/tables/4e772876-4b31-419d-b3da-54e1fdf34c57>' before retrying. Athena will not delete data in your account."}, "invocation_id": "fb4988bf-b84c-427f-af24-dca7cef82fee", "level": "error", "log_version": 1, "msg": "  SUBQUERY_MULTIPLE_ROWS: Scalar sub-query has returned multiple rows. You may need to manually clean the data at location '<s3://forethought-athena-gatsby/dbt_v1/tables/4e772876-4b31-419d-b3da-54e1fdf34c57>' before retrying. Athena will not delete data in your account.", "node_info": {}, "pid": 10144, "thread_name": "MainThread", "ts": "2022-03-29T22:41:36.058935Z", "type": "log_line"}
{"code": "Z023", "data": {"stats": {"error": 1, "pass": 2, "skip": 1, "total": 4, "warn": 0}}, "invocation_id": "fb4988bf-b84c-427f-af24-dca7cef82fee", "level": "info", "log_version": 1, "msg": "Done. PASS=2 WARN=0 ERROR=1 SKIP=1 TOTAL=4", "node_info": {}, "pid": 10144, "thread_name": "MainThread", "ts": "2022-03-29T22:41:36.059387Z", "type": "log_line"}
o
hm.. -- and not when you just do a plain
dbt run
in the
dbt_data_tf/
?
From what I recall (might be inaccurate), when you don't explicitly give dbt a profile dir argument, it will use a profile from
~/.dbt/profiles.yml
, so if
dbt run
works but not
dbt run --profiles-dir ...
, then it's possible that there's some configuration that differs between those.
s
sorry, I'm now seeing the error when I run dbt in the dbt_data_tf subfolder - not sure why it was working before. looks like I had to ORDER BY and LIMIT 1 in my dbt model. thanks for helping and sorry for the confusion!
o
no problem -- glad you found the issue 🙂
139 Views