https://dagster.io/ logo
#ask-community
Title
# ask-community
r

Ricky Kim

01/04/2023, 4:21 PM
Hi 👋 I am trying to load dbt assets from multiple projects. By reading https://dagster.slack.com/archives/C01U954MEER/p1669307372666059 I could define multiple assets using
dbt_resource_key
argument with
load_assets_from_dbt_projects
. But I am having problem include both dbt asset groups in the repository. I can include either one of them in the return statement of
repository
. But when I include both of them, I get below error.
Copy code
dagster._core.errors.DagsterInvalidDefinitionError: Invalid dependencies: op "run_dbt_b7d19" does not have input "source_moo_moocards_moocards_moo_user". Available inputs: ['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
The dbt assets are defined as below.
Copy code
def get_dbt_group_name(node_info: Mapping[str, Any]):
    name = _get_node_group_name(node_info)
    return "dbt_" + name if name else None


# setup an `asset` with all dbt models
all_models = with_resources(
    # when loading assets from dbt, we invoke the `node_info_to_group_fn` argument
    # to label them all under a unified `group_name`
    load_assets_from_dbt_project(
        project_dir=DBT_PROJECT_DIR, profiles_dir=DBT_PROFILES_DIR, node_info_to_group_fn=get_dbt_group_name
    ),
    {
        "dbt": dbt_cli_resource.configured(
            {"project_dir": DBT_PROJECT_DIR, "profiles_dir": DBT_PROFILES_DIR, "target": DBT_TARGET}
        )
    },
)

# setup an `asset` with dbt prep models
prep_models = with_resources(
    load_assets_from_dbt_project(
        project_dir=DBT_PREP_DIR,
        profiles_dir=DBT_PROFILES_DIR,
        dbt_resource_key="dbt_prep",
        node_info_to_group_fn=get_dbt_group_name,
    ),
    {
        "dbt_prep": dbt_cli_resource.configured(
            {"project_dir": DBT_PREP_DIR, "profiles_dir": DBT_PROFILES_DIR, "target": DBT_TARGET}
        )
    },
)
Could I get some insight on how to load both dbt assets into the repository? Thank you!
o

owen

01/04/2023, 9:57 PM
hi @Ricky Kim! This is a pretty surprising error -- I played around with this for a bit trying to replicate it but wasn't able to do so. Just to be clear, does this error show up as soon as you try to load the repository? And is there anything within your repository other than those two sets of assets? And are
['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
the only sources in one of your projects? Or is that an incomplete list? A final question, do you happen to have a target_path set in your dbt_project.yml?
One thing to try might be switching over to
load_assets_from_dbt_manifest
(https://docs.dagster.io/integrations/dbt/reference#loading-models-using-load_assets_from_dbt_manifest), but this error really is surprising so it's somewhat hard to predict if that would help
r

Ricky Kim

01/05/2023, 9:40 AM
Thank you for your reply. To answer your questions, "does this error show up as soon as you try to load the repository?" When I branch deploy, it will fail to load. And I click into "view the error", I see the error message "And is there anything within your repository other than those two sets of assets?" From the error message
Copy code
dagster._core.errors.DagsterInvalidDefinitionError: Invalid dependencies: op "run_dbt_b7d19" does not have input "source_moo_moocards_moocards_moo_user". Available inputs: ['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions'] is the complete set from a project stored in DBT_PREP_DIR (in my code snippet). "source_moo_moocards_moocards_moo_user" is just one model from the total 23 models from a project stored in DBT_PROJECT_DIR. My guess is that it's trying to load a model from DBT_PROJECT_DIR, when it should load models from DBT_PREP_DIR. _"And are
['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
the only sources in one of your projects?"_ Yes this is a complete set of models from DBT_PREP_DIR. "do you happen to have a target_path set in your dbt_project.yml?" Yes I have same target_path set in both projects' dbt_project.yml. as
target-path: "target"
Regarding your suggestion of using
load_assets_from_dbt_manifest
, I will try that and see how it goes. But if you can spot anything from my answer could you kindly let me know? Thank you.
Hi! I have an update on this. While trying a few different things, I have found a workaround! Even though I am not 100% clear why it works with this workaround, but below is what I have done. When loading the assets from our main dbt project, I have included
exclude
argument with some non existing tag.
Copy code
all_models = with_resources(
    # when loading assets from dbt, we invoke the `node_info_to_group_fn` argument
    # to label them all under a unified `group_name`
    load_assets_from_dbt_project(
        project_dir=DBT_PROJECT_DIR,
        profiles_dir=DBT_PROFILES_DIR,
        node_info_to_group_fn=get_dbt_group_name,
        exclude="tag:nonexisting_tag",
    ),
    {
        "dbt": dbt_cli_resource.configured(
            {"project_dir": DBT_PROJECT_DIR, "profiles_dir": DBT_PROFILES_DIR, "target": DBT_TARGET}
        )
    },
)
It works similarly with
select
argument, but since we want to bring all the models' assets.
exclude
works nicely just by giving it some non existing tag, in this way it will not exclude any models and bring everything. I know this is a bit hacky workaround, so I hope there will be some proper fix for this. But as you have already mentioned, you were not able to reproduce the issue, so not sure whether it's possible for you to take a deeper look into this. My guess is when a key word argument like
select
or
exclude
it goes through some kind of process that magically fixes the problem I had. If you have any questions please let me know. Thank you.
m

Manish Khatri

01/10/2023, 5:56 PM
Hi @owen I’ve done a little more digging to try and get to the bottom of things on this one. I did a
pprint(vars(dbtproj_asset[0]))
for both dbt projects of ours in conjunction with the error message we are seeing:
Copy code
Error loading repository location moo:dagster._core.errors.DagsterInvalidDefinitionError: 
Invalid dependencies: op "run_dbt_b7d19" does not have input "source_moo_moocards_moocards_order_item". 
Available inputs: ['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
The mentioned
source_moo_moocards_moocards_order_item
will be different each time trying to start dagit (picks up the 1st in whatever ordered list generated on each run of dbt project 1). dbt project 2 only has the 3 sources mentioned in Available inputs. There seems to be something strange going on where it can’t distinguish the difference between the 2 projects - like it’s looking at assets of project 2 with the lens of project 1 The one important thing to note is that the 3 sources named in dbt project 2 also exist in dbt project 1. I.e.
['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
is common in both projects and appears in the
vars(…)
output under the
_keys_by_input_name
key. The mystery part is that by providing the `include`/`exclude` kwarg for
load_assets_from_dbt_project
, some voodoo happens and there are no complaints and the project loads fine. Does the addition of those flags trigger some sort of different code flow? The
vars(..)
output of the assets with and without the exclude/include shows the identical object, but perhaps internally there is something different?
o

owen

01/11/2023, 3:19 PM
hey @Manish Khatri! I really appreciate the in-depth digging here -- the fact that the exclude argument is fixing the issue leads me to suspect that this is somehow a name collision issue. Basically, when there are multiple dbt projects being loaded, we need to come up with a unique name for the op that backs the assets for each of those projects. By default, we do this by reading in the project_id from the manifest.json file. Then, we name the op dbt_run_{project_id}. However, as you can see in the line below that, if "select" or "exclude" is specified, then we add in a hash of the select/exclude statement to the op name. So my best guess here is that somehow the project id for each of these projects is identical (or at least the first 5 characters are identical) -- it might be worth checking that on your end, just to know, and so we're creating two ops with duplicate names, which is confusing the system. When you add in an empty exclude statement, it changes the generated name, which solves the collision.
I'm definitely interested to know if those project ids really are identical (dbt docs say that they should be unique per project, but it's possible that they're somehow not), but in the meantime, adding in an empty "exclude" should solve the issue
m

Manish Khatri

01/11/2023, 3:23 PM
Good shout, I presume
project_id
is the equivalent of the
name:
key in the
dbt_project.yml
? If so, then yes these are identical and we should change this (!)
o

owen

01/11/2023, 3:24 PM
ah yeah it might be a hash of the name field or something like that!
m

Manish Khatri

01/11/2023, 3:25 PM
Ok great, will fix that and see if that resolves this issue! 🤞
8 Views