In the `dagster-dbt`reference they have a (conveni...
# ask-community
r
In the `dagster-dbt`reference they have a (conveniently...) simple example for how to read the table into a dataframe https://docs.dagster.io/integrations/dbt/reference#defining-an-io-manager This fails on my end, by all appearances because it fails to target the correct schema. The schemas are generated at runtime, so having to pass them manually would be a huge hassle. It does not appear to be stored in the asset metadata automatically, but please tell me there is some way of adding it? Can I use the inputs
runtime_metadata_fn
receives - for some reason it doesn't get the
node_info
?
o
hi @Rasmus Bonnevie! When you say that the schema is generated at runtime, what runtime is this referring to? Is the schema known at the point that
dbt compile
is executed, or is it not known until
dbt run
? If the schema is known at compile time, then I think the best course of action would be to make the asset key that's generated for each model dependent on the schema. That would look like
Copy code
def get_node_asset_key(node_info):
    return AssetKey([node_info["schema"], node_info["name"]])

load_assets_from_dbt_manifest(
    ...,
    node_info_to_asset_key=get_node_asset_key,
)
Then, in the IOManager, you could have
Copy code
schema_name = context.asset_key.path[-2]
table_name = context.asset_key.path[-1]
and go from there.
r
Hi @owen 🙂 I guess it is at
compile
, I'm not doing anything too obscure. I follow the DBT guidelines where I have separate development and production schemas encoded in
profiles.yml
, i.e.
dev_rasmus
for me, and then I use
+schema: foo
for some directories in
dbt_project.yml
to add some structure, which gets compiled to a schema of
dev_rasmus_foo
for those models. I'd rather not have my asset keys depend on my user schema and it's also a bit inconvenient to have to rebuild the schema with all suffixes manually for each table, if I'm e.g. generating in a loop. My user schema even depends on an environment variable, so I'd have to parse that too if I were to e.g. read the
profiles.yml
in as a resource/asset.
okay, it seems like the second input to
runtime_metadata_fn
is actually the necessary metadata/configuration, but even when I add e.g. database and schema to the DBT asset as metadata, it is not accessible in the IO
input_manager
I've added, at least not on the
upstream_output.metadata
which is empty.
o
ah I see, that is a bit tricky. right now, the metadata attached to the upstream output is not forwarded along to downstream steps (it just gets stored in the database). the
input_context.metadata
field is referencing metadata that is baked into the definition of the output, rather than anything that's generated at runtime (the runtime metadata is more for bookkeeping purposes). as for a workaround, I'd recommend the (somewhat painful) approach of replicating the schema name logic when configuring your IOManager. right now, the asset key for each model will depend on the configured schema (but not the base schema). So if you have
model_a
with
+schema: foo
applied to it, it should get the asset key
["foo", "model_a"]
, whereas
model_b
with no schema configured for it will just be
["model_b"]
. you'll then want to have an IOManager that can be configured with a
base_schema
property, i.e.
Copy code
def MyIOManager(IOManager):
    def __init__(self, base_schema: str):
        self.base_schema = base_schema
    
    def _get_schema(self, context):
        if len(context.asset_key.path) == 2:
            return f"{self.base_schema}_{context.asset_key.path[0]}"
        return self.base_schema
    ...

@io_manager(config_schema={"base_schema": str})
def my_io_manager(context):
    return MyIOManager(base_schema=context.resource_config["base_schema"])
As you note, this would require you to duplicate some config between
profiles.yml
and dagster, i.e.
Copy code
dev_io_manager = my_io_manager.configured({"base_schema": os.getenv("...")})

prod_io_manager = my_io_manager.configured({"base_schema": "something"})
but it should at least unblock you
r
yes, this was more or less what I ended up doing, but it just seemed like such a basic problem (especially with all that DBT metadata right there for the taking) that I was sure there must be some better alternative I was missing 🫤 A slightly more satisfying solution might be to have a profile resource that reads from
profiles.yml
so that you can access the schema? But the nicest would still be to be able to read it from the dbt table asset directly.
r
just to +1 here, with bigquery it ends up being quite clunky since the dbt profile is really only a billing project + auth, and you can actually read and write across both projects and datasets with a single profile. For our setup we also do the dbt_${USER} prefix on all datasets. If we could access the full metadata (project, dataset, table) it would allow us to use a single io manager across not only datasets but also projects. Our workaround for now is to have an io manager per project (which then requires knowing which IO manager is required for a particular dbt asset) and the _${USER} hack encoded in our io manager. Not the end of the world but would be nice to clean up at some point!