Dennis Gera
04/17/2023, 12:23 PMdefine
an asset as a subset of a dbt loaded from manifest asset?owen
04/17/2023, 9:35 PMDennis Gera
04/17/2023, 10:23 PMload_assets_from_dbt_project
. Initially, we did not know the best way to use these assets, so we ended up defining multiple dbt assets using logic like:
example_dbt_assets = load_assets_from_dbt_project(
project_dir=DBT_PROJECT_DIR,
profiles_dir=DBT_PROJECT_DIR + "/profile",
select="tag:example",
use_build_command=True,
key_prefix="analytics",
source_key_prefix=["analytics", "example"],
node_info_to_group_fn=lambda _: "example",
)
Consequently, we had a bunch of asset groups defined by dbt’s tag
select statement. We then proceed to define jobs like this
example_assets_job = define_asset_job(
name="example_assets_job",
executor_def=in_process_executor,
selection=AssetSelection.groups("example"),
)
Since our project has grown significantly, we are migrating to use the load_assets_from_dbt_manifest
function. I also wanted to improve the way we define our assets so that we wouldn’t have to keep loading assets from manifest in various parts of our code. Ideally, I want to load all of our dbt assets under dbt_assets = load_assets_from_dbt_manifest(...)
and then define assets from that load.
I tried using a node_info_to_group_fn
that would select the schema or the node’s first tag as a grouping criteria, but this is not the same as defining an asset with the select tag
statement (using the model’s tag as a group_fn doesn’t guarantee which tag it’s grouping to as the one I want it to if the model has many tags).
I’m wondering if this approach is feasible/recommended or if I’m better off just defining multiple load_assets_from_dbt_manifest
in my project.owen
04/17/2023, 10:27 PMdef my_group_fn(node_info):
# if the model has a tag that I want to associate with a group, use that
for tag in ["tags", "that", "are", "groups"]:
if tag in node_info["tags"]:
return tag
# otherwise, use schema
return node_info["schema"]
dbt_assets = load_assets_from_dbt_manifest(..., node_info_to_group_fn=my_group_fn)
Dennis Gera
04/17/2023, 10:31 PMstg_assets_job = define_asset_job(
name="stg_assets_job",
executor_def=in_process_executor,
selection=AssetSelection.groups("staging"),
)
and
operations_assets_job = define_asset_job(
name="operations_assets_job",
executor_def=in_process_executor,
selection=AssetSelection.groups("operations"),
)
owen
04/17/2023, 10:32 PMDennis Gera
04/17/2023, 10:35 PMload_assets_from_dbt_manifest
?owen
04/17/2023, 10:36 PMload_assets_from_dbt_manifest(select="tag:foo")
and load_assets_from_dbt_manifest(select="tag:bar")
, then that model would be double-represented, which should cause an error when building your repositoryDennis Gera
04/17/2023, 10:38 PMDbtManifestAssetSelection
doesn’t load the assets from my manifest file, it just selects based on it while load_assets_from_dbt_manifest
loads everything once?dbt_assets = load_assets_from_dbt_manifest(…)
with multiple DbtManifestAssetSelection
owen
04/17/2023, 10:40 PMload_assets_from..
defines the assets that you want to exist ("I want to create assets for every dbt model with the tag 'bar', and put them in a group called 'bar'"), whereas the DbtManifestSelection selection says "of the assets that exist, I want the ones that correspond to dbt models with the tag 'bar'"Dennis Gera
04/17/2023, 10:41 PMowen
04/17/2023, 10:41 PMDennis Gera
04/18/2023, 2:51 PMDbtManifestAssetSelection
to select my dbt assets
example_dbt_assets = DbtManifestAssetSelection(
manifest_json=json.load(open(os.path.join(DBT_PROJECT_DIR, "target", "manifest.json"), encoding="utf-8")),
select="tag:example,tag:sources",
)
and then created a job using
example_assets_job = define_asset_job(
name="example_assets_job",
executor_def=in_process_executor,
selection=example_dbt_assets,
)
But in the UI I’m seeing an empty job pipeline. Any idea on why this is?
cc: @Tim Castilloassets/__init__.py
file and each DbtManifestAssetSelection
is in a assets/example/dbt.py
file. The jobs are then defined outside my assets
folder. Not sure if this influences in anythingowen
04/18/2023, 5:30 PM,
indicates the instersection between the two clauses, meaning "tag:example,tag:sources"
means "give me the intersection between the set of models with tag "example" and the set of models with tag "sources". If no model has both tags, this set will be empty. I think you want "tag:example tag:sources"
, which should give you the union between those setsDennis Gera
04/18/2023, 5:31 PMtag:example
and I got the same thing :blob_sad:owen
04/18/2023, 5:34 PMDennis Gera
04/18/2023, 5:37 PMowen
04/18/2023, 5:38 PMDennis Gera
04/18/2023, 5:39 PMdbt_assets = load_assets_from_dbt_manifest(
json.load(open(os.path.join(DBT_PROJECT_DIR, "target", "manifest.json"), encoding="utf-8")),
io_manager_key="io_manager",
key_prefix=["analytics"],
source_key_prefix=["analytics"],
node_info_to_group_fn=node_info_to_group_fn,
)
owen
04/18/2023, 5:47 PMfrom dagster_dbt.asset_utils import default_asset_key_fn
def hacky_asset_key_fn(node_info):
orig_asset_key = default_asset_key_fn(node_info)
return AssetKey(["analytics"] + orig_asset_key.path)
example_dbt_assets = DbtManifestAssetSelection(
manifest_json=json.load(open(os.path.join(DBT_PROJECT_DIR, "target", "manifest.json"), encoding="utf-8")),
select="tag:example,tag:sources",
node_info_to_asset_key=hacky_asset_key_fn
)
Dennis Gera
04/18/2023, 5:50 PMowen
04/18/2023, 6:13 PMDennis Gera
05/08/2023, 4:37 PMload_assets_from_dbt_manifest
and several DbtManifestAssetSelection
, I was able to successfully load the dagster repository and run jobs locally. However, when merged to production, I started getting the following error in job runs. All dbt assets appear to have loaded properly, buy this issue appeared specifically when a job was ran. Also, I was not able to reproduce this error when running this job locally. Any ideas on what could be the issue here?owen
05/08/2023, 7:30 PMmanifest.json
file is not a part of your docker image -- generally, we recommend putting a dbt compile
inside your dagster_cloud_post_install.sh script so that your docker image will contain an up-to-date copy of your manifest.jsonDennis Gera
05/09/2023, 12:35 PMmanifest.json
read from S3 instead of my docker image?
cc: @Gabriel Montañolaowen
05/09/2023, 6:23 PMGabriel Montañola
05/12/2023, 2:09 PMDennis Gera
05/15/2023, 9:29 PMmanifest.json
to our docker image in the dbt directory -> dbt/manifest.json.Gabriel Montañola
05/15/2023, 9:36 PMowen
05/15/2023, 9:40 PM.../dbt/target/manifest.json
instead of /dbt/manifest.json
.Gabriel Montañola
05/15/2023, 9:40 PMowen
05/15/2023, 9:41 PM