Hey I'm working with the new `@dbt_assets` syntax ...
# integration-dbt
m
Hey I'm working with the new
@dbt_assets
syntax and I have to provide path to manifest. Before migration to new syntax I was using
load_assets_from_dbt_project
and manifest.json was in gitignore and not commited to repo. With the new syntax I have to commit the manifest and after few days of using it like this it is not ideal and manifest always causes merge conflicts. Is there any plan to add similar functionality like
load_assets_from_dbt_project
to new
@dbt_assets
decorator ? I'm thinking of reimplementing
_load_manifest_for_project
and running it before
@dbt_assets
. I only have like 70 models. Is there any obvious disadvantage to this approach?
plus1 1
d
following. but another way is to save the
manifest.json
file in a different location (S3 etc.)
m
Good tip, I like the simplicity of
load_assets_from_dbt_project
It runs
dbt ls
on each deploy and creates manifest for it self without dependency to any external service.
r
In
dagster-dbt project scaffold
, we emulate the behavior of
load_assets_from_dbt_project
by creating the manifest at runtime. You can see the implementation here. • We use the presence of an env var
DAGSTER_DBT_PARSE_PROJECT_ON_LOAD
to determine whether to build a manifest. • This allows for workflows in development where you run
DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev
locally so that you can work while simultaneously updating your dbt and dagster code. And in production, you can produce the manifest in your build system, when you deploy your code. You don’t need to commit it to your repository. • We recommend using a precompiled manifest because it is more efficient. Otherwise, everytime the code location loads, or whenever a Dagster job is kicked off to materialize your assets, you have to compile your manifest. Depending on the scale of your project, this could be an expensive operation. • By compiling it in the build system, you just do this compilation operation once for your deployed code.
n
We're not yet using the new DBT API, but we've been having much better performances on job loading times and less loading errors by using
load_assets_from_dbt_manifest
compared to
load_assets_from_dbt_project
. • Our DBT project is an independent Git project. Its
target
directory, that receives the manifest, is git-ignored. • It is included in our Dagster git project as a git submodule. • If DBT models are updated, we create a new release of the DBT project and the Dagster project with its updated submodule. • Our Dagster project deployment/installation script runs
dbt ls
to generate the manifest in the
target
folder of the submodule, which remains ignored. The only difficulty was to find the correct git commands to update the git submodules
git submodule sync && git submodule update --init --force --recursive
, but since this finding (based on heavy archeological and experimental work) it has been working perfectly.
👀 2