Dane Linssen
12/22/2022, 12:10 PMdagster_dbt
. We load_assets_from_dbt_project()
and have all our dbt assets in Dagster. We can materialize (dbt) assets and everything works great! However we use K8sRunLauncher
so asset materialization runs in a separate (ephemeral) pod.
Each dbt asset materialization generates a manifest.json
, saved locally on the pod. We want to grab this file and store in on our s3 bucket, however since the pod is ephemeral it’s killed before we can retrieve the file.
What’s the best way to solve this?Pieter Custers
12/22/2022, 1:03 PMsean
12/22/2022, 5:43 PMowen
12/22/2022, 5:51 PMmanifest.json
file before deploying your code (i.e. have a dbt compile step as part of your Dockerfile, then use load_assets_from_dbt_manifest
instead of load_assets_from_dbt_project
). With basic usage of dbt, it's unlikely for the contents of the manifest to change in meaningful ways between runs as long as the project doesn't change. load_assets_from_dbt_manifest
is also significantly faster.
if the manifest.json file is part of the image, you could then set up a separate process (could even be another asset) that reads that manifest file (which will be local to the Docker image) and persists it to your s3 bucket.
if this workflow doesn't work for you, let me know! would be happy to talk about other alternatives (although they'd likely require a bit of hacking)Pieter Custers
12/23/2022, 7:53 AMWith basic usage of dbt, it’s unlikely for the contents of the manifest to change in meaningful ways between runs as long as the project doesn’t change.This is and important one, we didn’t realize it before. But generating the manifest.json at or before deploy time is not desirable / does not fit well in our cicd setup. So what we eventually did was
load_assets_from_dbt_project
and right after that upload the manifest.json to s3 once. So at dagster repo load time basically. Works well 🙂Dane Linssen
12/23/2022, 10:27 AM