Dennis Gera
05/22/2023, 8:46 PMDbtManifestAssetSelection
that allows us to select only the models that have fresh data. In dbt CLI we do this using the source_status method (source_status:fresher+
). However, this method requires having the source.json
artifact for comparison. We want to know what would be the recommended way of creating, storing and then retrieving this source.json
file given that creating it only for the docker image is not a viable option (our data would be stale pretty quickly).
We thought about generating the sources.json
file and then storing it to the pod's temp memory and comparing it to our prod source.json
file in s3. Is this possible or is there a better way to do this?
@owen @Gabriel Montañolaowen
05/23/2023, 12:25 AMDbtManifestAssetSelection
might not be the ideal way to go about this, as AssetSelections in general should generally be static once your code is deployed (i.e. they should not resolve differently based on anything other than code changes).
Another potential way of going about this might be to take advantage of the (not-yet-released) @dbt_assets
decorator. It'll go out in this week's release, but essentially it allows you to write whatever compute function you want for your dbt assets, rather than relying on the prebuilt function that we provide.
In short, you could do something along the lines of:
@dbt_assets(manifest=my_manifest)
def my_dbt_assets(context: OpExecutionContext, dbt: DbtClient):
# get an up-to-date view of which sources are fresh
dbt.cli(["source freshness"]).stream()
# now just execute the ones with this status
yield from dbt.cli(["run", "--select", "source_status:fresher+"]).stream()
Dennis Gera
05/23/2023, 12:25 PMdbt_assets
decorator works and trying it out on our project.
I'll then advise what solution we come up with