Jakub Zgrzebnicki
08/17/2023, 9:12 AMAlex Orlovskyi
08/21/2023, 8:25 AMsandy
08/21/2023, 3:22 PMDagster allows you to define and execute data quality checks on your software-defined assets. Each asset check verifies some property of a data asset, e.g. that is has no null values in a particular column.
When viewing an asset in Dagster’s UI, you can see all of its checks, and whether they’ve passed, failed, or haven’t run. When launching a run to execute an asset, by default its checks will also be executed. Checks can also be executed on their own, independent of asset materializations.
By setting their severity level to ERROR, you can specify that your checks impact control flow, i.e. only materialize downstream assets if the checks on the upstream assets succeed.We would appreciate any and all feedback on the proposal. (This is a re-post of what @Alex Orlovskyi already discovered, with some additional context and an <!channel> mention for wider distribution).
johann
09/01/2023, 3:44 PMgeoHeil
09/06/2023, 4:23 PMrex
09/14/2023, 9:04 PMdagster-dbt>=0.20.13
, dbt tests can now be modeled as Dagster asset checks.
For more details and requirements for setup, see: https://github.com/dagster-io/dagster/discussions/16527.Zach P
09/14/2023, 9:22 PMjohann
09/18/2023, 6:48 PM@graph_asset
and @graph_multi_asset
. These can be used to implement checks that block materializations. (discussion)
To try them out: https://github.com/dagster-io/dagster/discussions/16266Brittany Kozura
09/25/2023, 6:32 PMBrittany Kozura
09/25/2023, 7:07 PMjohann
09/27/2023, 6:46 PMAssetCheckResult(success=…)
to AssetCheckResult(passed=…)
. We’re also going to make an internal change, so to use checks with Dagster Cloud you’ll need dagster version 1.5Jason
09/28/2023, 1:18 PMdagster._core.errors.DagsterInvalidDefinitionError: Duplicate check specs: {(AssetKey(['ANALYTICS', 'my_model_name']), 'assert_[some_name]_matches'): 2}
I'm able to find "my_model_name" in dbt, but I don't see any duplicate tests in schema file. I don't know how it's creating/getting the assert[...]
part and I tried searching for all and substrings in the dbt repo but still nothing. For now, I've disabled the checks in dbt_project.yml
inorder to get Dagster up and running again.
Any ideas?Duke
09/29/2023, 12:40 AMgeoHeil
10/03/2023, 8:45 AMHuy Nguyễn
10/06/2023, 9:58 AMDuke
10/06/2023, 1:22 PMdagster-dbt 0.21.1
we can now run dbt checks individually, is there a difference between yielding from build
vs. separating into run
and test
streams?Brittany Kozura
10/06/2023, 3:19 PMJeff Nawrocki
10/12/2023, 2:27 PM@asset(compute_kind="python")
def my_data(context) -> None:
data = wr.s3.read_parquet(cfg.s3_path)
connection = duckdb.connect(database=os.fspath(duckdb_database_path), read_only=False)
connection.execute("create schema if not exists database")
connection.execute(
"create or replace table database.my_data as select * from data"
)
@asset_check(asset=my_data, description="Check that my asset has enough rows")
def my_asset_has_enough_rows() -> AssetCheckResult:
connection = duckdb.connect(database=os.fspath(duckdb_database_path), read_only=False)
df = connection.execute("select * from database.my_data").fetchdf()
num_rows = df.shape[0]
return AssetCheckResult(passed=num_rows > 5, metadata={"num_rows": num_rows})
I read in a parquet file from S3 and load it into a duckdb file. My asset check queries the duckdb file and checks to see if num_rows > 5
. The issue is that the asset check errors with the following:
dagster._core.executor.child_process_executor.ChildProcessCrashException
Stack Trace:
File "/Users/.../virtualenvs/.../lib/python3.10/site-packages/dagster/_core/executor/multiprocess.py", line 247, in execute
event_or_none = next(step_iter)
File "/Users/.../virtualenvs/.../lib/python3.10/site-packages/dagster/_core/executor/multiprocess.py", line 357, in execute_step_out_of_process
for ret in execute_child_process_command(multiproc_ctx, command):
File "/Users/.../virtualenvs/.../lib/python3.10/site-packages/dagster/_core/executor/child_process_executor.py", line 174, in execute_child_process_command
raise ChildProcessCrashException(exit_code=process.exitcode)
It seems like this is caused because of the multiprocessing (running the asset check while executing downstream assets). My downstream assets are DBT assets that also query the duckdb. I saw that there is a way to include the asset check in the @asset declaration, but I like having them separate. Any thoughts? It would be nice to only begin materializing downstream assets until after the asset_check execution succeeds.
Note: this is only an issue if the asset_check queries the duckdb file. If I just pass return data
from the asset and pass it into the asset_check directly, the check works fine. So this is a working alternative but I like the idea that the asset_check could query the duckdb instead of passing dataframes around.Daniel
10/16/2023, 1:43 PMgeoHeil
10/16/2023, 2:12 PMBrendan Jackson
10/18/2023, 2:23 PMDagsterDbtTranslatorSettings(enable_asset_checks=True)
which seems to work fine - I can run asset checks from UI.
However, materialising the asset does not automatically trigger the asset checks. The asset body is just:
dbt.cli(["build"], context=context).stream()
but the asset checks on the GUI are show 'skipped' (and the checks are not run, based on the logs). What's going wrong?
Slack ConversationZach P
10/20/2023, 8:49 PMHuy Nguyễn
10/30/2023, 8:31 AMgeoHeil
11/01/2023, 8:11 AMJames Robinson
11/16/2023, 2:55 PMmulti_asset_check
functionality from the initial discussion has been implemented or cancelled? I can't find any reference to it outside of that post.Jeff Nawrocki
11/21/2023, 12:17 AMdefs = Definitions(
assets=[assets.my_asset],
asset_checks=[assets.my_asset_has_enough_rows],
)
However, it seems that when I use the function, load_assets_from_package_module
, it is not loading the asset check:
defs = Definitions(
assets=load_assets_from_package_module(source, group_name='test', key_prefix='test'),
asset_checks=[assets.my_asset_has_enough_rows],
)
In this example, assets.py
is in the sources subdirectory. All the assets show up, just no asset checks including DBT tests. Anyone run into this?Monde Sinxi
11/28/2023, 3:58 PM