Hello! Does anyone have any good recent examples o...
# community-showcase
j
Hello! Does anyone have any good recent examples of Great Expectations with Dagster? I haven't found much in the way of examples or documentation except for a blog post from 2020 here. GE output is pretty nice looking and I'd like to get some examples working with assets rather than ops.
r
I'm interested on it. Because it seems available with Ops. https://docs.dagster.io/integrations/great-expectations
But can't find a way to use it with asset.
t
Could you use it as an op inside a graph_asset?
s
Hey @Jeff Nawrocki - we're currently investigating changes that would make data quality checks a more first-class concept in Dagster, and thus allow displaying GE output more natively. We're still in the early exploration stage of this - if you'd be open to a call some time, I'd be curious to hear about your requriements
j
@sandy Sounds great! We currently use R's pointblank package to run data validations outside of our Dagster pipelines. It would be great to have a more intuitive way to use GE with SDAs though.
t
Its still early stage, but we are integrating Great Expectation as SDA via
asset_factory
. Each Expectation Suite is a (partitioned) data asset in dagster. Inside a custom GreatExpectationIOManager the data assets are materialized via GX
SimpleCheckpoint()
and GX validation results are added as run output metadata. One improvement which is still in progress, is the ability to add upstream dependency for the GX data assets like dbt models or raw files. As GX is in a different code location then our dbt projects,
AssetSelection()
does not work. (https://dagster.slack.com/archives/C01U954MEER/p1690985736107259) Right now the dependency is handled inside asset_factory with
deps=AssetKey()
where the AssetKey of the GX Suite just matches the one from the dbt model (without a prefix). Problem is, we load non-prod assets to prod deployments and other way round and can run into incosistent states. Trying to solve this issue by synchronizing the manifest.json from all our dbt projects to the GX code location and probably creating data assets based on a EnvVar for the specific environment. As this is all custom, we are interested in your investigation as well @sandy.