Hey! Im trying to integrate great expectations but...
# announcements
e
Hey! Im trying to integrate great expectations but the example is not really clear to me. Is there any other resources where I can check examples?
1
y
hi @Eduardo Santizo have you checked out the example docs: https://docs.dagster.io/integrations/great-expectations
we also have a blog post talking about the integration: https://dagster.io/blog/great-expectations-for-dagster
e
Hey @yuhan. Yes, I practically copied the example with my own suite and got the error: ‘DataTextConfig’ object has no attribute ‘validation_operators’. I wanted to check another example to see what I did wrong but theres nothing ☹️
m
hi eduardo, what version of great expectations are you on and can we see the full error?
e
Im using the ge 0.13.10
This is the error
m
it's possible that GE broke us with 0.13 -- can you try downgrading and see if the error persists?
pip uninstall great-expectations && pip install great-expectations--0.12.10
should do it
yep i think actually this broke in 0.13.10
would you mind +1ing this issue: https://github.com/dagster-io/dagster/issues/3319
e
After downgrading the pipeline breaks even faster
Sorry to ask such a dumb question but how do you one up the issue?
m
ah there should be a button to add a reaction but
if this is also broken on 0.12.x, can you open a new issue?
and do you have any code you feel comfortable sharing that we can use to reproduce it?
j
fyi dagster-ge==0.11.0 is working for me with great-expectations pinned to 0.12.10
e
I will try downgrading to dagster-ge==0.11.0 then. Currently I have the 0.11.1 version
@Jeff Hulbert Do you mind sharing a bit of your code so I can see how you implemented the integration?
j
Copy code
from dagster.core.definitions.decorators.composite_solid import composite_solid
from dagster.core.definitions.decorators.lambda_solid import lambda_solid
from dagster.utils import file_relative_path
from dagster import ModeDefinition. lambda_solid, solid, composite_solid
from dagster_ge import ge_validation_solid_factory
from dagster_ge.factory import ge_data_context

GE_PROJECT_DIR = file_relative_path(__file__, "../great_expectations")
GE_DATASOURCE_NAME = "pandasdata"

ge_data_context_conf = ge_data_context.configured({"ge_root_dir": GE_PROJECT_DIR})

local_mode = ModeDefinition(
    name="local",
    resource_defs={
        "ge_data_context": ge_data_context_conf,
    },
)

@lambda_solid
def continue_if_validated(df: DataFrame, expectation) -> DataFrame:
    if expectation["success"]:
        return df
    else:
        raise ValueError("GE Validation for dataset failed, see previous step")

def load_table_factory(table_name):
    check.str_param(table_name, "table_name")

    @composite_solid(
        name=f"load_table_{table_name}",
        config_schema={
            "table_name": Field(
                str,
                is_required=False,
                default_value=table_name,
            ),
        },
        config_fn=load_config,
    )
    def load_table_solid():
        """Download table, validate if great expectation suite exists, load it to database"""
        ge_suite_name = f"{table_name}.fail"
        validate_solid = ge_validation_solid_factory(
            name=f"validate_{table_name}",
            datasource_name=GE_DATASOURCE_NAME,
            suite_name=ge_suite_name,
        )
        continue_load_solid = continue_if_validated.alias(
            f"continue_if_validated_{table_name}"
        )
        df = download_table_solid()
        return load_df_to_db_solid(continue_load_solid(df, validate_solid(df)))

    return load_table_solid
grabbed pieces from a few different pieces of code - but this should give you an idea
that factory makes a composite solid that downloads the data, validates it with GE, then if its valid will load it
cut out some extraneous stuff so not sure if this snipped 100% works
e
Thanks a lot! I will try it ou
@max I was about to open a new issue, but after trying the code @Jeff Hulbert provided, I still got the same error. Just to be sure, I downgraded my dagster-ge package even further to the latest 0.10 version. Still, the same error about checkpoint_store_error. Apparently there is something wrong with my code and I'm not sure what it is
Copy code
# Path to the great expectations folder in the local directory
ge_project_dir = file_relative_path(__file__, "./great_expectations")

# Data source name in the expectation suite
ge_datasource_name = "test_datasource"

# Data context configuration of the root directory
ge_data_context_conf = ge_data_context.configured({"ge_root_dir": ge_project_dir})

# Basic mode definition for the great expectations data context configuration
basic_mode = ModeDefinition(
    name="basic",
    resource_defs={
        "ge_data_context": ge_data_context_conf,
    },
)

# Definition of the great expectations validation solid
ge_validate_CAMPR3 = ge_validation_solid_factory(
    name="ge_validation_solid",
    datasource_name=ge_datasource_name,
    suite_name="test_suite",
)

# Pipeline definition
@pipeline(
    # The following lines are needed for the great expectations integration
    mode_defs=[basic_mode],
)
def CAMPR3_pipeline():

    ids = retrieve_CAMP_ids()
    df = scrape_CAMPs(ids)
    validate_and_save(df, ge_validate_CAMPR3(df))
This is all the code related to the great expectations integration, but can't seem to find the error related to the "checkpoint_store_error"
m
does the string
checkpoint_store_name
appear in your validation suite?
my hunch is this is a version mismatch between the version of GE that created the validation suite and the version that's installed in your dagster environment
e
Yep, this string appears in my great_expectations.yml file: "checkpoint_store_name: checkpoint_store"
I will check if thats the case. brb
That was exactly the error! Thanks a lot @max. The 0.13.4 version seems to be broken, but 0.12.10 works.
💚 1
Thanks a lot!
m
happy to help!
@Dagster Bot docs mention that the version of GE used to create the expectations suite must match the version installed in the dagster environment
d
d
Hi, I'm wondering if at some point we will be able to use the latest version of GE with dagster. I tried to run the example and I receive the same error:
Copy code
AttributeError: 'DataContextConfig' object has no attribute 'validation_operators'
I have: great-expectations 0.13.30 dagster 0.12.4 dagster-ge 0.12.4 Seems that the GE 0.12.X doesn't have the V3 api
Copy code
great_expectations --v3-api init
Usage: great_expectations [OPTIONS] COMMAND [ARGS]...
Try 'great_expectations --help' for help.

Error: no such option: --v3-api
This is an issue for anyone following the GE tutorial. @max