trying to use pandera-dagster for validation and k...
# ask-community
k
trying to use pandera-dagster for validation and keep running into this error (🧵 )
🤖 1
Copy code
dagster._core.errors.DagsterTypeCheckDidNotPass: Type check failed for op "generate_fct_table" output "result" - expected type "CreatedCollections". Description: Unexpected error during validation: Data type 'dbdate' not understood by Engine.
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 273, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 369, in core_dagster_event_sequence_for_step
    for user_event in check.generator(
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 90, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 192, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn):
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 161, in _yield_compute_results
    for event in iterate_with_context(
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 445, in iterate_with_context
    next_output = next(iterator)
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 124, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 118, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
  File "/Users/kevin/zora/zora-backend/data/zora/dags/derived_tables/graph.py", line 66, in build_table
    data = generate_fct_table()
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/definitions/op_definition.py", line 482, in __call__
    return op_invocation_result(self, context, *args, **kwargs)
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/definitions/op_invocation.py", line 183, in op_invocation_result
    return _type_check_output_wrapper(op_def, result, bound_context)
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/definitions/op_invocation.py", line 425, in _type_check_output_wrapper
    return _type_check_function_output(op_def, result, context)
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/definitions/op_invocation.py", line 435, in _type_check_function_output
    _type_check_output(output_defs_by_name[event.output_name], event, context)
  File "/Users/kevin/.pyenv/versions/3.9.12/envs/default/lib/python3.9/site-packages/dagster/_core/definitions/op_invocation.py", line 458, in _type_check_output
    raise DagsterTypeCheckDidNotPass(
here's my pandera schema
Copy code
class CreatedCollections(pa.SchemaModel):
    """Created collections"""

    ds: pd.Timestamp = pa.Field(description="datestamp of creation")
    chain: str = pa.Field(description="The chain a collection is created on")
    creates: int = pa.Field(ge=1, description="Number of creates")
    wallet_address: str = pa.Field(description="wallet_address")
the datestamp column is problematic - its being read from bigquery as date but not sure what validation I need to add here
s
Hey Kevin, IIUC the
ds
column of the pandas dataframe you are returning is not actually a
pd.Timestamp
, but is instead a
dbdate
-- is that right? (I’m a little confused by this because I don’t think
dbdate
is a pandas type?) Can you convert the column to
pd.Timestamp
before returning the dataframe?
k
yeah, I just ended up setting
coerce=True
and it worked fine
thanks!
any idea when you will be improving the integration btw pandera and dagster or introducing native data validation like this ?
s
any idea when you will be improving the integration btw pandera and dagster
We don’t have anything planned atm but if there are features you’d like to see or bugs to fix, you should open an issue
introducing native data validation
There is a feature we’re currently calling “Asset Expectations” on the roadmap for the next several months. It will be a framework for running arbitrary data quality checks on assets. I’m not sure how much the functionality will overlap with dagster-pandera.