Hey folk :wave: Question regarding `dagster-pandas...
# announcements
c
Hey folk 👋 Question regarding `dagster-pandas`; it does not seem like there currently is support for the concept of Index in the pandas dataframe dagster type. Is there a workaround for this, and/or is this on the roadmap to add at some point? For context, I have an IO manager that outputs a dataframe to a RDMS table and uses the dataframe
index
to understand which column is the primary key (and uses it as the
merge
key). It looks like my only option right now would be to exclude that “column” from the pandas dagster type, though I would prefer to have the dagster validation on it as well (i.e.
non_nullable
,
unique
, plus the nice-to-have documentation!)
s
Hi Charles - I do not believe our current dagster-pandas package enables data validation on indexes. You make a convincing case that this would be useful - I filed an issue to track it here: https://github.com/dagster-io/dagster/issues/3814.
c
Awesome, thanks Sandy!
s
The best workaround that I can think of would be to create your own dagster type that adds your validation check on top of the existing ones. E.g. something like:
Copy code
MyPandasDataFrame = create_dagster_pandas_dataframe_type(...)

def validate_index(df):
    ...

MyPandasDataFrameWithIndexCheck = DagsterType(type_check_fn=lambda context, value: MyPandasDataFrame.type_check(context, value) and validate_index(value))
c
Ahh interesting! I actually had baked that validation step inside of my IO Manager which is just raising `ValueError`s. I like your approach better — thank you! 🙏