*Best Practice Question regarding type-hints with ...
# announcements
s
Best Practice Question regarding type-hints with DataFrames: Can you derive a
DagsterDataType
from a
PysparkDataFrame
. I have a generic solid
load_delta_table_to_df
, but in my Pipeline I'd like to type-check that the returned DataFrame has certain columns (not always the same see example attached). I try to achieve that with custom DagsterType
NpsDataFrame
and
TagDataFrame
in my pipeline (see attachment), but that will not show the type in Dagit. How could I use a generic solid but returning different typed DataFrames? I'd like to see NpsDataFrame and TagDataFrame instead of generic PySparkDataFrame. Any best practices? Or should I add an additional parameter to
load_delta_table_to_df
where I define the output DataFrame? Thanks a lot guys!
a
ya we can’t interpret type hints at the invocation site to modify the definition. One way you could solve the problem is to make a solid factory that takes the new name and the expected type as arguments and sets the
output_defs
https://docs.dagster.io/overview/solids-pipelines/solid-factories#main
s
good idea, thank you alex for the hint. I will try this!
Works quite well so far 🙂
Not sure if I’m trying to hard to define all Types, but I’m hoping to catch error early as possible with this approach. Let’s see how it goes 😉
m
Having just been bitten (again!) by untyped/validated dataframes in a pipeline I’m sure you won’t regret this investment 😀 (just because it’s called HistoryAPI doesn’t mean it won’t return dates in the future...)
s
ayyy, good to know! Thanks David for encouraging me 😅👍 Ups, yes the dates are a topic of its own 😉