https://dagster.io/ logo
Title
b

Bernardo Cortez

12/21/2021, 5:02 PM
Hi everyone. I am trying to write an op with an optional input of type dagster_pyspark.types.DataFrame. I implemented this entrance in the ins dictionary as
'optional_input': In(DataFrame, default_value=None)
. This works when I pass a DataFrame to the op. However, if I pass no input, it raise this error:
DagsterTypeCheckDidNotPass: Type check failed for step input "optional_input" - expected type "PySparkDataFrame". Description: Value of type <class 'NoneType'> failed type check for Dagster type PySparkDataFrame, expected value to be of Python type DataFrame.
Besides, if I implement it as
'optional_input': In(Any, default_value=None)
, it would raise an error when passing a DataFrame to the op. Can someone help me? Thanks!
o

owen

12/21/2021, 6:18 PM
Hi @Bernardo Cortez -- what error were you seeing for the
In(Any, ...)
solution? It's surprising to me that this would cause a problem. In any case, you should be able to do:
from typing import Optional

@op(ins={"optional_input": In(Optional[DataFrame], default_value=None)})
def my_op(optional_input):
   ...
b

Bernardo Cortez

12/22/2021, 1:35 PM
The 
In(Any, ...)
  solution raise the following error:
CheckError: Failure condition: Inputs of type <dagster.core.types.dagster_type._Any object at 0x7f18b6472e80> not supported. Please specify a valid type for this input either in the solid signature or on the corresponding InputDefinition.
o

owen

12/22/2021, 6:07 PM
ah interesting -- you can also leave the Dagster type blank for that sort of case (which will default it to not doing type checking)
b

Bernardo Cortez

12/23/2021, 12:37 PM
Well, if I do not specify a data type, I get the following error:
CheckError: Failure condition: Inputs of type <dagster.core.types.dagster_type._Any object at 0x7efe99ec9e80> not supported. Please specify a valid type for this input either in the solid signature or on the corresponding InputDefinition.
o

owen

12/23/2021, 3:25 PM
Oh I believe that's an issue that's coming from the IOManager code. It looks like you're using the Parquet IOManager from the hacker news example, which only accepts two specific input types on the input definition (pyspark dataframe and string). You'll need to modify the example code to get this setup working.
🙌 1
If you only ever expect to load the input as a pyspark dataframe, you can replace the load_input function on that IOManager with something that doesn't look at the dagster type of the input definition and just always loads as a pyspark dataframe
b

Bernardo Cortez

12/27/2021, 10:00 AM
Yes! Thank you. This was the root of the problem!