Good morning! In the upgrade from dagster 0.11.14 ...
# ask-community
j
Good morning! In the upgrade from dagster 0.11.14 to 0.11.15, all of our composite solids broke and we are getting error messages like
Copy code
dagster.core.errors.DagsterInvalidDefinitionError: Input "df" in solid "generate_speed_solid" is not connected to the output of a previous solid and can not be loaded from configuration, creating an impossible to execute pipeline. Possible solutions are:
E                         * add a dagster_type_loader for the type "DataFrame"
E                         * connect "df" to the output of another solid
where the “generate_speed_solid” is one of the standard solids inside the composite solid. I have tested the individual solids and they work in a pipeline outside of the composite solid. We had been using type casting in our function definitions instead of explicitly calling InputDefinition and OutputDefinition in the solid wrappers. I can’t seem to find the documentation about how this needs to change. Can someone point me to the right resource? Thanks!
a
cc @yuhan can you provide some examples of how your composite solids are set up?
j
@composite_solid(
config_fn=config_path_workflow,
config_schema={
"id_col": Columns.ENTITY_ID.field,
"wkt_col": Columns.POINT_WKT.field,
"time_col": Columns.TIME_LONG.field,
},
)
def generate_basic_path_features_workflow(df: SparkDataFrame) -> SparkDataFrame:
"""
Composite Solid that takes raw dataframes and generates basic speed, bearing and path segmentation features.
Required Columns:
id_col
wkt_col
time_col
Args:
df (SparkDataFrame): Spark DataFrame with a wkt point column containing lon,lat information
Returns:
SparkDataFrame
"""
return generate_path_segments_solid(
df=generate_bearing_solid(df=generate_speed_solid(df=df))
)
a
are you setting these inputs via
run_config
for the pipeline?
j
Yes
a
I’m having trouble reproducing, would you be comfortable sending a debug file (from the … menu in the runs page in dagit or
dagster debug export <run id>
) or sending some more details
We had been using type casting in our function definitions instead of explicitly calling InputDefinition and OutputDefinition in the solid wrappers.
are you saying that you made some change in addition to migrating from 11.14 -> 11.15?
j
No, we didn’t change anything except upgrade the library in the background
a
How long have you been using dagster? How did you upgrade the library? We included some stricter peer dependencies in this release so my current hypothesis is that you were on a very old version of
dagster-pyspark
until recently and are experiencing breaking changes that happened with a past major release
The core of the issue is that
DataFrame
from your error message is no longer a
DagsterType
but just a regular python type, that we don’t how to load from config. Old versions of the library would globally map the python to the dagster type automatically. The current library needs an explicit
make_python_type_usable_as_dagster_type
call to register that mapping, or for the python type hint to be the
DataFrame
imported from the
dagster-
library
d
hey @alex, We've been using dagster for a few months and like it quite a bit. Our library updated as part of our nightly build process through conda, that was the only change on our master branch last night.
j
I’ll work on pulling out a MWE and our conda environment file so that we can give that to you all.
d
We had been using a custom dagster type in the input definitions for some of our solids. We'd then have a standard python type hint within the call signature. Are you saying we need to do a strong definition of input and output definitions on all of the solids?
fwiw, I'm almost 100% sure we weren't using dagster-pyspark
j
We are not - I could try that though after I get the MWE. We had been using the type hinting since the API docs made it sound like the type hinting was equivalent to the explicit Input/Output Definition: https://docs.dagster.io/_apidocs/solids
a
We had been using a custom dagster type in the input definitions for some of our solids
Ah ok, so the
InputDefinition
with the
DagsterType
was providing the
loader
which is how the system knows how to make an instance of that object from config. When you removed those
InputDefinitions
you removed that piece, resulting in the observed error.
Root input managers are the proposed replacement for how that functionality gets defined, so that you do not have to provide 2 different types / the same type twice https://docs.dagster.io/_apidocs/io-managers#root-input-managers-experimental
To get things working again, you can either add back the
InputDefinition
s with your
DagsterType
s , or register your custom dagster type for that python type globally using
make_python_type_usable_as_dagster_type
https://docs.dagster.io/concepts/types#patterns
j
OK, I made a basic example that shows exactly the problem we are having. I followed the pattern given on the documentation that you referred me to. We’re using dagster 0.11.15, pandas>=1.1.3, pyspark 3.02, and python 3.8. I believe that these two pipelines should be equivalent, but when we invoke with a composite solid instead of the same list of individual solids, we get the error.
a
thanks for sending that repro, we’ll get this fixed for the next release
j
Thanks @alex! We really appreciate it. The composite solids have been a lifesaver for us, and we are storing several canned data preprocessing workflows as composite solids. For my own edification, would explicitly defining the dagster type loader solve the problem or is it more complicated than that?
a
I commented on https://dagster.phacility.com/D8365 which was the diff that went out last week introducing the bug you found. Some options to proceed til the fix is out: * pin back to
0.11.14
* add a dummy loader to
DSparkDataFrame
to workaround the bug
j
Yup - we pinned our repo for now. Thanks!
a
thank you for your patience - i somehow danced around triggering the bug in my attempted repros causing me to speculate all the suggestions above
y
have a fix out for review: https://dagster.phacility.com/D8583 - sorry for the thrash!
e
I didn't fully understand the setup above because I just skimmed through the above thread but we're also seeing the above issue where • previously working pipelines broke from 0.11.14 -> 0.11.15 • with an identical message • we use composite solids • we have Dagster types defined and output definitions • in the inputs of downstream solids we rely on type hints where we refer to the defined dagster type if you also agree that it was caused by the same issue, could you please let me know when you expect to release the fix? (just so we can decide whether to wait for next version or workaround) thank you! D
a
sounds like the same issue, fix should go out in the release on Thursday
thankyou 1
🙌 1
y
the fix was landed this morning so it should go out on this thursday
yay 3
j
Thank you!