sean
12/08/2020, 6:45 PMdagstermill
solids. In the docs, we see input defs defined like this:
k_means_iris = dm.define_dagstermill_solid(
"k_means_iris",
script_relative_path("iris-kmeans_2.ipynb"),
input_defs=[InputDefinition("path", str, description="Local path to the Iris dataset")],
)
This way of doing things requires duplicating input definitions/descriptions between the notebook itself and the solid definition call. Ideally the input definitions could be parsed from the special parameters
-tagged cell that you need to define anyway (some special comment formatting could maybe be used for the descriptions). A similar cell could be used for outputs.
Is anything like this possible now or planned? I would be willing to work on this if devs think it is a good idea but no one is working on it.max
12/08/2020, 7:25 PMsean
12/08/2020, 7:32 PM@solid(
input_defs=[
InputDefinition(name="a", dagster_type=int),
InputDefinition(name="b", dagster_type=int),
],
output_defs=[
OutputDefinition(name="sum", dagster_type=int),
OutputDefinition(name="difference", dagster_type=int),
],
)
def my_input_output_example_solid(context, a, b):
yield Output(a + b, output_name="sum")
yield Output(a - b, output_name="difference")
What I'm suggesting is two-fold:
• support a special cell corresponding to output definitions (there already is one for input definitions, the parameters
cell).
• optionally parse the contents of these special cells for the input/output definitions to be used in the solid declarationmax
12/08/2020, 8:53 PMdefine_dagstermill_solid
into that cellsean
12/08/2020, 9:40 PMdagstermill.define_dagstermill_solid
call, but then you are unnecessarily duplicating information.
And since dagstermill already constrains notebook structure (requirement of parameters cell if using inputs), why not provide a mechanism to infer input/output params based on further constraints (e.g. a special tagged output cell)?
Or, taking this idea further, why not provide facility to fully define the solid within a notebook, as in providing a specially tagged cell or set of cells where one somehow specifies all the info that goes into the @solid(...) decorator call?max
12/08/2020, 9:56 PMsean
12/08/2020, 9:57 PMextract_metadata
?) could be added to dm.define_dagstermill_solid
that takes a function reference, which gets passed the notebook object and should return a dictionary of solid
params. Then the user could use whatever documentation style they want and just provide this adapter function to dagstermill. This also has the advantage of adding very little complexity on the dagstermill
end.max
12/08/2020, 10:38 PMsean
12/08/2020, 11:33 PM