Hi team - I have a solid that is dependent on the ...
# ask-community
b
Hi team - I have a solid that is dependent on the execution of multiple upstream solids that each return
Nothing
. I am having issues calling a solid in the pipeline, despite defining the InputDefinition for each upstream dependency. I get this error:
Copy code
dagster.core.errors.DagsterInvalidDefinitionError: In @pipeline sample_example_pipeline, could not resolve input based on position at index 0 for invocation sample_example_table. Use keyword args instead, available inputs are: ['input_start1', 'input_start2']
Has anyone defined a solid that has multiple
Nothing
inputs and could provide an example?
d
I happen to be working on a quick example for my coworker. I am using the experimental APIs though, but you can probably visit this migration guide to do the conversion to solids/pipeline. using_nothings.py:
Copy code
from dagster import graph, op, InputDefinition, Nothing


@op
def do_nothing1(context) -> Nothing:
    <http://context.log.info|context.log.info>('Just me doing nothing')


@op
def do_nothing2(context) -> Nothing:
    <http://context.log.info|context.log.info>("I'm also doing nothing")


@op(
    input_defs=[
        InputDefinition(name="Nothing1", dagster_type=Nothing),
        InputDefinition(name="Nothing2", dagster_type=Nothing),
    ]
)
def i_eat_nothings(context) -> str:
    <http://context.log.info|context.log.info>('I am taking in Nothings')

    return "I am done eating Nothings"


@op
def i_eat_strings(context, message: str) -> Nothing:
    <http://context.log.info|context.log.info>(f'Message: {message}')


@graph
def nothing_graph():
    my_message = i_eat_nothings(
        Nothing1=do_nothing1(),
        Nothing2=do_nothing2()
    )

    i_eat_strings(my_message)


my_nothing_job = nothing_graph.to_job()
export DAGSTER_HOME=<path_containing_dagster.yaml> dagit -f using_nothings.py --suppress-warnings
šŸ™Œ 1
v
Maybe I'm missing something. Sometimes I hate it when programmers do this, but I'll do it , too (hopefully you are doing it this way for some particular reason, and not just running into a fundamental miss-understanding of Dagster's paradigms ):
Copy code
Why are you doing it this way?
Reason I ask is: Dagster depends on Inputs and Outputs for flow execution and timing. So, in pipelines run in series and parallel, you depend on the output from the previous node to be filled in the next node. This is not just parameter passing in functions:
Copy code
result1 = function1()
result2 = function1(result1)
...but it affects TIMING. The second node(solid) will WAIT until it's input is filled to begin. So your coding in the pipeline definition is not a strict top-down function. Things will only start when their inputs are filled (either by outputs of other solids, or configs, etc). Consequences: 1. You should always return something in your solids...even just a True/False or 0/1 exit status. This means you can capture that and know that the node(solid) completed. 2. You can use this return status or boolean to start other nodes. This, then, becomes the TIMING for the next(or other) node to begin. You can adjust timing by requiring inputs to be filled before a node starts. So, lets say
solidA
needs 1 input and
solidB
needs 2 inputs, but
solidC
needs 3 inputs...you can make
solid3
wait until the first two are done...no matter where in the pipeline you define it...by requiring those inputs from A&B. Waiting is an important thing, especially when you start multiprocessing and things are run in parallel. Then, whatever CAN start WILL start, and whatever needs to WAIT, will WAIT...and the pipeline can run very efficiently, not waiting for unrelated things (in series, like above). 3. Also, you can have any number of un-used outputs. You don't have to connect them. This is useful for making nodes(solids) re-usable in different pipelines with different requirements. Makes them 'general purpose'. Basically,
Inputs
and
Outputs
are not to be avoided. If you just want a program to run in series, then just run a python script without Dagster...or put the entirety of the code inside ONE solid.
Inputs
and
Outputs
can and should be used for Timing Execution, and also for formatting (example, return the same data multiple ways...since Dagster enforces type). Make sense? Or am I missing something?
Example: Output of these nodes affects the timing of the steps. You can re-use the outputs, and accumulate them to make things happen later in the pipeline.
b
Thanks for the write-up, Vlad! I have a pipeline that looks similar to the one you shared where solid execution is dependent on the completion of upstream solids. The solids just run SQL scripts to create and/or update tables in our data warehouse, so we are yielding
AssetMaterialization
events from each solid, but not actually pulling that data back into Dagster. There are SQL scripts that are dependent on the creation of tables from previous solids that I am using the
Nothing
type to explicitly establish that dependency even though there is not data passed between the solids. Currently if a solid doesn't successfully complete the SQL script, the solid will fail and downstream solids will not execute, which is intended. We are capturing asset information and metadata in the
AssetMaterialization
event. What is the advantage of returning a boolean from the solid as opposed to using
Nothing
?
v
Sounds like you've handled it for your needs. To answer your question about why return a
Boolean
vs a `Nothing`: I guess, it seems like it gives you more options, IMO. ā€¢ On the executions timing front, it is probably the same as returning
Nothing
. The next solid, if it's waiting for a
Nothing
will start. But that assumes most of the solids you build will have a
Nothing
type input, which isn't likely. At least a
Boolean
is a common input, and probably more useful in a coding environment. So, from a re-usability standpoint, the
Nothing
is a dead end...only usable for this one pipeline. ā€¢ Then there's the possibility of making some decision-making in the pipeline. I think this is the intent of using Dagstet for AI type things. So, classic branching based on returned
Booleans
can be achieved...since there are no IF statements, but you CAN make whole separate branches of solids execute based on returns(and stop execution of other branches for the False case). I guess you could argue the
Nothing
does some of the same things. But to my thinking, returning a
Boolean
is more about the next node and also code/solid re-use. It it like a function in a library returning null or None 100% of the time, on purpose. That's not an expected result for a function that returns. Also, if a library has a ton of utility functions, you'll probably not go back and re-write them...so you're stuck with what's there. Same with solids; you make them and then you're kinda stuck with them after a while. Might as well make them general purpose and really useful in other situations. Otherwise, you're kind of kicking the requirement to have
Nothing
type inputs down the road. Now it's like a requirement to use your library of solids for anyone (or your future self) because it's baked in. Just some thoughts. Cheers.
šŸ™Œ 1
k
Just popping by to say this thread helped me immensely with what I'm doing right now. Thanks for taking the time to write all that up @Vlad Dumitrascu
v
šŸ™‚šŸ‘Glad it helped. Pass it forward.