Hi team I have a solid that is dependent on the execution of dagster #ask-community

Hi team - I have a solid that is dependent on the ...

Brien Gleason

09/03/2021, 3:32 PM

Hi team - I have a solid that is dependent on the execution of multiple upstream solids that each return

Nothing

. I am having issues calling a solid in the pipeline, despite defining the InputDefinition for each upstream dependency. I get this error:

Copy code

dagster.core.errors.DagsterInvalidDefinitionError: In @pipeline sample_example_pipeline, could not resolve input based on position at index 0 for invocation sample_example_table. Use keyword args instead, available inputs are: ['input_start1', 'input_start2']

Has anyone defined a solid that has multiple

Nothing

inputs and could provide an example?

Daniel Kim

09/03/2021, 5:19 PM

I happen to be working on a quick example for my coworker. I am using the experimental APIs though, but you can probably visit this migration guide to do the conversion to solids/pipeline. using_nothings.py:

Copy code

from dagster import graph, op, InputDefinition, Nothing


@op
def do_nothing1(context) -> Nothing:
    <http://context.log.info|context.log.info>('Just me doing nothing')


@op
def do_nothing2(context) -> Nothing:
    <http://context.log.info|context.log.info>("I'm also doing nothing")


@op(
    input_defs=[
        InputDefinition(name="Nothing1", dagster_type=Nothing),
        InputDefinition(name="Nothing2", dagster_type=Nothing),
    ]
)
def i_eat_nothings(context) -> str:
    <http://context.log.info|context.log.info>('I am taking in Nothings')

    return "I am done eating Nothings"


@op
def i_eat_strings(context, message: str) -> Nothing:
    <http://context.log.info|context.log.info>(f'Message: {message}')


@graph
def nothing_graph():
    my_message = i_eat_nothings(
        Nothing1=do_nothing1(),
        Nothing2=do_nothing2()
    )

    i_eat_strings(my_message)


my_nothing_job = nothing_graph.to_job()

export DAGSTER_HOME=<path_containing_dagster.yaml> dagit -f using_nothings.py --suppress-warnings

🙌 1

Vlad Dumitrascu

09/03/2021, 11:21 PM

Maybe I'm missing something. Sometimes I hate it when programmers do this, but I'll do it , too (hopefully you are doing it this way for some particular reason, and not just running into a fundamental miss-understanding of Dagster's paradigms ):

Copy code

Why are you doing it this way?

Reason I ask is: Dagster depends on Inputs and Outputs for flow execution and timing. So, in pipelines run in series and parallel, you depend on the output from the previous node to be filled in the next node. This is not just parameter passing in functions:

Copy code

result1 = function1()
result2 = function1(result1)

...but it affects TIMING. The second node(solid) will WAIT until it's input is filled to begin. So your coding in the pipeline definition is not a strict top-down function. Things will only start when their inputs are filled (either by outputs of other solids, or configs, etc). Consequences: 1. You should always return something in your solids...even just a True/False or 0/1 exit status. This means you can capture that and know that the node(solid) completed. 2. You can use this return status or boolean to start other nodes. This, then, becomes the TIMING for the next(or other) node to begin. You can adjust timing by requiring inputs to be filled before a node starts. So, lets say

solidA

needs 1 input and

solidB

needs 2 inputs, but

solidC

needs 3 inputs...you can make

solid3

wait until the first two are done...no matter where in the pipeline you define it...by requiring those inputs from A&B. Waiting is an important thing, especially when you start multiprocessing and things are run in parallel. Then, whatever CAN start WILL start, and whatever needs to WAIT, will WAIT...and the pipeline can run very efficiently, not waiting for unrelated things (in series, like above). 3. Also, you can have any number of un-used outputs. You don't have to connect them. This is useful for making nodes(solids) re-usable in different pipelines with different requirements. Makes them 'general purpose'. Basically,

Inputs

and

Outputs

are not to be avoided. If you just want a program to run in series, then just run a python script without Dagster...or put the entirety of the code inside ONE solid.

Inputs

and

Outputs

can and should be used for Timing Execution, and also for formatting (example, return the same data multiple ways...since Dagster enforces type). Make sense? Or am I missing something?

Vlad Dumitrascu

09/03/2021, 11:25 PM

Example: Output of these nodes affects the timing of the steps. You can re-use the outputs, and accumulate them to make things happen later in the pipeline.

Brien Gleason

09/07/2021, 12:44 PM

Thanks for the write-up, Vlad! I have a pipeline that looks similar to the one you shared where solid execution is dependent on the completion of upstream solids. The solids just run SQL scripts to create and/or update tables in our data warehouse, so we are yielding

AssetMaterialization

events from each solid, but not actually pulling that data back into Dagster. There are SQL scripts that are dependent on the creation of tables from previous solids that I am using the

Nothing

type to explicitly establish that dependency even though there is not data passed between the solids. Currently if a solid doesn't successfully complete the SQL script, the solid will fail and downstream solids will not execute, which is intended. We are capturing asset information and metadata in the

AssetMaterialization

event. What is the advantage of returning a boolean from the solid as opposed to using

Nothing

Vlad Dumitrascu

09/07/2021, 5:22 PM

Sounds like you've handled it for your needs. To answer your question about why return a

Boolean

vs a `Nothing`: I guess, it seems like it gives you more options, IMO. • On the executions timing front, it is probably the same as returning

Nothing

. The next solid, if it's waiting for a

Nothing

will start. But that assumes most of the solids you build will have a

Nothing

type input, which isn't likely. At least a

Boolean

is a common input, and probably more useful in a coding environment. So, from a re-usability standpoint, the

Nothing

is a dead end...only usable for this one pipeline. • Then there's the possibility of making some decision-making in the pipeline. I think this is the intent of using Dagstet for AI type things. So, classic branching based on returned

Booleans

can be achieved...since there are no IF statements, but you CAN make whole separate branches of solids execute based on returns(and stop execution of other branches for the False case). I guess you could argue the

Nothing

does some of the same things. But to my thinking, returning a

Boolean

is more about the next node and also code/solid re-use. It it like a function in a library returning null or None 100% of the time, on purpose. That's not an expected result for a function that returns. Also, if a library has a ton of utility functions, you'll probably not go back and re-write them...so you're stuck with what's there. Same with solids; you make them and then you're kinda stuck with them after a while. Might as well make them general purpose and really useful in other situations. Otherwise, you're kind of kicking the requirement to have

Nothing

type inputs down the road. Now it's like a requirement to use your library of solids for anyone (or your future self) because it's baked in. Just some thoughts. Cheers.

🙌 1

Kevin Haynes

11/03/2021, 10:40 PM

Just popping by to say this thread helped me immensely with what I'm doing right now. Thanks for taking the time to write all that up @Vlad Dumitrascu

Vlad Dumitrascu

11/11/2021, 6:25 PM

🙂👍Glad it helped. Pass it forward.

Open in Slack

Previous Next