reading through the docs it seems like solid dependencies ar dagster #announcements

reading through the docs, it seems like solid depe...

dwall

01/21/2020, 10:05 PM

reading through the docs, it seems like solid dependencies are very much centered around input/output dependencies. However, we have a few scenarios where the completion of a solid matters much more than the output. For example, a solid that needs to wait until another solids computation has completed, but doesn't need the output of said computation. Is there a pre-defined pattern for declaring dependencies between solids without needing to explicitly define inputs and outputs?

alex

01/21/2020, 10:07 PM

we created the

Nothing

type for this, its not perfect but it might be helpful in this situation

alex

01/21/2020, 10:08 PM

you can create a

Nothing

type input which can either depend on a

Nothing

type output or an existing real output on a previous solid

alex

01/21/2020, 10:09 PM

that input then doesn’t need to have an argument in on the decorated

solid

dwall

01/21/2020, 10:10 PM

@alex oh nice. Do you by chance have an example of this being used in practice?

alex

01/21/2020, 10:10 PM

https://github.com/dagster-io/dagster/blob/master/examples/dagster_examples/toys/many_events.py

alex

01/21/2020, 10:11 PM

https://github.com/dagster-io/dagster/issues/1861 for discussion on how this could be better

abhi

01/21/2020, 10:11 PM

To make it more semantic, you could also use a

COMPLETED

signal constant which just returns a specified int. You can define a custom dagster type to use it in your input/output definitions

dwall

01/21/2020, 10:13 PM

@alex cool - this issue is exactly what I was looking for

dwall

01/21/2020, 10:18 PM

back in Airflow world, we typically used Sensors for this. In Dagster world I guess it would look something like a solid that polls a process until a condition is evaluated to True

max

01/21/2020, 10:36 PM

@sashank ^^ re our discussion last week

schrockn

01/21/2020, 10:40 PM

@dwall would love to see the exact example that you need this for. The principled/pedantic push to stick with data dependencies is an idealistic view of the world that anytime there is no data dependency, it is probably some operational concern that should be captured by an abstraction or pattern within the system. In a purist end-state, any operational solid of this nature would be captured by an abstraction and be only emitted and managed by the system in the execution plan. In the interim, as noted in the thread, we can rely on the fact that data dependencies are a strict superset of execution dependencies, and we can express execution dependencies with the “Nothing” type. (Note: It’s clear we need to document this better) So with that context, it’s always to see, in a concrete way, exactly what you are trying to do so that it can inform future design decisions.

dwall

01/22/2020, 2:49 PM

@schrockn yeah, sure. An example of this that we are actively bumping up against is using solids to wrap dbt invocations. We are running a dbt rpc server and are creating solids to communicate with that server to trigger different dbt things (run, test, snapshot, etc.). We want to define a specific order of events for these dbt invocations (for example,

dbt run

first, then

dbt test

upon completion), but we don't necessarily care about the input and output of each of these processes. For example, we still want to run

dbt test

upon the completion of

dbt run

regardless of the "output" of

dbt run

dwall

01/22/2020, 2:49 PM

I think I'm seeing this exact use case in the

dagster-dbt

library for the

Nothing

output type that @alex mentioned above: https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-dbt/dagster_dbt/__init__.py#L107

schrockn

01/22/2020, 2:59 PM

@dwall yup that’s it!

schrockn

01/22/2020, 2:59 PM

btw the dagster-dbt library is of, um, dubious quality

schrockn

01/22/2020, 3:00 PM

i wrote it in a few hours many moons ago, so would love to see what you cook up!

schrockn

01/22/2020, 3:00 PM

I’d love feedback about how to make the Nothing thing more obvious. it might be just a case of naming.

dwall

01/22/2020, 8:37 PM

@schrockn I think I managed to get something working pretty well. Check out this snippet:

dbt_rpc_example_pipeline.py

dwall

01/22/2020, 8:40 PM

this pattern seems to allow for dbt run and dbt test to depend on solids that produce something (

dbt_rpc_run

and

dbt_rpc_test

) and also solids that produce nothing (

dbt_rpc_poll

)

Open in Slack

Previous Next