reading through the docs, it seems like solid depe...
# announcements
d
reading through the docs, it seems like solid dependencies are very much centered around input/output dependencies. However, we have a few scenarios where the completion of a solid matters much more than the output. For example, a solid that needs to wait until another solids computation has completed, but doesn't need the output of said computation. Is there a pre-defined pattern for declaring dependencies between solids without needing to explicitly define inputs and outputs?
a
we created the
Nothing
type for this, its not perfect but it might be helpful in this situation
you can create a
Nothing
type input which can either depend on a
Nothing
type output or an existing real output on a previous solid
that input then doesn’t need to have an argument in on the decorated
solid
fn
d
@alex oh nice. Do you by chance have an example of this being used in practice?
https://github.com/dagster-io/dagster/issues/1861 for discussion on how this could be better
a
To make it more semantic, you could also use a
COMPLETED
signal constant which just returns a specified int. You can define a custom dagster type to use it in your input/output definitions
d
@alex cool - this issue is exactly what I was looking for
back in Airflow world, we typically used Sensors for this. In Dagster world I guess it would look something like a solid that polls a process until a condition is evaluated to True
m
@sashank ^^ re our discussion last week
s
@dwall would love to see the exact example that you need this for. The principled/pedantic push to stick with data dependencies is an idealistic view of the world that anytime there is no data dependency, it is probably some operational concern that should be captured by an abstraction or pattern within the system. In a purist end-state, any operational solid of this nature would be captured by an abstraction and be only emitted and managed by the system in the execution plan. In the interim, as noted in the thread, we can rely on the fact that data dependencies are a strict superset of execution dependencies, and we can express execution dependencies with the “Nothing” type. (Note: It’s clear we need to document this better) So with that context, it’s always to see, in a concrete way, exactly what you are trying to do so that it can inform future design decisions.
d
@schrockn yeah, sure. An example of this that we are actively bumping up against is using solids to wrap dbt invocations. We are running a dbt rpc server and are creating solids to communicate with that server to trigger different dbt things (run, test, snapshot, etc.). We want to define a specific order of events for these dbt invocations (for example,
dbt run
first, then
dbt test
upon completion), but we don't necessarily care about the input and output of each of these processes. For example, we still want to run
dbt test
upon the completion of
dbt run
regardless of the "output" of
dbt run
I think I'm seeing this exact use case in the
dagster-dbt
library for the
Nothing
output type that @alex mentioned above: https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-dbt/dagster_dbt/__init__.py#L107
s
@dwall yup that’s it!
btw the dagster-dbt library is of, um, dubious quality
i wrote it in a few hours many moons ago, so would love to see what you cook up!
I’d love feedback about how to make the Nothing thing more obvious. it might be just a case of naming.
d
@schrockn I think I managed to get something working pretty well. Check out this snippet:
this pattern seems to allow for dbt run and dbt test to depend on solids that produce something (
dbt_rpc_run
and
dbt_rpc_test
) and also solids that produce nothing (
dbt_rpc_poll
)