sephi06/23/2020, 8:22 AM
we are wondering what is the optimal way to use cache in a nested dagster pipeline. Currently we are running with
(version 2.3) with
with a Cloudera distribution (we are running without a dagster storage config ) . Our pipeline consists of
that have dependencies between them. The
are processing the data in various ways, including saving the data as an intermediate steps. We notices that adding
prevents some steps to be recalculated. What is the best practice to include the cache into the solids?
sandy06/23/2020, 3:15 PM
at the end of solids whose outputs will be consumed by multiple downstream solids. does that answer your question?
sephi06/24/2020, 6:55 AM