https://dagster.io/ logo
d

Dimitris Stafylarakis

12/10/2020, 9:32 PM
hi all. Came across this issue regarding incremental re-execution, and was wondering how it is related to intermediates. Isn't the desired functionality described in the issue covered by them?
c

cat

12/10/2020, 9:44 PM
Thanks @Dimitris Stafylarakis for asking this question! CC-ing @sandy who can share more
s

sandy

12/10/2020, 9:50 PM
Hi @Dimitris Stafylarakis - intermediates enable runs to use objects as input that were produced as output by prior runs. If I understand what stratospark is asking for in that issue, it's the ability to have Dagster automatically decide which steps to run based on what objects exist. Which of those are you interested in?
d

Dimitris Stafylarakis

12/10/2020, 10:22 PM
not sure I follow entirely..
is it about re-using objects within the same pipeline (intermediates) versus using them in different pipelines?
s

sandy

12/11/2020, 12:07 AM
The difference is: how do we decide what steps to run? 1: The user directly tells Dagster which steps should run. 2: The user tells Dagster "decide which steps to run". Then Dagster looks on the filesystem and runs only the steps that don't correspond to files there. We support (1), but only have experimental support for (2). In both cases, steps can use the files on the filesystem as inputs, even if they came from different runs. Does that make sense?
d

Dimitris Stafylarakis

12/11/2020, 8:16 AM
yeah it's clear now 🙂 thanks for the insights!