Hi, I have a graph/job that starts with an op that...
# ask-community
a
Hi, I have a graph/job that starts with an op that fans out an arbitrary number of dynamic outputs and executes some downstream ops on each before collecting the results . Some of the branches fail due to external reasons (they are API calls), while most of them succeeds. I wanted to re-execute the failed branches only from right after the fan-out, but got
dagster._core.errors.DagsterExecutionStepNotFoundError: Can not build subset plan from unknown steps:...
error listing the errored steps in the form of
graph_name.op_name[mapping_key].
I’m confused, cause I can see the outputs logged in the fan-out step, and I expected the rerun to pick those up. Am I missing something here?
c
This is using the re-execute from failure functionality? If so, that’s definitely unexpected.
a
Yes. I'm planning to create a shareable example sometime this week to investgate. Maybe it's my code or setup what makes this fail.
But at least I know it should work.
a
hi! @Andras Somi we want to do the same in our setup, did you succeed in the end?
a
@Arsenii Poriadin My dynamic op works like a charm. I never ran into the issue again after I finalized the code (the API I'm querying seems to be pretty reliable and the payloads are small)
a
so no need for the rock-solid re-execution mechanics then? 😄
can I ask you to take a look on my question here what do you think about it?
a
I have only used op retries. With our scale it’s perfectly fine to just go to Dagit and check why a scheduled job failed and rerun it manually, if necessary. Your approach sounds reasonable though.
🌈 1