https://dagster.io/ logo
#ask-community
Title
# ask-community
c

Caleb Overman

07/07/2023, 6:14 PM
hey all, i’m struggling to understand how to utilize
ReexecutionOptions
with a job run via a schedule…basically the job has a set of unreliable ops and i want a descendant op to always run even if something upstream fails…i’ve taken a look at the reexecution example but can’t quite wrap my head around it
🤖 1
j

jamie

07/07/2023, 6:19 PM
hey @Caleb Overman the reexecution system is for re-running failed op, not for continuing along with the op graph. Just want to confirm that what you’re looking for is a way to always execute a downstream op, even if the upstream op fails?
c

Caleb Overman

07/07/2023, 6:21 PM
yeah correct…i couldn’t find anything about continuing so thought reexecution might be an approach
j

jamie

07/07/2023, 6:23 PM
re-execution would be specifically for you to manually launch the job again and it would start from the first point of failure. So in your case, that would be the failed op. Would having retries on the unreliable ops be a solution? https://docs.dagster.io/concepts/ops-jobs-graphs/op-retries#op-retries
if you did that, basically if an op in the job failed it would get automatically retried according to the policy
c

Caleb Overman

07/07/2023, 6:28 PM
we do have retries…basically the ops that fail do so consistently due to some underlying libraries we’re still enhancing meaning a retry won’t succeed either…the ops are also dynamic so it’s not very straightforward to exclude certain ones
basically hoping to just allow the failed ops to fail and continue which would buy us some time to address the other issues
j

jamie

07/07/2023, 6:32 PM
yeah that makes sense
is it infeasible to put a try catch in the unreliable ops?
c

Caleb Overman

07/07/2023, 6:48 PM
really like that idea! unfortunately these ops are also k8s jobs that can fail unexpectedly and succeed on retry (hence having that setup) so catching the failure exception would prevent the retries we do use
we’re in the middle of migrating from airflow to dagster and just trying to get things running so we can shutdown airflow and clearly not following best practices yet 😂
j

jamie

07/07/2023, 6:51 PM
ok i see. this is tricky!
you could do the retry manually in the try/catch and then once it’s failed for real then continue on, Like
Copy code
@op 
def ureliable():
   try:
      flakey_function()
   except:
        try: 
           flakey_function()
        except:
            ...
might be able to put that in a loop too
c

Caleb Overman

07/07/2023, 6:54 PM
ooh i like it
j

jamie

07/07/2023, 6:54 PM
if the downstream op can run even if the upstream fails, is there just like an implicit ordering (B runs after A) rather than data being passed around (A returns a value that B needs)?
c

Caleb Overman

07/07/2023, 6:56 PM
kinda…basically the upstream ops build a partition parquet dataset by extracting data from another system then downstream we convert that entire parquet file to a tableau extract…so we want the tableau extract to get created even if one of the partitions fails
j

jamie

07/07/2023, 6:58 PM
ok - if it’s just a temp bandaid while you make the system more stable then i feel like the try catch thing should be fine. if it’s going to be more permanent then maybe there’s another combo of dagster concepts that’ll do it.
c

Caleb Overman

07/07/2023, 7:02 PM
agreed it’s probably the best approach for us right now…really appreciate the help!