Hi all Does anyone knows if theres a way to make Dagster ign dagster #ask-community

Hi all! Does anyone knows if theres a way to make ...

May Bohadana

03/28/2023, 8:51 AM

Hi all! Does anyone knows if theres a way to make Dagster "ignore" failed ops? meaning that the op will end up in failure status (as well of the job of course) but the other ops won't be affected by that? our use case is ETL job that running few parallel ops, and if one op is failing we don't want to fail the other ops, even if the job takes a long time. thanks!

🤖 1

sean

03/28/2023, 2:54 PM

Hi May, I believe this is already the case except for ops downstream of the failing op. You can confirm with this (alpha, delta execute successfully but beta fails):

Copy code

import time

from dagster import Definitions, job, op


@op
def alpha():
    time.sleep(5)
    return 1


@op
def beta():
    raise Exception()


@op
def delta(x):
    return x + 1


@job
def foo_job():
    delta(alpha())
    beta()


defs = Definitions(jobs=[foo_job])

May Bohadana

03/28/2023, 3:45 PM

Hey Sean thank you for your answer! I understood that in some cases the run manger is might not uploading new ops (even if they are parallel) in case of a failure in one of the ops in the job. You know anything about it? specificity in out case we have hard coded limit for the number of parallel ops- meaning that if we sometimes encounter a case that two ops were spoused to be parallel but actually running in some time gap, and in this case if the run manger indeed not uploading new ops in case of failure- this will be a problem for us

sean

03/28/2023, 4:34 PM

Is this something you’ve actually observed or are just worried about? I’m not 100% up to date on this but I believe any limitations on parallelism shouldn’t matter. For instance, in the above example snippet

delta

is launched well after

beta

already failed (because

alpha

sleeps for 5 seconds)

May Bohadana

03/28/2023, 5:17 PM

personally I have not experienced that issue- but our jobs currently not that long and I also think 5 sec won't be enough (but Im not that familiar with dagster architecture). I don't know if it's matter, but we are using k8s_job_executor as an executor to our ops.

sean

03/28/2023, 5:47 PM

The 5 seconds above is arbitrary-- if there are still ops in the queue (that are not being skipped due to upstream failure) they should run. But to confirm, it sounds to me like you haven’t actually encountered a problem yet? If that’s the case, I think you’re good to go, Dagster will only cancel ops downstream of failed ops.

May Bohadana

03/29/2023, 8:04 AM

yes- we haven't actually had this issue but got warned by it. I will give it try as it and if something will go wrong- I will write you guys again, Thank you for all your help! dagster bot responded by community

18 Views

Open in Slack

Previous Next