Is there a way to re-execute a run from failure vi...
# ask-community
c
Is there a way to re-execute a run from failure via the
run_failure_sensor
? I can identify the runs that I want to resume, but I'm not sure how to convert that into a
RunRequest
. Here's the code I have so far
Copy code
@run_failure_sensor
def run_execution_interrupted_sensor(context: RunFailureSensorContext):
    run_requests = []
    for event in context.get_step_failure_events():
        if event.event_specific_data.error_source == ErrorSource.INTERRUPT:
            ...
            run_requests.append(RunRequest(...))

    return SensorResult(run_requests=run_requests)
trying to build a smarter way to combat k8s killing run pods. Usually, this happens when GKE changes the nodes under the hood
j
Hi! I personally use graphql to launch runs from
run_failure_sensor
s
Hi Charlie, I don’t think it’s possible to “directly” reexecute from a run failure sensor, but I’ve reached out to the team for confirmation. To elaborate on what Jordan is saying, you could use the GQL python client as a workaround: https://docs.dagster.io/concepts/dagit/graphql-client#graphql-python-client
c
nice yeah I hadn't considered graphql
s
Hey Charlie, I’ve confirmed you can’t reexecute from failure via the sensor. However, you might be able to avoid the need for this with Run Retries
c
@sean should the graphql client work for dagster cloud?
I'm trying a simple
get_run_status
call and it's just hanging
changed some stuff and I'm seeing
Copy code
gql.transport.exceptions.TransportServerError: 401 Client Error: Unauthorized for url: <https://kipptaf.dagster.cloud/graphql>
here's my code
Copy code
@run_failure_sensor
def run_execution_interrupted_sensor(context: RunFailureSensorContext):
    client = DagsterGraphQLClient(hostname="kipptaf.dagster.cloud")

    for event in context.get_step_failure_events():
        run_status = client.get_run_status(event.logging_tags["run_id"])

        <http://context.log.info|context.log.info>(run_status)
@Jordan if you have any snippets you can share, that'd be appreciated
difference between the two is the port: • no port = 401 • port 3000 = hanging
j
I don't use the dagster cloud version so I can't really help you with that. My code looks like this:
Copy code
url = "<http://127.0.0.1:3000/graphql>"
payload = {"query": graphql_query, "variables": variables}
headers = {"Content-Type": "application/json"}
response = requests.request("POST", url=url, json=payload, headers=headers)
d
Hey @sean, so how to authorize the GraphQL client for Dagster Cloud? The docs don't seem to cover that.
s
d
Yes, thank you