The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

image.png

Hi, we had a failed job because engine could not find resources to spin up for about 4-5 minutes
Can we do something to avoid this?

Hey <@U04RNUWG0UQ> what agent are you using? depending on the agent type i can recommend a few ways to make it more unlikely to fail to start

serverless agent
metadata of agent:
```{
  "image_tag": "6522cfca-f55f630b",
  "version": "1.3.14rc2",
  "type": "ServerlessUserCodeLauncher"
}```

and a question: would retry policy on an op help? would it work if no resources were spinned up? :sweat_smile:

ah in serverless a sensor that triggers on run failures or retry policy on the job would help