Arsenii Poriadin05/09/2023, 5:29 PM
The default run launcher does each run in a subprocess on the gRPC server, so there's a risk of, when there are a high number of runs happening at once with no run queue in place, that the server gets overloaded.
A couple of things I would suggest here to mitigate this:...we are spinning k8s pods for each run but we don't create new pods for each ops but, we have a dynamic graph which can grow with ops pretty fast with the service usage growth it makes us to launch a lot of subprocesses inside of one pod and it seems to kill it with OOM is there any way to have some kind of queue which can help us to limit the number of subprocesses we can launch at a time?
sean05/09/2023, 9:29 PM
will limit the number of concurrent ops within a run, but I don’t think it will have any effect on the max number of processes arising from different runs
Arsenii Poriadin05/10/2023, 9:30 AM