Does anyone have experience with running >1.000...
# deployment-kubernetes
j
Does anyone have experience with running >1.000 - 10.000 ops concurrently? We are working on a Dagster architecture where every 15 minutes we need to execute a large volume of ops (i.e. 1.000 - 10.000). We have Dagster deployed on Kubernetes (GCP) with the Helm chart and we're using the K8sRunLauncher such that all ops are handled concurrently where each op is executed in a dedicated Pod in Kubernetes. However, the with the current set-up this is not performant at all. Various issues we already notice when executing 500 ops concurrently: • The scaling of the workers (i.e. Pods) takes a long time --> the quickest Pods are available within 15 seconds, the slowest Pods take more than 10 minutes before they are available. • Running into DB connection issues, likely due to too many open connections (we use an external Postgresql database hosted on GCP) • The duration it takes to complete an op (workload of each op practically identical) grows from <15s to >15min Looking for advice on how to best engineer this. Posted in #dagster-support