Hi Team, We configured multiprocess_executor with ...
# ask-community
m
Hi Team, We configured multiprocess_executor with max_concurrent: 4,using S3 for IO Management and in helm configured K8s-Runlauncher. whenever we are executing pipeline, It's taking time (2-3mints) to start and to complete execution of the tasks also taking more time with multiprocess_executor and we disabled k8s-Runlauher and increased re-sources of the user code deployment pods + dagit pod as well, still no use. So any thoughts on this please.... Using Dagster-Version : 0.12.14, helm_chart: user code deployment.
j
I assume the delay is for the K8sRunLauncher to create a K8s Job
Increasing cluster resources may improve that startup time.
How did you disable the K8sRunLauncher in the Helm chart? It's the default
m
Yes, in helm chart in configmap-instance.yaml. then it's using default one.
Yes, so much delay happened to create k8s-job then we tried without k8s. still same issue happened. so increasing cluster memory is the solution?
j
That's a K8s tuning question, I can't really say without access to the cluster. One common source of latency is pulling docker images to the node
m
If we pulled docker image to the node. we can handle this issue? any document on this please.
j
Once it's pulled once it's cached on the node. Minimizing pod start time is really a K8s tuning question, it's a bit outside Dagster’s control https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
You're right though that if you switch to DefaultRunLauncher, it will use the standing grpc server so you won't have startup latency. I suspect you needed to restart Dagit /the daemon to pick up your change to the instance configmap
m
Yes I restarted dagit/daemon. it's pick the DefaultRunLauncher and I increased resources of the user code deployments pods too.. but same delay happened. for now wanna avoid k8s-runlauncher. so if any other solution it will very helpful for us.
j
What are the events in the event log during the delay
m
When it was delay no event logs found to debug..
j
You could share a debug file for the run
m
sure.
j
I’ll take a look when I get a chance!
m
Sure @johann thank you 🙂
Hi.. @johann Did you get the chance to look into it ^.
j
Chatting with the team about it- the delay is between when the run launcher creates a new subprocess for the run, and when that subprocess logs for the first time
👍 1
Some possible sources of latency are • if your system is under load and doesn’t start the process immediately • once the process starts, it has to connect to connect to the DB/do other initialization tasks I’ll also call out that it’s harder for us to debug an old version of Dagster, if upgrading is a possibility. If you want to look into it yourself, putting a print statement here would help isolate if the latency is in process spin up or Dagster initialization
thank you box 1
m
We have 10 repos but we are facing this issue only in one repo. In remaining repos parallel processing working as like we expected. configuration same in all the repos, then we increased resources for that repo pod, described the pod and checked the logs everything look good but we are not able to understand what exactly happening with that specific repo.