Hi Team We configured multiprocess executor with max concurr dagster #ask-community

Hi Team, We configured multiprocess_executor with ...

Mohammad Nazeeruddin

04/06/2022, 11:36 AM

Hi Team, We configured multiprocess_executor with max_concurrent: 4,using S3 for IO Management and in helm configured K8s-Runlauncher. whenever we are executing pipeline, It's taking time (2-3mints) to start and to complete execution of the tasks also taking more time with multiprocess_executor and we disabled k8s-Runlauher and increased re-sources of the user code deployment pods + dagit pod as well, still no use. So any thoughts on this please.... Using Dagster-Version : 0.12.14, helm_chart: user code deployment.

johann

04/06/2022, 12:41 PM

I assume the delay is for the K8sRunLauncher to create a K8s Job

johann

04/06/2022, 12:42 PM

Increasing cluster resources may improve that startup time.

johann

04/06/2022, 12:43 PM

How did you disable the K8sRunLauncher in the Helm chart? It's the default

Mohammad Nazeeruddin

04/06/2022, 12:48 PM

Yes, in helm chart in configmap-instance.yaml. then it's using default one.

Mohammad Nazeeruddin

04/06/2022, 12:55 PM

Yes, so much delay happened to create k8s-job then we tried without k8s. still same issue happened. so increasing cluster memory is the solution?

johann

04/06/2022, 1:02 PM

That's a K8s tuning question, I can't really say without access to the cluster. One common source of latency is pulling docker images to the node

Mohammad Nazeeruddin

04/06/2022, 1:10 PM

If we pulled docker image to the node. we can handle this issue? any document on this please.

johann

04/06/2022, 1:15 PM

Once it's pulled once it's cached on the node. Minimizing pod start time is really a K8s tuning question, it's a bit outside Dagster’s control https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md

johann

04/06/2022, 1:17 PM

You're right though that if you switch to DefaultRunLauncher, it will use the standing grpc server so you won't have startup latency. I suspect you needed to restart Dagit /the daemon to pick up your change to the instance configmap

Mohammad Nazeeruddin

04/06/2022, 1:22 PM

Yes I restarted dagit/daemon. it's pick the DefaultRunLauncher and I increased resources of the user code deployments pods too.. but same delay happened. for now wanna avoid k8s-runlauncher. so if any other solution it will very helpful for us.

johann

04/06/2022, 1:23 PM

What are the events in the event log during the delay

Mohammad Nazeeruddin

04/06/2022, 1:26 PM

When it was delay no event logs found to debug..

johann

04/06/2022, 1:38 PM

You could share a debug file for the run

Mohammad Nazeeruddin

04/06/2022, 1:38 PM

sure.

Mohammad Nazeeruddin

04/06/2022, 1:40 PM

cadcd6bb-0f46-4129-ac32-7b7fabd1bbba.gzip

johann

04/06/2022, 3:02 PM

I’ll take a look when I get a chance!

Mohammad Nazeeruddin

04/06/2022, 4:08 PM

Sure @johann thank you 🙂

Mohammad Nazeeruddin

04/08/2022, 4:08 AM

Hi.. @johann Did you get the chance to look into it ^.

johann

04/08/2022, 3:18 PM

Chatting with the team about it- the delay is between when the run launcher creates a new subprocess for the run, and when that subprocess logs for the first time

👍 1

johann

04/08/2022, 3:23 PM

Some possible sources of latency are • if your system is under load and doesn’t start the process immediately • once the process starts, it has to connect to connect to the DB/do other initialization tasks I’ll also call out that it’s harder for us to debug an old version of Dagster, if upgrading is a possibility. If you want to look into it yourself, putting a print statement here would help isolate if the latency is in process spin up or Dagster initialization

thank you box 1

Mohammad Nazeeruddin

04/11/2022, 1:23 PM

We have 10 repos but we are facing this issue only in one repo. In remaining repos parallel processing working as like we expected. configuration same in all the repos, then we increased resources for that repo pod, described the pod and checked the logs everything look good but we are not able to understand what exactly happening with that specific repo.

3 Views

Open in Slack

Previous Next