i m observing something weird interesting a while ago i aske dagster #announcements

i'm observing something weird/interesting. a while...

alir

07/21/2020, 3:56 PM

i'm observing something weird/interesting. a while ago, i asked about launching pipelines from other pipelines and I ended up issuing a

POST

to the

/graphql

endpoint. I'm on 0.8.1 now and it looks like if I issue multiple

POST

requests at the same time (around 5 or so), then the server goes into some deadlock-ish state and refuses to accept any more connections. Dagit becomes completely unusable. But if I issue pipeline launch requests using websockets to the same endpoint, I have no issues. Does this sound like something that's unique to my setup or does it sound like there's a problem somewhere?

alir

07/21/2020, 4:08 PM

Or actually, maybe I don't even understand what I'm seeing.

alir

07/21/2020, 4:08 PM

Is the graphql server single-threaded?

max

07/21/2020, 4:14 PM

i think dagit runs exclusively over websockets so am not shocked that the POST endpoint has issues

alir

07/21/2020, 5:10 PM

ah i was using

dagster-graphql

as guidance when I initially wrote this, since it too uses POST to /graphql.

alir

07/21/2020, 5:11 PM

is there anything to know about how many requests I can issue at a time to dagit, whether websockets or POST?

alir

07/21/2020, 5:12 PM

is there something underneath dagit (like flask?) that doesn't behave so well when I issue a large number of requests at the same time?

alir

07/21/2020, 5:17 PM

because even if I use websockets and start issuing a bunch of requests to

/graphql

, the dagit UI becomes unresponsive until all

/graphql

requests are ack-ed. I don't mind that if my pipeline launch requests takes a while but I'd like the UI still be responsive. I'm trying to think of a good workaround but nothing I come up with seems satisfactory

alir

07/21/2020, 5:24 PM

I assumed that if I issue

LAUNCH_PIPELINE_EXECUTION_MUTATION

/graphql

, dagit would ack the request in a few ms and move on. But for some reason that I don't yet understand, it takes seconds to ack the request. Given that the server processes just one request at a time, my hypothesis is that the delays from several

LAUNCH_PIPELINE_EXECUTION_MUTATION

requests add up, causing the dagit UI to freeze up

alir

07/21/2020, 5:37 PM

maybe i can just circumvent all of this with multiple dagit instances, and dedicate one to the UI and the rest for just the

/graphql

endpoint

alex

07/22/2020, 1:53 PM

do you have a run launcher configured on your instance or are you using the default?

alex

07/22/2020, 1:54 PM

if default, the pipeline executions will happen in a subprocess on the dagit machine, so if you constrain against the number of CPUs you may see things start to grind to a halt

alir

07/22/2020, 1:54 PM

i have a run launcher and the pipeline executions run in celery

alex

07/22/2020, 2:00 PM

I have a run launcher

how does your run launcher work?

alex

07/22/2020, 2:00 PM

or rather which one are you using

alir

07/22/2020, 2:05 PM

sorry i misspoke. I use the default run launcher. I confused the executors and the run launcher.

👍 1

alex

07/22/2020, 2:12 PM

what run / event storage are you using? postgres? That may be the resource under contention as well

alir

07/22/2020, 2:13 PM

yep, postgres for run, event log, and schedule storage. although i guess the schedule storage part is less intensive

alex

07/22/2020, 2:13 PM

one workaround you could consider is staggering the launches by sleeping a random smallish amount of time

alir

07/22/2020, 2:13 PM

that makes sense though

alex

07/22/2020, 2:14 PM

I believe we would need profiler results to debug further to see what exact resource was under contention causing your problem

alir

07/22/2020, 2:16 PM

yea, i tried to see if staggering would work and tested it with some bash scripts that issue CURL requests. I'm not really sure what the right amount is in my use case because the number of pipelines I execute depend on how many inputs I get.

alir

07/22/2020, 2:16 PM

I'll try to run it under some profiler and see what happens

alir

07/22/2020, 2:16 PM

meanwhile, do you see anything bad happening if I go down the path of having multiple dagit instances?

alex

07/22/2020, 2:18 PM

no problems i can predict, I guess if its still locking up after you do that its likely a postgres contention issue

alir

07/22/2020, 2:19 PM

I guess I can rule also test it by temporarily switching to filesystem run and event storage and then re-running the tests

alex

07/22/2020, 2:22 PM

ya sqlite will have its own issues if you hammer it simultaneously so beware of that

Open in Slack

Previous Next