Hey there wasn t sure where to ask this quesiton We have bee dagster #deployment-kubernetes

Hey there! wasn't sure where to ask this quesiton....

Scott Hood

02/22/2023, 2:56 PM

Hey there! wasn't sure where to ask this quesiton... We have been noticing on our sensors that the ticks between sensors are way longer than the frequency specified. If we check every 30 seconds we are seeing gabs of 4 minutes between ticks. We have tried upgrading our dagster instance thinking it could of been due to the needed sql indexing changes, however we are still getting long gaps. We use the dagster helm chart to deploy our daemon, is there anything we need to setup or potentially allocate more resources to on the daemon valeus in order to improve our sensors reaction time?

daniel

02/22/2023, 3:14 PM

the first thing I would try here to improve this is to turn on the sensor threadpool that runs multiple sensors in parallel: https://github.com/dagster-io/dagster/blob/master/helm/dagster/values.yaml#L1084-L1089

Scott Hood

02/22/2023, 3:36 PM

Thanks, turned it on. Will be monitoring for the next few hours to see if I notice a difference.

Scott Hood

02/27/2023, 1:29 PM

Hey @daniel after setting the threading for both sensors and schedules looks like we haven't seen any improvements in the time between ticks. Any other suggestions?

daniel

02/27/2023, 1:41 PM

We may need to take a look at your daemon logs from a time period in question to have more specific suggestions here

Scott Hood

02/27/2023, 2:23 PM

Anything in particular, looking at the logs for the most part I only see a lot of "Checking for new runs of x sensor"

daniel

02/27/2023, 4:13 PM

If it's possible to share the full logs from a period of time where it was taking 4 minutes instead of 30 seconds, that might help us get to the bottom of why the threadpool isn't kicking in the way its supposed to (or a gap where things are taking longer than they're supposed to with no logs). I don't quite have enough information yet to recommend a specific thing to look for

daniel

02/27/2023, 4:33 PM

from your other post today about a relatively simple sensor timing out i'm wondering if the root cause may be your DB running slowly

Scott Hood

02/27/2023, 8:36 PM

Our DB does tend to run at 80% capacity a lot.... its an Azure postgres db running on 4 cores.... Not sure what the general sizing for dagster databases typically is.

Scott Hood

02/28/2023, 3:22 PM

@daniel is there any documentation on best practices for dagster database maintenance resource control?

daniel

02/28/2023, 3:23 PM

I don't think we currently have any docs like that, although it sounds very useful. The unsatisfying answer is that the size of the DB you need is likely to heavily depend on the amount of jobs/assets/sensors/etc. that are happening at once, but we could definitely create some rough guidelines

daniel

02/28/2023, 3:24 PM

The first thing i'd check if you're having latency issues is whether there are any particularly slow queries happening on the DB - i'm not sure what specifically azure offers but many of these managed DBs offer some perf monitoring tools with slowest queries / highest frequency queries / etc.

Scott Hood

02/28/2023, 3:26 PM

Will do, looks like we need to enable it so will come back if I see anything odd

Scott Hood

02/28/2023, 9:10 PM

@daniel after turning on our analytics for our database, I don't see anything that is anything more then a fraction of a millisecond.... Number of requests is slightly surprising but not sure all the things that end up querying the database in regards to dagster resources other than run status sensors.

daniel

02/28/2023, 9:11 PM

Got it - any luck with those daemon logs? if it's not the DB they'll help understand what's going on

Scott Hood

02/28/2023, 9:21 PM

Sent in DM

2 Views

Open in Slack

Previous Next