My jobs (which are supposed to spawn every 5 minut...
# ask-community
l
My jobs (which are supposed to spawn every 5 minutes and take ~1-3mins to complete) are stuck in “Starting” as far back as 3 days even though I have an autoscale GKE cluster I’m also seeing an extremely slow load then error when trying to go to this sensor page https://dagit.nautilusapp.xyz/workspace/example_repo@custom-example-user-code17/sensors/spawn_train_workers and this page https://dagit.nautilusapp.xyz/workspace/example_repo@custom-example-user-code17/sensors Do I need to beef up or repair my cluster? I’m still getting a bunch of completed runs even til now, so I know my sensor is functioning
1
although I am having trouble finding (via tags) all the jobs that I KNOW were supposed to have been spawned via sensor (in both the All Runs and In Progress page)
so I’m not confident that the sensor is fully functioning and issuing all the `RunRequest`s that it’s supposed to
a
DiskFull
it appears your database has consumed all the disk it has access to which would cause writes to fail putting the app in to this broken state you are observing
👍 1
l
gotcha, looks like this is the workload Noob question, but is this correct: I can beef this up by setting the
resources
for under
postgresql:
by following the conventions defined here https://github.com/helm/charts/blob/master/stable/postgresql/README.md?
Is there any downside to creating a separate persistent, managed postgres instance with Google Cloud SQL and pointing to that (vs configuring it to live in the GKE cluster via helm here)? it seems more stable and easier to upgrade separately if I have an external postgres instance hopefully latency isn’t a big factor since I’m using a regional cluster anyway. And this is what a typical “Managed Dagster Cloud” setup might look like anyways?
👍 1
a
I would recommend using a fully managed database if you are not comfortable with DB management https://cloud.google.com/blog/products/databases/to-run-or-not-to-run-a-database-on-kubernetes-what-to-consider
❤️ 1
l
ahh gotcha! thanks for the resource