Hi Dagsters, from time to time under the `Deployme...
# ask-community
s
Hi Dagsters, from time to time under the
Deployment
in dagit UI -> `code locations`: some of my repositories are not in status
Loaded
with green color. I want to be noted when this case is happening, because that means the jobs under the errored repository won’t run. how can I create a monitor on code locations status not loaded? thanks!
a
I would not recommend doing this from within dagster, but rather outside of dagster. If you do it from within (aka from a code location), then your monitor would also not run when the underlying issue is encountered. What monitoring tools are you already using in your organization?
1
a
I would like to know this too. We are a start-up so we may might not have any monitoring tools at all.
a
I can find the configs I used if needed, but we already had the Prometheus stack in our environment. I’m using the black box exporter (https://github.com/prometheus/blackbox_exporter) to periodically probe the dagster graphql API to check the code location health. The exporter also supports grpc polling, so you could have it poll the code location deployment directly too. Prometheus/alertmanager have alerts configured if the code locations go unhealthy that will notify us. We have yet to have this happen outside of a time we were expecting it (we scale down the dagster daemon during maintenance sometimes, rather than pausing all of our sensors and schedules)
a
Thanks, I will give it a go. I was thinking to just make the API call to Dagster via a CloudRun service we have but maybe that's not the best idea?
s
We use Datadog for monitoring. @Adam Bloom how can you query the graphQL for code location health? can you share the query please
a
you can access the graphql playground at <your dagit url>/graphql. Here's the query we use: ``````
Copy code
query {workspaceOrError{... on Workspace {locationEntries {name locationOrLoadError {... on RepositoryLocation {name repositories {name} isReloadSupported} ... on PythonError {message}}}} ... on PythonError {message stack}}}
our failure condition is getting
DagsterUserCodeUnreachableError
in the response
j
+1 to adam’s suggestions. In OSS dagster you’ll want to do the monitoring separately from Dagster, because the schedule you’re using to monitor it is also affected if the code location has an error. It’s worth mentioning that in Dagster Cloud we have native alerting, including for code location errors https://docs.dagster.io/dagster-cloud/managing-deployments/setting-up-alerts. On Cloud we’re running things separately from your code locations, so you’ll still get notified (unless our whole service goes down…)