Hi Dagsters from time to time under the `Deployment` in dagi dagster #ask-community

Hi Dagsters, from time to time under the `Deployme...

Sagit Dotan

06/18/2023, 12:44 PM

Hi Dagsters, from time to time under the

Deployment

in dagit UI -> `code locations`: some of my repositories are not in status

Loaded

with green color. I want to be noted when this case is happening, because that means the jobs under the errored repository won’t run. how can I create a monitor on code locations status not loaded? thanks!

Adam Bloom

06/18/2023, 2:10 PM

I would not recommend doing this from within dagster, but rather outside of dagster. If you do it from within (aka from a code location), then your monitor would also not run when the underlying issue is encountered. What monitoring tools are you already using in your organization?

➕ 1

Abhishek Agrawal

06/18/2023, 11:35 PM

I would like to know this too. We are a start-up so we may might not have any monitoring tools at all.

Adam Bloom

06/18/2023, 11:52 PM

I can find the configs I used if needed, but we already had the Prometheus stack in our environment. I’m using the black box exporter (https://github.com/prometheus/blackbox_exporter) to periodically probe the dagster graphql API to check the code location health. The exporter also supports grpc polling, so you could have it poll the code location deployment directly too. Prometheus/alertmanager have alerts configured if the code locations go unhealthy that will notify us. We have yet to have this happen outside of a time we were expecting it (we scale down the dagster daemon during maintenance sometimes, rather than pausing all of our sensors and schedules)

Abhishek Agrawal

06/18/2023, 11:56 PM

Thanks, I will give it a go. I was thinking to just make the API call to Dagster via a CloudRun service we have but maybe that's not the best idea?

Sagit Dotan

06/19/2023, 5:57 AM

We use Datadog for monitoring. @Adam Bloom how can you query the graphQL for code location health? can you share the query please

Adam Bloom

06/19/2023, 4:05 PM

you can access the graphql playground at <your dagit url>/graphql. Here's the query we use: ``````

Adam Bloom

06/19/2023, 4:06 PM

Copy code

query {workspaceOrError{... on Workspace {locationEntries {name locationOrLoadError {... on RepositoryLocation {name repositories {name} isReloadSupported} ... on PythonError {message}}}} ... on PythonError {message stack}}}

Adam Bloom

06/19/2023, 4:06 PM

our failure condition is getting

DagsterUserCodeUnreachableError

in the response

johann

06/21/2023, 5:21 PM

+1 to adam’s suggestions. In OSS dagster you’ll want to do the monitoring separately from Dagster, because the schedule you’re using to monitor it is also affected if the code location has an error. It’s worth mentioning that in Dagster Cloud we have native alerting, including for code location errors https://docs.dagster.io/dagster-cloud/managing-deployments/setting-up-alerts. On Cloud we’re running things separately from your code locations, so you’ll still get notified (unless our whole service goes down…)

Open in Slack

Previous Next