We are using dagster hybrid with a kubernetes agent Now fur dagster #ask-ai

We are using dagster hybrid with a kubernetes agen...

Joris Ganne

02/08/2024, 2:42 PM

We are using dagster hybrid with a kubernetes agent. Now, fur the first time ever, a run is hanging in the status queued. How can we fix this behaviour?

Scout

02/08/2024, 2:42 PM

When you're experiencing issues with runs hanging in the

QUEUED

status in a Dagster hybrid deployment with a Kubernetes agent, there are several steps you can take to troubleshoot and potentially resolve the issue: 1. Verify the Dagster daemon is set up and running: In the Dagster UI, navigate to Deployment > Daemons and verify that the daemon is running. The Run queue should also be running. If you used

dagster dev

to start the Dagster UI, the daemon should have been started for you. 2. Verify the Dagster daemon can access the same storage as the Dagster webserver process: Both the webserver process and the Dagster daemon should access the same storage, meaning they should use the same

dagster.yaml

. Locally, this means both processes should have the same set

DAGSTER_HOME

environment variable. 3. Troubleshoot the run queue configuration: If the daemon is running, runs may intentionally be left in the queue due to concurrency rules. To investigate, you can: - Check the output logged from the daemon process, as this will include skipped runs. - Check the

max_concurrent_runs

setting in your instance’s

dagster.yaml

. If set to

, this may block the queue. You can check this setting in the Dagster UI by navigating to Deployment > Configuration and locating the

run_queue.max_concurrent_runs

setting. 4. Check the state of your run queue: In some cases, the queue may be blocked by some number of in-progress runs. To view the status of your run queue, click Runs in the top navigation of the Dagster UI and then open the Queued and In Progress tabs. If there are queued or in-progress runs blocking the queue, you can terminate them to allow other runs to proceed. If you have gone through these steps and the issue persists, it may be beneficial to look into the specific configuration of your Kubernetes agent and ensure that it is correctly set up to handle the agent queue. Additionally, you can check the status of the Kubernetes pods to see if there are any issues at the infrastructure level that might be causing the runs to remain queued. For further assistance, you can refer to the [Dagster documentation](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines) on limiting concurrency in data pipelines, which includes information on troubleshooting runs that remain in the

QUEUED

status. If the problem continues, consider reaching out to the Dagster community or support channels with detailed information about your setup and the steps you've taken to troubleshoot.

2 Views

Open in Slack

Previous Next