Nicolas Guary
02/29/2024, 3:13 PMScout
02/29/2024, 3:13 PMSTEP_WORKER_STARTED
and RESOURCE_INIT_STARTED
. This could indicate a slow resource initialization. - Monitor Kubernetes Resources: Use kubectl describe
to check the status of the pods and nodes. Look for events that indicate insufficient resources or failed scheduling attempts. - Review Dagster Daemon Status: In the Dagster UI, navigate to Deployment > Daemons and verify that the daemon is running and healthy. - Check for Errors: Review the logs for any errors that might indicate why the steps are delayed. This could include errors during resource initialization or issues with subprocess creation. - Adjust Kubernetes Requests: If the issue is related to insufficient resources, consider increasing the CPU and memory requests for the run worker pods. - Run Monitoring: Utilize Dagster's run monitoring features to detect and handle cases where the run worker crashes or gets stuck. If after these checks and adjustments the issue persists, it may be necessary to delve deeper into the specific configuration of your Dagster deployment and Kubernetes cluster. You might also want to check for any recent changes in your infrastructure or Dagster configuration that could have introduced the delay. For more detailed guidance and potential solutions, you can refer to the Dagster documentation and consider reaching out to the community or support channels if the problem continues to be challenging to diagnose and resolve.