Curious on some feedback on an error when launching a large dagster #ask-community

Curious on some feedback on an error when launchin...

Cody Scott

07/12/2023, 6:21 PM

Curious on some feedback on an error when launching a large number of backfills. For some reason when we launch a number of backfills we get this error back on some of the runs. Its scattered throughout so no real consistency towards what might be the trigger behind the error. Currently, in development, we are running everything out of a devcontainer (with additional resources allocated) and saving everything to a postgres docker container running in the devcontainer. Could this be a resource issue tied to the setup, or might there be another source to the problem? Resources seems to be my first guess, but the irregularity of success/failure has me puzzled.

Copy code

dagster._core.errors.DagsterLaunchFailedError: Tried to start a run on a server after telling it to shut down

  File "/home/vscode/.local/lib/python3.11/site-packages/dagster/_daemon/run_coordinator/queued_run_coordinator_daemon.py", line 332, in _dequeue_run
    instance.run_launcher.launch_run(LaunchRunContext(dagster_run=run, workspace=workspace))
  File "/home/vscode/.local/lib/python3.11/site-packages/dagster/_core/launcher/default_run_launcher.py", line 127, in launch_run
    DefaultRunLauncher.launch_run_from_grpc_client(
  File "/home/vscode/.local/lib/python3.11/site-packages/dagster/_core/launcher/default_run_launcher.py", line 95, in launch_run_from_grpc_client
    raise (

Here is my deployment config as well. Very similar to the docker example.

Copy code

local_artifact_storage:
  module: dagster._core.storage.root
  class: LocalArtifactStorage
  config:
    base_dir: /workspaces/data-project/tmpxan393mf
run_storage:
  module: dagster_postgres.run_storage
  class: PostgresRunStorage
  config:
    postgres_db:
      db_name:
        env: DAGSTER_POSTGRES_DB
      hostname:
        env: DAGSTER_POSTGRES_HOST
      password:
        env: DAGSTER_POSTGRES_PASSWORD
      port: 5432
      username:
        env: DAGSTER_POSTGRES_USER
event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_db:
      db_name:
        env: DAGSTER_POSTGRES_DB
      hostname:
        env: DAGSTER_POSTGRES_HOST
      password:
        env: DAGSTER_POSTGRES_PASSWORD
      port: 5432
      username:
        env: DAGSTER_POSTGRES_USER
compute_logs:
  module: dagster._core.storage.local_compute_log_manager
  class: LocalComputeLogManager
  config:
    base_dir: /workspaces/data-project/tmpxan393mf/storage
schedule_storage:
  module: dagster_postgres.schedule_storage
  class: PostgresScheduleStorage
  config:
    postgres_db:
      db_name:
        env: DAGSTER_POSTGRES_DB
      hostname:
        env: DAGSTER_POSTGRES_HOST
      password:
        env: DAGSTER_POSTGRES_PASSWORD
      port: 5432
      username:
        env: DAGSTER_POSTGRES_USER
scheduler:
  module: dagster._core.scheduler
  class: DagsterDaemonScheduler
  config: {}
run_coordinator:
  module: dagster._core.run_coordinator
  class: QueuedRunCoordinator
  config:
    max_concurrent_runs: 20
run_launcher:
  module: dagster
  class: DefaultRunLauncher
  config: {}
telemetry:
  enabled: false

alex

07/14/2023, 9:36 PM

similar to the docker example

are you managing the code server containers or did you change set the workspace to target the python target directly (which means the daemon would manage its own copies of the code servers as subprocesses)?

Harry James

07/19/2023, 9:39 PM

Also seeing this error when launching a large number of backfills my dagster yaml simply has

Copy code

run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator

alex

07/19/2023, 9:46 PM

ok putting out a fix i expect may fix this - in the mean time you can set this bit of run_coordinator config

dequeue_use_threads: true

which should alleviate the issue and accelerate the throughput

❤️ 1

Cody Scott

07/19/2023, 10:23 PM

🎉

9 Views

Open in Slack

Previous Next