Yogic Wahyu
05/29/2023, 2:39 AMYogic Wahyu
05/29/2023, 2:40 AMYogic Wahyu
05/29/2023, 2:42 AMtelemetry:
enabled: false
run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
username: xxxxx
password: xxxxxx
hostname: x.x.x.x
db_name: xxxx
port: xxxx
event_log_storage:
module: dagster_postgres.event_log
class: PostgresEventLogStorage
config:
postgres_db:
username: xxxxx
password: xxxxxx
hostname: x.x.x.x
db_name: xxxxxx
port: xxx
schedule_storage:
module: dagster_postgres.schedule_storage
class: PostgresScheduleStorage
config:
postgres_db:
username: xxxxxx
password: xxxxxx
hostname: x.x.x.x
db_name: xxxxx
port: xxxx
run_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator
config:
max_concurrent_runs: 30
tag_concurrency_limits:
- key: "dagster/priority"
limit: 15
- key: "job"
value: "airbyte"
limit: 5
- key: "kind"
value: "airbyte"
limit: 5
- key: "job"
value: "dbt_asset_job"
limit: 1
retention:
schedule:
purge_after_days: 60 # sets retention policy for schedule ticks of all types (in latest 60 days)
sensor:
purge_after_days:
skipped: 7
failure: 30
success: 60 # keep success ticks in latest 60 days
daniel
05/30/2023, 6:50 PMYogic Wahyu
06/11/2023, 1:02 PM<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1686486246.742005053","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1686486246.742003835","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
>
Unable to connect to gRPC server: 0
Possible root cause,
1. Daemon doesn’t send the heartbeat in timeout interval (default 60 seconds), therefore the code_server (gRPC) is shut down due to timeout. https://docs.dagster.io/deployment/dagster-instance#grpc-servers
2. Reach resource limit, somehow it is restarted but not gracefully and some components are failed to spin up (I looked up in the status logs journalctl/systemd, I remembered that we started to spin dagster 1 months ago, but it started 1 weeks ago, which means it restarted automatically? unit config is attached below).
3. There is hidden mechanism of sync between every component (dagit, dagster gRPC, daemon) which produce exhaustive operation such as recalling the code server every load or certain process (I am not sure about this but I found something related with this in latest bugfix, https://docs.dagster.io/changelog#bugfixes).
[Unit]
Description=Dagster Daemon Service
Wants=network-online.target
After=network-online.target
[Service]
User=ubuntu
Group=ubuntu
Type=simple
Environment="DAGSTER_HOME=/opt/dagster/dagster_home"
WorkingDirectory=/opt/dagster/repo_sync
ExecStart=dagster-daemon run
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Yogic Wahyu
06/11/2023, 1:09 PMdaniel
06/12/2023, 5:51 PM