Charlie Bini
09/06/2022, 2:27 PMdagster._core.errors.DagsterInvariantViolationError: Unresolved ExecutionStep "load_destination[?]" is resolved by "compose_queries" which is not part of the current step selection
File "/root/app/__pypackages__/3.10/lib/dagster/_grpc/impl.py", line 404, in get_external_execution_plan_snapshot
create_execution_plan(
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/api.py", line 1005, in create_execution_plan
return ExecutionPlan.build(
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/plan.py", line 1023, in build
return plan_builder.build()
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/plan.py", line 238, in build
plan = plan.build_subset_plan(
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/plan.py", line 814, in build_subset_plan
executable_map, resolvable_map = _compute_step_maps(
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/plan.py", line 1449, in _compute_step_maps
raise DagsterInvariantViolationError(
Charlie Bini
09/06/2022, 2:28 PMdagster._core.errors.DagsterExecutionInterruptedError
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/execute_plan.py", line 224, in dagster_event_sequence_for_step
for step_event in check.generator(step_events):
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/execute_step.py", line 319, in core_dagster_event_sequence_for_step
for event_or_input_value in ensure_gen(
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/inputs.py", line 501, in load_input_object
yield from _load_input_with_input_manager(input_manager, load_input_context)
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/inputs.py", line 857, in _load_input_with_input_manager
with solid_execution_error_boundary(
File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/root/app/__pypackages__/3.10/lib/dagster/_core/execution/plan/utils.py", line 41, in solid_execution_error_boundary
with raise_execution_interrupts():
File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/root/app/__pypackages__/3.10/lib/dagster/_core/errors.py", line 150, in raise_execution_interrupts
with raise_interrupts_as(DagsterExecutionInterruptedError):
File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/root/app/__pypackages__/3.10/lib/dagster/_utils/interrupts.py", line 85, in raise_interrupts_as
raise error_cls()
Charlie Bini
09/06/2022, 2:28 PMCharlie Bini
09/06/2022, 3:42 PM{
"insertId": "rh0qp1f7mk9do",
"jsonPayload": {
"involvedObject": {
"uid": "56418917-e257-4a94-b4da-68ced239cfd1",
"kind": "Node",
"resourceVersion": "112831116",
"name": "gk3-dagster-cloud-default-pool-71c0c832-zr8f",
"apiVersion": "v1"
},
"source": {
"component": "cluster-autoscaler"
},
"kind": "Event",
"reportingComponent": "",
"type": "Normal",
"apiVersion": "v1",
"reportingInstance": "",
"metadata": {
"resourceVersion": "276787",
"name": "gk3-dagster-cloud-default-pool-71c0c832-zr8f.171231483531ccd8",
"managedFields": [
{
"time": "2022-09-06T06:23:18Z",
"fieldsV1": {
"f:source": {
"f:component": {}
},
"f:firstTimestamp": {},
"f:message": {},
"f:lastTimestamp": {},
"f:involvedObject": {},
"f:reason": {},
"f:count": {},
"f:type": {}
},
"manager": "cluster-autoscaler",
"fieldsType": "FieldsV1",
"operation": "Update",
"apiVersion": "v1"
}
],
"namespace": "default",
"creationTimestamp": "2022-09-06T06:23:18Z",
"uid": "5a72ac05-3498-48b1-85c1-d74e87cfaece"
},
"reason": "ScaleDown",
"eventTime": null,
"message": "marked the node as toBeDeleted/unschedulable"
},
"resource": {
"type": "k8s_node",
"labels": {
"node_name": "gk3-dagster-cloud-default-pool-71c0c832-zr8f",
"project_id": "teamster-332318",
"location": "us-central1",
"cluster_name": "dagster-cloud"
}
},
"timestamp": "2022-09-06T06:23:18Z",
"severity": "INFO",
"logName": "projects/teamster-332318/logs/events",
"receiveTimestamp": "2022-09-06T06:23:23.664051690Z"
}
yuhan
09/06/2022, 6:06 PMCharlie Bini
09/06/2022, 6:42 PMCharlie Bini
09/06/2022, 6:46 PMyuhan
09/06/2022, 6:48 PMjohann
09/07/2022, 3:31 PMGKE Autopilot decided to downscale the node that the job was running onAt some point this is just a reality of running on k8s and is why the retries are helpful. But you can set
"<http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>": "false"
to avoid the K8s scheduler opting to stop your workloads. https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscalerCharlie Bini
09/08/2022, 3:21 PMCharlie Bini
09/08/2022, 3:23 PMjohann
09/08/2022, 3:45 PMCharlie Bini
09/08/2022, 5:45 PMError creating: admission webhook "<http://gkepolicy.common-webhooks.networking.gke.io|gkepolicy.common-webhooks.networking.gke.io>" denied the request: GKE Policy Controller rejected the request because it violates one or more policies: {"[denied by autogke-node-affinity-selector-limitation]":["Auto GKE disallows use of <http://cluster-autoscaler.kubernetes.io/safe-to-evict=false|cluster-autoscaler.kubernetes.io/safe-to-evict=false> annotation on workloads Requested by user: 'system:serviceaccount:kube-system:job-controller', groups: 'system:serviceaccounts,system:serviceaccounts:kube-system,system:authenticated'."]}
Charlie Bini
09/08/2022, 5:46 PM"<http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>"
isn't allowed on Autopilot unfortunatelyCharlie Bini
09/08/2022, 5:47 PM"dagster/retry_strategy": "ALL_STEPS"
appears to be the best solution for nowjohann
09/08/2022, 5:47 PM