Alessandro Marrella
04/08/2021, 8:24 AMdagster_celery_k8s
:
An exception was thrown during execution that is likely a framework error, rather than an error in user code.
dagster.check.CheckError: Invariant failed. Description: Pipeline run dev_volumeclass_pipeline (0329a1a3-4013-4dfc-8f84-d9ee13492b9e) in state PipelineRunStatus.STARTED, expected NOT_STARTED or STARTING
any idea why this happens? (dagster 0.11.3)johann
04/08/2021, 11:52 AMdagster-run-…
(the run worker) as opposed to dagster-job-…
(the step worker)Alessandro Marrella
04/08/2021, 1:12 PMrun
container:
❯ kubectl logs dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e-hbzkm -n dagster
{"__class__": "ExecuteRunArgsLoadComplete"}
{"__class__": "DagsterEvent", "event_specific_data": {"__class__": "EngineEventData", "error": {"__class__": "SerializableErrorInfo", "cause": null, "cls_name": "CheckError", "message": "dagster.check.CheckError: Invariant failed. Description: Pipeline run dev_volumeclass_pipeline (0329a1a3-4013-4dfc-8f84-d9ee13492b9e) in state PipelineRunStatus.STARTED, expected NOT_STARTED or STARTING\n", "stack": [" File \"/usr/local/lib/python3.7/site-packages/dagster/grpc/impl.py\", line 76, in core_execute_run\n yield from execute_run_iterator(recon_pipeline, pipeline_run, instance)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py\", line 80, in execute_run_iterator\n pipeline_run.pipeline_name, pipeline_run.run_id, pipeline_run.status\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/check/__init__.py\", line 169, in invariant\n CheckError(\"Invariant failed. Description: {desc}\".format(desc=desc))\n", " File \"/usr/local/lib/python3.7/site-packages/future/utils/__init__.py\", line 446, in raise_with_traceback\n raise exc.with_traceback(traceback)\n"]}, "marker_end": null, "marker_start": null, "metadata_entries": []}, "event_type_value": "ENGINE_EVENT", "logging_tags": {}, "message": "An exception was thrown during execution that is likely a framework error, rather than an error in user code.", "pid": null, "pipeline_name": "dev_volumeclass_pipeline", "solid_handle": null, "step_handle": null, "step_key": null, "step_kind_value": null}
{"__class__": "DagsterEvent", "event_specific_data": null, "event_type_value": "PIPELINE_FAILURE", "logging_tags": {}, "message": "This pipeline run has been marked as failed from outside the execution context.", "pid": null, "pipeline_name": "dev_volumeclass_pipeline", "solid_handle": null, "step_handle": null, "step_key": null, "step_kind_value": null}
johann
04/08/2021, 1:20 PMAlessandro Marrella
04/08/2021, 1:20 PMjohann
04/08/2021, 1:24 PMkubectl describe pod dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e-hbzkm -n dagster
If you’re ok sharing, please check that it doesn’t have any secrets etcAlessandro Marrella
04/08/2021, 1:40 PM❯ kubectl describe pod dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e-hbzkm -n dagster
Name: dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e-hbzkm
Namespace: dagster
Priority: 0
Node: REDACTED
Start Time: Wed, 07 Apr 2021 16:16:58 +0100
Labels: <http://app.kubernetes.io/component=run_coordinator|app.kubernetes.io/component=run_coordinator>
<http://app.kubernetes.io/instance=dagster|app.kubernetes.io/instance=dagster>
<http://app.kubernetes.io/name=dagster|app.kubernetes.io/name=dagster>
<http://app.kubernetes.io/part-of=dagster|app.kubernetes.io/part-of=dagster>
<http://app.kubernetes.io/version=0.11.3|app.kubernetes.io/version=0.11.3>
controller-uid=4b5c1c1f-0ec1-48ea-a790-af6dc114be03
job-name=dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e
Annotations: <none>
Status: Succeeded
IP: REDACTED
IPs:
IP: REDACTED
Controlled By: Job/dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e
Containers:
dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e:
Container ID: <docker://c8437d6c8e3ab0d971f5d2ea0f43b2961d36faaaf4eabc8a07bb6611e0d6c5fb>
Image: REDACTED
Image ID: REDACTED
Port: <none>
Host Port: <none>
Args:
dagster
api
execute_run
{"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "dev_volumeclass_pipeline", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "FileCodePointer", "fn_name": "repository", "python_file": "bin/repository.py", "working_directory": "/app"}, "container_image": "REDACTED", "executable_path": "/usr/local/bin/python"}}, "pipeline_run_id": "0329a1a3-4013-4dfc-8f84-d9ee13492b9e"}
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 07 Apr 2021 16:17:00 +0100
Finished: Wed, 07 Apr 2021 16:17:15 +0100
Ready: False
Restart Count: 0
Environment Variables from:
dagster-anomaly-pipeline-user-env ConfigMap Optional: false
ses-keys Secret Optional: false
Environment:
DAGSTER_HOME: /opt/dagster/dagster_home
DAGSTER_PG_PASSWORD: <set to the key 'postgresql-password' in secret 'dagster-postgresql-secret'> Optional: false
DAGSTER_CURRENT_IMAGE: REDACTED
Mounts:
/opt/dagster/dagster_home/dagster.yaml from dagster-instance (rw,path="dagster.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5wdps (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
dagster-instance:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dagster-instance
Optional: false
default-token-5wdps:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5wdps
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events: <none>
johann
04/08/2021, 2:13 PMkubectl describe job dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e -n dagster
as wellAlessandro Marrella
04/08/2021, 2:35 PM❯ kubectl describe job dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e -n dagster
Name: dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e
Namespace: dagster
Selector: controller-uid=4b5c1c1f-0ec1-48ea-a790-af6dc114be03
Labels: <http://app.kubernetes.io/component=run_coordinator|app.kubernetes.io/component=run_coordinator>
<http://app.kubernetes.io/instance=dagster|app.kubernetes.io/instance=dagster>
<http://app.kubernetes.io/name=dagster|app.kubernetes.io/name=dagster>
<http://app.kubernetes.io/part-of=dagster|app.kubernetes.io/part-of=dagster>
<http://app.kubernetes.io/version=0.11.3|app.kubernetes.io/version=0.11.3>
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Wed, 07 Apr 2021 15:51:34 +0100
Completed At: Wed, 07 Apr 2021 16:17:16 +0100
Duration: 25m
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pod Template:
Labels: <http://app.kubernetes.io/component=run_coordinator|app.kubernetes.io/component=run_coordinator>
<http://app.kubernetes.io/instance=dagster|app.kubernetes.io/instance=dagster>
<http://app.kubernetes.io/name=dagster|app.kubernetes.io/name=dagster>
<http://app.kubernetes.io/part-of=dagster|app.kubernetes.io/part-of=dagster>
<http://app.kubernetes.io/version=0.11.3|app.kubernetes.io/version=0.11.3>
controller-uid=4b5c1c1f-0ec1-48ea-a790-af6dc114be03
job-name=dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e
Containers:
dagster-run-0329a1a3-4013-4dfc-8f84-d9ee13492b9e:
Image: REDACTED
Port: <none>
Host Port: <none>
Args:
dagster
api
execute_run
{"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "dev_volumeclass_pipeline", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "FileCodePointer", "fn_name": "repository", "python_file": "bin/repository.py", "working_directory": "/app"}, "container_image": "REDACTED", "executable_path": "/usr/local/bin/python"}}, "pipeline_run_id": "0329a1a3-4013-4dfc-8f84-d9ee13492b9e"}
Environment Variables from:
dagster-anomaly-pipeline-user-env ConfigMap Optional: false
ses-keys Secret Optional: false
Environment:
DAGSTER_HOME: /opt/dagster/dagster_home
DAGSTER_PG_PASSWORD: <set to the key 'postgresql-password' in secret 'dagster-postgresql-secret'> Optional: false
DAGSTER_CURRENT_IMAGE: REDACTED
Mounts:
/opt/dagster/dagster_home/dagster.yaml from dagster-instance (rw,path="dagster.yaml")
Volumes:
dagster-instance:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dagster-instance
Optional: false
Events: <none>
thanks for looking at this!johann
04/08/2021, 2:55 PMThis doesn’t always happen, and in this case it happened after a few steps not immediatelyThis is strange because the run worker is the process spawning steps. Really seems like it would be a duplicate run worker, but your manifests are showing only a single pod
Alessandro Marrella
04/08/2021, 3:44 PM