Igal Dahan
02/10/2022, 10:16 AMtags={
'dagster-k8s/config': {
'container_config': {
'resources': {
'requests': {'cpu': '4000m', 'memory': '8000Mi'},
'limits': {'cpu': '4000m', 'memory': '16000Mi'},
}
},
'pod_spec_config': {
'affinity': {
'nodeAffinity': {
'requiredDuringSchedulingIgnoredDuringExecution': {
'nodeSelectorTerms': [
{
'matchExpressions': [
{
'key': '<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>',
'operator': 'In',
'values': ['immunai-pipeline-pool'],
}
]
}
]
}
}
}
},
},
},
and when we run a jobs, the pod creation is stuck:
Warning FailedScheduling 41m default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
Warning FailedScheduling 41m default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
Normal NotTriggerScaleUp 36s (x241 over 41m) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector
in the nodes the label exist:
gke-dagster-omic-dat-immunai-pipeline-ca9c3a0a-60cw Ready <none> 2d21h v1.21.6-gke.1500 <http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-nodepool=immunai-pipeline-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/machine-family=n1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,igal=king,kubernetes.io/arch=amd64,kubernetes.io/hostname=gke-dagster-omic-dat-immunai-pipeline-ca9c3a0a-60cw,kubernetes.io/os=linux,label1=single-sample-pipeline,node.kubernetes.io/instance-type=n1-standard-2,topology.gke.io/zone=us-central1-a,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a|beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-nodepool=immunai-pipeline-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/machine-family=n1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,igal=king,kubernetes.io/arch=amd64,kubernetes.io/hostname=gke-dagster-omic-dat-immunai-pipeline-ca9c3a0a-60cw,kubernetes.io/os=linux,label1=single-sample-pipeline,node.kubernetes.io/instance-type=n1-standard-2,topology.gke.io/zone=us-central1-a,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a>
Andrea Giardini
02/10/2022, 10:41 AMIgal Dahan
02/10/2022, 10:41 AMkubectl describe po dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628-vhm6m
Name: dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628-vhm6m
Namespace: default
Priority: 0
Node: <none>
Labels: <http://app.kubernetes.io/component=run_worker|app.kubernetes.io/component=run_worker>
<http://app.kubernetes.io/instance=dagster|app.kubernetes.io/instance=dagster>
<http://app.kubernetes.io/name=dagster|app.kubernetes.io/name=dagster>
<http://app.kubernetes.io/part-of=dagster|app.kubernetes.io/part-of=dagster>
<http://app.kubernetes.io/version=0.13.4|app.kubernetes.io/version=0.13.4>
controller-uid=cee1d7f5-67ae-416d-8ead-9e2f5d6f1b77
job-name=dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: Job/dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628
Containers:
dagster:
Image: <http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>
Port: <none>
Host Port: <none>
Args:
/usr/bin/python3
-m
dagster
api
execute_run
{"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "staging_single_sample_job", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "ModuleCodePointer", "fn_name": "staging_single_sample_repo", "module": "single_sample_pipeline"}, "container_image": "<http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>", "executable_path": "/usr/bin/python3"}}, "pipeline_run_id": "4cc9fae2-0ae5-45b3-9862-11278066c628"}
Limits:
cpu: 4
memory: 16000Mi
Requests:
cpu: 4
memory: 8000Mi
Environment Variables from:
dagster-pipeline-env ConfigMap Optional: false
Environment:
DAGSTER_HOME: /opt/dagster/dagster_home
DAGSTER_PG_PASSWORD: <set to the key 'postgresql-password' in secret 'dagster-postgresql-secret'> Optional: false
LD_LIBRARY_PATH:
Mounts:
/opt/dagster/dagster_home/dagster.yaml from dagster-instance (rw,path="dagster.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j2chw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
dagster-instance:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dagster-instance
Optional: false
kube-api-access-j2chw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 41m default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
Warning FailedScheduling 41m default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
Normal NotTriggerScaleUp 36s (x241 over 41m) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector
Andrea Giardini
02/10/2022, 10:48 AMrequiredDuringSchedulingIgnoredDuringExecution -> required_during_scheduling_ignored_during_execution
etc etcIgal Dahan
02/10/2022, 10:50 AMAndrea Giardini
02/10/2022, 10:52 AMis it documented?https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment#job-or-op-kubernetes-configuration I see the docs here say exactly what you do... so I might be wrong
and what is it camel case? (is it a phrase?)+https://en.wikipedia.org/wiki/Camel_case
Igal Dahan
02/10/2022, 10:53 AMspec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>
operator: In
values:
- immunai-pipeline-pool
Andrea Giardini
02/10/2022, 10:54 AMkubectl get pod $podname -o yaml
?Igal Dahan
02/10/2022, 11:05 AMAndrea Giardini
02/10/2022, 11:06 AMIgal Dahan
02/10/2022, 11:07 AMapiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2022-02-10T10:55:56Z"
generateName: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f-
labels:
<http://app.kubernetes.io/component|app.kubernetes.io/component>: run_worker
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: dagster
<http://app.kubernetes.io/name|app.kubernetes.io/name>: dagster
<http://app.kubernetes.io/part-of|app.kubernetes.io/part-of>: dagster
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 0.13.4
controller-uid: 4021f273-5ee1-432b-a31e-3d4e91197860
job-name: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f
name: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f-zpwh5
namespace: default
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f
uid: 4021f273-5ee1-432b-a31e-3d4e91197860
resourceVersion: "14284445"
uid: 1fdf8d27-a718-490c-a521-d11a1597367c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>
operator: In
values:
- immunai-pipeline-pool
containers:
- args:
- /usr/bin/python3
- -m
- dagster
- api
- execute_run
- '{"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__":
"PipelinePythonOrigin", "pipeline_name": "staging_single_sample_job", "repository_origin":
{"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "ModuleCodePointer",
"fn_name": "staging_single_sample_repo", "module": "single_sample_pipeline"},
"container_image": "<http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>",
"executable_path": "/usr/bin/python3"}}, "pipeline_run_id": "e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f"}'
env:
- name: DAGSTER_HOME
value: /opt/dagster/dagster_home
- name: DAGSTER_PG_PASSWORD
valueFrom:
secretKeyRef:
key: postgresql-password
name: dagster-postgresql-secret
- name: LD_LIBRARY_PATH
envFrom:
- configMapRef:
name: dagster-pipeline-env
image: <http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>
imagePullPolicy: Always
name: dagster
resources:
limits:
cpu: "4"
memory: 16000Mi
requests:
cpu: "4"
memory: 8000Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/dagster/dagster_home/dagster.yaml
name: dagster-instance
subPath: dagster.yaml
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-wch7g
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: gcr-json-key
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: dagster
serviceAccountName: dagster
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: dagster-instance
name: dagster-instance
- name: kube-api-access-wch7g
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-02-10T10:55:56Z"
message: '0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory,
1 node(s) were unschedulable, 2 node(s) didn''t match Pod''s node affinity/selector.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
Andrea Giardini
02/10/2022, 11:13 AMIgal Dahan
02/10/2022, 11:14 AMAndrea Giardini
02/10/2022, 11:15 AM