https://dagster.io/ logo
#deployment-kubernetes
Title
# deployment-kubernetes
i

Igal Dahan

02/10/2022, 10:16 AM
Hi All, i need help on k8s i have set on the job this decoration
Copy code
tags={
        'dagster-k8s/config': {
            'container_config': {
                'resources': {
                    'requests': {'cpu': '4000m', 'memory': '8000Mi'},
                    'limits': {'cpu': '4000m', 'memory': '16000Mi'},
                }
            },
            'pod_spec_config': {
                'affinity': {
                    'nodeAffinity': {
                        'requiredDuringSchedulingIgnoredDuringExecution': {
                            'nodeSelectorTerms': [
                                {
                                    'matchExpressions': [
                                        {
                                            'key': '<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>',
                                            'operator': 'In',
                                            'values': ['immunai-pipeline-pool'],
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                }
            },
        },
    },
and when we run a jobs, the pod creation is stuck:
Copy code
Warning  FailedScheduling   41m                  default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   41m                  default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
  Normal   NotTriggerScaleUp  36s (x241 over 41m)  cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector
in the nodes the label exist:
Copy code
gke-dagster-omic-dat-immunai-pipeline-ca9c3a0a-60cw   Ready                      <none>   2d21h   v1.21.6-gke.1500   <http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-nodepool=immunai-pipeline-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/machine-family=n1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,igal=king,kubernetes.io/arch=amd64,kubernetes.io/hostname=gke-dagster-omic-dat-immunai-pipeline-ca9c3a0a-60cw,kubernetes.io/os=linux,label1=single-sample-pipeline,node.kubernetes.io/instance-type=n1-standard-2,topology.gke.io/zone=us-central1-a,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a|beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-nodepool=immunai-pipeline-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/machine-family=n1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,igal=king,kubernetes.io/arch=amd64,kubernetes.io/hostname=gke-dagster-omic-dat-immunai-pipeline-ca9c3a0a-60cw,kubernetes.io/os=linux,label1=single-sample-pipeline,node.kubernetes.io/instance-type=n1-standard-2,topology.gke.io/zone=us-central1-a,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a>
a

Andrea Giardini

02/10/2022, 10:41 AM
Can you post the full description of the pod?
i

Igal Dahan

02/10/2022, 10:41 AM
yes
Copy code
kubectl describe po dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628-vhm6m
Name:           dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628-vhm6m
Namespace:      default
Priority:       0
Node:           <none>
Labels:         <http://app.kubernetes.io/component=run_worker|app.kubernetes.io/component=run_worker>
                <http://app.kubernetes.io/instance=dagster|app.kubernetes.io/instance=dagster>
                <http://app.kubernetes.io/name=dagster|app.kubernetes.io/name=dagster>
                <http://app.kubernetes.io/part-of=dagster|app.kubernetes.io/part-of=dagster>
                <http://app.kubernetes.io/version=0.13.4|app.kubernetes.io/version=0.13.4>
                controller-uid=cee1d7f5-67ae-416d-8ead-9e2f5d6f1b77
                job-name=dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  Job/dagster-run-4cc9fae2-0ae5-45b3-9862-11278066c628
Containers:
  dagster:
    Image:      <http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>
    Port:       <none>
    Host Port:  <none>
    Args:
      /usr/bin/python3
      -m
      dagster
      api
      execute_run
      {"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "staging_single_sample_job", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "ModuleCodePointer", "fn_name": "staging_single_sample_repo", "module": "single_sample_pipeline"}, "container_image": "<http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>", "executable_path": "/usr/bin/python3"}}, "pipeline_run_id": "4cc9fae2-0ae5-45b3-9862-11278066c628"}
    Limits:
      cpu:     4
      memory:  16000Mi
    Requests:
      cpu:     4
      memory:  8000Mi
    Environment Variables from:
      dagster-pipeline-env  ConfigMap  Optional: false
    Environment:
      DAGSTER_HOME:         /opt/dagster/dagster_home
      DAGSTER_PG_PASSWORD:  <set to the key 'postgresql-password' in secret 'dagster-postgresql-secret'>  Optional: false
      LD_LIBRARY_PATH:      
    Mounts:
      /opt/dagster/dagster_home/dagster.yaml from dagster-instance (rw,path="dagster.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j2chw (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  dagster-instance:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dagster-instance
    Optional:  false
  kube-api-access-j2chw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Warning  FailedScheduling   41m                  default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling   41m                  default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector.
  Normal   NotTriggerScaleUp  36s (x241 over 41m)  cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector
a

Andrea Giardini

02/10/2022, 10:48 AM
I don't see the affinity in the pod description
I think you need to replace the camel case with the underscore syntax
nodeAffinity -> node_affinity
Copy code
requiredDuringSchedulingIgnoredDuringExecution -> required_during_scheduling_ignored_during_execution
etc etc
i

Igal Dahan

02/10/2022, 10:50 AM
hmm
is it documented?
and what is it camel case? (is it a phrase?)
a

Andrea Giardini

02/10/2022, 10:52 AM
is it documented?
https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment#job-or-op-kubernetes-configuration I see the docs here say exactly what you do... so I might be wrong
and what is it camel case? (is it a phrase?)+
https://en.wikipedia.org/wiki/Camel_case
i

Igal Dahan

02/10/2022, 10:53 AM
when we did edit pod we see
Copy code
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>
            operator: In
            values:
            - immunai-pipeline-pool
a

Andrea Giardini

02/10/2022, 10:54 AM
Weird... if the affinity is there and the node label is there it should work
what about
kubectl get pod $podname -o yaml
?
i

Igal Dahan

02/10/2022, 11:05 AM
i have also another guy in the team that request to join and he is still in pending, could you approve him? his email eldan.hamdani@immunai.com
a

Andrea Giardini

02/10/2022, 11:06 AM
I am not sure what you are referring to...
i

Igal Dahan

02/10/2022, 11:07 AM
Copy code
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-02-10T10:55:56Z"
  generateName: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f-
  labels:
    <http://app.kubernetes.io/component|app.kubernetes.io/component>: run_worker
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: dagster
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: dagster
    <http://app.kubernetes.io/part-of|app.kubernetes.io/part-of>: dagster
    <http://app.kubernetes.io/version|app.kubernetes.io/version>: 0.13.4
    controller-uid: 4021f273-5ee1-432b-a31e-3d4e91197860
    job-name: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f
  name: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f-zpwh5
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: dagster-run-e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f
    uid: 4021f273-5ee1-432b-a31e-3d4e91197860
  resourceVersion: "14284445"
  uid: 1fdf8d27-a718-490c-a521-d11a1597367c
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: <http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>
            operator: In
            values:
            - immunai-pipeline-pool
  containers:
  - args:
    - /usr/bin/python3
    - -m
    - dagster
    - api
    - execute_run
    - '{"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__":
      "PipelinePythonOrigin", "pipeline_name": "staging_single_sample_job", "repository_origin":
      {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "ModuleCodePointer",
      "fn_name": "staging_single_sample_repo", "module": "single_sample_pipeline"},
      "container_image": "<http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>",
      "executable_path": "/usr/bin/python3"}}, "pipeline_run_id": "e47b5c8a-09a6-4052-bef9-d0e9f6cfab6f"}'
    env:
    - name: DAGSTER_HOME
      value: /opt/dagster/dagster_home
    - name: DAGSTER_PG_PASSWORD
      valueFrom:
        secretKeyRef:
          key: postgresql-password
          name: dagster-postgresql-secret
    - name: LD_LIBRARY_PATH
    envFrom:
    - configMapRef:
        name: dagster-pipeline-env
    image: <http://gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s|gcr.io/immunai-registry-hub/panacea-ai/immunai-product-single_sample_pipeline:feature-ssp-k8s>
    imagePullPolicy: Always
    name: dagster
    resources:
      limits:
        cpu: "4"
        memory: 16000Mi
      requests:
        cpu: "4"
        memory: 8000Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /opt/dagster/dagster_home/dagster.yaml
      name: dagster-instance
      subPath: dagster.yaml
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-wch7g
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: gcr-json-key
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: dagster
  serviceAccountName: dagster
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: dagster-instance
    name: dagster-instance
  - name: kube-api-access-wch7g
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-02-10T10:55:56Z"
    message: '0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient memory,
      1 node(s) were unschedulable, 2 node(s) didn''t match Pod''s node affinity/selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable
a guy the want to join this slack channel
i was not clear, sorry.
a

Andrea Giardini

02/10/2022, 11:13 AM
I have no control over the slack channel, I am just a member of the community. The slack server and channel are free to join
i

Igal Dahan

02/10/2022, 11:14 AM
i see, now its working for use, the resources should fit
thanks for the help
🍾
a

Andrea Giardini

02/10/2022, 11:15 AM
Good to know, let me know in case you need additional help 🙂 I've been working with Dagster for some time and I am looking for more projects with it
2 Views