https://dagster.io/ logo
Title
a

Abhishek Agrawal

02/22/2023, 3:13 AM
Getting this while trying to add a code location in hybrid mode using GKE - Anyone faced this before?
d

daniel

02/22/2023, 3:16 AM
Hey Abhishek - can you try running 'kubectl describe job' on the job being referenced there (spina-prod-xxx) and see if any errors jump out from that description? If there's a pod with a similar name in the cluster then running 'kubectl describe pod' on that pod or checking its logs can also explain why its having trouble starting up
one common source of errors here is not having the dagster-cloud package installed in the image that's running your code, but usually there's some clue in the logs there that explains what's going on
a

Abhishek Agrawal

02/22/2023, 3:46 AM
thanks a ton Daniel. Should I just add the install command for dagster cloud in my requirements file and then re-create the docker image?
abhishek_agrawal@cloudshell:~ (mutinex-dev)$ kubectl describe pod spina-prod-1a8182-7b8cc7b94-pmtpr --namespace dagster-cloud
Name:             spina-prod-1a8182-7b8cc7b94-pmtpr
Namespace:        dagster-cloud
Priority:         0
Service Account:  user-cloud-dagster-cloud-agent
Node:             gk3-dagster-default-pool-12146bc5-c3j4/10.192.0.14
Start Time:       Wed, 22 Feb 2023 03:04:56 +0000
Labels:           pod-template-hash=7b8cc7b94
                  user-deployment=spina-prod-1a8182
Annotations:      <http://seccomp.security.alpha.kubernetes.io/pod|seccomp.security.alpha.kubernetes.io/pod>: runtime/default
Status:           Running
IP:               10.107.0.17
IPs:
  IP:           10.107.0.17
Controlled By:  ReplicaSet/spina-prod-1a8182-7b8cc7b94
Containers:
  dagster:
    Container ID:  <containerd://84995ab668de9ac13526168ec51b4cb41878dd7cfdfbd9564dedf2174698548>6
    Image:         <http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>
    Image ID:      <http://gcr.io/mutinex-dev/spina@sha256:981f467b395c32fb466ea7dbe9c7f8ebe5fb6e2ea47d96c8e787e120756ed33f|gcr.io/mutinex-dev/spina@sha256:981f467b395c32fb466ea7dbe9c7f8ebe5fb6e2ea47d96c8e787e120756ed33f>
    Port:          <none>
    Host Port:     <none>
    Args:
      dagster
      api
      grpc
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 22 Feb 2023 03:08:58 +0000
      Finished:     Wed, 22 Feb 2023 03:08:58 +0000
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:                250m
      ephemeral-storage:  1Gi
      memory:             512Mi
    Requests:
      cpu:                250m
      ephemeral-storage:  1Gi
      memory:             512Mi
    Environment Variables from:
      dagster-cloud-agent-token  Secret  Optional: false
    Environment:
      DAGSTER_LOCATION_NAME:                     spina
      DAGSTER_INJECT_ENV_VARS_FROM_INSTANCE:     1
      DAGSTER_CLI_API_GRPC_LAZY_LOAD_USER_CODE:  1
      DAGSTER_CLI_API_GRPC_HOST:                 0.0.0.0
      DAGSTER_INSTANCE_REF:                      {"__class__": "InstanceRef", "compute_logs_data": {"__class__": "ConfigurableClassData", "class_name": "CloudComputeLogManager", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.compute_logs"}, "custom_instance_class_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterCloudAgentInstance", "config_yaml": "dagster_cloud_api:\n  agent_token:\n    env: DAGSTER_CLOUD_AGENT_TOKEN\n  deployment: prod\nuser_code_launcher:\n  class: K8sUserCodeLauncher\n  config:\n    dagster_home: /opt/dagster/dagster_home\n    env_secrets:\n    - dagster-cloud-agent-token\n    instance_config_map: user-cloud-dagster-cloud-agent-instance\n    namespace: dagster-cloud\n    pull_policy: Always\n    service_account_name: user-cloud-dagster-cloud-agent\n  module: dagster_cloud.workspace.kubernetes\n", "module_name": "dagster_cloud.instance"}, "event_storage_data": {"__class__": "ConfigurableClassData", "class_name": "GraphQLEventLogStorage", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.event_logs"}, "local_artifact_storage_data": {"__class__": "ConfigurableClassData", "class_name": "LocalArtifactStorage", "config_yaml": "base_dir: /opt/dagster/dagster_home\n", "module_name": "dagster.core.storage.root"}, "run_coordinator_data": {"__class__": "ConfigurableClassData", "class_name": "DefaultRunCoordinator", "config_yaml": "{}\n", "module_name": "dagster.core.run_coordinator"}, "run_launcher_data": {"__class__": "ConfigurableClassData", "class_name": "DefaultRunLauncher", "config_yaml": "{}\n", "module_name": "dagster"}, "run_storage_data": {"__class__": "ConfigurableClassData", "class_name": "GraphQLRunStorage", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.runs"}, "schedule_storage_data": {"__class__": "ConfigurableClassData", "class_name": "GraphQLScheduleStorage", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.schedules"}, "scheduler_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterDaemonScheduler", "config_yaml": "{}\n", "module_name": "dagster.core.scheduler"}, "secrets_loader_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterCloudSecretsLoader", "config_yaml": "{}\n", "module_name": "dagster_cloud.secrets"}, "settings": {}, "storage_data": {"__class__": "ConfigurableClassData", "class_name": "CompositeStorage", "config_yaml": "event_log_storage:\n  class_name: GraphQLEventLogStorage\n  config_yaml: '{}\n\n    '\n  module_name: dagster_cloud.storage.event_logs\nrun_storage:\n  class_name: GraphQLRunStorage\n  config_yaml: '{}\n\n    '\n  module_name: dagster_cloud.storage.runs\nschedule_storage:\n  class_name: GraphQLScheduleStorage\n  config_yaml: '{}\n\n    '\n  module_name: dagster_cloud.storage.schedules\n", "module_name": "dagster.core.storage.legacy_storage"}}
      DAGSTER_CLI_API_GRPC_PORT:                 4000
      DAGSTER_CURRENT_IMAGE:                     <http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>
      DAGSTER_CLI_API_GRPC_PACKAGE_NAME:         spina
      DAGSTER_CLOUD_DEPLOYMENT_NAME:             prod
      DAGSTER_CLOUD_LOCATION_NAME:               spina
      DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT:        0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d7m6k (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-d7m6k:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 <http://kubernetes.io/arch=amd64:NoSchedule|kubernetes.io/arch=amd64:NoSchedule>
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                    From                                   Message
  ----     ------     ----                   ----                                   -------
  Normal   Scheduled  6m1s                   <http://gke.io/optimize-utilization-scheduler|gke.io/optimize-utilization-scheduler>  Successfully assigned dagster-cloud/spina-prod-1a8182-7b8cc7b94-pmtpr to gk3-dagster-default-pool-12146bc5-c3j4
  Normal   Pulled     5m14s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 44.158385722s
  Normal   Pulled     5m9s                   kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.689241565s
  Normal   Pulled     4m51s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.717180699s
  Normal   Created    4m22s (x4 over 5m14s)  kubelet                                Created container dagster
  Normal   Pulled     4m22s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.676467814s
  Normal   Started    4m20s (x4 over 5m13s)  kubelet                                Started container dagster
  Normal   Pulling    3m26s (x5 over 5m58s)  kubelet                                Pulling image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>"
  Normal   Pulled     3m24s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.719730509s
  Warning  BackOff    57s (x19 over 5m6s)    kubelet                                Back-off restarting failed container
d

daniel

02/22/2023, 3:46 AM
yeah, pip install dagster-cloud alongside dagster. Did the logs for the task indicate that it was having trouble importing the dagster_clodu module?
er logs for the pod, rather
a

Abhishek Agrawal

02/22/2023, 3:46 AM
I could not see it from the describe
ah okay
let me check
d

daniel

02/22/2023, 3:47 AM
We're planning to include those pod logs automatically in dagit when this happens instead of that cryptic timeout message which should hopefully make this less confusing in the future
a

Abhishek Agrawal

02/22/2023, 3:48 AM
image.png
how do I see the logs?
sorry I'm new to k8s and learning as I go..
d

daniel

02/22/2023, 3:49 AM
ah that looks different actually! one sec
How did you build the image? If it was on a mac you may need to rebuild it with
docker build --platform linux/amd64
a

Abhishek Agrawal

02/22/2023, 3:50 AM
I built the image on my local and then I pushed to gcr.. I can do it again after adding dagster cloud install command
d

daniel

02/22/2023, 3:50 AM
'exec format error' there usually indicates that it's trying to use a different system architecture than the one it was built on
a

Abhishek Agrawal

02/22/2023, 3:52 AM
hmm that's strange.. I will try with dagster-cloud first and see how I go.. I also think that it maybe a reason for this failure
d

daniel

02/22/2023, 3:56 AM
I think if it was the dagster-cloud thing there would be a different error there related to not being able to import that package
a

Abhishek Agrawal

02/22/2023, 3:56 AM
ohh
that exec error came when I try to get the logs.. the describe response didn't say much about the error though, right?
in terms of image, it's just built on my local and then pushed to gcr
d

daniel

02/22/2023, 3:57 AM
I think the exec error is the contents of the logs - it’s saying the pod couldn’t start up because the image wasn’t built in a way that could be executed in the cluster
a

Abhishek Agrawal

02/22/2023, 3:58 AM
ooh
d

daniel

02/22/2023, 3:58 AM
Try building it with the flag I posted earlier to match the architecture of the k8s cluster
a

Abhishek Agrawal

02/22/2023, 3:58 AM
got it.. will do it now
while I do it.. here's the config file when I'm adding the code locations -
location_name: spina
image: <http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>
code_source:
  package_name: spina
container_context:
  k8s:
    image_pull_policy: Always
    namespace: dagster-cloud
    run_k8s_config:
      container_config:
        resources:
          limits:
            cpu: 500m
            memory: 1024Mi
      pod_spec_config:
        node_selector:
          disktype: ssd
    server_k8s_config:
      container_config:
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
and here's the docker file which is pretty straightforward..
FROM python:3.9.0

WORKDIR /app

COPY spina_requirements.txt spina_requirements.txt

ADD . .

RUN pip3 install -r spina_requirements.txt
Does these look okay?
d

daniel

02/22/2023, 4:46 AM
that looks OK yeah - I would suggest python:3.9 over python:3.9.0 so that you get the latest version with security patches
a

Abhishek Agrawal

02/22/2023, 5:09 AM
thanks a ton @daniel! It works but now I need to solve something internally.. we were using JSON key files for GCP authentication so it's looking for that file..
d

daniel

02/22/2023, 12:44 PM
a

Abhishek Agrawal

02/22/2023, 12:52 PM
Hmmm that's pretty easy. I will check with my infra guys. Last I checked, they were against using json key files. I had k8s related question, for the print () statements I have in my code, is it possible to view them in k8s pod somewhere? Would you know how to access those?
d

daniel

02/22/2023, 1:09 PM
Is it possible to make a new post for new questions? Our support oncall will see them and get them answered. for that specific question the answer may depend on where exactly the print statement is
a

Abhishek Agrawal

02/22/2023, 1:25 PM
Done