Getting this while trying to add a code location i...
# dagster-plus
a
Getting this while trying to add a code location in hybrid mode using GKE - Anyone faced this before?
d
Hey Abhishek - can you try running 'kubectl describe job' on the job being referenced there (spina-prod-xxx) and see if any errors jump out from that description? If there's a pod with a similar name in the cluster then running 'kubectl describe pod' on that pod or checking its logs can also explain why its having trouble starting up
one common source of errors here is not having the dagster-cloud package installed in the image that's running your code, but usually there's some clue in the logs there that explains what's going on
a
thanks a ton Daniel. Should I just add the install command for dagster cloud in my requirements file and then re-create the docker image?
Copy code
abhishek_agrawal@cloudshell:~ (mutinex-dev)$ kubectl describe pod spina-prod-1a8182-7b8cc7b94-pmtpr --namespace dagster-cloud
Name:             spina-prod-1a8182-7b8cc7b94-pmtpr
Namespace:        dagster-cloud
Priority:         0
Service Account:  user-cloud-dagster-cloud-agent
Node:             gk3-dagster-default-pool-12146bc5-c3j4/10.192.0.14
Start Time:       Wed, 22 Feb 2023 03:04:56 +0000
Labels:           pod-template-hash=7b8cc7b94
                  user-deployment=spina-prod-1a8182
Annotations:      <http://seccomp.security.alpha.kubernetes.io/pod|seccomp.security.alpha.kubernetes.io/pod>: runtime/default
Status:           Running
IP:               10.107.0.17
IPs:
  IP:           10.107.0.17
Controlled By:  ReplicaSet/spina-prod-1a8182-7b8cc7b94
Containers:
  dagster:
    Container ID:  <containerd://84995ab668de9ac13526168ec51b4cb41878dd7cfdfbd9564dedf2174698548>6
    Image:         <http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>
    Image ID:      <http://gcr.io/mutinex-dev/spina@sha256:981f467b395c32fb466ea7dbe9c7f8ebe5fb6e2ea47d96c8e787e120756ed33f|gcr.io/mutinex-dev/spina@sha256:981f467b395c32fb466ea7dbe9c7f8ebe5fb6e2ea47d96c8e787e120756ed33f>
    Port:          <none>
    Host Port:     <none>
    Args:
      dagster
      api
      grpc
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 22 Feb 2023 03:08:58 +0000
      Finished:     Wed, 22 Feb 2023 03:08:58 +0000
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:                250m
      ephemeral-storage:  1Gi
      memory:             512Mi
    Requests:
      cpu:                250m
      ephemeral-storage:  1Gi
      memory:             512Mi
    Environment Variables from:
      dagster-cloud-agent-token  Secret  Optional: false
    Environment:
      DAGSTER_LOCATION_NAME:                     spina
      DAGSTER_INJECT_ENV_VARS_FROM_INSTANCE:     1
      DAGSTER_CLI_API_GRPC_LAZY_LOAD_USER_CODE:  1
      DAGSTER_CLI_API_GRPC_HOST:                 0.0.0.0
      DAGSTER_INSTANCE_REF:                      {"__class__": "InstanceRef", "compute_logs_data": {"__class__": "ConfigurableClassData", "class_name": "CloudComputeLogManager", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.compute_logs"}, "custom_instance_class_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterCloudAgentInstance", "config_yaml": "dagster_cloud_api:\n  agent_token:\n    env: DAGSTER_CLOUD_AGENT_TOKEN\n  deployment: prod\nuser_code_launcher:\n  class: K8sUserCodeLauncher\n  config:\n    dagster_home: /opt/dagster/dagster_home\n    env_secrets:\n    - dagster-cloud-agent-token\n    instance_config_map: user-cloud-dagster-cloud-agent-instance\n    namespace: dagster-cloud\n    pull_policy: Always\n    service_account_name: user-cloud-dagster-cloud-agent\n  module: dagster_cloud.workspace.kubernetes\n", "module_name": "dagster_cloud.instance"}, "event_storage_data": {"__class__": "ConfigurableClassData", "class_name": "GraphQLEventLogStorage", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.event_logs"}, "local_artifact_storage_data": {"__class__": "ConfigurableClassData", "class_name": "LocalArtifactStorage", "config_yaml": "base_dir: /opt/dagster/dagster_home\n", "module_name": "dagster.core.storage.root"}, "run_coordinator_data": {"__class__": "ConfigurableClassData", "class_name": "DefaultRunCoordinator", "config_yaml": "{}\n", "module_name": "dagster.core.run_coordinator"}, "run_launcher_data": {"__class__": "ConfigurableClassData", "class_name": "DefaultRunLauncher", "config_yaml": "{}\n", "module_name": "dagster"}, "run_storage_data": {"__class__": "ConfigurableClassData", "class_name": "GraphQLRunStorage", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.runs"}, "schedule_storage_data": {"__class__": "ConfigurableClassData", "class_name": "GraphQLScheduleStorage", "config_yaml": "{}\n", "module_name": "dagster_cloud.storage.schedules"}, "scheduler_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterDaemonScheduler", "config_yaml": "{}\n", "module_name": "dagster.core.scheduler"}, "secrets_loader_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterCloudSecretsLoader", "config_yaml": "{}\n", "module_name": "dagster_cloud.secrets"}, "settings": {}, "storage_data": {"__class__": "ConfigurableClassData", "class_name": "CompositeStorage", "config_yaml": "event_log_storage:\n  class_name: GraphQLEventLogStorage\n  config_yaml: '{}\n\n    '\n  module_name: dagster_cloud.storage.event_logs\nrun_storage:\n  class_name: GraphQLRunStorage\n  config_yaml: '{}\n\n    '\n  module_name: dagster_cloud.storage.runs\nschedule_storage:\n  class_name: GraphQLScheduleStorage\n  config_yaml: '{}\n\n    '\n  module_name: dagster_cloud.storage.schedules\n", "module_name": "dagster.core.storage.legacy_storage"}}
      DAGSTER_CLI_API_GRPC_PORT:                 4000
      DAGSTER_CURRENT_IMAGE:                     <http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>
      DAGSTER_CLI_API_GRPC_PACKAGE_NAME:         spina
      DAGSTER_CLOUD_DEPLOYMENT_NAME:             prod
      DAGSTER_CLOUD_LOCATION_NAME:               spina
      DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT:        0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d7m6k (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-d7m6k:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 <http://kubernetes.io/arch=amd64:NoSchedule|kubernetes.io/arch=amd64:NoSchedule>
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                    From                                   Message
  ----     ------     ----                   ----                                   -------
  Normal   Scheduled  6m1s                   <http://gke.io/optimize-utilization-scheduler|gke.io/optimize-utilization-scheduler>  Successfully assigned dagster-cloud/spina-prod-1a8182-7b8cc7b94-pmtpr to gk3-dagster-default-pool-12146bc5-c3j4
  Normal   Pulled     5m14s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 44.158385722s
  Normal   Pulled     5m9s                   kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.689241565s
  Normal   Pulled     4m51s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.717180699s
  Normal   Created    4m22s (x4 over 5m14s)  kubelet                                Created container dagster
  Normal   Pulled     4m22s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.676467814s
  Normal   Started    4m20s (x4 over 5m13s)  kubelet                                Started container dagster
  Normal   Pulling    3m26s (x5 over 5m58s)  kubelet                                Pulling image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>"
  Normal   Pulled     3m24s                  kubelet                                Successfully pulled image "<http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>" in 1.719730509s
  Warning  BackOff    57s (x19 over 5m6s)    kubelet                                Back-off restarting failed container
d
yeah, pip install dagster-cloud alongside dagster. Did the logs for the task indicate that it was having trouble importing the dagster_clodu module?
er logs for the pod, rather
a
I could not see it from the describe
ah okay
let me check
d
We're planning to include those pod logs automatically in dagit when this happens instead of that cryptic timeout message which should hopefully make this less confusing in the future
a
image.png
how do I see the logs?
sorry I'm new to k8s and learning as I go..
d
ah that looks different actually! one sec
How did you build the image? If it was on a mac you may need to rebuild it with
Copy code
docker build --platform linux/amd64
a
I built the image on my local and then I pushed to gcr.. I can do it again after adding dagster cloud install command
d
'exec format error' there usually indicates that it's trying to use a different system architecture than the one it was built on
a
hmm that's strange.. I will try with dagster-cloud first and see how I go.. I also think that it maybe a reason for this failure
d
I think if it was the dagster-cloud thing there would be a different error there related to not being able to import that package
a
ohh
that exec error came when I try to get the logs.. the describe response didn't say much about the error though, right?
in terms of image, it's just built on my local and then pushed to gcr
d
I think the exec error is the contents of the logs - it’s saying the pod couldn’t start up because the image wasn’t built in a way that could be executed in the cluster
a
ooh
d
Try building it with the flag I posted earlier to match the architecture of the k8s cluster
a
got it.. will do it now
while I do it.. here's the config file when I'm adding the code locations -
Copy code
location_name: spina
image: <http://gcr.io/mutinex-dev/spina|gcr.io/mutinex-dev/spina>
code_source:
  package_name: spina
container_context:
  k8s:
    image_pull_policy: Always
    namespace: dagster-cloud
    run_k8s_config:
      container_config:
        resources:
          limits:
            cpu: 500m
            memory: 1024Mi
      pod_spec_config:
        node_selector:
          disktype: ssd
    server_k8s_config:
      container_config:
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
and here's the docker file which is pretty straightforward..
Copy code
FROM python:3.9.0

WORKDIR /app

COPY spina_requirements.txt spina_requirements.txt

ADD . .

RUN pip3 install -r spina_requirements.txt
Does these look okay?
d
that looks OK yeah - I would suggest python:3.9 over python:3.9.0 so that you get the latest version with security patches
a
thanks a ton @daniel! It works but now I need to solve something internally.. we were using JSON key files for GCP authentication so it's looking for that file..
d
a
Hmmm that's pretty easy. I will check with my infra guys. Last I checked, they were against using json key files. I had k8s related question, for the print () statements I have in my code, is it possible to view them in k8s pod somewhere? Would you know how to access those?
d
Is it possible to make a new post for new questions? Our support oncall will see them and get them answered. for that specific question the answer may depend on where exactly the print statement is
a
Done