its me again. I am working on the dual helm chart ...
# deployment-kubernetes
j
its me again. I am working on the dual helm chart deploy (infrastructure managed by me and user-workspace managed by the analytics team. I have it working
dagit.workspace.enabled: true
and the servers specified, but i want the other team to be able to add/remove without me being a bottleneck so want to move that to the server information to an
externalConfigmap
that they can manage and deploy as they develop and add new servers. I am testing by just deploying a configmap that looks like the one made by the helm chart eg:
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: dagster-workspace
  namespace: dagster-cd
data:
  workspace.yaml: |
    load_from:
      - grpc_server:
        host: "k8s-example-user-code-1"
        port: 3030
        location_name: "user-code-example"
and setting
externalConfigmap: "dagster-workspace"
in the dagit values, but that is not working an im getting Crashloop backoffs. The issue appears to be that the deployment is still trying to mount the dagster-workspace-yaml configmap that would be made if i had information entered in
servers
and isn't being told to load the configmap I'm passing in at
externalConformap
d
Hi Jakub - any chance you could pass along your values.yaml? Especially the workspace part?
Oh i see - when you say "the deployment is still trying to mount the dagster-workspace-yaml configmap" - "the deployment" here is the user code deployment?
j
yeah, when i describe the pod that is crashing it is still trying to mount dagster-workspace-yaml which doesnt exitst because i told it not to
d
I'm confused why the user code deployment would need to mount that in the first place
user code shouldn't need the workspace.yaml file
what's the name of the pod that's failing?
j
dagster-dagit-5fc6f96447-k6h99
the daemon is also failing
d
Ah that's dagit, not the user code deployment
ok, yeah, can you send over the relevant bits of the values.yaml for the workspace?
j
yeah, the problem is it cant pick up the workspace configmap i think
sure
d
This is the relevant bit of the Helm chart:
Copy code
{{- define "dagit.workspace.configmapName" -}}
{{- $dagitWorkspace := .Values.dagit.workspace }}
{{- if and $dagitWorkspace.enabled $dagitWorkspace.externalConfigmap }}
{{- $dagitWorkspace.externalConfigmap -}}
{{- else -}}
{{ template "dagster.fullname" . }}-workspace-yaml
{{- end -}}
{{- end -}}
for whatever reason we're hitting the else it seems when we want to be hitting the if
Dunno if anything jumps out reading that - hopefully it'll be apparent from the values.yaml
j
in the dagster helm values from deploy with dagster/dagster chart
Copy code
workspace:
    enabled: true
    servers: []
    externalConfigmap: "dagster-workspace"
...

dagster-user-deployments:
  enabled: true
  enableSubchart: false
independently deployed configmap
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: dagster-workspace
  namespace: dagster-cd
data:
  workspace.yaml: |
    load_from:
      - grpc_server:
        host: "k8s-example-user-code-1"
        port: 3030
        location_name: "user-code-example"
second helm deployment with dagster/dagster-user-deployments chart
Copy code
deployments:
  - name: "k8s-example-user-code-1"
    image:
      repository: "<http://docker.io/dagster/user-code-example|docker.io/dagster/user-code-example>"
      tag: ~
      pullPolicy: Always
    dagsterApiGrpcArgs:
      - "-f"
      - "/example_project/example_repo/repo.py"
    port: 3030
still just hello worlding it over here trying to get things in the place i want them
d
and double-checking: workspace there is under dagit, right?
j
im not sure i understand.
ill edit the last block
d
it should look like this (your paste left out the
dagit
part)
Copy code
dagit:
  workspace:
    enabled: true
    servers: []
    externalConfigmap: "dagster-workspace"
j
yes, correct
that first block is from the "main" dagster deploy
d
is it also creating a configmap called dagster-workspace-yaml?
(incorrectly)
j
it is not. i see that go away when i deploy with the servers: []
d
Can you post the full text of the error you're seeing?
er actually hang on i think i see it
j
redeploying, ive been messing around in the deployed manifest trying some things out..
d
seems like an oversight in the PR that added the externalConfigMap stuff: https://github.com/dagster-io/dagster/pull/7882/files - funny enough, i think it might work if you still name your custom configmap "dagster-workspace-yaml" - we can put out a quick fix for that
j
hmm. trying it with dagster-workspace-yaml
no, still failing.
d
Hmm actually I may have misdiagnosed, yeah. Can I have that full error message after all?
j
so there really isnt an error message. it is failing readiness checks.
d
i think the fact that it's referencing dagster-workspace-yaml may be unrelated for the same reason I was confused - it still creates a volume called dagster-workspace-yaml that's different from the configmap name
What does the readiness check failure say, if anything
j
connection refused
Copy code
Readiness probe failed: Get "<http://172.22.89.219:80/dagit_info>": dial tcp 172.22.89.219:80: connect: connection refused
d
Ok how about the output of kubectl describe on the dagit pod that's crashing?
Just to double-check, is there possibly a missing indent in your workspace.yaml? instead of:
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: dagster-workspace
  namespace: dagster-cd
data:
  workspace.yaml: |
    load_from:
      - grpc_server:
        host: "k8s-example-user-code-1"
        port: 3030
        location_name: "user-code-example"
try
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: dagster-workspace
  namespace: dagster-cd
data:
  workspace.yaml: |
    load_from:
      - grpc_server:
          host: "k8s-example-user-code-1"
          port: 3030
          location_name: "user-code-example"
(maybe unrelated to the crash loop)
j
Copy code
Name:             dagster-dagit-7cbd4ccdcf-cn2zw
Namespace:        dagster-cd
Priority:         0
Service Account:  dagster
Node:             ip-172-22-89-92.ec2.internal/172.22.89.92
Start Time:       Fri, 03 Mar 2023 14:01:49 -0600
Labels:           <http://app.kubernetes.io/instance=dagster|app.kubernetes.io/instance=dagster>
                  <http://app.kubernetes.io/name=dagster|app.kubernetes.io/name=dagster>
                  component=dagit
                  pod-template-hash=7cbd4ccdcf
Annotations:      checksum/dagster-instance: 7dbdd4411fb97a93bc73cc4d6d2c7a516a167c32e5847ef00b1bd397487c8a74
                  checksum/dagster-workspace: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
                  <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:           Running
IP:               172.22.89.219
IPs:
  IP:           172.22.89.219
Controlled By:  ReplicaSet/dagster-dagit-7cbd4ccdcf
Init Containers:
  check-db-ready:
    Container ID:  <docker://6c4981d742229d7168cd6402b73cbb6accec4a9dca4766c780090eb344f5f9a>c
    Image:         library/postgres:14.6
    Image ID:      <docker-pullable://postgres@sha256>:f565573d74aedc9b218e1d191b04ec75bdd50c33b2d44d91bcd3db5f2fcea647
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until pg_isready -h dagster-postgresql -p 5432 -U test; do echo waiting for database; sleep 2; done;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 03 Mar 2023 14:01:50 -0600
      Finished:     Fri, 03 Mar 2023 14:02:08 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dlrg9 (ro)
Containers:
  dagster:
    Container ID:  <docker://1d40ea4ed02abf5219f90b0835589f2863f926c82aaa2a741b73d5d9d31e076>3
    Image:         <http://docker.io/dagster/dagster-celery-k8s:1.1.20|docker.io/dagster/dagster-celery-k8s:1.1.20>
    Image ID:      <docker-pullable://dagster/dagster-celery-k8s@sha256:ddc3b429602d6fda0803a738bc8a52d97aab95bec0f98e3a414423f069edde9c>
    Port:          80/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      -c
       dagit -h 0.0.0.0 -p 80 -w /dagster-workspace/workspace.yaml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 03 Mar 2023 14:02:27 -0600
      Finished:     Fri, 03 Mar 2023 14:02:29 -0600
    Ready:          False
    Restart Count:  2
    Readiness:      http-get http://:80/dagit_info delay=0s timeout=3s period=20s #success=1 #failure=3
    Environment Variables from:
      dagster-dagit-env  ConfigMap  Optional: false
    Environment:
      DAGSTER_PG_PASSWORD:  <set to the key 'postgresql-password' in secret 'dagster-postgresql-secret'>  Optional: false
    Mounts:
      /dagster-workspace/ from dagster-workspace-yaml (rw)
      /opt/dagster/dagster_home/dagster.yaml from dagster-instance (rw,path="dagster.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dlrg9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  dagster-instance:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dagster-instance
    Optional:  false
  dagster-workspace-yaml:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dagster-workspace-yaml
    Optional:  false
  kube-api-access-dlrg9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  56s                default-scheduler  Successfully assigned dagster-cd/dagster-dagit-7cbd4ccdcf-cn2zw to ip-172-22-89-92.ec2.internal
  Normal   Pulled     55s                kubelet            Container image "library/postgres:14.6" already present on machine
  Normal   Created    55s                kubelet            Created container check-db-ready
  Normal   Started    55s                kubelet            Started container check-db-ready
  Normal   Pulled     36s                kubelet            Successfully pulled image "<http://docker.io/dagster/dagster-celery-k8s:1.1.20|docker.io/dagster/dagster-celery-k8s:1.1.20>" in 97.909065ms
  Normal   Pulled     33s                kubelet            Successfully pulled image "<http://docker.io/dagster/dagster-celery-k8s:1.1.20|docker.io/dagster/dagster-celery-k8s:1.1.20>" in 97.175577ms
  Normal   Pulling    18s (x3 over 36s)  kubelet            Pulling image "<http://docker.io/dagster/dagster-celery-k8s:1.1.20|docker.io/dagster/dagster-celery-k8s:1.1.20>"
  Normal   Created    18s (x3 over 36s)  kubelet            Created container dagster
  Normal   Started    18s (x3 over 36s)  kubelet            Started container dagster
  Normal   Pulled     18s                kubelet            Successfully pulled image "<http://docker.io/dagster/dagster-celery-k8s:1.1.20|docker.io/dagster/dagster-celery-k8s:1.1.20>" in 91.269323ms
  Warning  Unhealthy  17s (x4 over 35s)  kubelet            Readiness probe failed: Get "<http://172.22.89.219:80/dagit_info>": dial tcp 172.22.89.219:80: connect: connection refused
  Warning  BackOff    1s (x5 over 31s)   kubelet            Back-off restarting failed container
d
Any logs on the pod?
looks like the container crashed - so it may actually be the indent thing if it isn't valid YAML the way it's expecting?
j
Copy code
Defaulted container "dagster" out of: dagster, check-db-ready (init)
Traceback (most recent call last):
  File "/usr/local/bin/dagit", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/dagit/cli.py", line 225, in main
    cli(auto_envvar_prefix="DAGIT")  # pylint:disable=E1120
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dagit/cli.py", line 173, in dagit
    code_server_log_level=code_server_log_level,
  File "/usr/local/lib/python3.7/site-packages/dagster/_cli/workspace/cli_target.py", line 276, in get_workspace_process_context_from_kwargs
    code_server_log_level=code_server_log_level,
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 495, in __init__
    {origin.location_name: self._load_location(origin) for origin in self._origins}
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 504, in _origins
    return self._workspace_load_target.create_origins() if self._workspace_load_target else []
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/load_target.py", line 41, in create_origins
    return location_origins_from_yaml_paths(self.paths)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/load.py", line 47, in location_origins_from_yaml_paths
    for k, v in location_origins_from_config(cast(Dict, workspace_config), yaml_path).items():
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/load.py", line 56, in location_origins_from_config
    workspace_config = ensure_workspace_config(workspace_config, yaml_path)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/config_schema.py", line 39, in ensure_workspace_config
    workspace_config,
dagster._core.errors.DagsterInvalidConfigError: Errors while loading workspace config from /dagster-workspace/workspace.yaml.
    Error 1: You can only specify a single field at path root:load_from[0]. You specified ['grpc_server', 'host', 'location_name', 'port']. The available fields are ['grpc_server', 'python_file', 'python_module', 'python_package']
d
aha! it is the indent thing after all
j
i dont think i was getting logs when i didnt have it as
"dagster-workspace-yaml"
though..
d
ah that's possible - try changing it back? could be a combination of two things
j
yup.
working it
op no. same logs. maybe was requesting them at bad times 😬
d
all good, its confusing that it keeps the old name in the volume, i see why you thought it was that
j
it is. def got fixated on it
and it was the indent..
lol. omg.
yaml fire
Thank you so much!
condagster 1