https://dagster.io/ logo
#deployment-kubernetes
Title
# deployment-kubernetes
m

Michel Rouly

02/01/2022, 11:19 PM
Hey! Similar question to @Javier Llorente Mañas's above but with a key difference. I'd like to set up a job where it provisions its own PersistentVolumClaim (PVC) based on storage requests associated with the job. By comparison, we can set resource requests (cpu/memory) for jobs using
dagster-k8s/config
-- would be nice to also set storage requests, and then meet the requests by creating a PVC. The lifecycle I'm picturing would look like: • dagster spawns job and pvc mount • PVC spawns (or matches existing) PV ◦ if new PV, new backing resource (e.g. AWS EBS) is spawned as well • dagster job mounts new PVC with the requested storage • job runs! • job terminates, pod terminates, PVC is released, PV is expired/persisted based on policy
One critical distinction from what's currently available (as I understand it) in
dagster-k8s/config
is that the PVC I'm describing does not exist before the Dagster job is spawned. So I can't simply use
volumes
and
volumeMounts
to refer to an existing PVC.
a

Andrea Giardini

02/02/2022, 8:49 AM
Hey @Michel Rouly I think Ephemeral volumes are what you are looking for -> https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes I haven't tested them myself, but it should be possible to integrate them with Dagster
m

Michel Rouly

02/02/2022, 1:09 PM
Hey Andrea - that sounds right to me based on some digging I did yesterday. Unfortunately it looks like generic ephemeral inline volumes are added in 1.23 which isn't available on EKS yet 🙃 And CSI ephemeral inline volumes aren't supported by the AWS EBS driver - it only supports persistent inline volumes.
...meaning I don't think that works on AWS Kubernetes at the moment, which is a bummer.
a

Andrea Giardini

02/02/2022, 1:13 PM
Hey michael, it looks like it's already supported by the CSI driver -> https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/482#issuecomment-740273187
on which version of EKS are you?
Ephemerals volumes are stable in 1.23 but they have been beta since 1.19
m

Michel Rouly

02/02/2022, 1:22 PM
Hm. We're on K8s 1.21 with EKS. I'm not sure which EKS version but I can check. That's interesting. When I tried out a generic ephemeral volume I got an error about a missing plugin which I assumed indicated the lack of support in that version.
Specifically this error
Copy code
Unable to attach or mount volumes: unmounted volumes=[test-storage], unattached volumes=[test-storage kube-api-access-mcffp]: failed to get Plugin from volumeSpec for volume "test-storage" err=no volume plugin matched
for this manifest
Copy code
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
  - command: [ls]
    image: debian
    imagePullPolicy: Always
    name: debian
    volumeMounts:
    - name: test-storage
      mountPath: /mnt
  volumes:
  - name: test-storage
    ephemeral:
      volumeClaimTemplate:
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
          storageClassName: gp2-encrypted
          volumeMode: Filesystem
Which seems like a textbook generic ephemeral volume 🤷
a

Andrea Giardini

02/02/2022, 1:38 PM
Mmmmm interesting... how is the
gp2-encrypted
storage class defined?
m

Michel Rouly

02/02/2022, 1:39 PM
nothing too special:
Copy code
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
kind: StorageClass
metadata:
  annotations:
    <http://storageclass.kubernetes.io/is-default-class|storageclass.kubernetes.io/is-default-class>: "true"
  name: gp2-encrypted
parameters:
  encrypted: "true"
  fsType: ext4
  type: gp2
allowVolumeExpansion: false
provisioner: <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
So, another weird detail. When I apply that pod manifest from earlier. The PV, PVC, and EBS volume are created. But it simply fails to mount to the Pod, so the Pod is stuck in
ContainerCreating
.
🤔 1
I've also tried all of this with the AWS EBS CSI driver explicitly installed and without. We normally don't have it running, and the EBS backed
gp2-encrypted
storageclass works fine. I've seen no difference in behavior, we get the same plugin error either way.
a

Andrea Giardini

02/02/2022, 1:50 PM
Can you have a look at the API server flags following this guide? https://docs.aws.amazon.com/eks/latest/userguide/api-server-flags.html In particular, we are interested in the ``GenericEphemeralVolume` flag
m

Michel Rouly

02/02/2022, 1:50 PM
Oh! That's new. Yeah, I'll give that a try.
kubernetes 1
Oh shoot. There's no way to view the API server flags for a cluster that wasn't just recently created?
However, you can create a new cluster with the same Kubernetes version and enable the API server logging when you create the cluster. Clusters with the same platform version have the same flags enabled, so your flags should match the new cluster's flags. When you finish viewing the flags for the new cluster in CloudWatch, you can delete the new cluster.
ah
a

Andrea Giardini

02/02/2022, 2:04 PM
#sad 😄
m

Michel Rouly

02/02/2022, 2:05 PM
hah. alright. I guess I'll try that.
a

Andrea Giardini

02/04/2022, 4:35 PM
@Michel Rouly let me know if you figure it out. I have been thinking about implementing the same in my company (on GKE though).
m

Michel Rouly

02/04/2022, 4:39 PM
Hey Andrea. I had to pause as other tasks took priority. I contacted AWS support in parallel and they indicated that the AWS EKS version we are on, which supports K8S 1.21, also supports generic ephemeral volumes. Which is exactly what we had been trying to do when I got that plugin error. I still plan to spin up a fresh EKS cluster probably next week and verify this. Assuming it works, I suspect generic ephemeral volumes are definitely the way of associating storage volumes with Dagster worker pods.
Would be interested to know if it works out of the box on GKE
a

Andrea Giardini

02/04/2022, 4:50 PM
Will let you know
On GKE
v1.21.6-gke.1500
I can create ephemeral volumes successfully. I tried a simple pod yaml file, I will try to build something on Dagster soon
I confirm that a new storage class is not necessary
m

Michel Rouly

02/18/2022, 4:13 PM
@André Augusto sorry for dropping off! that's awesome, I'm glad to hear it's working on GKE. That's a pretty powerful tool for attaching storage to Dagster pods on Kubernetes.