Any advice on using fluentd bit to upload JSON formatted app dagster #deployment-kubernetes

Any advice on using fluentd/bit to upload JSON-for...

Mark Fickett

10/28/2022, 2:37 PM

Any advice on using fluentd/bit to upload JSON-formatted application logs from user code in k8s? The standard would be a daemonset, but that pulls from the container's stdout, which is a bit noisy. Our current (on-prem) setup is to have a separate fluentd process that tails a JSON-formatted log file. Is there support for a k8s agent to start multi-container pods for user code (I don't see anything in its Helm values)? Is it a terrible idea to just also run fluentbit in our application image?

daniel

10/28/2022, 3:03 PM

you can use these tags to have multi-container pods for jobs: https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment#job-or-op-kubernetes-configuration Supporting that for every pod that the agent spins up include the code servers would be the same change that would enable tolerations

ty thankyou 1

Mark Fickett

11/04/2022, 6:03 PM

Coming back to this, I think you can actually configure a fluentd daemonset to use the tail plugin to read from a file path just fine. Then so long as the fluentd daemonset and the Dagster workspace pod have a shared volume to read/write from/to, that should work smoothly. (Not sure if a local or persistent volume makes a difference.) So I think the only thing needed from Dagster is specifying a volume for the pod via the agent's

workspace.volumes

. I'm not sure how that differs from

@job tags pod_spec_config volumes

, but I like dealing with this in the Helm / kubernetes arena rather than the

@job

arena if they're equivalent.

Mark Fickett

11/16/2022, 5:00 PM

Following up again, how would I provide a

ConfigMap

to the job? I'm trying to follow this example for a fluentd sidecar. I think I can use the

@graph.to_job(tags=...)

to specify the additional

containers

and

volumes

for the pod, and the additional

volumeMount

for the user code container, but I'm not sure where a

ConfigMap

fits in. Here's what I have so far. Specifically, I'm looking for how to add in

fluentd.conf

data in the

configvol

volume based on the

admin/logging/fluentd-sidecar-config.yaml

in the example.

Untitled.py

Mark Fickett

11/16/2022, 6:37 PM

Maybe what I need to do is separately publish the

ConfigMap

into my cluster, and then just reference it by name from my

@job

. It seems a little odd to have my

@job

know about my EKS cluster config, but also makes sense that

etcd

contents would not be defined on the job.

daniel

11/16/2022, 6:38 PM

I think that's right - Dagster doesn't currently provide any functionality for creating additional k8s resources like configmaps for you

👍🏻 1

Mark Fickett

11/18/2022, 6:23 PM

For posterity, here's what I got working: • No extra tags on the Dagster

@job

. According to documentation these only affect the k8s run launcher / job per pod, and not the job per step mode which is what I'm using. •

volumes

and

volumeMounts

in the agent's Helm chart. These did add the host volume mount I needed, mapping in

/var/log

to my step pods. So then my application code can write out a JSON-formatted log file that

fluent-bit

can see. •

fluent-bit

as a daemonset. By default this maps

/var/log

into the fluent-bit container, so I didn't need to define extra volumes on that side. And the

fluent-bit

Helm chart has options for its config file which it then publishes as a ConfigMap, so I didn't need to separately set up a ConfigMap. All in all, pretty concise and nicely organized; for the most part the application code remains agnostic of the deployment configuration.

3 Views

Open in Slack

Previous Next