We use Elastic search with logstash+filebeat to ingest our l dagster #announcements

We use Elastic search with logstash+filebeat to in...

paul.q

02/11/2021, 9:47 PM

We use Elastic search with logstash+filebeat to ingest our logs. We developed a custom JSON logger to plug in to dagster to support what we want to do. Among other things, we like to use the "extra" kwarg for logger methods (debug, warn, etc) to pass a dictionary of useful context that's meaningful to our logging dashboards. So I was surprised when it failed with "do not allow until explicit support is handled". Looking further I found this. Short of overriding all the methods in our custom logger to find a way to cram the "extra" blob into the log output in some other way does anyone have: 1. Any suggestion as to how we might achieve what we're after, or 2. An ideas as to when this part of log_manager will get some treatment to accomodate "extra"? thanks Paul

alex

02/11/2021, 11:18 PM

@max

max

02/11/2021, 11:22 PM

Hi @paul.q, we can prioritize support for this; should be a straightforward fix. We disabled this at the outset because of caution around the way that python smashes

extra

onto the underlying

LogRecord

max

02/11/2021, 11:22 PM

Can you open an issue on github? Should be able to get this into next week's release

paul.q

02/11/2021, 11:37 PM

Done. See https://github.com/dagster-io/dagster/issues/3670

max

02/12/2021, 1:42 AM

🙏

paul.q

03/22/2021, 10:10 PM

I see this hasn't made it into 0.11.0. Any idea when we might see it? Thanks

paul.q

07/21/2021, 12:44 AM

Just bumping this again. This is one of the things that will stop us taking Dagster beyond 'boutique' projects and into a lot more BAU projects that handle data. We really rely on Elastic/Logstash/Kibana for monitoring. Lack of the "extra" dictionary in log records means we can't add very useful application variables that are consumed downstream by filters, alerts and counters. Is there a chance this can find its way into a release "soon"?

alex

07/21/2021, 2:48 PM

Apologies for the speculation of a quick fix with no follow up. cc @owen who may spend some time on logging Just to better understand the situation, do you care more about the message produced by dagster or those in your solids via context.log? Did you consider bypassing dagster and using python logging directly instead of context.log, or is the

dagit

display of these messages also useful for you?

👀 1

paul.q

07/24/2021, 12:52 AM

Hi @alex, yes we could definitely use non-dagster logging to achieve user messages with event data dictionaries. The dagster metadata is very useful though. If we bypass context.log, we could try passing the context as part of the event_data dictionary and then putting a lot of work into

Logstash

to clean up and 'normalise' what's sent to

Elastic

. Up until now, that effort is entirely contained in our logger implementation where its pure python and easy enough to debug as well. A further option is to write another package on top of python logging which would give us control of everything that's written to disk - essentially transplanting the logic from our dagster logger implementation. But we then wouldn't get the benefit of seeing user messages (via context.log) in the

dagit

UI would we? We've also built a REST API that gets pipeline run stats together with messages about pipeline/solid failures (via GraphQL using the logs). I guess these would continue to work because event logs would be unaffected? Let us know if it's worth waiting or should we switch approaches. Thanks Paul

paul.q

07/26/2021, 6:06 AM

Hi @alex and @owen, just for fun I built something that looks like a logger (has debug, info, etc methods) and I can pass the solid

context

object to it along with

extra

. With the

extra

dict passed to it, I munge it into a string and add it to the message, before calling the `context`'s

log.log

method at the end - so after that it's over to our custom json logger. Inside that I can unmunge the extra dict out of the message, add the elements into the log record and also clean up the message (to removed the munged bit). It all works fine, except that the message that appears in the dagit console includes the munged portion as well. In our JSON logger, the message is cleaned up as desired. What I don't understand is: isn't the same log record being passed to all the handlers in the custom Dagster logger? If so, we would expect to get the same in the console log as we see in the json log records?

alex

07/26/2021, 2:34 PM

But we then wouldn’t get the benefit of seeing user messages (via context.log) in the dagit UI would we?

They would no longer be in the structured event stream, but the raw stdout/stderr logs should be visible via dagit assuming you have your

ComputeLogManager

set-up correctly for your deployment. Its not as nice as the structured event stream entries but there is still should be a way to see it

I guess these would continue to work because event logs would be unaffected?

Yep this should be right.

I munge it into a string and add it to the message, before calling the context’s log.log method at the end

What I don’t understand is: isn’t the same log record being passed to all the handlers in the custom Dagster logger? If so, we would expect to get the same in the console log as we see in the json log records?

The loggers should all receive whats passed to the context’s log method, which is what sounds like is happening? I could be misunderstanding the details.

9 Views

Open in Slack

Previous Next