https://dagster.io/ logo
Title
c

Chris Chan

09/16/2021, 5:12 PM
By default, our Datadog integration forwards all stdout / stderr logs to Datadog. Is there a way to format / suppress the logging to stdout / stderr? In particular, all the DEBUG event logs make it hard to see useful information in Datadog, plus we’d like structured logs instead of plain text. The only way I’ve been able to change the Dagster event logging is to attach a json_console_logger named
console
to each pipeline and have a run config set the log level to INFO
o

owen

09/16/2021, 6:01 PM
hi @Chris Chan, unfortunately what you've described is currently the only way of getting the behavior you're describing (consuming stdout / stderr + json formatting + setting a log level). If no loggers are specified on a pipeline, then Dagster will use the regular colored console logger at a DEBUG level, so to modify that you do indeed need to add a logger. I've been working on some changes to the logging system that allow you to set a global python_log_level in your dagster.yaml (e.g.
python_logs:
  python_log_level: INFO
), which it would probably make sense to have our default console logger inherit as well (although this wouldn't solve the json formatting part of your issue). Adding tap points for people to configure logging at the repository level is definitely something we're looking into, and this is a good motivating example.
you can make your life slightly easier and avoid setting the run config for every pipeline by using the configured api
logger_defs={"json":json_console_logger.configured({"level":"INFO"})}
c

Chris Chan

09/16/2021, 6:57 PM
thanks @owen - what I’ve found is that creating a configured version of the logger doesn’t work without setting a run config. So specifically this works
my_job = my_graph.to_job(
    config={
        "loggers": {
            "console": {
                "config": {
                    "log_level": "INFO"
                }
            }
        }
    },
    logger_defs={"console": json_console_logger}
)
but this does not:
my_job = my_graph.to_job(
    logger_defs={"console": json_console_logger.configured({"log_level":"INFO"})}
)
o

owen

09/16/2021, 6:58 PM
hmm that seems strange to me, and is potentially a bug -- does the second statement have no effect, or does it create errors?
c

Chris Chan

09/16/2021, 6:59 PM
it has no effect - when I look at the dagit stdout / stderr, it outputs the DEBUG logs in plain text
o

owen

09/16/2021, 6:59 PM
gotcha, I'll create a ticket for this and investigate more
c

Chris Chan

09/16/2021, 6:59 PM
thank you!
@owen another note - while I’m able to use the run config to suppress the DEBUG logging when running through dagit / dagster-daemon, when executing through
dagster api execute_run
(I’m doing this because I wrote a custom run launcher), I’m still getting log messages in stdout of this type:
{
  "__class__": "DagsterEvent",
  "event_specific_data": {
    "__class__": "EngineEventData",
    "error": null,
    "marker_end": null,
    "marker_start": null,
    "metadata_entries": [
      {
        "__class__": "EventMetadataEntry",
        "description": null,
        "entry_data": {
          "__class__": "TextMetadataEntryData",
          "text": "20"
        },
        "label": "pid"
      }
    ]
  },
  "event_type_value": "ENGINE_EVENT",
  "logging_tags": {},
  "message": "Multiprocess executor: parent process exiting after 1.68s (pid: 20)",
  "pid": 20,
  "pipeline_name": "do_nothing",
  "solid_handle": null,
  "step_handle": null,
  "step_key": null,
  "step_kind_value": null
}
so those are still getting through. stderr logs appears to be what I would expect though
o

owen

09/27/2021, 6:00 PM
hi @Chris Chan -- very sorry for the slow reply (I was out last week). it seems like the behavior with
.configured()
is actually not a bug, weirdly enough. The comment here indicates that only loggers referenced in the run config will actually be initialized. The reasoning makes some vague amount of sense, but it does result in some pretty strange end behavior that diverges from other configurable objects like resources. I'll look into the implications of changing that. For your other question, are all dagster events (like STEP_STARTED, OUTPUT_HANDLED, etc.) being logged this way, or just a subset of them (such as only engine events)?
c

Chris Chan

09/27/2021, 8:13 PM
@owen no problem! I was on vacation myself. I do see other Dagster events like
STEP_START
,
STEP_OUTPUT
,
LOADED_INPUT
o

owen

09/27/2021, 8:29 PM
out of curiosity, is there a reason that you're using
dagster api execute_run
instead of
dagster pipeline launch
(docs)? I don't have a clear picture of why that would make a difference in logging behavior, but I believe they do end up hitting different initialization code paths, and I'm much less familiar with the first way of doing things.
c

Chris Chan

09/27/2021, 8:30 PM
yeah, I wrote our own custom run launcher (for Nomad) - so all the examples I saw of run launchers (the docker, kubernetes, etc) used
dagster api execute_run
o

owen

09/27/2021, 8:33 PM
ah I see, so you're running that inside your custom RunLauncher's launch_run()?
c

Chris Chan

09/27/2021, 8:37 PM
not exactly - so inside my
launch_run()
it sends a request to initiate a Nomad job - the Nomad job is set to execute a shell script that runs
dagster api execute_run ...
where the
is whatever payload is provided
the payload being what comes out of
serialize_dagster_namedtuple
o

owen

09/27/2021, 8:42 PM
gotcha, that all makes sense!
hmm this one is a particularly strange one. what's the exact run config that is being supplied when you get the stdout log messages for every dagster event? I'm particularly confused because the json_console_logger has a default logging level of INFO, so even if your run config was somehow being ignored, the default should prevent those messages from being logged (because they're supposed to be at the DEBUG level).
ah I guess it's probably just what you showed before (with the to_job snippet)
yeah I'm very much struggling to reproduce this issue -- I can't see any reason why the execute_run code path would initialize your json_console_logger any differently than dagit, so it's pretty bewildering that they're producing different behaviors. Just to confirm, what version of dagster are you using at the moment?
c

Chris Chan

09/27/2021, 9:09 PM
I’m on
0.12.10
not sure if this matters, but these runs are being kicked off by a sensor /
RunRequest
- so my run request looks like:
yield RunRequest(
    run_key=s3_key,
    run_config={
        "ops": {
            "get_key": {"config": {"s3_key": s3_key}},
        },
        "loggers": {"console": {"config": {"log_level": "INFO"}}},
    },
)
and I don’t have it baked into my job like I do above
o

owen

09/27/2021, 9:43 PM
hm I wouldn't think that would change anything -- is the
logger_defs
argument in the
to_job
call still
{"console": json_console_logger}
? also is it possible to get a larger snippet of the output logs when you run this? finally, are you collecting these output logs via datadog running on the nomad machine(s), or doing something else to view the stdout/stderr logs?
c

Chris Chan

09/27/2021, 11:31 PM
yes, the job definition looks like this:
update_job = update.to_job(
    logger_defs={
        "console": json_console_logger
    },
    resource_defs={
...
I can pull a full set of logs for each job run, although I probably need to redact the details of the actual work - what would be the most useful format?
basically - our setup is that we have a Datadog agent running on each Nomad node (basically an EC2 instance) that monitors the running containers - it picks up stdout / stderr and forwards them to Datadog. I can also view the stdout / stderr directly from the container (but we generally don’t do that)
o

owen

09/28/2021, 12:10 AM
my hypothesis is that those log entries are not actually being produced by the json_console_logger (which would explain why changing the log level doesn't do anything, and why the format is a bit different from the usual json console logger output). I'm wondering if there's potentially any custom code in this run launcher that could somehow print to stdout?
c

Chris Chan

09/28/2021, 1:14 PM
hmm, I didn’t implement any logging in my run launcher - also all the logs come from the actual execution end. The run launcher just submits a job creation request and all the logs I’ve been discussing are within the container that’s created. I can try to replicate with the Docker run launcher if that would help
o

owen

09/28/2021, 4:19 PM
are all of the stdout lines that you're seeing dagster events or are there other things mixed in? I'd be interested to see a set of logs for a run (this can be just a dummy pipeline, whatever's easiest) -- github gist works but you can also dm me a copy/pasted version here. also, do you know if a custom executor is being used? you're right that the run launcher is probably innocent here.
c

Chris Chan

09/28/2021, 7:02 PM
no custom executor - I’ll work on getting a dummy pipeline that shows this behavior