https://dagster.io/ logo
#dagster-feedback
Title
# dagster-feedback
m

Mark Fickett

06/28/2022, 3:00 PM
After updating to
0.15.2
, I'm seeing lots of log messages like
2022-06-28 10:55:16 -0400 - dagster - DEBUG - battery_data_job_local - 983e56f8-a69b-4edd-937b-0417012c2a37 - 296222 - STEP_WORKER_STARTED - Executing step "blah" in subprocess.
that don't seem to be respecting the
dagster.yaml
setting of:
Copy code
python_logs:
  python_log_level: INFO
It looks like maybe
default_system_loggers
have
DEBUG
level by default which is causing this. If I edit that directly it silences the log messages, but I don't see a way to configure it otherwise.
And digging further, the message seems to have been introduced in https://github.com/dagster-io/dagster/commit/483c2765fb8c1c2912a4f1981847b8cb7816b62a#diff-d7a8ba1870a1d1040d3670eb[…]23a2e6367c86c26a28a5005f85eee9 so dropping back to
0.15.0
removes it.
d

daniel

06/28/2022, 3:22 PM
Yeah, this was me, I'll take a look at this
m

Mark Fickett

06/28/2022, 3:22 PM
Thanks!
d

daniel

06/28/2022, 3:27 PM
Which process output are you seeing them in exactly?
dagit?
m

Mark Fickett

06/28/2022, 3:28 PM
In the console. I'm running a job via
dagster.execute_pipeline
.
d

daniel

06/28/2022, 3:32 PM
Roger
i'm having some trouble reproducing this working before in 0.15.0 - on a script like this:
Copy code
import requests
import csv
from dagster import job, op, get_dagster_logger, execute_pipeline, reconstructable, DagsterInstance


@op
def hello_cereal():
    response = requests.get("<https://docs.dagster.io/assets/cereal.csv>")
    lines = response.text.split("\n")
    cereals = [row for row in csv.DictReader(lines)]
    get_dagster_logger().info(f"Found {len(cereals)} cereals")

@job
def hello_cereal_job():
    hello_cereal()

if __name__ == "__main__":
    with DagsterInstance.get() as instance:
        execute_pipeline(reconstructable(hello_cereal_job), instance=instance)
If I run this with a
dagster.yaml
like this:
Copy code
python_logs:
  python_log_level: INFO
I still see a bunch of DEBUG lines in the logs like:
Copy code
2022-06-28 11:53:28 -0500 - dagster - DEBUG - hello_cereal_job - 8a54c239-6ea3-487e-8786-bf45200d2593 - 13536 - RUN_START - Started execution of run for "hello_cereal_job".
Is that different from what you're seeing
m

Mark Fickett

06/28/2022, 5:01 PM
Ah bummer. I am seeing something different, but I'm also hacking around
DagsterLogHandler
to get console logging output the way I want (previous discussion on that), and maybe that's working <= 0.15.0 but not with
default_system_loggers
. So maybe this is a feature request and not a regression with this specific message's handling.
d

daniel

06/28/2022, 5:02 PM
Ah, yeah, I think we would need to see your changes to evaluate that
with that, both the old events and the new events that are DEBUG go away for me
m

Mark Fickett

06/28/2022, 5:16 PM
Here's my extension for DagsterLogHandler which I patch in. I think the issue (for this setup) is that the "Executing step" message doesn't go through DagsterLogHandler, whereas most other messages do. And the
dagster.yaml
log level setting doesn't come into play either way. But https://github.com/dagster-io/dagster/pull/8648 looks like just what I'd need!
Copy code
class _PatchedDagsterLogHandler(log_manager.DagsterLogHandler):
    """Customize DagsterLogHandler to get back original messages and add formatting."""

    def __init__(self, *args, **kwargs):
        self._customized_handler = logging.StreamHandler()
        # This handler's filter only affects console output, so the noisy messages
        # remain visible in the Dagit UI. (Also even if this filter edits the
        # messages' level in place, the Dagit UI and JSOn logger still sees them at
        # their original level, possibly because those loggers consume them first.)
        self._customized_handler.addFilter(NoisyLogsFilter())
        self._customized_handler.setFormatter(coloredlogs.ColoredFormatter(_FORMAT))
        self._customized_handler.setLevel(<http://logging.INFO|logging.INFO>)
 
        super().__init__(*args, **kwargs)

    def _convert_record(self, record: logging.LogRecord) -> logging.LogRecord:
        """Restores the LogRecord's original msg.

        This needs to override the existing DagsterLogHandler's behavior because it
        affects both our console output, and logs sent to OpenTelemetry / JSON.

        <https://github.com/dagster-io/dagster/issues/7642>
        """
        converted = super()._convert_record(record)
        if "STEP_FAILURE" in converted.msg:
            # Dagster packs exception info for step failures into
            # converted.dagster_meta['dagster_event'].event_specific_data.error
            # not standard exc_info, so just use the provided formatted msg.
            return converted

        # Dagster's log_manager.DagsterLogHandler overwrites the LogRecord's msg and
        # args fields in-place. Although the original getMessage() value is saved in
        # dagster_meta["orig_message"], the args and exc_info are lost.
        record.msg = record.dagster_meta["orig_message"]
        record.args = ()  # Dagster stores the result of getMessage(), not msg + args.
        record.exc_text = None  # Tell downstream formatters to reformat exc_info.
        return record

    def handle(self, record):
        """Passes records to our custom console handler.

        We use this instead of the Dagster framework @logger mechanism because Dagster
        re-logs the message to @loggers and they therefore lose the logger name,
        filename/lineno, etc. <https://github.com/dagster-io/dagster/issues/7890> By
        passing records to our handler's .handle() method we make all original
        attributes available for formatting.

        Dagster doesn't expose a way to add more handlers along side DagsterLogHandler,
        so adding a sub-handler here is just a way to hook into the log record stream.
        It also matches the pattern DagsterLogHandler uses, or dispatching log records
        to various other internal handlers and log records.
        """
        super().handle(record)
        # The level check is usually done before the handler gets the record.
        if self._customized_handler and (
            record.levelno >= self._customized_handler.level
        ):
            self._customized_handler.handle(record)
Hi @daniel, checking back on https://github.com/dagster-io/dagster/pull/8648 . It looks like it's still waiting for review -- do you think it's likely to go in? I have been waiting to upgrade our Dagster dependency versions for that so we don't have noisy logs.
d

daniel

07/15/2022, 2:53 PM
So I've learned that at least one of our executors relies on the pod logs to do its job 😕 specifically the celery_k8s_executor. So this may need to be a different setting than instance.python_log_level
or we could break that executor
that's a pattern that we've moved away from, i think its just one or two executors that still use it
m

Mark Fickett

07/15/2022, 2:58 PM
Gocha. Well, I can't really balance that tradeoff, but we're not using Celery and do use console logs. (:
a

Averell

08/29/2022, 12:13 PM
Hi @daniel @Mark Fickett does this issue affect all other launchers, or just Celery? With the docker launcher, I tried to set the log level to WARN
Copy code
python_logs:
  python_log_level: WARN
and could still see the INFO logs coming out from the daemon process. I'm on 1.0.6. Thanks
m

Mark Fickett

10/18/2022, 2:14 PM
Hi @daniel, revisiting this thread. The context is that I've held off updating past Dagster 0.15 because our team's typical workflow is to run the Dagster job via a
execute_job
call and view console logs, and the
default_system_loggers
DEBUG
output makes this too noisy to be usable. But I'd like to get up to date. Would you be OK with adding another setting such as
default_system_log_level
as a peer of
python_log_level
? Glad to help with a PR based on yours above, but I'm not sure where to find associated tests / docs / etc that would need updating as well. I was also trying to patch
default_system_loggers
or
LogManager
or
LogHandler
at runtime, but I'm not finding a time when my code is loaded and can do some patching before the log machinery is already set up and issuing step-start events.
Also just confirming that in the job config,
loggers: console: config: level: ERROR
shouldn't fix this? https://docs.dagster.io/concepts/logging/loggers#configuring-the-built-in-loggers
d

daniel

10/18/2022, 2:39 PM
i'm going to tag in @owen about the last idea about loggers, although I believe he's at coalesce so may be a bit slower to respond - if there was a way to workaround with config that sounds ideal though
Mark is that something you could try quick to see if it matches what you would expect?
m

Mark Fickett

10/18/2022, 3:17 PM
I did try to set that
loggers
config and it didn't seem to have any effect. From spelunking in the code a bit, it looks like
default_system_loggers
doesn't look at any configs, and the multiprocess executor just calls
create_context_free_log_manager
which always uses the
default_system_loggers
. So if I'm following that right, with a multiprocess executor it just always installs the default system logger at DEBUG level and does not consult other loggers or configs, but I was hoping I'd missed something.
d

daniel

10/18/2022, 4:11 PM
ah, ok - so it seems that it's just those STEP_WORKER_STARTED and RUN_START events that are logged at the very beginning of the process before the config is read in
and you didn't have this problem with the "Executing step xxx in subprocess" messages in 0.15.x? Those were logged to INFO?
oh right, we covered that earlier in the thread, there were modifications to DagsterLogManager
👍🏻 1
I think maybe we should just bring that PR, I opened it up for discussion: https://github.com/dagster-io/dagster/pull/10073 - it could be a breaking change for people using that feature with those executors though, so there's a possibility it'd wait until v1.1 - will see what people think
m

Mark Fickett

10/18/2022, 8:38 PM
OK, thanks. Is there anything I can do to help with that -- comment with use case on the PR, work up a minimal example, etc?
Also, those executors would still work so long as the log level were DEBUG, right? I guess I'm not quite sure of the interplay, like if you're using those executors I don't know if you'd care about having excessive console logging or of turning up the level would necessarily affect other things.
d

daniel

10/18/2022, 8:40 PM
right, the breaking change would be configuring the log level to something other than DEBUG while using one of those executors
👍🏻 1
Just landed that PR, so that'll go out later this week
m

Mark Fickett

10/24/2022, 2:49 PM
Awesome, thank you sir!
I see it in the release notes for 1.0.15, woohoo!
celebrate 1