Binoy Shah

09/21/2022, 1:25 PM
Question about Logger in general: I am wondering why Dagster has exposed loggers and its extension as public API. 1. As far as I understand Dagster Loggers are always job scoped. There are no global scoped exposed loggers a. So how do i do logging for my custom code which might not be part of jobs/ops ? 2. “built-in logger that tracks all the execution events”.. Is it tracking of events or recording of events ? a. Do Dagster loggers create coded event outputs that other components track and act upon it ? 3. Do I have to extend loggers or use dagster loggers for Informative/Diagnostic logging? 4. Do I have to use Dagster’s Logging mechanism to trigger any operational functionality? eg. Refreshing asset, triggering cron job etc ?


09/21/2022, 5:06 PM
hi @Binoy Shah! loggers in dagster can definitely be confusing, as they're a bit of a blend of regular python logging with some extra constraints (because any message produced by a dagster logger is intended to be stored in persistent storage, so it can be associated with a particular execution of a job / op). Taking a crack at these questions in order...
1. This is true -- any dagster logging needs to happen somewhere during the execution of a job. If your custom code is being executed during that execution, you can call get_dagster_logger within that custom code to get a logger which will produce log messages that will be associated with the currently-executing job. If your custom code is not intended to execute during job execution, then this logger will just output messages to stdout, and these messages will not be persisted anywhere
2. the built-in logger is purely for recording unstructured information i.e. "Foo value is 3" or "Finished doing X". Any structured information such as recording when a particular step started, or if a job failed is handled automatically by the dagster framework.
3. You don't have to extend loggers for this functionality, you can just use the default get_dagster_logger, or inside an op you can call
<|>("some message")
which will have the same effect.
4. similar to the answer for 2, all operational state is handled automatically by framework code, so there is no requirement to interact with loggers in these cases.
at a high level, logging is a completely opt-in functionality, and in many cases you do not need to use it at all. In simple cases, it's useful to be able to add extra unstructured information to the event log, which you can view in Dagit, to give some extra information when trying to debug or understand a job. This is where you would call
<|>("finished doing x")
get_dagster_logger().info("some value is 3")
inside an op or helper function
you can also customize how / where log messages are emitted. While log messages are always stored in persistent storage no matter what configuration you use, they are also emitted (by default) to the stdout stream, and you can add more sinks for advanced use cases

Binoy Shah

09/21/2022, 8:37 PM
Thank you Owen for detailed explanation. This was very helpful. It clarifies lot of ambiguity. For my needs our Kubernetes infrastructure has Cluster level log aggregators which scrape all container stdout/stderr streams so our logging is implicitly captured. In such cases where the infrastructure already has logging and monitoring via other systems: 1. Should I turn off my DB storage of logs ? 2. Do I need to “register” my external Logging Aggregator? or Put in some kind of Log forwarding mechanism?