https://dagster.io/ logo
#dagster-support
Title
# dagster-support
a

Alexey Zavalskiy

09/01/2022, 5:42 PM
Hi there, i have scrapy crawler which i’d like to run from dagster. Is it possible to catch all scrapy logs and show them in dagster insterface during job run (not only in raw log tab, but also in structured log)? Tried to add this in my dagster.yaml file, but it didn’t work out (those are some of scrapy built-in loggers names):
Copy code
python_logs:
  python_log_level: DEBUG
  managed_python_loggers:
    - scrapy.core.engine
    - scrapy.utils.log
In attachment example of .stderr file from job.
j

jamie

09/01/2022, 6:30 PM
the log you attached looks to me like it is getting logs from
scrapy.core.engine
and
scrapy.utils.log
eg
Copy code
2022-09-01 17:28:34 [scrapy.utils.log] INFO: Scrapy 2.6.2 started (bot: scrapybot)
and
Copy code
2022-09-01 17:28:34 [scrapy.core.engine] INFO: Spider opened
is the issue that the log you attached isn't showing up in dagit?
a

Alexey Zavalskiy

09/02/2022, 6:11 AM
Yes, i’d like to display these logs in dagit interface in structured event logs view, not only in raw.
@jamie so is there any way to display these logs in dagit, or maybe i have some mistakes in config?
j

jamie

09/13/2022, 2:57 PM
hi, sorry this fell off my radar. @owen could you take a look? i don't see anything wrong with the logging config, but i'm not super familiar with the custom logging system
o

owen

09/13/2022, 5:02 PM
hi @Alexey Zavalskiy -- your configuration looks correct to me, and I don't see an immediate reason why this wouldn't work. To help debug, if you set
managed_python_loggers
to just
- root
in your dagster.yaml, does that end up capturing logs?
a

Alexey Zavalskiy

09/15/2022, 9:51 AM
hi @owen, yes, setting
root
in
managed_python_loggers
lead to capturing logs
o

owen

09/15/2022, 8:20 PM
hm interesting -- I looked into how scrapy's setting up their loggers and nothing seems out of the ordinary, so I'm a bit confused as to why you'd be seeing this behavior. One further thing to help debug would be to (inside your op) print / log
context.log._managed_loggers
, which should be the actual python logger objects that Dagster is expecting to capture messages from. With your original config, I'd expect that to contain two loggers,
<Logger scrapy.core.engine (INFO)>, <Logger scrapy.utils.log (INFO)>
I think the logger names are sensitive to spaces, so
logging.getLogger("a.b.c") != logging.getLogger("a.b.c ")
, which is something to watch out for (not sure if spaces get stripped out in the yaml config or not)
a

Alexey Zavalskiy

09/16/2022, 4:02 PM
I logged
context.log._managed_loggers
from my op and get
[<Logger dagster.builtin (DEBUG)>, <Logger scrapy.core.engine (DEBUG)>, <Logger scrapy.utils.log (DEBUG)>]
, but still don’t have these logs in dagit if
root
logger isn’t set. Checked for some typos or spaces, didn’t help
98 Views