Mark Fickett
12/22/2022, 4:43 PMShalabh Chaturvedi
12/22/2022, 5:16 PM429
responses should auto-retry after a delay and we do not expect failures. This helps spread out the heavy load. Are you seeing failures in your logs or do the runs eventually succeed?
Is it also possible to reduce your usage of context.log
writes? That is the only request that is hitting the rate limit.Mark Fickett
12/22/2022, 6:13 PMShalabh Chaturvedi
12/22/2022, 6:33 PMIn theory we could reduce log usage, but not easily. It would be a slow process to comb through our logs and try to find ones which we might not need. And generally we want all logs to be captured, since it's useful to see logs in Dagit.Sometimes we see accidental heavy log usage (eg dumping large data in a tight loop). Good to know this is not the case and these logs are actually useful. Note that scaling the log writes (and scaling the database in general) is an active project we are working on and a key priority for us for the next couple of months. For this specific job - I noticed that the spike of log writes stopped before 9am pacific time today. Was this a one-off job or scheduled job? If we increase the rate limit are you able to retry this job?
Mark Fickett
12/22/2022, 6:36 PMdaniel
12/22/2022, 6:44 PMMark Fickett
12/22/2022, 6:45 PMmanaged_python_loggers
with a console handler?
python_logs:
python_log_level: INFO
managed_python_loggers:
# Capture logs from all Python loggers into Dagster's UI.
# I assume this is what we'd change to go to compute logs,
# but we still want context.log calls to funnel into the root logger.
- root
# Specify additional handlers. We want these to get all the log messages
# from context.log and application logs / the Python root logger.
dagster_handler_config:
handlers:
combined_json:
class: processing.common.log.JsonFileHandler
level: INFO
otel_span:
class: form_observability.OtelSpanEventHandler
level: INFO
daniel
12/22/2022, 6:57 PMMark Fickett
12/22/2022, 6:58 PMdaniel
12/22/2022, 7:03 PMMark Fickett
12/22/2022, 7:10 PMcontext.log
to some other logging.getLogger(..)
. A big find-replace but not hard. And then I would just attach these handlers in our own setup (maybe in a resource).
There are some Dagster framework logs that are nice to see alongside the log message we write. So I guess we might want to also attach the handlers via dagster_handler_config
for those. It would be too bad not to see the logs interleaved normally in the compute log output, though.
Overall, sounds like switching to compute logs is a potential workaround, but if we're not really abusing the log system and if raising the event log quota is possible, that would be a easier and result in a bit more streamlined product experience.daniel
12/22/2022, 7:12 PMMark Fickett
12/22/2022, 7:17 PMdaniel
12/22/2022, 7:25 PMMark Fickett
12/22/2022, 7:28 PMdaniel
12/22/2022, 7:30 PMMark Fickett
12/22/2022, 7:31 PMAn exception was thrown during step execution that is likely a framework error, rather than an error in user code.
dagster_cloud_cli.core.errors.GraphQLStorageError: Max retries (6) exceeded, too many 429 error responses.
Stack Trace:
File "/usr/local/lib/python3.10/site-packages/dagster/_cli/api.py", line 441, in _execute_step_command_body
yield from execute_plan_iterator(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/api.py", line 1190, in __iter__
yield from self.iterator(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_plan.py", line 114, in inner_plan_execution_iterator
for step_event in check.generator(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_plan.py", line 333, in dagster_event_sequence_for_step
yield step_failure_event_from_exc_info(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/objects.py", line 122, in step_failure_event_from_exc_info
return DagsterEvent.step_failure_event(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/events/__init__.py", line 802, in step_failure_event
return DagsterEvent.from_step(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/events/__init__.py", line 418, in from_step
log_step_event(step_context, event)
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/events/__init__.py", line 297, in log_step_event
step_context.log.log_dagster_event(
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/log_manager.py", line 387, in log_dagster_event
self.log(level=level, msg=msg, extra={DAGSTER_META_KEY: dagster_event})
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/log_manager.py", line 402, in log
self._log(level, msg, args, **kwargs)
, File "/usr/local/lib/python3.10/logging/__init__.py", line 1624, in _log
self.handle(record)
, File "/usr/local/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
, File "/usr/local/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
, File "/data-pipeline/orchestration/log.py", line 96, in handle
super().handle(record)
, File "/usr/local/lib/python3.10/logging/__init__.py", line 968, in handle
self.emit(record)
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/log_manager.py", line 267, in emit
handler.handle(dagster_record)
, File "/usr/local/lib/python3.10/logging/__init__.py", line 968, in handle
self.emit(record)
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 181, in emit
self._instance.handle_new_event(event)
, File "/usr/local/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 1587, in handle_new_event
self._event_storage.store_event(event)
, File "/usr/local/lib/python3.10/site-packages/dagster_cloud/storage/event_logs/storage.py", line 371, in store_event
self._execute_query(
, File "/usr/local/lib/python3.10/site-packages/dagster_cloud/storage/event_logs/storage.py", line 248, in _execute_query
res = self._graphql_client.execute(
, File "/usr/local/lib/python3.10/site-packages/dagster_cloud_cli/core/graphql_client.py", line 111, in execute
raise GraphQLStorageError(
The above exception was caused by the following exception:
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: <https://formenergy.agent.dagster.cloud/graphql>
Stack Trace:
File "/usr/local/lib/python3.10/site-packages/dagster_cloud_cli/core/graphql_client.py", line 72, in execute
return self._execute_retry(query, variable_values, headers)
, File "/usr/local/lib/python3.10/site-packages/dagster_cloud_cli/core/graphql_client.py", line 157, in _execute_retry
response.raise_for_status()
, File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
11:36:00.058
data_pipe_graph_…asquatch_anode
↳raw_data_graph
↳_normalize_t…0ue9r_plus_25]
Step data_pipe_graph_sasquatch_anode.raw_data_graph._normalize_task[0ue9r_plus_25] failed health check: Discovered failed Kubernetes job dagster-step-e5abc3ae8aad5ae2b7ebf2295ce0fde4 for step data_pipe_graph_sasquatch_anode.raw_data_graph._normalize_task[0ue9r_plus_25].
root
from the managed_python_loggers
allowed the run to complete successfully. (But I haven't done the other cleanup / verification to make sure the logs get to where we need them otherwise.)daniel
01/03/2023, 1:54 PM