Philip Strnad
08/17/2023, 10:42 PMevent_logs
than job_ticks
(which is where I believe the tick data is stored - but either way, event_logs
uses about 21GB of space whereas the next largest table is asset_event_tags
at 1.2GB). It sounds like the only way to clean up event_logs
is to create a Dagster job that either executes a SQL statement directly on the table or calls context.instance.delete_run(...)
based on some list of run ID's that I would generate based on data in event_logs
- are there other options I missed?
I'd like to keep the size of the db from growing unnecessarily large, and I'm pretty sure we don't really need to look at run data that's older than a few weeks.Philip Strnad
08/17/2023, 10:42 PMdagster_event_types
in the table:
count | dagster_event_type
---------+-------------------------------
6 | HOOK_COMPLETED
202 | HOOK_SKIPPED
330 | PIPELINE_CANCELING
342 | PIPELINE_CANCELED
2998 | PIPELINE_ENQUEUED
6524 | STEP_SKIPPED
9896 | PIPELINE_SUCCESS
10788 | PIPELINE_FAILURE
19334 | PIPELINE_START
19398 | PIPELINE_STARTING
49717 | STEP_FAILURE
148834 | ENGINE_EVENT
584840 | STEP_INPUT
586052 | LOADED_INPUT
669303 | ASSET_OBSERVATION
829815 | ASSET_MATERIALIZATION
831232 | HANDLED_OUTPUT
831407 | STEP_OUTPUT
835837 | STEP_SUCCESS
885650 | STEP_START
885653 | LOGS_CAPTURED
885656 | RESOURCE_INIT_SUCCESS
885841 | RESOURCE_INIT_STARTED
885955 | STEP_WORKER_STARTED
894480 | STEP_WORKER_STARTING
1109948 | ASSET_MATERIALIZATION_PLANNED
2556012 |
Sort of surprised to see so many null rows, which I think are produced by context.log()
? I didn't think we do that much logging of our own, but I should check that. It does seem that deleting based on event type will leave behind a fragmented run history so it's probably better to delete by run, for data consistency at least.prha
08/17/2023, 11:18 PMcontext.log
calls. The input/output events are used to power the “Retry” feature, so might be okay to remove after some amount of time. The asset events (materialization, observation) power the asset catalog / asset history, so deleting those will have impact on history.Philip Strnad
08/17/2023, 11:49 PMcontext.instance.delete_run()
, running that on a weekly or monthly basis and feeding in run ID's obtained from the runs
table is probably a better solution. Which event types does delete_run()
remove? I haven't had a chance to look at the code.prha
08/18/2023, 12:01 AMdelete_run
will delete all the events for the run (including materialization history, etc).Philip Strnad
08/21/2023, 9:38 PMPhilip Strnad
08/21/2023, 9:40 PMprha
08/22/2023, 12:35 AMPhilip Strnad
08/22/2023, 4:43 PMprha
08/22/2023, 4:45 PMjob_ticks
tablePhilip Strnad
08/22/2023, 4:46 PMprha
08/22/2023, 4:52 PM