Hi team! Our dagster event logs table is like 250G...
# ask-community
m
Hi team! Our dagster event logs table is like 250G now 🙈. Any advice how to properly clean-up? And is there a way to store event logs somewhere else (like s3)?
f
Was wondering the same thing... keeping this in postgres is a but suboptimal
a
I haven’t configured this yet, but there are some retention settings available. The event log holds things like asset materializations that you would probably want to retain, so it’s not as simple as “drop everything after X days”. https://docs.dagster.io/deployment/dagster-instance#data-retention Curious if there are any planned changes here though!
m
Sorry for bumping the question, but I really need this. I want to clean-up Dagster database (event_logs table is 250G now) without destroying the consistency. Please help!
p
The event log is the ultimate source of truth that powers the asset views, run views, and some run retry functionality. We don’t have an easy way to configure some cold storage archiving, but you should be able to delete certain rows from your event logs table if you no longer need them. For example, rows with the
dagster_event_type == null
correspond to calls to
context.log.[debug|info|error]
in your `@asset`/`@op` functions. If you don’t need those after some amount of time, you can probably just delete them without affecting any of the historical views in dagit (or affecting retries of old runs).
m
@prha, but if i don't realy need to retry some old runs (e.g. they io-manages are in-memory and the source data is no more), can you propose some way clean-up?
a
heres an simple example script for deleting old runs you could start from
Copy code
import datetime
from dagster import DagsterInstance, RunsFilter
instance = DagsterInstance.get()
month_ago = datetime.datetime.now() - datetime.timedelta(days=30)
batch_size = 10
old_run_records = instance.get_run_records(
    filters=RunsFilter(created_before=month_ago),
    ascending=True,  # start from the oldest
    limit=batch_size
)
for record in old_run_records:
    # delete all the database contents for this run
    instance.delete_run(record.pipeline_run.run_id)
ty spinny 1
492 Views