https://dagster.io/ logo
Title
m

Mykola Palamarchuk

11/21/2022, 2:45 AM
Hi team! Our dagster event logs table is like 250G now 🙈. Any advice how to properly clean-up? And is there a way to store event logs somewhere else (like s3)?
f

Felix Ruess

11/21/2022, 8:31 AM
Was wondering the same thing... keeping this in postgres is a but suboptimal
a

Adam Bloom

11/21/2022, 3:40 PM
I haven’t configured this yet, but there are some retention settings available. The event log holds things like asset materializations that you would probably want to retain, so it’s not as simple as “drop everything after X days”. https://docs.dagster.io/deployment/dagster-instance#data-retention Curious if there are any planned changes here though!
m

Mykola Palamarchuk

11/23/2022, 12:39 PM
Sorry for bumping the question, but I really need this. I want to clean-up Dagster database (event_logs table is 250G now) without destroying the consistency. Please help!
p

prha

11/23/2022, 10:12 PM
The event log is the ultimate source of truth that powers the asset views, run views, and some run retry functionality. We don’t have an easy way to configure some cold storage archiving, but you should be able to delete certain rows from your event logs table if you no longer need them. For example, rows with the
dagster_event_type == null
correspond to calls to
context.log.[debug|info|error]
in your `@asset`/`@op` functions. If you don’t need those after some amount of time, you can probably just delete them without affecting any of the historical views in dagit (or affecting retries of old runs).
m

Mykola Palamarchuk

11/24/2022, 11:15 AM
@prha, but if i don't realy need to retry some old runs (e.g. they io-manages are in-memory and the source data is no more), can you propose some way clean-up?
a

alex

11/28/2022, 4:25 PM
heres an simple example script for deleting old runs you could start from
import datetime
from dagster import DagsterInstance, RunsFilter
instance = DagsterInstance.get()
month_ago = datetime.datetime.now() - datetime.timedelta(days=30)
batch_size = 10
old_run_records = instance.get_run_records(
    filters=RunsFilter(created_before=month_ago),
    ascending=True,  # start from the oldest
    limit=batch_size
)
for record in old_run_records:
    # delete all the database contents for this run
    instance.delete_run(record.pipeline_run.run_id)
:ty-spinny: 1