Hi! I wanted to ask you how do you handle dagster ...
# announcements
c
Hi! I wanted to ask you how do you handle dagster logs that get too big? How would for example a clean of old logs/events/etc look like? In only under a month we are seeing that the dagster db take huge space and becoming slower probably due to this
a
We have some utilities for removing everything but I’m guessing you want to keep recent ones around right?
Can you file an issue and include what criteria you would like to use to decide which runs to remove / keep?
In the short term there might be enough in the python APIs to write your own script if you would be interested in that path
c
Thanks! initially it would make sense to keep the most recent runs given some window (eg 1mo). It could also be a manually triggered once we notice the db is getting too big
I will check the issue, also, could you point me towards the functionalities you mention in the API?
a
DagsterInstance.get().delete_run(run_id)
Copy code
from dagster import DagsterInstance

def clean_old_runs():
    # get access to instance DBs - assumes locally configred with 
    # $DAGSTER_HOME/dagster.yaml
    
    instance = DagsterInstance.get()

    runs = instance.get_runs() # can take limit & cursor if you want to go in batches

    for pipeline_run in runs:
        stats = instance.get_run_stats(pipeline_run.run_id)
        stats.end_time # unix time stamp
        # do your date math
        if too_old:
            instance.delete_run(pipeline_run.run_id)
^ heres a rough sketch of what it could look like
c
Alright! let me check and Ill get back to you
Also, perhaps as a related question, we would like to do this since for example displaying the logs of a single run takes around 5 minutes to fully load. The Postgres DB were using is around 16 GB now and it seemed like a candidate of a cause for the slowdown. However, would you have any idea of why it is so slow? How could we check to improve performance?
a
What version are you on? 0.8.0 fixes some bad performance bugs we had
👍 1