https://dagster.io/ logo
Title
n

Nitin Madhavan

02/04/2022, 6:50 AM
Hi, have been running a dagster data pipeline for a few days. The compute logs are being stored in the local folder. The size is continuously growing and disk space is becoming a problem. Is there any way of cleaning old data automatically or put size limit on the folder?
d

Deveshi

02/04/2022, 11:04 AM
Hey - I was facing the same problem - I wrote a small script that checks the compute logs older than n number of days and deletes them. (version 12.11) There might be a better alternative, this one works for now
n

Nitin Madhavan

02/04/2022, 1:18 PM
Thanks @Deveshi I am also hoping for a better alternative. Have too many clean up scripts running on the server🙂
n

Nitin Madhavan

02/05/2022, 4:36 AM
@Deveshi Would it be possible for you to share the cleanup script you are using? Would be grateful🙏
d

Deveshi

02/07/2022, 5:49 PM
Hi @Nitin Madhavan here you go - This also deletes associated run and event log first (stored in postgres), and then removes compute log locally set as a cron, works well for my use case
import os
import shutil
from pathlib import Path
import time
import subprocess
import datetime

dagster_home = '/home/ubuntu/dagster_home'
storage_dir = Path(dagster_home, "storage")

runs_to_delete = []

now = time.time()
old = now - 864000

for dir in os.listdir(storage_dir):
    dir_path = str(Path(storage_dir, dir))
    if os.path.getmtime(dir_path) < old:
        runs_to_delete.append(dir)

print(f"{datetime.datetime.now()}:Begin: runs to delete = {len(runs_to_delete)}")

for run_id in runs_to_delete:
    try:
        #delete the associated run
        subprocess.call(f"/home/ubuntu/.local/bin/dagster run delete {run_id} -f", shell=True)

        #clear compute storage
        if Path(storage_dir, run_id).is_dir():
           shutil.rmtree(Path(storage_dir, run_id))

    except BaseException as err:
        print(f"exception occured {datetime.datetime.now()} {err}")

print(f"{datetime.datetime.now()}: End: runs deleted = {len(runs_to_delete)}")
n

Nitin Madhavan

02/08/2022, 9:06 AM
Thanks a lot @Deveshi 🙏🙂