Mikael Frosthage
03/11/2024, 5:23 PMScout
03/11/2024, 5:23 PMMikael Frosthage
03/11/2024, 5:25 PMScout
03/11/2024, 5:25 PMpython import datetime from dagster import DagsterInstance, RunsFilter instance = DagsterInstance.get() # Define the time threshold for what is considered old, e.g., 1 week week_ago = datetime.datetime.now() - datetime.timedelta(days=7) old_run_records = instance.get_run_records( filters=RunsFilter(created_before=week_ago), limit=10, # Limit how many are fetched at a time, perform this operation in batches ascending=True, # Start from the oldest ) # In this simple example, we delete serially # For higher throughput, you could parallelize with threads for record in old_run_records: # Delete all the database contents for this run instance.delete_run(record.dagster_run.run_id)
This script fetches and deletes runs that were created before a certain date (one week ago in this example). You can adjust the week_ago
variable to suit your needs. The limit
parameter controls how many runs are fetched at a time, and you can run this script in batches to manage the deletion process. Please note that deleting runs is a destructive operation that will remove the events, tags, and run records from the database. This means that Dagster will no longer have any record of these runs, which could impact partitioned jobs and assets. Therefore, you should perform this operation with great care. If you have a large number of runs to delete and are concerned about the performance impact of deleting them one by one, you might consider running the deletion in parallel using threads or adjusting the batch size. Before running any scripts that modify your instance's data, ensure you have appropriate backups and understand the implications of these changes. If you're unsure or if there have been updates to Dagster's capabilities since my last update, please refer to the latest documentation or seek assistance from the Dagster community.