Hi! Is there a way to delete runs other than throu...
# ask-community
m
Hi! Is there a way to delete runs other than through Dagit or using "dagster run wipe"? I would like for example to be able to delete runs older than yesterday. We have jobs running every 2 and 5 minutes so I don't want to do that clicking through Dagit, but I also don't want to wipe ALL history either. The thing is, our Postgres pod resources are limited and we want it to stay that way, but now Dagit struggles to show Runs and often errors. For now I would just like to make that work again by deleting old history, but eventually the perfect scenario would be automatic periodical wipe of run history older than x, especially when the number of jobs and runs increases. Thanks in advance for help and suggestions. 🙂
We’re also planning in a future release to support this sort of pruning natively
👍 1
m
Great, thank you so much!
@johann one more question please. Am I right to assume that when we talk about "deleting a run" that means deleting rows from the associated tables in Dagster Postgres? I realized the graphql API won't make my life much easier so I tried to figure out how to delete directly from this database. But since more tables are probably involved, I wasn't sure how to do that, IF my assumption is correct at all.
By the way, we deployed Dagster with Helm and replaced the Dagster Postgres with our own database (which is just another pod created with a separate Bitnami Postgres helm chart - we couldn't make Dagster's default Postgres work in our Openshift cluster).
j
“deleting a run” that means deleting rows from the associated tables in Dagster Postgres?
yes. Usually the largest table is the event logs, which are keyed per run
🙌 1
g
Hey @johann do you know if a command line tool or dagster provided job is already in development? I'd be tempted to contribute
m
Thanks for confirming my research! I was afraid there was more to it and I would break something by deleting from database directly. So for now, we can create a separate Dagster job for periodical delete of old runs and associated event_logs directly from our database, but I'll keep track on new releases for this feature.
j
do you know if a command line tool or dagster provided job is already in development?
I don’t think we have it in development. It likely would be a new daemon process I think. If you were interested in putting together a script to delete via graphql, I think that could also be useful in the interim
👍 1
a
Hi @MasaN, @johann I'm also interested in a solution for purging these runs/event logs, and it seems the feature that Johann mentioned is not available yet. Out of the 2 walk-around solutions discussed, wouldn't the one with removing rows from the DB tables be easier to implement than by deleting run_ids one by one via GraphQL? Thanks.
j
It would certainly be faster!
m
Hello @Averell, I can confirm our solution works well. So we indees have a separate Dagster job that gets run IDs older than 3 days (3 is what we agreed on) and then deletes associated event logs (and runs) from the metadata database. We set up a schedule to run daily and so far it works as expected. However, I also believe it would be nice to already have a similar solution out-of-the-box, because when you start to like Dagster and you use it for more and more jobs, your event log gets really crowded. 😉
a
Thanks a lot. I'll schedule that clean-up. BTW, did you think of archiving those logs instead of removing them? Or having some kind of cheaper storage than Postgres/MySQL? Thanks