Danny Jackowitz02/16/2022, 6:17 PM
. We’ve encountered a case where every ~12m our Dagster database CPUs are nearly pegged for ~10m and have quickly tracked it down to the following query:
I see a fix for this just got merged yesterday(!) (there’s no index on
SELECT event_logs.id, event_logs.event FROM event_logs ORDER BY event_logs.timestamp DESC, event_logs.id DESC LIMIT ?
, so makes sense that the current query is so, so slow): https://github.com/dagster-io/dagster/pull/6620 While investigating, though, I noticed that our
table has accumulated many millions of rows, seemingly retained forever. I see this related issue as well: https://github.com/dagster-io/dagster/issues/4497 We’ve also noticed similar seemingly-infinite retention of our
(we use S3). Finally, my question. Until such cleanup is a first-class feature of Dagster, what is safe for us to prune out-of-band? Can we just periodically delete rows older than a given
and use S3 lifecycle rules for
? Or will attempting to do so violate internal consistency assumptions made by Dagster and result in a bad time? Thanks for any guidance here.
prha02/16/2022, 6:59 PM
message. The event log table powers two main views: the individual Run view, and the asset details view. The first one is self-explanatory: we need to fetch the events to display on the run page, including all of the op timing. The second one are all the cross-run materialization events going back in time.
no compute logs available
Danny Jackowitz02/16/2022, 7:09 PM
are strictly for “human” consumption via the Dagit UI? As in, Dagster isn’t using them to make any scheduling decisions? That’s the particular case that I’m worried about (more for
), where we DELETE some old rows and then scheduling goes haywire because Dagster needs the full event history from the beginning of time to decide what to do.
the concern is more whether there’s also some associated metadata so Dagster/Dagit thinks there should be compute logs and then fails when it can’t find them because the S3 objects were deleted, but not the metadata.)
prha02/16/2022, 7:11 PM
Danny Jackowitz02/16/2022, 7:18 PM
prha02/16/2022, 7:19 PM
Danny Jackowitz02/16/2022, 7:23 PM
query). That said, we’re still seeing many GiB/month of growth in
, and presumably that will only ramp up further as we migrate more jobs (we’re currently only running a tiny fraction within Dagster), so having an an official, out-of-the-box way to manage retention would be a great feature to have.
prha02/16/2022, 7:27 PM
Danny Jackowitz02/16/2022, 7:39 PM