https://dagster.io/ logo
c

Clement Emmanuel

05/24/2023, 4:01 PM
This query
Copy code
SELECT
  event_logs.id,
  event_logs.event
FROM
  event_logs
WHERE
  event_logs.dagster_event_type = $1
ORDER BY
  event_logs.id DESC
Which seems to be invoked by
Copy code
context: MultiAssetSensorEvaluationContext
context.latest_materialization_records_by_partition_and_asset()
Becomes very expensive as the event_logs table grows (which is indefinite I believe as it's essentially a write-only table). This is expensive even with the appropriate indexes that get leveraged by the query plan. Has there been any throughput testing on this pattern, or any ideas as to how to optimize this. Unless i'm missing something this seems to make the canonical use of multi-asset sensors non viable even at a fairly modest scale as it will only decay in performance as materializations continue until it eventually (or in the case that materializations already exist when turning on the sensor, immediately) can't complete within the hard 60 second timeout
1
v

Vitaly Markov

05/24/2023, 4:06 PM
It should add filters by asset key and cursor position (id), so it should not be too bad. But fundamentally it does scan waaay too much data. Feel free to upvote this feature-request: https://github.com/dagster-io/dagster/issues/14406 In my view, it's the only proper fix.
c

Clement Emmanuel

05/24/2023, 4:11 PM
I see yeah that makes a lot of sense, good point that this is mostly an issue with partitioned assets
2 Views