This query ```SELECT event logs id event logs event FROM eve dagster #dagster-feedback

This query ```SELECT event_logs.id, event_logs...

Clement Emmanuel

05/24/2023, 4:01 PM

This query

Copy code

SELECT
  event_logs.id,
  event_logs.event
FROM
  event_logs
WHERE
  event_logs.dagster_event_type = $1
ORDER BY
  event_logs.id DESC

Which seems to be invoked by

Copy code

context: MultiAssetSensorEvaluationContext
context.latest_materialization_records_by_partition_and_asset()

Becomes very expensive as the event_logs table grows (which is indefinite I believe as it's essentially a write-only table). This is expensive even with the appropriate indexes that get leveraged by the query plan. Has there been any throughput testing on this pattern, or any ideas as to how to optimize this. Unless i'm missing something this seems to make the canonical use of multi-asset sensors non viable even at a fairly modest scale as it will only decay in performance as materializations continue until it eventually (or in the case that materializations already exist when turning on the sensor, immediately) can't complete within the hard 60 second timeout

➕ 1

Vitaly Markov

05/24/2023, 4:06 PM

It should add filters by asset key and cursor position (id), so it should not be too bad. But fundamentally it does scan waaay too much data. Feel free to upvote this feature-request: https://github.com/dagster-io/dagster/issues/14406 In my view, it's the only proper fix.

Clement Emmanuel

05/24/2023, 4:11 PM

I see yeah that makes a lot of sense, good point that this is mostly an issue with partitioned assets

2 Views

Open in Slack

Previous Next