Hi All We are facing a lot of GraphQL errors in Dagit due to dagster #deployment-ecs

Hi All, We are facing a lot of GraphQL errors in ...

Arnoud van Dommelen

05/11/2023, 9:15 AM

Hi All, We are facing a lot of GraphQL errors in Dagit due to the large number of underlying data stored in the PostgreSQL database (see screenshot for the details). I have already increased the statement timeout for Dagit, however, this does not solve the underlying problem! Is there a way to set data retention for the "runs" table created by Dagster inside the PostgreSQL database or some other solution in order to avoid the GraphQL errors to happen (implementing a more specific LIMIT inside the query for instance, not sure what the current LIMIT does as there is no information regarding "param_1")? *Note that this GraphQL error not only occur at the schedules, but almost everywhere within the Dagit UI! Would love to hear your reaction to this! Kind regards, Arnoud

Arnoud van Dommelen

05/11/2023, 9:48 AM

Ah, I found the value for param_1: ['param_1': 1], however, this does not seem to do much for the underlying data (only to limit to 1 schedule)? :)

prha

05/12/2023, 12:41 AM

Hi Arnoud… One way to manage data growth is to manually set up a scheduled job to delete old runs over a certain age. You would lose asset materialization history and some historical partition data if you do so however, but that may or may not be important (depending on what features you are using). We do have some work we need to pursue to improve the performance of some queries regardless. Do you mind sharing the size of your runs table / run_tags table?

Arnoud van Dommelen

05/12/2023, 11:12 AM

Hi Phil, thank you for the response! I will try to create a scheduled job that purges the data for the time being then. The size of the runs table equals 552MB and the size of run_tags table equals 1127MB. If you need any other information let me know! We are very willing to help improve Dagster even more 🙂 Kind regards

7 Views

Open in Slack

Previous Next