Chris Le Sueur06/07/2021, 11:03 AM
So I said intermittent, and it seems like this is some kind of race condition because in other cases this initialisation proceeds and the tests all pass. When this happens though, postgres still logs uniqueness violations; they're just wrapped with some retry logic in
dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster/core/instance/ref.py", line 235, in run_storage dagster_1 | return self.run_storage_data.rehydrate() dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster/serdes/config_class.py", line 85, in rehydrate dagster_1 | return klass.from_config_value(self, result.value) dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster_postgres/run_storage/run_storage.py", line 88, in from_config_value dagster_1 | return PostgresRunStorage( dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster_postgres/run_storage/run_storage.py", line 62, in __init__ dagster_1 | stamp_alembic_rev(pg_alembic_config(__file__), conn) dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster/core/storage/sql.py", line 46, in stamp_alembic_rev dagster_1 | stamp(alembic_config, rev) ... dagster_1 | cursor.execute(statement, parameters) dagster_1 | sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "alembic_version_pkc" dagster_1 | DETAIL: Key (version_num)=(7cba9eeaaf1d) already exists.
(line 59 of
, just above 62 mentioned in the traceback). I don't understand how the retry actually helps though: if you retry something which results in a uniqueness violation, it will cause the same issue; the wrapped function is just a sqlalchemy metadata's
method, which as far as I know doesn't check state before issuing the
statements. This suggests there's something going on I don't understand. On a "successful" run, the postgres log has 6 error messages from trying to create tables that already exist, and the dagster logs contain several instances of "Retrying failed database creation":
I am fairly sure the numeric correspondence is a coincidence, since the runs schema alone contains several of the 6 tables being duplicated. I don't understand the significance of the different
dagster_1 | WARNING:root:Retrying failed database creation dagster_1 | WARNING:root:Retrying failed database creation dagster_1 | WARNING:root:Retrying failed database creation dagster_1 | WARNING:root:Retrying failed database creation dagster_1 | WARNI [root] Retrying failed database creation dagster_1 | WARNI [root] Retrying failed database creation
formats; I could only work out where one of these was coming from in the dagster codebase. I should mention that as you'd expect, we sometimes see slightly different errors - this may happen while trying to initialise the
for example. I am unfamiliar with driving alembic programmatically so I don't really understand the significance of stamping the revision here in the dagster code as opposed to allowing alembic to do it. If anyone has any idea as to why this might be happening we'd be grateful. As an aside I was wondering why there is an attempt to mitigate race conditions with retry logic here, instead of using SQL transaction logic to serialise the initialisation.
daniel06/07/2021, 1:44 PM
Chris Le Sueur06/07/2021, 2:14 PM
. Introducing a wait here looks to be fixing this.