Chris Le Sueur
06/07/2021, 11:03 AMdagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster/core/instance/ref.py", line 235, in run_storage
dagster_1 | return self.run_storage_data.rehydrate()
dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster/serdes/config_class.py", line 85, in rehydrate
dagster_1 | return klass.from_config_value(self, result.value)
dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster_postgres/run_storage/run_storage.py", line 88, in from_config_value
dagster_1 | return PostgresRunStorage(
dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster_postgres/run_storage/run_storage.py", line 62, in __init__
dagster_1 | stamp_alembic_rev(pg_alembic_config(__file__), conn)
dagster_1 | File "/usr/local/lib/python3.8/site-packages/dagster/core/storage/sql.py", line 46, in stamp_alembic_rev
dagster_1 | stamp(alembic_config, rev)
...
dagster_1 | cursor.execute(statement, parameters)
dagster_1 | sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "alembic_version_pkc"
dagster_1 | DETAIL: Key (version_num)=(7cba9eeaaf1d) already exists.
So I said intermittent, and it seems like this is some kind of race condition because in other cases this initialisation proceeds and the tests all pass. When this happens though, postgres still logs uniqueness violations; they're just wrapped with some retry logic in dagster
(line 59 of run_storage.py
, just above 62 mentioned in the traceback). I don't understand how the retry actually helps though: if you retry something which results in a uniqueness violation, it will cause the same issue; the wrapped function is just a sqlalchemy metadata's create_all
method, which as far as I know doesn't check state before issuing the CREATE TABLE
statements. This suggests there's something going on I don't understand. On a "successful" run, the postgres log has 6 error messages from trying to create tables that already exist, and the dagster logs contain several instances of "Retrying failed database creation":
dagster_1 | WARNING:root:Retrying failed database creation
dagster_1 | WARNING:root:Retrying failed database creation
dagster_1 | WARNING:root:Retrying failed database creation
dagster_1 | WARNING:root:Retrying failed database creation
dagster_1 | WARNI [root] Retrying failed database creation
dagster_1 | WARNI [root] Retrying failed database creation
I am fairly sure the numeric correspondence is a coincidence, since the runs schema alone contains several of the 6 tables being duplicated. I don't understand the significance of the different logging
formats; I could only work out where one of these was coming from in the dagster codebase.
I should mention that as you'd expect, we sometimes see slightly different errors - this may happen while trying to initialise the event_log
for example.
I am unfamiliar with driving alembic programmatically so I don't really understand the significance of stamping the revision here in the dagster code as opposed to allowing alembic to do it.
If anyone has any idea as to why this might be happening we'd be grateful. As an aside I was wondering why there is an attempt to mitigate race conditions with retry logic here, instead of using SQL transaction logic to serialise the initialisation.daniel
06/07/2021, 1:44 PMChris Le Sueur
06/07/2021, 2:14 PMdagit &
followed by dagster-daemon
. Introducing a wait here looks to be fixing this.