Hi Team, I’m (apparently) running into the concurr...
# ask-community
m
Hi Team, I’m (apparently) running into the concurrency limit of the default SQLite database with a job/op that `DynamicOutput`s every row in a pandas DF (about 3000):
Copy code
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO event_logs (run_id, event, dagster_event_type, timestamp, step_key, asset_key, partition) VALUES (?, ?, ?, ?, ?, ?, ?)]
My use case is rather simple and I don’t need to keep track of all the logs Dagster saves by default. I tried unsetting
DAGSTER_HOME
but that just uses a temporary directory that hits the same problem. Is there a way to work around this without having to setup my own SQL db? Something like reducing the amount of events saved or preventing/limiting the
DynamicOutput
ops from running in parallel?
is fiddling with
max_concurrent
the answer I’m looking for?https://docs.dagster.io/_apidocs/execution#dagster.multiprocess_executor
m
is running Postgres not an option for you? SQLite is fundamentally not designed for concurrent use cases -- we've tried to hack around that, but ultimately you will want a db that supports multiple concurrent writes
m
I mean it’s a lot of overhead for a feature I don’t really use… My jobs (which are triggered manually on Dagit or CLI) are built to feed a digital humanities project, formatting image metadata, handling image files on the cloud and querying a few APIs. I don’t deploy to cloud or use the daemons at all, and am not dealing with a complex business environment where this exhaustive logging might be needed. The Dagster features I make use of are more related to type checking/validation, code organization/visualization and IO management
limiting
max_concurrent
to 4 seems to have fixed it though. I’m on a 24 core machine so that’s likely what caused the issue. I’ll experiment to see where the limit is being reached exactly…
120 Views