I find that the sqlite database results in a lot of SQLite l dagster #ask-community

I find that the sqlite database results in a lot o...

Caleb Parnell Lampen

05/12/2023, 4:16 PM

I find that the sqlite database results in a lot of "SQLite lock" errors. I think it's the same problem mentioned here: https://github.com/dagster-io/dagster/discussions/8617 The problem seems to occur less often if I drop the number of allowed processes, but I haven't been able to make it never appear. Being able to run with a local sqlite database with no extra setup is very convenient. Is this problem something that will improve? I realize that the nature of sqlite makes it less reliable. There is a use case where we'll want people to be able to run our processing chain on stand alone laptops that are not won't always have good internet connectivity at the time processing is needed (field experiments basically). The automatically generated sqlite database would make the processing chain more portable and easier to deploy in situations like this.

alex

05/12/2023, 7:04 PM

Concurrent access across processes is a fundamental limitation for sqlite, so this is not something I would expect to improve. how many concurrent jobs do you have running? Is this heavy job concurrency observed in the local laptop runs or only in some shared deployment?

Caleb Parnell Lampen

05/12/2023, 8:20 PM

I was running a small test case, so I set the jobs only to 3. It was not a high concurrency situation. I'd expect in the laptop we'd run some small amount of concurrency, but a bit higher, based upon number of cores available on the system. If its relevant, the "local" test I was running in this case was on a central shared high performance linux computing system at our company, rather than on a standalone laptop.

alex

05/12/2023, 8:30 PM

hmm, 3 is smaller than i expected. can you share the full stack trace of the exception you are getting?

Caleb Parnell Lampen

05/12/2023, 9:10 PM

I'll probably have to do that next week at some point. But will do. Thanks.

Caleb Parnell Lampen

05/12/2023, 9:27 PM

Out of curiosity, at what point would you expect issues? I realize a hard number is not something knowable, but are we talking 10? 50? 100? 500? If this is a rare event that we can get around with retries, that would probably be fine, but it seems to happen fairly often.

alex

05/12/2023, 10:10 PM

Anecdotally I have a local sqlite instance ive been using since i started on the project over 4 years ago and I do not recall ever hitting this issue on OSX doing various stress tests. Don’t think I went much above ~100 concurrent though. Operating system, file system, sqlite/python version may all factor in to this.

a central shared high performance linux computing system at our company

If there is some sort of network based filesystem in use here that may be what is exacerbating the issue

Caleb Parnell Lampen

05/15/2023, 4:11 PM

It is a network-based filesystem. Supposedly, I'm running on one that is designed for high throughput use with cluster computing, but it is still a networked filesystem.

alex

05/15/2023, 4:58 PM

Ya throughput is not the issue but how some primitive operations work and their latencies. If you are able to use the local file system instead for

$DAGSTER_HOME

you may no longer encounter the issue. You can read more about NFS issues with sqlite here https://www.sqlite.org/faq.html

4 Views

Open in Slack

Previous Next