Skip to content

Conversation

iequidoo
Copy link
Collaborator

@iequidoo iequidoo commented Aug 17, 2025

No description provided.

From https://www.sqlite.org/wal.html:
> The default strategy is to allow successive write transactions to grow the WAL until the WAL
  becomes about 1000 pages in size, then to run a checkpoint operation for each subsequent COMMIT
  until the WAL is reset to be smaller than 1000 pages. By default, the checkpoint will be run
  automatically by the same thread that does the COMMIT that pushes the WAL over its size
  limit. This has the effect of causing most COMMIT operations to be very fast but an occasional
  COMMIT (those that trigger a checkpoint) to be much slower.

And while autocheckpoint runs in the `PASSIVE` mode and thus doesn't block concurrent readers and
writers, in our design it blocks writers because it's done under `write_mutex` locked and thus may
cause the app to stuck for noticeable time. Let's disable autocheckpointing then, we can't rely on
it anyway.
@iequidoo
Copy link
Collaborator Author

We should consider scheduling housekeeping() after program start, this way the user can trigger WAL checkpointing also. housekeeping() is scheduled after deletion of messages and chats, but this isn't a convenient and obvious way to trigger it.

@iequidoo
Copy link
Collaborator Author

Also w/o WAL autocheckpointing there's another interesting option: we can disable running wal_checkpoint for some time at all, e.g. after a version upgrade and this way provide a way to roll back the upgrade by truncating the WAL. This is worse than a normal backup of course because after such a restoration some referenced blobs may be missing.

Some migrations want housekeeping to run. Also if housekeeping failed before, fixing the reason and
restarting the program is the most natural way to retry it.
@link2xt
Copy link
Collaborator

link2xt commented Aug 22, 2025

I am not sure how many pages we accumulate during the day. I'm afraid we will grow a huge unlimited WAL in some workloads, e.g. for a bot that receives a lot of messages and normally deletes them almost immediately.

It seems correct way is to register our own WAL hook (https://sqlite.org/c3ref/wal_hook.html) that notifies a separate thread to do manual checkpointing via channel when WAL grows too large. And just have this thread/task constantly waiting in for notification and then taking all the connections and doing checkpointing just like it is currently done in housekeeping. It is possible to register a hook via https://docs.rs/rusqlite/0.37.0/rusqlite/struct.Connection.html#method.wal_hook, it also unregisters autocheckpointing.

@iequidoo
Copy link
Collaborator Author

If a bot deletes messages or chats, delete_msgs_locally_done() triggers housekeeping, so the WAL shouldn't grow a lot. But i agree that it makes sense to register our own WAL hook to make sure we don't miss another scenario.

@iequidoo iequidoo marked this pull request as draft August 22, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants