-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
While benchmarking the Hugging Face tokenizer, which uses std::sync::RwLock for managing read/write access to shared data, we've observed a false sharing issue.
In the current implementation of RwLock, the inner lock is positioned adjacent to the data field. In most cases, these elements are placed within a single cache line (typically 128 bytes in x86 systems).
https://github.com/rust-lang/rust/blob/master/library/std/src/sync/rwlock.rs#L80-L84
pub struct RwLock<T: ?Sized> {
inner: sys::RwLock,
poison: poison::Flag,
data: UnsafeCell<T>,
}
When a reader thread attempts to access data that isn't being altered but shares the same cache line as a lock that is being altered (e.g., when a new reader acquires the read lock), the caching protocol necessitates the first reader thread to reload the entire cache block, despite a lack of logical necessity. This significantly impacts performance, especially when the number of readers exceeds the number of writers.
To address this issue, one practice is to move the data to a new cache line by adding some padding. One quick fixing, which use #[repr(align(128))]
to align each field to 128 bytes, has shown more than a 10% increase in throughput in our benchmark. However, it's important to note that this change might have a detrimental impact on throughput when the number of readers is less than the number of writers, since a writer has to write data and lock to two different cache lines.
An instinctive solution suggested by @pingzhaozz is split the inner RwLock into two fields, one is used for write access control which resides in the same cacheline with data field, the other lock for reader will be placed in another cacheline to avoid unnecessary reload data cacheline caused by another reader acquired the read lock.
Any comments would be appreciated.