You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes it is needed to develop concurrent code, which must scale with the number of available CPU cores. I face such tasks quite frequently when working on VictoriaMetrics. A few examples:
VictoriaMetrics needs to accept incoming samples from many concurrent client connections in the most efficient way. Every connection is processed by a dedicated goroutine. These goroutines read the ingested samples and put them into in-memory buffer, which is periodically converted to searchable data block and saved to disk. Initially the buffer was implemented as a slice of samples protected by a mutex. This hit scalability issues on systems with many CPU cores, since they spent a lot of time blocking on the mutex instead of doing useful work. So later this buffer has been rewritten into sharded buffers where every shard is protected by its own mutex. The samples are added into these shards in a round-robin, hash-based or a random manner in order to reduce mutex contention. This helped improving the scalability of data ingestion path on systems with many CPU cores, but it still suffers from mutex contention :( The mutex contention can be reduced further by increasing the number of shards, but this increases memory usage.
VictoriaLogs needs to accept incoming logs. It buffers them in memory and then periodically converts the buffered logs into searchable blocks. It faces the same scalability issues on systems with many CPU cores as described above.
The solution
To provide M-Local storage at sync package with the following API:
// MLocal is a storage for thread-local T items.//// It holds up to a single item per thread (M).typeMLocal[Tany] struct {
// Init is an optional function for initializing the newly created x for the current thread (M)Initfunc(x*T)
}
// Get returns T item for the current thread.//// If the current thread has no an item, then it is created// and initialized via optional Init function.//// The caller must be responsible for protecting access// to the returned item from concurrent goroutinesfunc (m*MLocal[T]) Get() *T// All returns all the T items at m.//// It is expected that All is called when there are no// other concurrent goroutines who access mfunc (m*MLocal[T]) All() []*T
It is expected that the implementation must allocate T items at P-local area in order to avoid false sharing between items allocated for different threads.
The All method can be implemented in the most simple way by iterating over the existing items and putting them into a newly allocated slice. This is OK given the documented restrictions to this method - it can be called when there are no concurrent goroutines which access the m.
This API ideally solves the practical tasks described above, by providing linear scalability with the number of CPU cores.
Alternatives
There are other proposals for P-local storage in Go, but they are too complex and are trying to solve many different cases with different requirements and restrictions at once. That's why there are very low chances they will be implemented in the near future. Even if they will be implemented, the implementation won't fit ideally different practical use cases such as this one.
The text was updated successfully, but these errors were encountered:
Proposal Details
The issue
Sometimes it is needed to develop concurrent code, which must scale with the number of available CPU cores. I face such tasks quite frequently when working on VictoriaMetrics. A few examples:
The solution
To provide M-Local storage at
sync
package with the following API:It is expected that the implementation must allocate T items at P-local area in order to avoid false sharing between items allocated for different threads.
The
All
method can be implemented in the most simple way by iterating over the existing items and putting them into a newly allocated slice. This is OK given the documented restrictions to this method - it can be called when there are no concurrent goroutines which access them
.This API ideally solves the practical tasks described above, by providing linear scalability with the number of CPU cores.
Alternatives
There are other proposals for P-local storage in Go, but they are too complex and are trying to solve many different cases with different requirements and restrictions at once. That's why there are very low chances they will be implemented in the near future. Even if they will be implemented, the implementation won't fit ideally different practical use cases such as this one.
The text was updated successfully, but these errors were encountered: