-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Proposal
Problem statement
One of the fundamental properties of BTreeMap
is that it maintains elements in sorted order and enables efficient element lookup in O(log(N))
time. However the current API is overly fitted towards a key-value API like a HashMap
and fails to expose the ability to make queries about "nearby" keys. For example, finding the first element whose key is greater than X.
There is limited support for this with the range
API, but this is hard to use and doesn't allow mutating the tree (by inserting/deleting elements) while iterating. This is insufficient to address existing use cases.
Motivation, use-cases
One specific use case where such manipulations are useful is when using a BTreeMap
to represent a set of non-overlapping ranges with associated values. This is often used to associate metadata with ranges of memory addresses.
As a proof of concept, I implemented a RangeTree
type which conceptually holds a map of Range<K> -> V
. It has 3 main operations:
get
: returns the range containing the given key.remove
: removes any ranges that intersect the given range of keys. Partially overlapping ranges are truncated or split into 2 sub-ranges.insert
: removes any intersecting ranges (with the same behavior asremove
for partially overlapping ranges) and then inserts a new range.
There are two implementations of these operations: one with just the standard library API and one with the new proposed cursor API.
Benchmark results show a 25% to 50% speedup by using cursors:
insert non-overlapping time: [82.516 ns 88.146 ns 93.282 ns]
change: [-33.810% -27.769% -20.292%] (p = 0.00 < 0.05)
Performance has improved.
insert overlapping end time: [76.902 ns 80.527 ns 83.670 ns]
change: [-52.589% -49.404% -46.189%] (p = 0.00 < 0.05)
Performance has improved.
insert overlapping start
time: [73.223 ns 77.361 ns 81.151 ns]
change: [-32.392% -26.673% -20.424%] (p = 0.00 < 0.05)
Performance has improved.
insert overlapping all time: [187.08 ns 193.34 ns 198.79 ns]
change: [-29.838% -26.102% -22.680%] (p = 0.00 < 0.05)
Performance has improved.
insert within existing time: [75.748 ns 79.829 ns 83.380 ns]
change: [-32.192% -26.014% -19.129%] (p = 0.00 < 0.05)
Performance has improved.
remove non-overlapping time: [43.850 ns 46.363 ns 48.454 ns]
change: [-48.288% -43.011% -36.968%] (p = 0.00 < 0.05)
Performance has improved.
remove overlapping end time: [60.592 ns 64.339 ns 67.775 ns]
change: [-59.367% -56.141% -52.421%] (p = 0.00 < 0.05)
Performance has improved.
remove overlapping start
time: [45.273 ns 48.281 ns 50.901 ns]
change: [-49.392% -43.853% -37.379%] (p = 0.00 < 0.05)
Performance has improved.
remove overlapping all time: [182.28 ns 189.05 ns 194.94 ns]
change: [-37.365% -34.433% -31.290%] (p = 0.00 < 0.05)
Performance has improved.
remove within existing time: [39.387 ns 41.828 ns 43.930 ns]
change: [-46.469% -41.060% -34.474%] (p = 0.00 < 0.05)
Performance has improved.
Solution sketches
This proposal adds Cursor
and CursorMut
types to BTreeMap
based on similar types for LinkedList
.
A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the tree during iteration. This is because the lifetime of its yielded references is tied to its own lifetime, instead of just the underlying tree. A cursor either points to an element in the tree, or to a "ghost" non-element that is logically located after the last element and before the first element.
This proposal adds the following APIs to BTreeMap
:
impl<K, V> BTreeMap<K, V> {
/// Returns a cursor pointing to the first element that is above the given bound.
///
/// Passing `Unbounded` will return a cursor pointing to the first element in the tree.
fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
/// Returns a mutable cursor pointing to the first element that is above the given bound.
///
/// Passing `Unbounded` will return a cursor pointing to the first element in the tree.
fn lower_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
/// Returns a cursor pointing to the last element that is below the given bound.
///
/// Passing `Unbounded` will return a cursor pointing to the last element in the tree.
fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
/// Returns a mutable cursor pointing to the last element that is below the given bound.
///
/// Passing `Unbounded` will return a cursor pointing to the last element in the tree.
fn upper_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
}
/// An immutable cursor which only allows read-only operations.
struct Cursor<'a, K: 'a, V: 'a>;
impl<'a, K, V> Cursor<'a, K, V> {
/// Moving the cursor to the next/previous element.
fn move_next(&mut self);
fn move_prev(&mut self);
/// Access to the key and value of the current element.
///
/// Returns `None` if the cursor points to the "ghost" non-element.
fn key(&self) -> Option<&'a K>;
fn value(&self) -> Option<&'a V>;
fn key_value(&self) -> Option<(&'a K, &'a V)>;
/// Read-only access to adjacent elements.
fn peek_next(&self) -> Option<(&'a K, &'a V)>;
fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}
/// An mutable cursor which allows mutating the tree.
struct CursorMut<'a, K: 'a, V: 'a>;
impl<'a, K, V> CursorMut<'a, K, V> {
/// Moving the cursor to the next/previous element.
fn move_next(&mut self);
fn move_prev(&mut self);
/// Access to the key and value of the current element.
///
/// Returns `None` if the cursor points to the "ghost" non-element.
fn key(&self) -> Option<&K>;
fn value(&self) -> Option<&V>;
fn value_mut(&mut self) -> Option<&mut V>;
fn key_value(&self) -> Option<(&K, &V)>;
fn key_value_mut(&self) -> Option<(&K, &mut V)>;
/// Returns a mutable reference to the of the element that the cursor is
/// currently pointing to.
///
/// This can be used to modify the key, but you must ensure that the
/// `BTreeMap` invariants are maintained. Specifically:
///
/// * The key must remain unique within the tree.
/// * The key must remain in sorted order with regards to other elements in
/// the tree.
unsafe fn key_mut_unchecked(&mut self) -> Option<&mut K>;
/// Read-only access to adjacent elements.
fn peek_next(&self) -> Option<(&K, &V)>;
fn peek_prev(&self) -> Option<(&K, &V)>;
/// Returns a read-only cursor pointing to the current element.
///
/// The lifetime of the returned `Cursor` is bound to that of the
/// `CursorMut`, which means it cannot outlive the `CursorMut` and that the
/// `CursorMut` is frozen for the lifetime of the `Cursor`.
fn as_cursor(&self) -> Cursor<'_, K, V>;
/// Inserts a new element into the `BTreeMap` after the current one.
///
/// If the cursor is pointing at the "ghost" non-element then the new element is
/// inserted at the front of the `BTreeMap`.
///
/// # Panics
///
/// This function panics if:
/// - the given key compares less than or equal to the current element (if
/// any).
/// - the given key compares greater than or equal to the next element (if
/// any).
fn insert_after(&mut self, key: K, value: V);
/// Inserts a new element into the `BTreeMap` before the current one.
///
/// If the cursor is pointing at the "ghost" non-element then the new element is
/// inserted at the end of the `BTreeMap`.
///
/// # Panics
///
/// This function panics if:
/// - the given key compares greater than or equal to the current element
/// (if any).
/// - the given key compares less than or equal to the previous element (if
/// any).
fn insert_before(&mut self, key: K, value: V);
/// Inserts a new element into the `BTreeMap` after the current one.
///
/// If the cursor is pointing at the "ghost" non-element then the new element is
/// inserted at the front of the `BTreeMap`.
///
/// # Safety
///
/// You must ensure that the `BTreeMap` invariants are maintained.
/// Specifically:
///
/// * The key of the newly inserted element must be unique in the tree.
/// * All keys in the tree must remain in sorted order.
unsafe fn insert_after_unchecked(&mut self, key: K, value: V);
/// Inserts a new element into the `BTreeMap` before the current one.
///
/// If the cursor is pointing at the "ghost" non-element then the new element is
/// inserted at the end of the `BTreeMap`.
///
/// # Safety
///
/// You must ensure that the `BTreeMap` invariants are maintained.
/// Specifically:
///
/// * The key of the newly inserted element must be unique in the tree.
/// * All keys in the tree must remain in sorted order.
unsafe fn insert_before_unchecked(&mut self, key: K, value: V);
/// Removes the current element from the `BTreeMap`.
///
/// The element that was removed is returned, and the cursor is
/// moved to point to the next element in the `BTreeMap`.
///
/// If the cursor is currently pointing to the "ghost" non-element then no element
/// is removed and `None` is returned. The cursor is not moved in this case.
fn remove_current(&mut self) -> Option<(K, V)>;
/// Removes the current element from the `BTreeMap`.
///
/// The element that was removed is returned, and the cursor is
/// moved to point to the previous element in the `BTreeMap`.
///
/// If the cursor is currently pointing to the "ghost" non-element then no element
/// is removed and `None` is returned. The cursor is not moved in this case.
fn remove_current_and_move_back(&mut self) -> Option<(K, V)>;
}
Links and related work
LinkedList
cursor RFC and tracking issue.
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
Activity
scottmcm commentedon Nov 29, 2022
I disagree, because it allows violating invariants for a type where the
Ord
implementation is trusted.For example, today if you give me a
BTreeMap<usize, ()>
I can trust that there are no duplicates and that if I iterate they'll be in order, because I trustBTreeMap
, I trustusize
, and I trust()
.But with the addition of a safe
_unchecked
API, I can no longer trust maps even when they only hold my own types that I trust.Amanieu commentedon Nov 29, 2022
Alright I've changed those functions to be
unsafe
.the8472 commentedon Nov 29, 2022
Why only the unsafe APIs and not equivalent safe ones that do a comparison against the neighbors?
scottmcm commentedon Nov 29, 2022
A safe insert that inserts in the "correct" place but with improved performance when that place is close to the cursor might be nice. (Like the
map::insert
s that take an iterator position in C++, https://en.cppreference.com/w/cpp/container/map/insert.)Amanieu commentedon Nov 30, 2022
I wouldn't use them in
RangeTree
myself, but I've added checked variants of the functions. I'm still not a big fan of making the unchecked versions unsafe since it can introduce confusion. For example it's perfectly safe forRangeTree
to use the unchecked functions even if the key uses an untrustedOrd
implementation.the8472 commentedon Nov 30, 2022
The API docs say
The cursor API doesn't fall under any of those categories and I don't think "normally" provides enough wiggle room here.
jeffparsons commentedon Dec 12, 2022
I maintain the rangemap crate, which is essentially the same as the motivating use case described above, plus automatic coalescing of adjacent ranges that map to the same value. E.g. if you insert
(1..2, true)
and(2..3, true)
, then the map now contains[(1..3, true)]
.I have longed for this API since day one. I care about performance, of course, but I care even more about how much simpler my implementation could be if I could build on top of
CursorMut
.So I guess this is just my 2c from experiencing the lack of BTreeMap cursors: they feel to me like a very natural and powerful extension.
m-ou-se commentedon Jan 30, 2023
This looks quite complete and ready for unstable experimentation. Time to open a tracking issue and merge 105641. :)
9 remaining items