Skip to content

Use BTreeMap with u128 values for sparse bit sets/"vectors" (in dataflow etc.). #47575

Closed
@eddyb

Description

@eddyb
Member

According to our current implementation of B-trees:

const B: usize = 6;
pub const MIN_LEN: usize = B - 1;
pub const CAPACITY: usize = 2 * B - 1;

it would appear that up to 11 key-value pairs can be stored in each node.

For u128 values representing 128 set elements each, 1408 set elements can be stored in a single allocation, with an overhead of around 50% compared to Vec<u128> in the dense case.

Such sparse bitsets would be really useful for (e.g. dataflow) analysis algorithms, in situations where the bitset elements tend to be localized, with multi-"word" gaps in between local groups.

cc @nikomatsakis @pnkfelix

Activity

nikomatsakis

nikomatsakis commented on Jan 19, 2018

@nikomatsakis
Contributor

Is the idea that the key would be index >> 7 and the value would be a bitset of size 128? i.e., so to tell if something is there, we compute the key and then mask the resulting value?

eddyb

eddyb commented on Jan 19, 2018

@eddyb
MemberAuthor

@nikomatsakis That's pretty much it, yes. Here's what I've been playing with so far:

// FIXME(eddyb) move to rustc_data_structures.
#[derive(Clone)]
pub struct SparseBitSet<I: Idx> {
    map: BTreeMap<u32, u128>,
    _marker: PhantomData<I>,
}

fn key_and_mask<I: Idx>(index: I) -> (u32, u128) {
    let index = index.index();
    let key = index / 128;
    let key_u32 = key as u32;
    assert_eq!(key_u32 as usize, key);
    (key_u32, 1 << (index % 128))
}

impl<I: Idx> SparseBitSet<I> {
    pub fn new() -> Self {
        SparseBitSet {
            map: BTreeMap::new(),
            _marker: PhantomData
        }
    }

    pub fn capacity(&self) -> usize {
        self.map.len() * 128
    }

    pub fn contains(&self, index: I) -> bool {
        let (key, mask) = key_and_mask(index);
        self.map.get(&key).map_or(false, |bits| (bits & mask) != 0)
    }

    pub fn insert(&mut self, index: I) -> bool {
        let (key, mask) = key_and_mask(index);
        let bits = self.map.entry(key).or_insert(0);
        let old_bits = *bits;
        let new_bits = old_bits | mask;
        *bits = new_bits;
        new_bits != old_bits
    }

    pub fn remove(&mut self, index: I) -> bool {
        let (key, mask) = key_and_mask(index);
        if let Some(bits) = self.map.get_mut(&key) {
            let old_bits = *bits;
            let new_bits = old_bits & !mask;
            *bits = new_bits;
            // FIXME(eddyb) maybe remove entry if now `0`.
            new_bits != old_bits
        } else {
            false
        }
    }

    pub fn iter<'a>(&'a self) -> impl Iterator<Item = I> + 'a {
        self.map.iter().flat_map(|(&key, &bits)| {
            let base = key as usize * 128;
            (0..128).filter_map(move |i| {
                if (bits & (1 << i)) != 0 {
                    Some(I::new(base + i))
                } else {
                    None
                }
            })
        })
    }
}
nikomatsakis

nikomatsakis commented on Jan 19, 2018

@nikomatsakis
Contributor

Makes sense. I wonder if it would be useful in NLL too.

eternaleye

eternaleye commented on Jan 19, 2018

@eternaleye
Contributor

One thing that might be worth considering is using binmaps.

eddyb

eddyb commented on Jan 22, 2018

@eddyb
MemberAuthor

We should also benchmark "sparse bit matrices" with BTreeMap (or FxHashMap) keys that also contain the "row" number, not just the upper bits of the "column" number.
They might prove more efficient than Vec<BTreeMap<u32, u128>>.

EDIT: This makes iteration within a "row" harder, especially with the HashMap variant.

added
C-enhancementCategory: An issue proposing an enhancement or a PR with one.
on Jan 23, 2018
eddyb

eddyb commented on Feb 21, 2018

@eddyb
MemberAuthor

I've added a "chunked" API to my SparseBitSet, in the hope of 128x+ perf wins:

// FIXME(eddyb) move to rustc_data_structures.
#[derive(Clone)]
pub struct SparseBitSet<I: Idx> {
    chunk_bits: BTreeMap<u32, u128>,
    _marker: PhantomData<I>,
}

#[derive(Copy, Clone)]
pub struct SparseChunk<I> {
    key: u32,
    bits: u128,
    _marker: PhantomData<I>,
}

impl<I: Idx> SparseChunk<I> {
    pub fn one(index: I) -> Self {
        let index = index.index();
        let key_usize = index / 128;
        let key = key_usize as u32;
        assert_eq!(key as usize, key_usize);
        SparseChunk {
            key,
            bits: 1 << (index % 128),
            _marker: PhantomData
        }
    }

    pub fn any(&self) -> bool {
        self.bits != 0
    }

    pub fn iter(&self) -> impl Iterator<Item = I> {
        let base = self.key as usize * 128;
        let mut bits = self.bits;
        (0..128).map(move |i| {
            let current_bits = bits;
            bits >>= 1;
            (i, current_bits)
        }).take_while(|&(_, bits)| bits != 0)
          .filter_map(move |(i, bits)| {
            if (bits & 1) != 0 {
                Some(I::new(base + i))
            } else {
                None
            }
        })
    }
}

impl<I: Idx> SparseBitSet<I> {
    pub fn new() -> Self {
        SparseBitSet {
            chunk_bits: BTreeMap::new(),
            _marker: PhantomData
        }
    }

    pub fn capacity(&self) -> usize {
        self.chunk_bits.len() * 128
    }

    pub fn contains_chunk(&self, chunk: SparseChunk<I>) -> SparseChunk<I> {
        SparseChunk {
            bits: self.chunk_bits.get(&chunk.key).map_or(0, |bits| bits & chunk.bits),
            ..chunk
        }
    }

    pub fn insert_chunk(&mut self, chunk: SparseChunk<I>) -> SparseChunk<I> {
        if chunk.bits == 0 {
            return chunk;
        }
        let bits = self.chunk_bits.entry(chunk.key).or_insert(0);
        let old_bits = *bits;
        let new_bits = old_bits | chunk.bits;
        *bits = new_bits;
        let changed = new_bits ^ old_bits;
        SparseChunk {
            bits: changed,
            ..chunk
        }
    }

    pub fn remove_chunk(&mut self, chunk: SparseChunk<I>) -> SparseChunk<I> {
        if chunk.bits == 0 {
            return chunk;
        }
        let changed = match self.chunk_bits.entry(chunk.key) {
            Entry::Occupied(mut bits) => {
                let old_bits = *bits.get();
                let new_bits = old_bits & !chunk.bits;
                if new_bits == 0 {
                    bits.remove();
                } else {
                    bits.insert(new_bits);
                }
                new_bits ^ old_bits
            }
            Entry::Vacant(_) => 0
        };
        SparseChunk {
            bits: changed,
            ..chunk
        }
    }

    pub fn clear(&mut self) {
        self.chunk_bits.clear();
    }

    pub fn chunks<'a>(&'a self) -> impl Iterator<Item = SparseChunk<I>> + 'a {
        self.chunk_bits.iter().map(|(&key, &bits)| {
            SparseChunk {
                key,
                bits,
                _marker: PhantomData
            }
        })
    }

    pub fn contains(&self, index: I) -> bool {
        self.contains_chunk(SparseChunk::one(index)).any()
    }

    pub fn insert(&mut self, index: I) -> bool {
        self.insert_chunk(SparseChunk::one(index)).any()
    }

    pub fn remove(&mut self, index: I) -> bool {
        self.remove_chunk(SparseChunk::one(index)).any()
    }

    pub fn iter<'a>(&'a self) -> impl Iterator<Item = I> + 'a {
        self.chunks().flat_map(|chunk| chunk.iter())
    }
}

cc @spastorino

self-assigned this
on Feb 22, 2018
spastorino

spastorino commented on Feb 23, 2018

@spastorino
Member

Done here #48245

tamird

tamird commented on Mar 1, 2018

@tamird
Contributor

Should this be closed, now?

9 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-collectionsArea: `std::collections`C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-compilememIssue: Problems and improvements with respect to memory usage during compilation.I-compiletimeIssue: Problems and improvements with respect to compile times.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @spastorino@eddyb@eternaleye@nikomatsakis@tamird

        Issue actions

          Use BTreeMap with u128 values for sparse bit sets/"vectors" (in dataflow etc.). · Issue #47575 · rust-lang/rust