Skip to content

Flat containers #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 13, 2025
2 changes: 1 addition & 1 deletion src/data_format/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ mod storage;
pub(crate) mod utils;

use crate::cosmetic_filter_cache::CosmeticFilterCache;
use crate::filters::unsafe_tools::VerifiedFlatbufferMemory;
use crate::flatbuffers::unsafe_tools::VerifiedFlatbufferMemory;
use crate::network_filter_list::NetworkFilterListParsingError;

/// Newer formats start with this magic byte sequence.
Expand Down
2 changes: 1 addition & 1 deletion src/data_format/storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use rmp_serde as rmps;
use serde::{Deserialize, Serialize};

use crate::cosmetic_filter_cache::{CosmeticFilterCache, HostnameRuleDb, ProceduralOrActionFilter};
use crate::filters::unsafe_tools::VerifiedFlatbufferMemory;
use crate::flatbuffers::unsafe_tools::VerifiedFlatbufferMemory;
use crate::utils::Hash;

use super::utils::{stabilize_hashmap_serialization, stabilize_hashset_serialization};
Expand Down
2 changes: 1 addition & 1 deletion src/filters/fb_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use std::vec;
use flatbuffers::WIPOffset;

use crate::filters::network::{NetworkFilter, NetworkFilterMaskHelper};
use crate::filters::unsafe_tools::VerifiedFlatbufferMemory;
use crate::flatbuffers::unsafe_tools::VerifiedFlatbufferMemory;
use crate::network_filter_list::token_histogram;
use crate::optimizer;
use crate::utils::{to_short_hash, Hash, ShortHash};
Expand Down
2 changes: 1 addition & 1 deletion src/filters/fb_network.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use std::collections::HashMap;

use crate::filters::fb_builder::FlatBufferBuilder;
use crate::filters::network::{NetworkFilterMask, NetworkFilterMaskHelper, NetworkMatchable};
use crate::filters::unsafe_tools::{fb_vector_to_slice, VerifiedFlatbufferMemory};
use crate::flatbuffers::unsafe_tools::{fb_vector_to_slice, VerifiedFlatbufferMemory};

use crate::regex_manager::RegexManager;
use crate::request::Request;
Expand Down
73 changes: 0 additions & 73 deletions src/filters/flat_filter_map.rs

This file was deleted.

2 changes: 0 additions & 2 deletions src/filters/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,4 @@ mod network_matchers;
pub mod cosmetic;
pub(crate) mod fb_builder;
pub(crate) mod fb_network;
pub(crate) mod flat_filter_map;
pub mod network;
pub(crate) mod unsafe_tools;
86 changes: 86 additions & 0 deletions src/flatbuffers/containers/flat_multimap.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
use std::marker::PhantomData;

use crate::flatbuffers::containers::sorted_index::SortedIndex;
use flatbuffers::{Follow, Vector};

/// A map-like container that uses flatbuffer references.
/// Provides O(log n) lookup time using binary search on the sorted index.
/// I is a key type, Keys is specific container of keys, &[I] for fast indexing (u32, u64)
/// and flatbuffers::Vector<I> if there is no conversion from Vector (str) to slice.
Comment on lines +6 to +9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're accumulating several of these flatbuffer utility structs, I'm wondering if it makes sense to create and publish a new separate crate just for those? I imagine others in the Rust community could find them useful too.

No need to do so here, but something to consider.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be an option once we finally stabilize the interface and merge it to master.

pub(crate) struct FlatMultiMapView<'a, I: Ord, V, Keys>
where
Keys: SortedIndex<I>,
V: Follow<'a>,
{
keys: Keys,
values: Vector<'a, V>,
_phantom: PhantomData<I>,
}

impl<'a, I: Ord + Copy, V, Keys> FlatMultiMapView<'a, I, V, Keys>
where
Keys: SortedIndex<I> + Clone,
V: Follow<'a>,
{
pub fn new(keys: Keys, values: Vector<'a, V>) -> Self {
debug_assert!(keys.len() == values.len());

Self {
keys,
values,
_phantom: PhantomData,
}
}

pub fn get(&self, key: I) -> Option<FlatMultiMapViewIterator<'a, I, V, Keys>> {
let index = self.keys.partition_point(|x| *x < key);
if index < self.keys.len() && self.keys.get(index) == key {
Some(FlatMultiMapViewIterator {
index,
key,
keys: self.keys.clone(), // Cloning is 3-4% faster than & in benchmarks
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment claims cloning is faster than references, but this seems counterintuitive and may be a premature optimization. Consider providing more context about why cloning would be faster or benchmark conditions that led to this conclusion.

Suggested change
keys: self.keys.clone(), // Cloning is 3-4% faster than & in benchmarks
keys: self.keys.clone(), // Clone keys for iterator; see implementation details for performance considerations

Copilot uses AI. Check for mistakes.

values: self.values,
})
} else {
None
}
}

#[cfg(test)]
pub fn total_size(&self) -> usize {
self.keys.len()
}
}

pub(crate) struct FlatMultiMapViewIterator<'a, I: Ord + Copy, V, Keys>
where
Keys: SortedIndex<I>,
V: Follow<'a>,
{
index: usize,
key: I,
keys: Keys,
values: Vector<'a, V>,
}

impl<'a, I, V, Keys> Iterator for FlatMultiMapViewIterator<'a, I, V, Keys>
where
I: Ord + Copy,
V: Follow<'a>,
Keys: SortedIndex<I>,
{
type Item = (usize, <V as Follow<'a>>::Inner);

fn next(&mut self) -> Option<Self::Item> {
if self.index < self.keys.len() && self.keys.get(self.index) == self.key {
self.index += 1;
Some((self.index - 1, self.values.get(self.index - 1)))
} else {
None
}
}
}

#[cfg(test)]
#[path = "../../../tests/unit/flatbuffers/containers/flat_multimap.rs"]
mod unit_tests;
49 changes: 49 additions & 0 deletions src/flatbuffers/containers/flat_set.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#![allow(dead_code)]

use std::marker::PhantomData;

use crate::flatbuffers::containers::sorted_index::SortedIndex;

/// A set-like container that uses flatbuffer references.
/// Provides O(log n) lookup time using binary search on the sorted data.
/// I is a key type, Keys is specific container of keys, &[I] for fast indexing (u32, u64)
/// and flatbuffers::Vector<I> if there is no conversion from Vector (str) to slice.
pub(crate) struct FlatSetView<I, Keys>
where
Keys: SortedIndex<I>,
{
keys: Keys,
_phantom: PhantomData<I>,
}

impl<I, Keys> FlatSetView<I, Keys>
where
I: Ord,
Keys: SortedIndex<I>,
{
pub fn new(keys: Keys) -> Self {
Self {
keys,
_phantom: PhantomData,
}
}

pub fn contains(&self, key: I) -> bool {
let index = self.keys.partition_point(|x| *x < key);
index < self.keys.len() && self.keys.get(index) == key
}

#[inline(always)]
pub fn len(&self) -> usize {
self.keys.len()
}

#[inline(always)]
pub fn is_empty(&self) -> bool {
self.len() == 0
}
}

#[cfg(test)]
#[path = "../../../tests/unit/flatbuffers/containers/flat_set.rs"]
mod unit_tests;
3 changes: 3 additions & 0 deletions src/flatbuffers/containers/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pub(crate) mod flat_multimap;
pub(crate) mod flat_set;
pub(crate) mod sorted_index;
73 changes: 73 additions & 0 deletions src/flatbuffers/containers/sorted_index.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
use flatbuffers::{Follow, Vector};

// Represents sorted sequence to perform the binary search.
pub(crate) trait SortedIndex<I> {
fn len(&self) -> usize;
fn get(&self, index: usize) -> I;
fn partition_point<F>(&self, predicate: F) -> usize
where
F: FnMut(&I) -> bool;
}

// Implementation for slices. Prefer using this with fb_vector_to_slice
// if possible, because it faster than getting values with flatbuffer's
// get method.
impl<I: Ord + Copy> SortedIndex<I> for &[I] {
#[inline(always)]
fn len(&self) -> usize {
<[I]>::len(self)
}

#[inline(always)]
fn get(&self, index: usize) -> I {
self[index]
}

#[inline(always)]
fn partition_point<F>(&self, predicate: F) -> usize
where
F: FnMut(&I) -> bool,
{
debug_assert!(self.is_sorted());
<[I]>::partition_point(self, predicate)
}
}

// General implementation for flatbuffers::Vector, it uses get to
// obtain values.
impl<'a, T: Follow<'a>> SortedIndex<T::Inner> for Vector<'a, T>
where
T::Inner: Ord,
{
#[inline(always)]
fn len(&self) -> usize {
Vector::len(self)
}

#[inline(always)]
fn get(&self, index: usize) -> T::Inner {
Vector::get(self, index)
}

fn partition_point<F>(&self, mut predicate: F) -> usize
where
F: FnMut(&T::Inner) -> bool,
{
debug_assert!(self.iter().is_sorted());

let mut left = 0;
let mut right = self.len();

while left < right {
let mid = left + (right - left) / 2;
let value = self.get(mid);
if predicate(&value) {
left = mid + 1;
} else {
right = mid;
}
}

left
}
}
2 changes: 2 additions & 0 deletions src/flatbuffers/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pub(crate) mod containers;
pub(crate) mod unsafe_tools;
File renamed without changes.
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ pub mod cosmetic_filter_cache;
mod data_format;
mod engine;
pub mod filters;
mod flatbuffers;
pub mod lists;
mod network_filter_list;
mod optimizer;
Expand Down
Loading
Loading