diff --git a/content/cpp/concepts/unordered-map/terms/bucket-count/bucket-count.md b/content/cpp/concepts/unordered-map/terms/bucket-count/bucket-count.md new file mode 100644 index 00000000000..a3e9ad5713a --- /dev/null +++ b/content/cpp/concepts/unordered-map/terms/bucket-count/bucket-count.md @@ -0,0 +1,182 @@ +--- +Title: 'bucket_count()' +Description: 'Returns the number of buckets in the hash table of an unordered associative container.' +Subjects: + - 'Computer Science' + - 'Web Development' +Tags: + - 'Data Structures' + - 'Hash Maps' + - 'Hash Tables' + - 'STL' +CatalogContent: + - 'learn-c-plus-plus' + - 'paths/computer-science' +--- + +The **`bucket_count()`** method is a member function of C++ unordered associative containers such as [`unordered_map`](https://www.codecademy.com/resources/docs/cpp/unordered-map), `unordered_set`, `unordered_multimap`, and `unordered_multiset`. + +**`It returns the current number of buckets in the hash table used internally by these containers`**. Each bucket can contain zero or more elements that hash to the same value, and understanding bucket distribution is crucial for analyzing hash table performance and collision patterns. + +The `bucket_count()` function is particularly useful for performance analysis, debugging hash table behavior, monitoring load factors, and understanding how elements are distributed across the hash table. It helps developers optimize hash functions and assess whether rehashing might be beneficial for their specific use case. + +## Syntax + +```pseudo +container.bucket_count() +``` + +**Parameters:** + +This method takes no parameters. + +**Return value:** + +Returns a value of type `size_type` (typically `size_t`) representing the current number of buckets in the hash table. + +## Example + +This example demonstrates how to use `bucket_count()` with an `unordered_map` and observe how it changes as elements are added: + +```cpp +// unordered_map::bucket_count +#include +#include +#include + +int main () +{ + std::unordered_map mymap = { + {"house","Vaikunth"}, + {"apple","red"}, + {"tree","green"}, + {"book","Geeta"}, + {"door","porte"}, + {"grapefruit","pamplemousse"} + }; + + unsigned n = mymap.bucket_count(); + + std::cout << "mymap has " << n << " buckets.\n"; + + for (unsigned i=0; ifirst << ":" << it->second << "] "; + std::cout << "\n"; + } + + return 0; +} +``` + +The output might look like this (actual values may vary by implementation): + +```shell +mymap has 13 buckets. +bucket #0 contains: [book:Geeta] [tree:green] [apple:red] +bucket #1 contains: +bucket #2 contains: +bucket #3 contains: +bucket #4 contains: +bucket #5 contains: [grapefruit:pamplemousse] +bucket #6 contains: +bucket #7 contains: +bucket #8 contains: +bucket #9 contains: +bucket #10 contains: +bucket #11 contains: [door:porte] +bucket #12 contains: [house:Vaikunth] +``` + +This example shows how the bucket count can increase when the hash table needs to rehash to maintain performance as more elements are added. + +## Codebyte Example + +This interactive example demonstrates using `bucket_count()` to analyze hash table performance and distribution: + +```codebyte/cpp +#include +#include +#include +#include + +int main() { + std::unordered_map students; + + std::cout << "Hash Table Analysis Tool" << std::endl; + std::cout << "========================" << std::endl; + + // Initial state + std::cout << "Initial bucket count: " << students.bucket_count() << std::endl; + + // Add student data + students[101] = "Alice"; + students[102] = "Bob"; + students[103] = "Carol"; + students[104] = "David"; + students[105] = "Eva"; + + std::cout << "\nAfter adding 5 students:" << std::endl; + std::cout << "Elements: " << students.size() << std::endl; + std::cout << "Bucket count: " << students.bucket_count() << std::endl; + std::cout << "Load factor: " << std::fixed << std::setprecision(3) + << students.load_factor() << std::endl; + + // Show bucket distribution + std::cout << "\nBucket distribution:" << std::endl; + for (size_t i = 0; i < students.bucket_count(); ++i) { + std::cout << "Bucket " << std::setw(2) << i << ": " + << students.bucket_size(i) << " elements"; + + if (students.bucket_size(i) > 0) { + std::cout << " -> "; + for (auto it = students.begin(i); it != students.end(i); ++it) { + std::cout << it->first << ":" << it->second << " "; + } + } + std::cout << std::endl; + } + + // Add more elements to trigger rehashing + for (int i = 106; i <= 115; ++i) { + students[i] = "Student" + std::to_string(i); + } + + std::cout << "\nAfter adding 15 students total:" << std::endl; + std::cout << "Elements: " << students.size() << std::endl; + std::cout << "Bucket count: " << students.bucket_count() << std::endl; + std::cout << "Load factor: " << std::fixed << std::setprecision(3) + << students.load_factor() << std::endl; + + // Calculate statistics + size_t empty_buckets = 0; + for (size_t i = 0; i < students.bucket_count(); ++i) { + if (students.bucket_size(i) == 0) { + empty_buckets++; + } + } + + double utilization = (double)(students.bucket_count() - empty_buckets) + / students.bucket_count() * 100; + + std::cout << "Empty buckets: " << empty_buckets << std::endl; + std::cout << "Utilization: " << std::setprecision(1) << utilization << "%" << std::endl; + + return 0; +} +``` + +## Frequently Asked Questions + +### 1. Why does the bucket count sometimes change automatically? + +The bucket count changes when the hash table automatically rehashes to maintain performance. This typically happens when the load factor exceeds the maximum load factor threshold, causing the container to increase the number of buckets and redistribute elements. + +### 2. Is there a relationship between bucket count and performance? + +Yes, the bucket count directly affects performance. More buckets generally mean fewer collisions and faster lookups, but also more memory usage. The load factor (size/bucket_count) is a key metric for balancing performance and memory efficiency. + +### 3. Can I control the bucket count manually? + +Yes, you can use the `rehash()` method to request a specific number of buckets, or `reserve()` to ensure the container can hold a certain number of elements without rehashing. However, the actual bucket count may differ from your request based on the implementation's requirements. \ No newline at end of file diff --git a/content/cpp/concepts/unordered-map/terms/bucket-size/bucket-size.md b/content/cpp/concepts/unordered-map/terms/bucket-size/bucket-size.md new file mode 100644 index 00000000000..e7b571231b9 --- /dev/null +++ b/content/cpp/concepts/unordered-map/terms/bucket-size/bucket-size.md @@ -0,0 +1,208 @@ +--- +Title: 'bucket_size()' +Description: 'Returns the number of elements in a specific bucket of an unordered associative container.' +Subjects: + - 'Computer Science' + - 'Web Development' +Tags: + - 'Data Structures' + - 'Hash Maps' + - 'Hash Tables' + - 'STL' +CatalogContent: + - 'learn-c-plus-plus' + - 'paths/computer-science' +--- + +The **`bucket_size()`** method is a member function of C++ unordered associative containers such as [`unordered_map`](https://www.codecademy.com/resources/docs/cpp/unordered-map), `unordered_set`, `unordered_multimap`, and `unordered_multiset`. + +**`It returns the number of elements stored in a specific bucket of the hash table`**. This function is essential for analyzing hash distribution patterns, detecting collisions, and understanding the performance characteristics of the hash table implementation. + +The `bucket_size()` function is particularly useful for debugging hash functions, identifying hotspots where many elements hash to the same bucket, monitoring collision rates and optimizing hash table performance. It helps developers understand how evenly elements are distributed across buckets and whether the hash function is working effectively. + +## Syntax + +```pseudo +container.bucket_size(bucket_index) +``` + +**Parameters:** + +- `bucket_index`: A value of type `size_type` (typically `size_t`) representing the index of the bucket to query. Must be less than the value returned by bucket_count() + +**Return value:** + +Returns a value of type `size_type` (typically `size_t`) representing the number of elements in the specified bucket. + +## Example + +This example demonstrates how to use `bucket_size()` to analyze the distribution of elements across buckets in an `unordered_map`: + +```cpp +// unordered_map::bucket_size +#include +#include +#include + +int main () +{ + std::unordered_map mymap = { + {"us","United States"}, + {"uk","United Kingdom"}, + {"fr","France"}, + {"de","Germany"} + }; + + unsigned nbuckets = mymap.bucket_count(); + + std::cout << "mymap has " << nbuckets << " buckets:\n"; + + for (unsigned i=0; i +#include +#include +#include +#include + +class CollisionAnalyzer { +private: + std::unordered_set data; + +public: + void addNumbers(const std::vector& numbers) { + for (int num : numbers) { + data.insert(num); + } + } + + void analyzeCollisions() { + std::cout << "Collision Analysis Report" << std::endl; + std::cout << "========================" << std::endl; + std::cout << "Total elements: " << data.size() << std::endl; + std::cout << "Total buckets: " << data.bucket_count() << std::endl; + + std::vector bucket_sizes; + size_t max_bucket_size = 0; + size_t collisions = 0; + size_t empty_buckets = 0; + + // Collect bucket size statistics + for (size_t i = 0; i < data.bucket_count(); ++i) { + size_t size = data.bucket_size(i); + bucket_sizes.push_back(size); + + if (size == 0) { + empty_buckets++; + } else { + max_bucket_size = std::max(max_bucket_size, size); + if (size > 1) { + collisions += size - 1; + } + } + } + + std::cout << "Empty buckets: " << empty_buckets << std::endl; + std::cout << "Max bucket size: " << max_bucket_size << std::endl; + std::cout << "Total collisions: " << collisions << std::endl; + + // Show buckets with collisions + if (collisions > 0) { + std::cout << "\nBuckets with collisions:" << std::endl; + for (size_t i = 0; i < data.bucket_count(); ++i) { + if (data.bucket_size(i) > 1) { + std::cout << "Bucket " << i << " (size " << data.bucket_size(i) << "): "; + for (auto it = data.begin(i); it != data.end(i); ++it) { + std::cout << *it << " "; + } + std::cout << std::endl; + } + } + } else { + std::cout << "\nNo collisions detected - excellent hash distribution!" << std::endl; + } + + // Performance assessment + double collision_rate = (double)collisions / data.size() * 100; + std::cout << "\nPerformance Metrics:" << std::endl; + std::cout << "Collision rate: " << std::fixed << std::setprecision(1) + << collision_rate << "%" << std::endl; + + if (collision_rate < 10) { + std::cout << "Hash performance: Excellent" << std::endl; + } else if (collision_rate < 25) { + std::cout << "Hash performance: Good" << std::endl; + } else if (collision_rate < 50) { + std::cout << "Hash performance: Fair" << std::endl; + } else { + std::cout << "Hash performance: Poor - consider rehashing" << std::endl; + } + } +}; + +int main() { + CollisionAnalyzer analyzer; + + // Test with different number patterns + std::cout << "Test 1: Sequential numbers" << std::endl; + std::vector sequential = {10, 20, 30, 40, 50, 60, 70, 80}; + analyzer.addNumbers(sequential); + analyzer.analyzeCollisions(); + + std::cout << "\n" << std::string(40, '=') << std::endl; + + // Create a new analyzer for second test + CollisionAnalyzer analyzer2; + std::cout << "\nTest 2: Random-like numbers" << std::endl; + std::vector random_like = {157, 283, 491, 672, 829, 934, 1047, 1158}; + analyzer2.addNumbers(random_like); + analyzer2.analyzeCollisions(); + + return 0; +} +``` + +## Frequently Asked Questions + +### 1. What does it mean when a bucket has size greater than 1? + +When `bucket_size()` returns a value greater than 1, it indicates a collision - multiple elements have hashed to the same bucket. While some collisions are normal, consistently high bucket sizes may indicate poor hash function performance. + +### 2. How can I use bucket_size() to optimize my hash table? + +Use `bucket_size()` to identify buckets with many collisions. If you see consistently large bucket sizes, consider using a custom hash function, increasing the bucket count with `rehash()`, or analyzing your key distribution patterns. + +### 3. Is there a performance cost to calling bucket_size()? + +The `bucket_size()` function typically has O(1) complexity in most implementations, as hash tables often maintain size information for each bucket. However, frequent calls in performance-critical code should still be used judiciously. \ No newline at end of file diff --git a/content/cpp/concepts/unordered-map/terms/load-factor/load-factor.md b/content/cpp/concepts/unordered-map/terms/load-factor/load-factor.md new file mode 100644 index 00000000000..73c44ba65d7 --- /dev/null +++ b/content/cpp/concepts/unordered-map/terms/load-factor/load-factor.md @@ -0,0 +1,330 @@ +--- +Title: 'unordered_map::load_factor' +Description: 'Returns the current load factor of the unordered_map hash table, which is the ratio of element count to bucket count.' +Subjects: + - 'Computer Science' + - 'Web Development' +Tags: + - 'Data Structures' + - 'Hash Maps' + - 'Map' + - 'STL' +CatalogContent: + - 'learn-c-plus-plus' + - 'paths/computer-science' +--- + +The **`load_factor()`** is a member function of the `std::unordered_map` container that returns the current load factor of the hash table. `The load factor is defined as the ratio between the number of elements in the container (its size) and the number of buckets (bucket count)`. + +**`load_factor = size / bucket_count`** + +This value is a key performance metric for hash tables as it directly affects collision rates and operation time complexity. + +A lower load factor generally means fewer collisions and better performance, while a higher load factor indicates the container is becoming full, potentially degrading performance due to increased collision resolution operations. Understanding and monitoring the load factor is essential for optimizing hash table performance in performance-critical applications such as real-time systems, game engines, and high-frequency trading applications. + +## Syntax + +```pseudo +float load_factor() const; +``` + +**Parameters:** + +- None + +**Return value:** + +- Returns a floating-point value representing the container's current load factor (the ratio of size to bucket count). + +## Example 1: Basic Usage of load_factor() + +This example demonstrates the basic usage of `load_factor()` and how it changes as elements are added: + +```cpp +#include +#include +#include +#include + +int main() { + // Create an empty unordered_map + std::unordered_map inventory; + + // Display initial stats + std::cout << std::fixed << std::setprecision(3); + std::cout << "Initial state:" << std::endl; + std::cout << "Size: " << inventory.size() << std::endl; + std::cout << "Bucket count: " << inventory.bucket_count() << std::endl; + std::cout << "Load factor: " << inventory.load_factor() << std::endl; + std::cout << "Max load factor: " << inventory.max_load_factor() << std::endl; + std::cout << std::endl; + + // Insert elements and observe load factor changes + std::cout << "Inserting inventory items:" << std::endl; + for (int i = 1; i <= 8; ++i) { + inventory[i] = "Item " + std::to_string(i); + std::cout << "After inserting " << i << " items:" << std::endl; + std::cout << " Size: " << inventory.size() << std::endl; + std::cout << " Bucket count: " << inventory.bucket_count() << std::endl; + std::cout << " Load factor: " << inventory.load_factor() << std::endl; + std::cout << std::endl; + } + + return 0; +} +``` + +The output might look something like: + +```shell +Initial state: +Size: 0 +Bucket count: 1 +Load factor: 0.000 +Max load factor: 1.000 + +Inserting inventory items: +After inserting 1 items: + Size: 1 + Bucket count: 13 + Load factor: 0.077 + +After inserting 2 items: + Size: 2 + Bucket count: 13 + Load factor: 0.154 + +After inserting 3 items: + Size: 3 + Bucket count: 13 + Load factor: 0.231 + +After inserting 4 items: + Size: 4 + Bucket count: 13 + Load factor: 0.308 + +After inserting 5 items: + Size: 5 + Bucket count: 13 + Load factor: 0.385 + +After inserting 6 items: + Size: 6 + Bucket count: 13 + Load factor: 0.462 + +After inserting 7 items: + Size: 7 + Bucket count: 13 + Load factor: 0.538 + +After inserting 8 items: + Size: 8 + Bucket count: 13 + Load factor: 0.615 +``` + +The above code shows how the load factor changes as elements are inserted into the container. Initially, the load factor is 0 (empty container). As elements are added, the load factor increases. When it approaches the max load factor, the container automatically rehashes (increases the bucket count), which reduces the load factor. + +## Example 2: Performance Analysis with load_factor() + +This example demonstrates how to use `load_factor()` to analyze and optimize hash table performance: + +```cpp +#include +#include +#include +#include +#include +#include + +// Function to measure average lookup time +double measureLookupTime(const std::unordered_map& map, int iterations) { + auto start = std::chrono::high_resolution_clock::now(); + + // Perform lookups + for (int i = 0; i < iterations; ++i) { + for (int key = 1; key <= 100; ++key) { + map.find(key); + } + } + + auto end = std::chrono::high_resolution_clock::now(); + auto duration = std::chrono::duration_cast(end - start); + return static_cast(duration.count()) / (iterations * 100); +} + +int main() { + std::unordered_map cache; + + std::cout << std::fixed << std::setprecision(4); + std::cout << "Performance Analysis with Load Factor:" << std::endl; + std::cout << "| Elements | Buckets | Load Factor | Avg Lookup (μs) |" << std::endl; + std::cout << "|----------|---------|-------------|-----------------|" << std::endl; + + // Insert elements in batches and measure performance + for (int batch = 0; batch < 10; ++batch) { + // Add 50 elements per batch + for (int i = 1; i <= 50; ++i) { + int key = batch * 50 + i; + cache[key] = "CachedData" + std::to_string(key); + } + + // Measure performance + double avgTime = measureLookupTime(cache, 1000); + + std::cout << "| " << std::setw(8) << cache.size() + << " | " << std::setw(7) << cache.bucket_count() + << " | " << std::setw(11) << cache.load_factor() + << " | " << std::setw(15) << avgTime << " |" << std::endl; + } + + return 0; +} +``` + +The output will show the relationship between load factor and lookup performance: + +```shell +Performance Analysis with Load Factor: +| Elements | Buckets | Load Factor | Avg Lookup (μs) | +|----------|---------|-------------|-----------------| +| 50 | 59 | 0.8475 | 0.1296 | +| 100 | 127 | 0.7874 | 0.0939 | +| 150 | 257 | 0.5837 | 0.0935 | +| 200 | 257 | 0.7782 | 0.0948 | +| 250 | 257 | 0.9728 | 0.0983 | +| 300 | 541 | 0.5545 | 0.0930 | +| 350 | 541 | 0.6470 | 0.0930 | +| 400 | 541 | 0.7394 | 0.0755 | +| 450 | 541 | 0.8318 | 0.0728 | +| 500 | 541 | 0.9242 | 0.0944 | +``` + +This example demonstrates how monitoring the load factor helps understand performance characteristics. It shows the correlation between load factor and lookup time, revealing that performance generally improves when the load factor decreases due to rehashing. + +## Codebyte Example: Smart Cache Management + +This interactive example shows how to use `load_factor()` for intelligent cache management and performance optimization: + +```codebyte/cpp +#include +#include +#include +#include +#include + +class SmartCache { +private: + std::unordered_map cache; + static constexpr double OPTIMAL_LOAD_FACTOR = 0.6; + +public: + void addItem(int key, const std::string& value) { + cache[key] = value; + + // Check if we need to optimize + if (cache.load_factor() > OPTIMAL_LOAD_FACTOR * 1.5) { + optimize(); + } + } + + void optimize() { + std::cout << "Optimizing cache..." << std::endl; + std::cout << "Before optimization:" << std::endl; + displayStats(); + + // Set optimal max load factor + cache.max_load_factor(OPTIMAL_LOAD_FACTOR); + + std::cout << "After optimization:" << std::endl; + displayStats(); + std::cout << std::endl; + } + + void displayStats() { + std::cout << std::fixed << std::setprecision(3); + std::cout << " Size: " << cache.size() << std::endl; + std::cout << " Bucket count: " << cache.bucket_count() << std::endl; + std::cout << " Load factor: " << cache.load_factor() << std::endl; + std::cout << " Max load factor: " << cache.max_load_factor() << std::endl; + + // Calculate distribution statistics + size_t emptyBuckets = 0; + size_t maxBucketSize = 0; + + for (size_t i = 0; i < cache.bucket_count(); ++i) { + size_t bucketSize = cache.bucket_size(i); + if (bucketSize == 0) emptyBuckets++; + if (bucketSize > maxBucketSize) maxBucketSize = bucketSize; + } + + double emptyPercent = (static_cast(emptyBuckets) / cache.bucket_count()) * 100; + std::cout << " Empty buckets: " << emptyBuckets << " (" << emptyPercent << "%)" << std::endl; + std::cout << " Max bucket size: " << maxBucketSize << std::endl; + } + + std::string get(int key) { + auto it = cache.find(key); + return (it != cache.end()) ? it->second : "Not found"; + } +}; + +int main() { + SmartCache smartCache; + + std::cout << "Smart Cache Management Demo" << std::endl; + std::cout << "============================" << std::endl; + + // Initial state + std::cout << "Initial cache state:" << std::endl; + smartCache.displayStats(); + std::cout << std::endl; + + // Add items to trigger optimization + std::cout << "Adding 100 cache entries..." << std::endl; + for (int i = 1; i <= 100; ++i) { + smartCache.addItem(i, "CacheEntry_" + std::to_string(i)); + + // Show progress at certain intervals + if (i % 25 == 0) { + std::cout << "Progress: " << i << " items added" << std::endl; + smartCache.displayStats(); + std::cout << std::endl; + } + } + + // Test retrieval + std::cout << "Testing cache retrieval:" << std::endl; + std::cout << "Key 50: " << smartCache.get(50) << std::endl; + std::cout << "Key 999: " << smartCache.get(999) << std::endl; + + return 0; +} +``` + +This example demonstrates a practical scenario where monitoring and managing the load factor is crucial for maintaining optimal performance. The smart cache automatically adjusts its configuration based on the load factor to ensure efficient operations. + +## Frequently Asked Questions + +### 1. What is an ideal load factor for [`unordered_map`](https://www.codecademy.com/resources/docs/cpp/unordered-map)? + +There's no universally ideal load factor as it depends on the hash function quality, key distribution, and performance requirements. However, most implementations perform well with load factors between 0.5 and 0.75. Values around 0.6-0.7 often provide a good balance between memory usage and performance. + +### 2. Why does the load factor matter for performance? + +The load factor directly affects hash collision probability. Higher load factors increase collision likelihood, potentially degrading average-case O(1) operations to O(n) in worst-case scenarios. Lower load factors reduce collisions but may waste memory. + +### 3. How can I control the load factor in my application? + +You can control the load factor through several methods: + +- Use `max_load_factor()` to set the threshold for automatic rehashing +- Call `reserve()` to pre-allocate buckets for expected element count +- Monitor `load_factor()` to trigger manual optimizations +- Consider custom hash functions for better key distribution + +### 4. Does calling load_factor() have any performance impact? + +No, `load_factor()` is a const member function that simply returns a calculated value (size/bucket_count). It's a lightweight operation with O(1) complexity and doesn't modify the container or affect its performance.