Skip to content

Commit a872a2b

Browse files
authored
ggml-alloc : fix discrepency between measure&eval (#2639)
The GGML memory allocator consistently places a tensor within the optimal-fit memory block, which is the smallest block capable of accommodating the tensor's size. During the measurement phase, the final block is generously sized, ensuring it never qualifies as the optimal-fit block as long as there exists another block capable of accommodating the tensor. Nevertheless, in the evaluation phase, the last block is constrained in size and could potentially qualify as the optimal-fit block. Consequently, there exists the possibility of a tensor being allocated to a different region during evaluation, leading to more memory fragmentation in our scratch buffer. This recent commit guarantees uniform behavior of the allocator across both the measurement and evaluation phases, eliminating discrepancies between the two.
1 parent 0919a0f commit a872a2b

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

ggml-alloc.c

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -113,10 +113,10 @@ void ggml_allocr_alloc(struct ggml_allocr * alloc, struct ggml_tensor * tensor)
113113

114114
size_t max_avail = 0;
115115

116-
// find the best fitting free block
116+
// find the best fitting free block besides the last block
117117
int best_fit_block = -1;
118118
size_t best_fit_size = SIZE_MAX;
119-
for (int i = 0; i < alloc->n_free_blocks; i++) {
119+
for (int i = 0; i < alloc->n_free_blocks - 1; i++) {
120120
struct free_block * block = &alloc->free_blocks[i];
121121
max_avail = MAX(max_avail, block->size);
122122
if (block->size >= size && block->size <= best_fit_size) {
@@ -128,10 +128,17 @@ void ggml_allocr_alloc(struct ggml_allocr * alloc, struct ggml_tensor * tensor)
128128
AT_PRINTF("block %d\n", best_fit_block);
129129

130130
if (best_fit_block == -1) {
131-
fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
132-
__func__, size, max_avail);
133-
GGML_ASSERT(!"not enough space in the buffer");
131+
// the last block is our last resort
132+
struct free_block * block = &alloc->free_blocks[alloc->n_free_blocks - 1];
133+
if (block->size >= size) {
134+
best_fit_block = alloc->n_free_blocks - 1;
135+
max_avail = MAX(max_avail, block->size);
136+
} else {
137+
fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
138+
__func__, size, max_avail);
139+
GGML_ASSERT(!"not enough space in the buffer");
134140
return;
141+
}
135142
}
136143
struct free_block * block = &alloc->free_blocks[best_fit_block];
137144
void * addr = block->addr;

0 commit comments

Comments
 (0)