-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Block-Level Sequence Producer API #3333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
e43bf48
First building commit with sample matchfinder
embg 625bf62
Set up ZSTD_externalMatchCtx struct
embg a3c5c2b
move seqBuffer to ZSTD_Sequence*
embg 201588a
support non-contiguous dictionary
embg 5a01759
clean up parens
embg c6c5b5b
add clearExternalMatchfinder, handle allocation errors
embg 462e967
Add useExternalMatchfinder cParam
embg a3a5f3d
validate useExternalMatchfinder cParam
embg 8562b10
Disable LDM + external matchfinder
embg 6488d8d
Check for static CCtx
embg 50df826
Validate mState and mStateDestructor
embg f03da10
Improve LDM check to cover both branches
embg d955d16
Error API with optional fallback
embg a5241db
handle RLE properly for external matchfinder
embg 4defd0e
nit
embg f9fc1b1
Move to a CDict-like model for resource ownership
embg c126ecd
Add hidden useExternalMatchfinder bool to CCtx_params_s
embg aeb060d
Eliminate malloc, move to cwksp allocation
embg 349c36e
Handle CCtx reset properly
embg 80ec8aa
Ensure seqStore has enough space for external sequences
embg bf69b2c
fix capitalization
embg 5c4891f
Add DEBUGLOG statements
embg 72e50fa
Add compressionLevel param to matchfinder API
embg 56e6633
fix c99 issues and add a param combination error code
embg da90ae8
nits
embg 4d12960
Test external matchfinder API
embg a968472
C90 compat for simpleExternalMatchFinder
embg 80765c6
Fix some @nocommits and an ASAN bug
embg 15d909a
nit
embg 901ab34
nit
embg da18842
nits
embg ec9b6b0
forward declare copySequencesToSeqStore functions in zstd_compress_in…
embg b8b5547
nit
embg 38182da
nit
embg d9534ad
nits
embg aabfc34
Update copyright headers
embg f866d0e
Fix CMake zstreamtest build
embg 3a0efdf
Fix copyright headers (again)
embg b1e2422
typo
embg b6fe61d
Add externalMatchfinder demo program to make contrib
embg 3de6d5c
Reduce memory consumption for small blockSize
embg fc37297
ZSTD_postProcessExternalMatchFinderResult nits
embg 029ba01
test sum(matchlen) + sum(litlen) == srcSize in debug builds
embg c0be839
refExternalMatchFinder -> registerExternalMatchFinder
embg f4685d2
C90 nit
embg 153be31
zstreamtest nits
embg beab112
contrib nits
embg b6d48a2
contrib nits
embg a40cea4
allow block splitter + external matchfinder, refactor
embg 31260d8
add windowSize param
embg f89cedc
add contrib/externalMatchfinder/README.md
embg 628755c
docs
embg b12c8e4
go back to old RLE heuristic because of the first block issue
embg 12b7ca1
fix initializer element is not a constant expression
embg 4b15448
ref contrib from zstd.h
embg c2574e7
extremely pedantic compiler warning fix, meson fix, typo fix
embg 8052b10
Additional docs on API limitations
embg 1e60543
minor nits
embg 49cd2e8
Refactor maxNbSeq calculation into a helper function
embg 241f2a7
Fix copyright
embg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# build artifacts | ||
externalMatchfinder |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# ################################################################ | ||
# Copyright (c) Yann Collet, Meta Platforms, Inc. | ||
# All rights reserved. | ||
# | ||
# This source code is licensed under both the BSD-style license (found in the | ||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found | ||
# in the COPYING file in the root directory of this source tree). | ||
# ################################################################ | ||
|
||
PROGDIR = ../../programs | ||
LIBDIR = ../../lib | ||
|
||
LIBZSTD = $(LIBDIR)/libzstd.a | ||
|
||
CPPFLAGS+= -I$(LIBDIR) -I$(LIBDIR)/compress -I$(LIBDIR)/common | ||
|
||
CFLAGS ?= -O3 | ||
CFLAGS += -std=gnu99 | ||
DEBUGFLAGS= -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow \ | ||
-Wstrict-aliasing=1 -Wswitch-enum \ | ||
-Wstrict-prototypes -Wundef -Wpointer-arith \ | ||
-Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings \ | ||
-Wredundant-decls | ||
CFLAGS += $(DEBUGFLAGS) $(MOREFLAGS) | ||
|
||
default: externalMatchfinder | ||
|
||
all: externalMatchfinder | ||
|
||
externalMatchfinder: matchfinder.c main.c $(LIBZSTD) | ||
$(CC) $(CPPFLAGS) $(CFLAGS) $^ $(LDFLAGS) -o $@ | ||
|
||
.PHONY: $(LIBZSTD) | ||
$(LIBZSTD): | ||
$(MAKE) -C $(LIBDIR) libzstd.a CFLAGS="$(CFLAGS)" | ||
|
||
clean: | ||
$(RM) *.o | ||
$(MAKE) -C $(LIBDIR) clean > /dev/null | ||
$(RM) externalMatchfinder |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
externalMatchfinder | ||
===================== | ||
|
||
`externalMatchfinder` is a test tool for the external matchfinder API. | ||
It demonstrates how to use the API to perform a simple round-trip test. | ||
|
||
A sample matchfinder is provided in matchfinder.c, but the user can swap | ||
this out with a different one if desired. The sample matchfinder implements | ||
LZ compression with a 1KB hashtable. Dictionary compression is not currently supported. | ||
|
||
Command line : | ||
``` | ||
externalMatchfinder filename | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
/* | ||
* Copyright (c) Yann Collet, Meta Platforms, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under both the BSD-style license (found in the | ||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found | ||
* in the COPYING file in the root directory of this source tree). | ||
* You may select, at your option, one of the above-listed licenses. | ||
*/ | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
#include <string.h> | ||
#include <assert.h> | ||
|
||
#define ZSTD_STATIC_LINKING_ONLY | ||
#include "zstd.h" | ||
#include "zstd_errors.h" | ||
#include "matchfinder.h" // simpleExternalMatchFinder | ||
|
||
#define CHECK(res) \ | ||
do { \ | ||
if (ZSTD_isError(res)) { \ | ||
printf("ERROR: %s\n", ZSTD_getErrorName(res)); \ | ||
return 1; \ | ||
} \ | ||
} while (0) \ | ||
|
||
int main(int argc, char *argv[]) { | ||
if (argc != 2) { | ||
printf("Usage: exampleMatchfinder <file>\n"); | ||
return 1; | ||
} | ||
|
||
ZSTD_CCtx* const zc = ZSTD_createCCtx(); | ||
|
||
int simpleExternalMatchState = 0xdeadbeef; | ||
|
||
// Here is the crucial bit of code! | ||
ZSTD_registerExternalMatchFinder( | ||
zc, | ||
&simpleExternalMatchState, | ||
simpleExternalMatchFinder | ||
); | ||
|
||
{ | ||
size_t const res = ZSTD_CCtx_setParameter(zc, ZSTD_c_enableMatchFinderFallback, 1); | ||
CHECK(res); | ||
} | ||
|
||
FILE *f = fopen(argv[1], "rb"); | ||
embg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
assert(f); | ||
{ | ||
int const ret = fseek(f, 0, SEEK_END); | ||
assert(ret == 0); | ||
} | ||
size_t const srcSize = ftell(f); | ||
{ | ||
int const ret = fseek(f, 0, SEEK_SET); | ||
assert(ret == 0); | ||
} | ||
|
||
char* const src = malloc(srcSize + 1); | ||
assert(src); | ||
{ | ||
size_t const ret = fread(src, srcSize, 1, f); | ||
assert(ret == 1); | ||
int const ret2 = fclose(f); | ||
assert(ret2 == 0); | ||
} | ||
|
||
size_t const dstSize = ZSTD_compressBound(srcSize); | ||
char* const dst = malloc(dstSize); | ||
embg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
assert(dst); | ||
|
||
size_t const cSize = ZSTD_compress2(zc, dst, dstSize, src, srcSize); | ||
CHECK(cSize); | ||
|
||
char* const val = malloc(srcSize); | ||
assert(val); | ||
|
||
{ | ||
size_t const res = ZSTD_decompress(val, srcSize, dst, cSize); | ||
CHECK(res); | ||
} | ||
|
||
if (memcmp(src, val, srcSize) == 0) { | ||
printf("Compression and decompression were successful!\n"); | ||
printf("Original size: %lu\n", srcSize); | ||
printf("Compressed size: %lu\n", cSize); | ||
} else { | ||
printf("ERROR: input and validation buffers don't match!\n"); | ||
for (size_t i = 0; i < srcSize; i++) { | ||
if (src[i] != val[i]) { | ||
printf("First bad index: %zu\n", i); | ||
break; | ||
} | ||
} | ||
return 1; | ||
} | ||
|
||
ZSTD_freeCCtx(zc); | ||
free(src); | ||
free(dst); | ||
free(val); | ||
return 0; | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
/* | ||
* Copyright (c) Yann Collet, Meta Platforms, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under both the BSD-style license (found in the | ||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found | ||
* in the COPYING file in the root directory of this source tree). | ||
* You may select, at your option, one of the above-listed licenses. | ||
*/ | ||
|
||
embg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#include "zstd_compress_internal.h" | ||
#include "matchfinder.h" | ||
|
||
#define HSIZE 1024 | ||
static U32 const HLOG = 10; | ||
static U32 const MLS = 4; | ||
static U32 const BADIDX = 0xffffffff; | ||
|
||
size_t simpleExternalMatchFinder( | ||
void* externalMatchState, | ||
ZSTD_Sequence* outSeqs, size_t outSeqsCapacity, | ||
const void* src, size_t srcSize, | ||
const void* dict, size_t dictSize, | ||
int compressionLevel, | ||
size_t windowSize | ||
) { | ||
const BYTE* const istart = (const BYTE*)src; | ||
const BYTE* const iend = istart + srcSize; | ||
const BYTE* ip = istart; | ||
const BYTE* anchor = istart; | ||
size_t seqCount = 0; | ||
U32 hashTable[HSIZE]; | ||
|
||
(void)externalMatchState; | ||
(void)dict; | ||
(void)dictSize; | ||
(void)outSeqsCapacity; | ||
(void)compressionLevel; | ||
|
||
{ int i; | ||
for (i=0; i < HSIZE; i++) { | ||
hashTable[i] = BADIDX; | ||
} } | ||
|
||
while (ip + MLS < iend) { | ||
size_t const hash = ZSTD_hashPtr(ip, HLOG, MLS); | ||
U32 const matchIndex = hashTable[hash]; | ||
hashTable[hash] = (U32)(ip - istart); | ||
|
||
if (matchIndex != BADIDX) { | ||
const BYTE* const match = istart + matchIndex; | ||
U32 const matchLen = (U32)ZSTD_count(ip, match, iend); | ||
if (matchLen >= ZSTD_MINMATCH_MIN) { | ||
U32 const litLen = (U32)(ip - anchor); | ||
U32 const offset = (U32)(ip - match); | ||
ZSTD_Sequence const seq = { | ||
offset, litLen, matchLen, 0 | ||
}; | ||
|
||
/* Note: it's crucial to stay within the window size! */ | ||
if (offset <= windowSize) { | ||
outSeqs[seqCount++] = seq; | ||
ip += matchLen; | ||
anchor = ip; | ||
continue; | ||
} | ||
} | ||
} | ||
|
||
ip++; | ||
} | ||
|
||
{ ZSTD_Sequence const finalSeq = { | ||
0, (U32)(iend - anchor), 0, 0 | ||
}; | ||
outSeqs[seqCount++] = finalSeq; | ||
} | ||
|
||
return seqCount; | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
/* | ||
* Copyright (c) Yann Collet, Meta Platforms, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under both the BSD-style license (found in the | ||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found | ||
* in the COPYING file in the root directory of this source tree). | ||
* You may select, at your option, one of the above-listed licenses. | ||
*/ | ||
|
||
#ifndef MATCHFINDER_H | ||
#define MATCHFINDER_H | ||
|
||
#define ZSTD_STATIC_LINKING_ONLY | ||
#include "zstd.h" | ||
|
||
size_t simpleExternalMatchFinder( | ||
void* externalMatchState, | ||
ZSTD_Sequence* outSeqs, size_t outSeqsCapacity, | ||
const void* src, size_t srcSize, | ||
const void* dict, size_t dictSize, | ||
int compressionLevel, | ||
size_t windowSize | ||
); | ||
|
||
#endif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.