ugrep pattern match testing and validation

Setup

Compile two small utilities pick and trickle to support the tests:

$ ./setup.sh

Run

Run the extensive barrage of tests, which can take hours to complete:

$ ./run.sh
...
OK

Reports OK or halts when an error is detected, where temp_words.txt is the set of words seaarched as a pattern and temp_results.txt is the output with a problem.

The bulk of the tests are designed go through all possible pattern match methods and optimizations with randomized patterns. This is independent of the ugrep command line options. The SIMD optimizations with which ugrep was compiled are tested, when SIMD is enabled, which is one of SSE2, AVX2, AVX512BW, NEON, or AArch64.

Data

words extracted from enwik8 with ugrep -iwo '[a-z]+' enwik8 | sort -u > words
enwik8 100MB Wikipedia file

Note: we pick words of a specific byte length or within a byte length range to test with. Therefore, we test with ASCII words only to match byte lengths. This has no impact on validation of the internal byte-based pattern match methods that don't care what bytes represent, i.e. ASCII or UTF-8 or raw binary.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
enwik8		enwik8
pick.cpp		pick.cpp
run.sh		run.sh
setup.sh		setup.sh
test-ugrep.sh		test-ugrep.sh
trickle.c		trickle.c
words		words

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ugrep pattern match testing and validation

Setup

Run

Data

About

Uh oh!

Releases

Languages

License

Genivia/ugrep-testing

Folders and files

Latest commit

History

Repository files navigation

ugrep pattern match testing and validation

Setup

Run

Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages