GitHub - BurntSushi/ripgrep: ripgrep recursively searches directories for a regex pattern
ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern while respecting your gitignore rules. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep.
stealth/grab: experimental and very fast implementation of a grep
parallel variant of grep optimized for SSD's and large directory tree searches
grep lower case upper case sequence manipulation
Insert space between lowercase character and uppercase character in BBEdit text file.
Using a GREP search to add a space between a lowercase and uppercase character
Using Grep in BBEdite to find and insert a space before a capitalized letter.
Is Prefix Of String In Table? A Journey Into SIMD String Processing.

Wrote some C and assembly code that uses SIMD instructions to perform prefix matching of strings. The C code was between 4-7x faster than the baseline implementation for prefix matching. The assembly code was 9-12x faster than the baseline specifically for the negative match case (determining that an incoming string definitely does not prefix match any of our known strings). The fastest negative match could be done in around 6 CPU cycles, which is pretty quick. (Integer division, for example, takes about 90 cycles.)


Goal: given a string, determine if it prefix-matches a set of known strings as fast as possible. That is, in a set of known strings, do any of them prefix match the incoming search string?

A reference implementation was written in C as a baseline, which simply looped through an array of strings, comparing each one, byte-by-byte, looking for a prefix match. Prefix match performance ranged from 28 CPU cycles to 130, and negative match performance was around 74 cycles.

A SIMD-friendly C structure called STRING_TABLE was derived. It is optimized for up to 16 strings, ideally of length less than or equal 16 characters. The table is created from the set of known strings up-front; it is sorted by length, ascending, and a unique character (with regards to other characters at the same byte offset) is then extracted, along with its index. A 16 byte character array, STRING_SLOT, is used to capture the unique characters. A 16 element array of unsigned characters, SLOT_INDEX, is used to capture the index. Similarly, lengths are stored in the same fashion via SLOT_LENGTHS. Finally, a 16 element array of STRING_SLOTs is used to capture up to the first 16 bytes of each string in the set.
