Performance Analysis of Hyperscan with hsbench
- The number and composition of the patterns: this affects which implementation strategies Hyperscan selects when building a compiled database.
- The data being scanned; for example, performance may be affected by the rate of matches or near-matches in the data.
- Pattern flags: some pattern flags -- such as SOM or UTF8 mode -- impose performance costs.
- The scanning mode: there are optimizations available to Hyperscan in block mode, where it knows precisely how much data will be scanned, which are not available in streaming mode. Similarly, streaming mode requires bookkeeping work at stream scan boundaries that is not needed in block mode.
- The platform hardware: Hyperscan is able to take advantage of modern instruction set features, such as Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Bit Manipulation Instructions 2 (Intel® BMI2), where available.
- snort_literals: This is a set of 3,316 literal patterns extracted from the sample ruleset included with the Snort* 3 network intrusion detection system, available at https://github.com/snortadmin/snort3. Some of these are marked with the
HS_FLAG_CASELESSflag so that they match case-insensitively, and all of them use
HS_FLAG_SINGLEMATCHto limit matching to once per scan for each pattern.
- snort_pcres: This is a set of 847 regular expressions that were also extracted from the sample ruleset includes with Snort 3, taken from rules targeted at HTTP traffic. It is important to note that these are just the patterns extracted from the rules' "pcre:" options, and that scanning for them in a single pattern set with Hyperscan is not semantically equivalent to scanning for these rules within Snort; this is a sample case intended to show Hyperscan's capability for matching expressions simultaneously.
- teakettle_2500: This is a set of 2,500 synthetic pattern generated with a script that produces regular expressions of limited complexity. These are composed of dictionary words separated by character class repeats and alternations.
1:/loofas.+stuffer[^\n]*interparty[^\n]*godwit/is 2:/procurers.*arsons/s 3:/^authoress[^\r\n]*typewriter[^\r\n]*disservices/is 4:/praesidiadyeweedisonomic.*reactivating/is 5:/times/s
- gutenberg.db: A collection of English-language texts from Project Gutenberg, broken up into 10,240-byte streams of 2,048-byte blocks.
- alexa200.db: A large traffic sample constructed from a PCAP capture of an automated Web browser browsing a subset of the top sites listed on Alexa. This file contains 130,957 blocks (originally corresponding to packets), and only traffic to or from port 80 is included.
Using hsbench to Collect Performance Measurements
$ hsbench -e pcre/snort_literals -c corpora/alexa200.db -N Signatures: pcre/snort_literals Hyperscan info: Version: 4.4.1 Features: AVX2 Mode: BLOCK Expression count: 3,116 Bytecode size: 1,111,416 bytes Database CRC: 0x17dce83b Scratch size: 4,289 bytes Compile time: 0.313 seconds Peak heap usage: 551,178,240 bytes Time spent scanning: 11.767 seconds Corpus size: 514,572,016 bytes (405,004 blocks) Matches per iteration: 1,894,263 (3.770 matches/kilobyte) Overall block rate: 688,357.34 blocks/sec Overall throughput: 6,996.66 Mbit/sec
-Nargument above instructs hsbench to scan in block mode; by default, streaming mode will be used.
-nargument, and the time taken for each scan will be displayed if the
--per-scanargument is specified.
- Hyperscan info: the version of Hyperscan used to create the pattern database, the platform it was constructed for ("AVX2" in this case) and the scanning mode ("BLOCK").
- Expression count: the number of patterns in the pattern file supplied.
- Bytecode size: the size of the Hyperscan database bytecode built from the patterns. This is a compiled data structure used to match the complete set of patterns during scanning. The bytecode is immutable once built and can be shared between scanning threads.
- Database CRC: a CRC32 of the database bytecode.
- Scratch size: the size of the mutable "scratch" space required to scan data against this bytecode. Each scanning context requires its own scratch space.
- Compile time: the time requires to compile the Hyperscan database bytecode from the pattern set.
- Peak heap usage: The peak memory usage for the compilation process.
Example Performance Measurements
$ taskset 1 hsbench -e pcre/snort_literals -c corpora/alexa200.db -N
$ taskset 1 hsbench -e pcre/snort_pcres -c corpora/alexa200.db -N
$ taskset 1 hsbench -e pcre/teakettle_2500 -c corpora/gutenberg.db
|Pattern Set||Scan Corpus||Number of Patterns||Matches/kb||Blocks/sec||Megabits/sec|
|Snort literals||HTTP Traffic||3,116||3.686||541,772||5,861|
|Snort PCREs||HTTP Traffic||847||8.804||140,116||1,516|
|Teakettle 2500||Gutenberg Text||2,500||0.577||205,249||3,355|
-Targument to create multiple threads: for example,
-T 0,1,2,3will run four scanning threads, one on each of the four given cores.