=Paper= {{Paper |id=Vol-3039/paper17 |storemode=property |title=A Survey on Hardware Solutions for Signature-Based Security Systems |pdfUrl=https://ceur-ws.org/Vol-3039/paper17.pdf |volume=Vol-3039 |authors=Serhii Hilgurt |dblpUrl=https://dblp.org/rec/conf/ittap/Hilgurt21 }} ==A Survey on Hardware Solutions for Signature-Based Security Systems== https://ceur-ws.org/Vol-3039/paper17.pdf
A Survey on Hardware Solutions for Signature-Based Security
Systems
Serhii Ya. Hilgurta
a
    Pukhov Institute for Modelling in Energy Engineering of NAS of Ukraine, 15, General Naumov str., Kyiv, 03164,
    Ukraine


                 Abstract
                 Signature approach due to its exact matching is still a relevant technology used in applications
                 like network intrusion detection systems (NIDS), antivirus, anti-spam, worm-containment and
                 other security systems. Basing on signature approach, multi-pattern string matching technology
                 by comparing an input data stream against a set of predefined patterns, detects malicious
                 activities. Due to the increasing signature database size and the high throughput of today’s
                 networks the traditional software alone solutions can no longer meet the performance
                 requirements of such systems. Therefore, many hardware approaches are proposed to
                 accelerate the computational-intensive pattern matching procedure. On the other hand, a
                 developer of hardware-accelerated security system has to solve non-trivial problem of choosing
                 the most suitable approach in each particular case. Thus, some kind of guidance is needed to
                 facilitate this problem solving. This paper provides a comprehensive survey on the hardware
                 solutions were proposed in this area for the past years focusing on their effectiveness and
                 feasibility. A deep analysis of main directions of security systems hardware acceleration has
                 done. The most promising direction based on the use of FPGA and programmable logic is more
                 closely investigated.

                 Keywords1
                 Security, signature, multi-pattern matching, NIDS, FPGA, CAM, Bloom filter, Aho–Corasick

1. Introduction
    Whereas the propagation of Internet and network technologies in all spheres of human activities
gains in many benefits the increase in number and sophistication of attacks against the network
infrastructure and computer systems significantly escalates risks of information security compromising.
Therefore, many technologies and techniques are offered to make information systems more robust and
secure.
    Despite the great progress made in recent years by AI-based approaches, in particular, by deep neural
networking, security tools based on such methods still suffer from nonzero recognition error probability.
Even low probability of such error is able to disrupt the correct operation of the security information
system. In critical infrastructure such effects can have disastrous consequences. Therefore, signature-
based recognition methods with their theoretically exact match are still relevant when creating
information security systems.
    Signature-based technology relies on using so called signatures – descriptions of known attacks that
are compared against the intensive flow of input information (coded by bytes). Checking every byte of
every data element to see if it matches one of hundreds thousand and millions strings becomes a
computationally-intensive task as network speeds grow into the tens and hundreds of gigabits/second.
Thus, the multi-pattern string matching task has been a major performance bottleneck in information

ITTAP’2021: 1st International Workshop on Information Technologies: Theoretical and Applied Problems, November 16–18, 2021, Ternopil,
Ukraine
EMAIL: hilgurt@ukr.net (S.Ya. Hilgurt)
ORCID: 0000-0003-1647-1790 (S.Ya. Hilgurt)
              © 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
security systems which have to scan the incoming data in real time [1]. There is an ever-widening
performance gap between the security systems processing requirements and their software
implementations because of the speed limitations of sequential software execution on standard CPUs
[2]. To keep up with these speeds specialized devices built on hardware-oriented solutions are required.
    Literature review demonstrates the high variety of technical solutions aimed to the hardware
acceleration of multi-pattern string matching based on different computing platforms, which use
approaches to building matching schemes of very different nature. Thus, choosing a solution to be most
suitable for particular application is a complex and cunning problem that developers of signature-based
security systems aim to solve effectively.
    This paper provides some kind of guidance for such developers through analyzing, systematizing
and classifying known platforms, approaches, techniques and customized solutions are invented by
researchers over the world to realize the multi-pattern matching task in hardware. The goal of this work
is also to facilitate the process of creating signature-based hardware-accelerated security systems by
becoming familiar with their features and peculiarities.
    To continue it is necessary to decide on some terms and definitions.

2. Terms and definitions
    Some of the definitions required to work with signature-based information security tools are as
follows.
    The signature in this paper is considered as aggregate information about a specific threat, stored in
the signature database of the correspondent security system.
    The term pattern refers to a sample of a text string (a fixed sequence of characters in a certain
encoding), which is part of the signature and is to be find in the input data.
    Pattern dictionary – all the patterns that the database of signatures contains.
    Pattern set – patterns that only some signatures contain that are selected for a certain purpose.
    The mathematical formulation of the task of multi-pattern string matching can be defined as follows.
D. Gusfield in the monograph [3], page 53, defines it as generalization of the task of exact matching:
"An immediate and important generalization of the exact matching problem is to find all occurrences
in text T of any pattern in a set of patterns P = {P1, P2, …, PZ}".
    So, given text T that consists of written in the line n symbols: "C1C2C3 … Cn", and the set of patterns
P = {Р1, Р2, …, PZ}, where Z is its size, and each pattern Pi (i = 1, 2, …, Z) consists of written in the
line m symbols: "C1C2C3 … Cm", where m is the length of the corresponding pattern (different for each
of them in the General case), the task of multi-pattern string matching is to identify all occurrences in
the text T any of the patterns of Ri from the set P.
    The fundamental difference multi-pattern string matching task from the single string recognition,
which is effectively solved by the classical algorithms Knuth–Morris–Pratt, Boyer–Moore, Karp–
Rabin, etc., is in the need to search not one pattern, but many patterns simultaneously.
    In a practical sense, the task of multi-pattern matching is to provide answers to the following
questions:
     whether there are substrings in the text T, that matches character by character with any of the
         patterns Pi;
     if any – with which ones;
     how many matches for each of the patterns is detected;
     in which position (relative to the beginning of the text T) is each of the matching substrings.
    Note that the sequences of characters corresponding to the patterns in the input data may not only
be repeated, but also partially overlap, i.e. the end of the suspicious substring may match the beginning
of another. Such situations must also be revealed when solving the multi-pattern matching task [4].

3. HPC vs. accelerators
  Traditional way to sole computationally-intensive problems is use of high performance computing
means as supercomputers, clusters, GRID and cloud. Unfortunately, the main feature of the entire HPC
segment is the apartness of their computing resources from the place where the calculation results are
needed. The large volume and intensity of information to be processed when performing multi-pattern
matching do not allow the use of the above-mentioned high-performance solutions. In other words, the
local nature of security tasks in fact makes it impossible to apply HPC equipment.
   Instead, it turned out to be more acceptable to use various joined coprocessors (accelerators), which
are located directly at the place of the problem, so do not require quick transfer of large amounts of data
over long distances. The principles of construction and capabilities of such devices vary considerably.
But the main feature that unites their diversity is the use of hardware solutions instead of software,
because the main purpose of such devices is to overcome the limitations faced by software solutions,
as mentioned above.

4. Hardware platforms for multi-pattern matching
   Devices that theoretically can be used as a hardware platform for solving the multi-pattern matching
task include:
    specialized coprocessors based on ASIC;
    network processors (NP);
    ternary content-addressable memory (TCAM);
    accelerators based on multi-core processors;
    graphics accelerators (GPGPU);
    based on FPGA reconfigurable accelerators (RA).
   Let's consider the features of these technologies and their possibilities to solve the multi-pattern
matching task for further comparison.

4.1.    Specialized coprocessors based on ASIC
    In specialized hardware coprocessors, due to the constriction of the target area, the maximum
acceleration is achieved the microelectronic technology can theoretically provide at present. The ASICs
(Application-Specific Integrated Circuits) is the base of modern special processors that are in fact the
top of the semiconductor circuit evolution, starting with small- and medium-scale integration devices,
whose electrical circuits have been redesigned for each application up to complex nanometer process
products, which allow developers to synthesize the most complex computing schemes.
    Creation of special hardware devices for solving the information security tasks on the basis of ASIC
chips is quite possible, and such developments exist. Due to specialization, such solutions have the best
performance and energy consumption characteristics compared to other technologies. But the
production of each ASIC is a resource-intensive process both in terms of cost and time. The fixed
internal structure of such integrated circuits does not allow making changes in already made products.
Properly speaking, a special processors based on ASIC is not a joined accelerator in the conventional
sense, because products on "fixed" logic a-priory cannot be universal. Therefore, the use of special
ASIC-based processors is reasonable under the condition of sufficiently large production volumes.

4.2.    Network processors
   Network processors were created to accelerate the execution of specialized network tasks, including
multi-pattern matching for signature-based information security systems. A typical NP in addition to
the components of a conventional network adapter contains a CPU, usually of VLIW- or RISC-
architecture, multi-port memory and additional logic resources to perform typical network operations.
On such a device it is convenient to organize computing, for instance, according to the multi-pattern
matching algorithm Aho–Corasick by using additional logic for a finite automaton control unit, the
memory – to store transition table and the CPU – to create and load these tables, and to perform general
control functions [5].
   There are also NPs that includes content-addressable memory chips with three states – the ternary
content-addressable memory (see the next subsection).
   But despite the fact that NPs were designed specifically for network problems, the fixed architecture
and limited computing resources have led to a decline in interest for this type of accelerators, and they
have not actually been used now.

4.3.    TCAM-memory
    The tertiary content-addressable memory, strictly speaking, is not a separate computing platform
[6], [7]. But the speed capabilities of TCAM (response time in nanosecond range) are of some interest
and distinguish it from other hardware that focuses on the algorithmic performance of multi-pattern
matching. A number of methods have been developed to use TCAM for multiple matching, but without
addressing its practical implementation as part of an accelerator.
    In contrast to conventional binary-, ternary-memory provides limited possibility for flexible
matching based on regular expression [6]. There are also several comparative studies in which TCAM
is considered as a computing technology, so this type of equipment was moved in this paper to a separate
section. Disadvantages of the approach based on using TCAM chips are: a fixed architecture and a limit
on the pattern dictionary volume.

4.4.    Multi-core processors
    A typical multi-core device is the Intel Xeon Phi product [8]. The main advantage of this technology
is ease of programming. The similarity to conventional processors allows using well known and verified
algorithmic approaches for multi-pattern matching. Strong computing capabilities of multi-core
processors simplify the implementation of complex procedures of multi-pattern matching, such as
flexible recognition using regular expressions [9].
    Disadvantages include the well-known shortcomings of microprocessor technology – fixed internal
structure and increased power consumption, as well as difficulties in organizing data exchange with the
central (control) computer and between processor cores, that complicates the parallelization of multi-
pattern matching task.

4.5.    Graphic technology GPGPU
    The technology of using graphics accelerators not for the main purpose, but to perform resource-
intensive computing was called GPGPU (General-Purpose Graphics Processing Units).
    The process of displaying graphical information on the screen of a personal computer initially was
more complex than the actual data processing in the CPU. And this task with the development of the
PC only continued to get more complicated. The market for graphics adapters was equal to market PCs,
so manufacturers have invested heavily in the development of graphics processing units. The most
powerful products in terms of functionality and performance contained an increasing number of
components capable to process information independently. In order to unify technical solutions and
simplify their management processes, well-known low-level programming techniques were used. As a
result it became unexpectedly possible not only to send data to the GPU for processing, but also to
retrieve results of data processing back.
    Modern GPUs consist of many independent processing elements, capable to simultaneously process
a large number of data streams [10]. Processing elements can be programmed independently or similar
to vector architecture. That is, GPGPUs can function both as SIMD (Single Instruction Multiple Data)
architecture and as a SPMD (Signal Program Multiple Data) structure, which is a kind of the MIMD
(Multiple Instruction, Multiple Data) architecture class. Like multi-core processors, GPGPUs realize
mainly algorithmic methods for solving multi-pattern matching tasks.
    Disadvantages of GPGPU technology also include a fixed internal structure (moreover optimized
for graphics problems) and excessive power consumption. The advantages of this platform are: high
overall performance and fine-tuned interaction with the central computer.
    Therefore, GPGPU platform can be effectively used to solve the multi-pattern matching task.
4.6.    FPGA-based reconfigurable accelerators
    The most flexible and universal tool for creating digital circuits, which allows developing computing
devices for any purpose at almost any level of complexity is programmable logic [11]. The ability of
modern FPGAs (which already contain millions of equivalent gates) to synthesize information
processing devices by any principles, approaches and computational architectures gives the developer
a wide space to choose and implement the most appropriate direction for creating recognition tools.
Namely the possibility to implement the latest methods and algorithms for information processing in
digital devices should be considered a significant advantage of programmable logic.
    Because the computing scheme created in FPGA is in fact a specialized device, it does not consume
excess electrical power to support universal components, which makes RAs a more energy saving
option compared to other platforms.
    The use of RA as joined accelerator allows building high-performance computing system on the
basis of universal computer.
    The presence of a large number of ready-made products that can be used as RA is an advantage of
reconfigurable platform too. An electronic resource [12], numbering more than a thousand items, makes
it possible to estimate the number and variety of reconfigurable accelerators. Among the most well-
known manufacturers of RAs are: Nallatech, Xilinx, Alpha Data, National Instruments, DiNI Group
and many others.
    However the main advantage of the direction based on the FPGAs and RAs should be considered
the ability to quickly (in a few seconds) change the functionality of the security device.
    The disadvantages of the programmable logic platform include the relative high cost and complexity
of digital schemes synthesis and FPGA configuration files generation.

5. Hardware platforms comparison
    As mentioned above, network processors and ternary content-addressable memory chips have
recently lost interest from developers and are no longer used. These platforms were considered only for
completeness of the study. Thus, for comparison, ASIC, GPGPU, Multi-Core and FPGA technologies
remain.
    The problem of choosing the most suitable hardware platform for signature-based security system
is complicated. A rigorous and well-grounded solution would be possible if for each platform there
were examples of experimental development of the same task in the same formulation, detailed to the
quantitative level. Unfortunately, no information about such developments has been found in scientific
publications and practical reports. In the best case, two or three platforms were compared, and
furthermore, one of them often was a traditional solution on the CPU. Moreover, the comparison was
mainly carried out qualitatively rather than quantitatively.
    Using the available sources of all types, as well as other accessible information and logical
considerations, let's conduct with the maximum degree of objectivity a comparative analysis of the
above hardware platforms for solving signature-based security tasks.
    One of the only works where almost all technologies are compared [7] provides a qualitative
assessment of the parameters of physical constraints, system design and performance (Table 1). The
study [13] qualitatively presents advantages and disadvantages of Multi-Core, ASIC and FPGA
technologies (Table 2). Quantitative comparison of certain developments on Multi-Core, GPGPU,
ASIC and FPGA platforms can be found in [14]. Some data from this publication are given in Table 3.
    Note that in the Table 3 the clock speeds of Multi-Core, GPGPU and ASIC projects are significantly
exceed the frequencies of FPGA-based solutions. This indicates the high potential of the reconfigurable
platform, for which, unlike other areas, the technological limit has not yet been reached, and i.e. the so-
called Moore's Law still works for the FPGA platform.
    Indeed, in one of more recent sources [15], it is reported that the corresponding implementation of
a specific signature-based network security task based on FPGA, according to the results of a number
of experiments, outperformed a traditional processor by 24-89 times, a 12-core processor by 4.7-7.4
times. Comparison of the reconfigurable solution with the implementation of the same task on the
GPGPU platform with 200 parallel streams showed an advantage of FPGA in bandwidth by 3.14 times,
propagation delay by 1020 times, and power consumption by 106 times.


Table 1
Implementation alternatives for signature matching
            Parameters                   ASIC       FPGA             GPU         NP             CPU
                         Cost           Highest    Medium            Low       Medium-          Low
                                                                                Low
    Physical        Power efficiency      Highest       Low-         High      Medium         Lowest
   constraints                                         Medium
                     Area efficiency      Highest       Worst       High        Medium         Low
                       Scalability         High         Low        Medium        High          High
                        Flexibility       Worst        Medium      Medium       Medium         Best
  System design
                      Design time         Highest      Medium       Low          Low          Lowest
                           Peak           Highest      Medium      Medium       Medium        Lowest
                      performance
  Performance
                       Application        Highest      Medium       High-       Medium        Lowest
                      performance                                  Medium


Table 2
Comparison among hardware platform solutions
  Platform       Multi-Core Processors               ASIC                                 FPGA
              Can enhance the aggregate Provide impressively high                Provide desirable high
              throughput dramatically by    per-stream throughput             performance, flexibility of
 Advantage      using a large number of                                               software and
              threads to process multiple                                            reconfigurable
                input stream in parallel                                              programming
                Additional Complexity is      The applicability is            It takes considerable time
               introduced in scheduling,      limited by the high                 to resynthesize the
Disadvantage
             buffering ordering, and load implementation cost and                design and reprogram
                       balancing            low reprogrammability                   the FPGA device


Table 3
Quantitative comparison
                                                    Num. of     Patterns      Clock rate    Throughput
   Approach                 Platform
                                                    Pattern      length         (MHz)         (Gbps)
    AC -DFA          Multi-Core (Cell/B.E.)          8400          ≤10           3200           2.5
    AC -DFA        GPGPU (GeForce 8600GT)            4000          ≤25           1200           2.3
     CDFA              ASIC (0.18 µm)                1785       No limit          763           6.1
    Bit-Split               FPGA                     1316       No limit      220 (200)         1.76
     B-FSM                  FPGA                    ∼8000       No limit      138 (125)         2.2
 Field-Merge                  FPGA                  6944         < 64            285             4.56
 Multi-Pipeline               FPGA                  9033        No limit         178            11.4

   In summary, we reasonably conclude that FPGA-based reconfigurable accelerators as a hardware
platform best meet the complex and dynamic nature of information security tasks, outperforming
competing technologies in terms of flexibility and cost-effectiveness. Therefore, the rest of the paper is
devoted to a more detailed consideration of this technology and a comparison of the approaches used
when constructing multi-pattern matching schemes on reconfigurable accelerators.

6. FPGA-based reconfigurable accelerators
   Programmable logic integrated circuits have been used successfully in engineering for a long time.
   The form of implementation of the FPGA-based equipment as joined co-processors is a
reconfigurable accelerator (Figure 1).




Figure 1: Appearance of a typical reconfigurable accelerator

    The structure of such a device contains both mandatory and additional components [16]. Every RA
contains at least one FPGA chip. Another mandatory component of the RA is onboard random access
memory (RAM) for storing intermediate results of calculations, which are usually used standard chips
of large capacity DDR-memory.
    A separate hardware interface controller is also highly recommended to be an RA component. In
addition to communicating with the host system's central processor, this controller also provides FPGA
chip programming, i.e. loads the configuration file (bitstream) into it.
    An important additional component of RA is the terminal equipment of communication channels for
direct high-speed exchange with the external environment, bypassing the CPU of the host system.
    Today, the world produces a large number of FPGA-based devices that can be used as RA and
implemented for solving information security tasks. These devices can range from high-performance,
general-purpose reconfigurable accelerators to less functional but more affordable products such as
starter kits, trainer boards, evaluation kits, development boards and prototype plates. The Table 4
exemplifies several reconfigurable accelerators, which cover a wide range of devices in terms of
specifications and functionality.
    Due to the relatively long obsolescence of RAs, accelerators with very different capabilities are used
concurrently. As we can see, some parameters of RA devices in the table vary by several orders of
magnitude.

7. Constructing reconfigurable multi-pattern match circuits
   Many different approaches to constructing hardware matching circuits on FPGAs were proposed
recently. Hundreds researches publish thousands of papers devoted to this scientific direction. To
understand this diversity and correctly assess the possibilities of various solutions, it is necessary to set
the appropriate efficiency criteria. Before deciding on the indicators, let us consider the most prevalent
class of protection tools - network intrusion detection systems as an example of a signature-based
security system.
Table 4
Specifications for typical RAs
  Manufacturer          Nallatech       Nallatech          Xilinx          Digilent         Digilent
                                                                         NetFPGA-1G-
Model of RA            XUP-VV8            385A           VC709 Kit                            Atlys
                                                                             CML
                                                                         Development
Kind of RA            Full-size RA     Full-size RA    Evaluation Kit                     Trainer Board
                                                                            Board
                     Xilinx Virtex    Intel (Altera)                                         Xilinx
                                                       Xilinx Virtex-7   Xilinx Kintex-
Type of FPGA         UltraScale+       Arria 10 GX                                         Spartan-6
                                                           VX690T        7 XC7K325T
                        VU13P             1150                                               LX45
Number        of
                      3 780 000         1 150 000         693 120          326 080           43 661
logical cells
BRAM size, Kbit           36               20                36               36               18
Total volume of
BRAM memory,             94,5             54,3              52,9             16,0              2,1
Mbit
Type of onboard
                    DDR4 SDRAM        DDR3 SDRAM       DDR3 SDRAM        DDR3 SDRAM       DDR2 SDRAM
RAM
Onboard RAM
                          512              32                8              0,512            0,128
volume, GB
                      4x QSFP-DD       2x QSFP28           4x SFP+       4x RJ-45 PHY     1x RJ-45 PHY
Network ports         (2x 100 Gb -      (100 Gb -         (10 Gb -           (1 Gb -         (1 Gb -
                     Ethern. each)    Ethern. each)    Ethern. each)     Ethern. each)      Ethern.)
Communication         PCI-Express      PCI-Express      PCI-Express       PCI-Express
                                                                                            USB 2.0
interface              x16 Gen.3        x8 Gen.3          x8 Gen.3         x16 Gen.2
                        Dual-slot                       Single-slot       Single-slot
                                        Single-slot
                      PCI-E board                        PCI-E card        PCI-E card     stand-alone
                                       PCI-E board
Form factor            standard-                         standard-         standard-      board 120 x
                                       half-height
                         height                            height/           height         133 mm
                                       half-length
                       3/4-length                          length         3/4-length
Cooling                  Passive         Active            Passive             No            Passive
Price, $                  8995           5995               4995              1499            490



7.1.    The structure of reconfigurable NIDS
    NIDS were historically the first and, consequently, the most studied FPGA-based tools of
information security [17]. Therefore, without losing the generality of reasoning, consider the typical
functions and efficiency parameters of reconfigurable security systems on the example of such systems.
The structure and composition of the reconfigurable network intrusion detection system (Figure 2) can
be compiled as a generalization of numerous published scientific solutions (see for example [17], [18]).
    The Matching module is the most important component of NIDS. It solves the computationally
complex task of multi-pattern string matching, i.e. checks the content of network packets against pattern
set. As we can see the signatures is “wired” into the circuitry of the device at hardware level. This
feature ensures the highest performance rate due to the maximally possible parallelism has been
reached.

7.2.    The efficiency parameters of reconfigurable NIDS
   There are three main groups of efficiency parameters (or indicators) of reconfigurable NIDS [19],
[20]: cost parameters, parameters of speed or performance parameters, functional parameters.
Figure 2: The structure of the reconfigurable network intrusion detection system

    Cost indicators are as follows:
     the amount of logical resources of programmable logic needed to create a digital circuit;
     memory costs of three types: onboard memory of RA, internal memory of FPGA (BRAM) and
         distributed memory of FPGA (flip-flops in logical cells);
     other costs, which make up the total cost of ownership, including the development, manufacture
         and programming costs.
    Performance parameters include:
     the volume of the pattern dictionary (i.e. the number of patterns to be matched);
     the speed of the system (which is defines as either the delay of data propagation from input to
         output or as bandwidth);
     the predictability of bandwidth as well.
    Functional indicators include:
     the ability of NIDS to work in the mode of network intrusion prevention system (NIPS);
     the ability to dynamically update the patterns without interrupting the matching process;
     ability to selective recognition (the ability to change a subset of recognizable patterns by an
         external signal);
     the ability to counter attacks targeted at the NIDS itself,
and others.
    An important intermediate metric that links speed and cost characteristics is scalability – the ability
to increase performance without excessive resource costs. There are three types of scalability: by the
bandwidth, by the pattern set size, and by the pattern length.

7.3. Comparison of the main approaches to the building of reconfiguring
matching modules
   The analysis of numerical researches from all over the world shows that when creating NIDS the
most effective and widely applied are these three approaches which use the following technical
solutions based on the corresponding technologies:
    content-addressable memory (CAM) based on digital comparators (DC) [21], [22], [23], [24],
        [25] and [26];
    Bloom filter (BF) based on hash-functions (HF) [27], [28], [29], [30], [31] and [32];
    Aho–Corasick algorithm (AC) on finite automata (FA) [14], [33], [34], [35], [36], [37] and
        [38].
   Each of these approaches has its own pros and cons. And none of these fully meets the requirements
for the reconfigurable signature-based security systems. The lack of a leading direction that would
outperform competitive solutions in all respects makes the developers offer numerous modifications of
the main approaches, use diverse techniques and technical solutions trying to overcome their
shortcomings.
   Let's consider briefly these technologies, approaches and its modifications, techniques and solutions
for every of three mentioned directions.

        7.3.1. Approach based on content-addressable memory and digital
             comparators
    Content-addressable memory is a class of devices that were created for quick code recognition and
perform a function opposite to traditional RAM: it finds the location of data in the memory device by
its content or reports about their absence. Digital comparators are the fast-acting basis of CAM [39].
    The direct solution for detecting the matching of input symbols with patterns is a set of digital
comparators, each of which compares the input byte with a predetermined symbol [23], [24]. Let's call
such scheme the Basic scheme on CAM or the BsCAM scheme (Figure 3). Just the input data fed, the
BsCAM circuit is able to give the output in one clock cycle. The simplicity and regularity of the
structure are also its advantages. The main disadvantage is the significant consumption of logical
resources of FPGA.




Figure 3: Direct recognition by digital comparators

    Difficulties when implementing on FPGA arise for the following reasons. Firstly, the outputs of
registers built on standard FPGA components are overloaded by a large number of comparators inputs.
Secondly, it is necessary to combine a large number of bit signals with a multi-input AND gate for long
patterns. Both problems are solved by pipelining the fan-out schemes, which leads to an additional
increase in hardware costs [23], [25]. This leads to the poor scalability of the approach by the pattern
set size.
    It is possible to reduce the CAM hardware costs of by the reuse of comparators, which in the extreme
case leads to the allocation of only one DC per character of the alphabet, and their combinations are
formed using delay circuits and the AND gates, as shown in Figure 4. Here DEL(1) and DEL(2) – delay
circuits of one and two cycles, respectively. This solution is called the Decoded Content-Addressable
Memory (DCAM), because the complete set of the DCs forms a decoder [25].
Figure 4: DCAM schematic solution

    A further reducing the complexity of the DCAM-based recognition scheme is possible by using a
partial matching technique where long patterns are broken into shorter parts that are recognized
consistently. It is sufficient to delay only the partial match signal instead of using long delay circuits
for a large number of cycles for long patterns. Figure 5 depicts the recognition scheme of a 31-character
pattern, built on this principle [26]. This solution was named the Recognition Scheme based on partially
decoded CAM or the DpCAM (Decoded partially Content-Addressable Memory) scheme.




Figure 5: DpCAM schematic solution

    A more detailed study shows that the use of the cost reduction techniques described above in the
case of their parallel connection (in the ParCAM scheme) leads to even higher resource savings than
linear growth [23]. Thus, the schemes on the DC have a good scalability by the bandwidth.
    There is also a technique for reducing costs by using non-byte data processing, which is implemented
by Hbc, HbcDCAM and ParHbcDCAM schemes on half-byte comparators [40], as well as the BCAM
scheme, which is built using binary decision diagram [41].
    The fact that the input data content does not affect the operation of the DC-based CAM recognition
scheme implements the bandwidth predictability functional efficiency indicator and makes security
system invulnerable to external attacks, which is another functional efficiency indicator.
    As can be seen from the considered solutions, when creating recognition tools on DCs, the
information about the patterns is actually "wired" into the hardware circuit, which makes it impossible
to dynamically reconfigure and fulfill the selective recognition.
    The results obtained by examining the features of the recognition schemes based on CAM and DC
are summarized in Table 5 below (see the section 7.4. "Comparison of the approaches").
        7.3.2. Approach based on Bloom filter and hash-functions
   The Bloom filter is an abstract device that allows detecting a match of a fragment of a given bit
sequence with a sample from the dictionary [27]. BF consists of two key components (Figure 6): a set
of K units that calculate the hash functions h1(x), h2(x), …, hK(x), and an array of M one-bit memory
cells (component Rg in the figure). In the initial state, this bit register (BR) is filled with zeros [31].




                             a                                     b
Figure 6: The principle of operation of the Bloom filter: a – programming; b – recognition

    At the programming stage of the Bloom filter, each of the n elements of the pattern dictionary (W-
bit length) is sequentially fed to the inputs of the hash function units. The output values are interpreted
as addresses (bit numbers) of BR and correspondent cells are filled with ones.
    While functioning, a fragment of the input stream of symbols (also of W bits length) is fed to BF
input, and the values of all K hash-functions are calculated as well. If all the cells of BR pointed by the
hash-functions outputs contain ones, it is assumed with certain probability that the input combination
of characters matches one of the patterns used while the BF programming process. But if at least one
hash-function points to a zero value, no match situation is guaranteed. Thus, BF operates with some
probability of false positive error, but without false negative errors [32].
    BF saves memory resources. The size of the pattern set does not directly affect the size of the BR:
adding new patterns to existing ones only increases the probability of a recognition error, but does not
increase the amount of memory required. The number and complexity of hash-functions, and therefore
hardware consumption and performance, also do not depend on the volume of the pattern set with this
approach. The pattern length also does not affect directly the amount of memory resources or
performance. That is, BF scales well in two ways: by pattern set size and by pattern length. Indeed,
even very long strings after conversion with hash-functions require the same K cells of BR to be stored.
    Unfortunately, the Bloom filter has an important drawback inherent in all hash-based solutions: the
size of the input character sequence to be analyzed must exactly correspond to the given set of hash-
functions. That is, one BF is able to recognize patterns of only the same length. To overcome this
shortcoming, it is necessary to build a structure containing several BF in order patterns of different
lengths can be used [29]. This reduces the cost-effectiveness of the Bloom filter approach, but not the
speed indicators.
    Difficulties when implementing BF on FPGA arise primarily because several hash-function
generators need to simultaneously access the bit register Rg (Figure 6). When implementing the latter
on a single memory device, collisions will occur. The solution is to split the BR into several memory
devices depending on the number of hash-functions. This requires, firstly, distributing the outputs of
the hash-function units between the corresponding RAM devises, and secondly, imposing additional
restrictions on the functionality of these units, so that output range of every unit fit the reduced size of
memory device. It has been proven that such constraints not significantly increase the false positive
error probability of Bloom filter [29].
    In the general case, several BRAM blocks are required to create a BR. The corresponding circuit is
called a Full-Size Bloom Filter or LBF (Large Bloom Filter). But in most practical applications for
security system the so-called mini-BF circuit, which are part of the LBF and contain only one block of
BRAM, can be used for the matching module construction. Let's call such decision the Simplified Bloom
filter or the SBF (Simplified Bloom Filter) scheme [30].
    In order to increase the speed, BF can be combined into a parallel structure similar to the ParCAM
scheme. But unlike DC, the properties of Bloom filter do not allow to achieve a sublinear law of cost
growth – the hardware costs of a parallel circuit are strictly proportional to the achieved acceleration,
i.e. BF bandwidth is not scaled as well as CAM-based circuits.
    The classic scheme of the Bloom filter does not allow on-the-run removing patterns added during
programming. If necessary, developer can use a Bloom filter with counters, or CBF (Counting Bloom
Filter) scheme [28]. This solution allows dynamic reconfiguration without stopping the security system.
    The need to fulfill the procedure of refining the results due to BF's system error, which is slower
than recognition, leads to unpredictability of bandwidth, and makes the Bloom filter vulnerable to
attacks on security systems [30], [31].
    The results of examining properties of the recognition schemes based on BF are also summarized in
Table 5.

        7.3.3. Approach based on Aho–Corasick algorithm and finite automata
    The mathematical apparatus of finite-state machines or finite automata (FA) has been used
successfully to create computer systems for a long time [42]. A special class of FAs – so-called
classifiers is widely used when building signature-based security systems that perform multi-pattern
string matching [43]. Such a machine outputs an active signal only if one of the predefined sequences
of symbols is received at its input. When this situation occurs, the classifier goes into one of the so-
called acceptable states [1]. While transitions are performed between unacceptable states, there are no
signals at the output of a FA–classifier.
    The Aho–Corasick algorithm is an example of a tool that, unlike many known single string
recognition algorithms, detects several samples in the input data simultaneously [33]. The essence of
the algorithm is in creating an FA according by certain rules, which recognizes the desired patterns
while operating. Let's call such FA the Aho–Corasick finite automaton (AC-FA).
    The theory and practical experience of using AC-FA is widely presented in the literature [14], [34],
[35], [36], [37] and [38]. However, for the effective building AC-FA with help of FPGAs, special
attention needs to be paid to the finite automaton transition function. Analysis of the problem gives an
understanding of the need to distinguish following four types of transitions in such machines: direct or
goto transitions, cross, failure and restartable transitions. The latter type, strictly speaking, belongs to
cross transitions, but its separation into an individual type allows increasing the effectiveness of FPGA-
based AC-FA realization due to some peculiarities of technical implementation of such machines [44].
    A generalized structure of the hardware implementation of AC-FA is shown in Figure 7. It based on
a memory unit (MU) in which the machine transition table is stored [36], [38]. Each MU cell contains
a number of the next state of automaton and a match vector (MV). A character code from the input
stream is concatenated with the value of the next state number retrieved from memory unit. The
obtained value is fed to the MU address input and selects the appropriate record. MVs contain ones in
the positions denoting the corresponding pattern. The control unit (CU) controls the operation of the
machine. If the RA onboard memory (external to the FPGA chip) is used when implementing the circuit,
we will call this solution the Basic AC-FA Scheme with External Memory or ACRAM scheme. A
solution based on BRAMs (FPGA internal memory blocks) will be called Basic AC-FA Scheme with
Block Memory or ACBRAM scheme.
    The most important advantage of AC-FA solutions is its bandwidth independence from the signature
database size and the pattern’s properties, in particular, their lengths, which results in predictability of
bandwidth. Typically, the FA takes one character from the input stream for each clock cycle. But in
practice, if the size of transition table is too large and it is necessary to use external memory, each access
to the MU can take several cycles. Furthermore, external memory is slower than internal. Thus, the
downside of this advantage is the relatively low speed, which is also difficult to increase, which means
poor AC-FA scalability by the bandwidth.




Figure 7: The structure of the typical Aho–Corasick finite automaton

    The main difficulties that arise when implementing AC-FA on the reconfiguration platform concerns
the organization of efficient data exchange with MU. When constructing an ACBRAM scheme there is
a possibility to synthesize a memory device of almost arbitrary structure due to the flexibility of BRAM
blocks configuring process. But in the case of ACRAM the problem is escalated because the onboard
memory of RA has a fixed structure not convenient in the general case for interaction with CU [45].
    As we can see, AC-FA requires a small amount of logic resources to create a control unit, registers
and MU controller. However, the size of memory can be very large, i.e. the approach is characterized
by high memory consumption. Increasing the number of patterns leads to "explosive" growth of MU
resources, which means very poor scalability by pattern set size. Therefore, the majority of researchers'
efforts when implementing AC-FA hardware in security systems are aimed at reducing memory
resources and moderating their rapid growth.
    Among the numerous modifications of AC-FA there are solutions that offer different ways to encode
the transition table. The matches of not only prefixes but also infixes of patterns are also analyzed [36],
[38]. However, in most modifications, memory reduction is achieved by handling certain types of finite
automaton transitions [35], [44]. In a number of works the technique of AC-FA pipelining has been
offered and developed [37]. The linear conveyor of H steps allows eliminating all the cross transitions
in AC-FA structure from an initial state to level H [14].
    A certain part of the researchers' efforts was aimed at improving the not very good speed efficiency
of the approach. Because AC-FA, like any finite state machine, processes input information strictly
sequentially, character by character, attempts have been made to speed up the AC algorithm by
processing more than one character per clock cycle [46]. Since it is not known in advance with which
offset the specific pattern will appear in the input data, it is necessary to organize the parallel operation
of the appropriate number of identical machines (modifications ParACRAM and ParACBRAM).
    Another solution related to non-byte data processing. The so-called Bit-split scheme was proposed
to replace a FA that processes 8-bit characters by several identical parallel sub-machines that analyze
1, 2 or 4 bits [34]. By reducing the size of character codes, the size of the alphabet is significantly
reduced. The Bit-split solution is the opposite of multi-character per clock cycle recognition schemes;
however, it is not aimed at speeding up but at reducing resources.
    The use of external RAM allows a simple implementation of the dynamic updating of the AC-FA
circuitry by overwriting FPGA contents, i.e. to completely change the algorithm of the machine without
stopping the security system. Regarding the indicator of selective recognition, it is implemented by
forming in the MU several transition tables for different submachines.
    The results of the study of FA-based recognition schemes properties that implement the AC
algorithm are also summarized in Table 5.

7.4.              Comparison of the approaches
   The comparative analysis of three approaches: the use of associative memory based on digital
comparators, Bloom filter based on hash functions and the Aho–Corasick algorithm based on finite
automata, as well as numerous techniques for their improvement allows to thoroughly assessing the
advantages and disadvantages of each of them in the sense of the performance indicators mentioned
above. The Table 5 shows the results of this comparative analysis.

Table 5
The comparison of the main approaches to the construction of the reconfigurable multi-pattern
matching modules
                                                             Approach
                  Parameter
                                              CAM           Bloom Filter     Aho–Corasick
 Logic costs                                  ---                 +              +++
         distributed                          ---                 +              +++
  Memory
   costs




         BRAM                                 +++                 +              ---
         onboard                              +++               +++              ---
 Speed                                        +++                 +                -
 Speed predictability                         +++               ---              +++
         the ability to counter attacks       +++               ---               +
  parameters
  Functional




         targeted at the defense system
         dynamic update                       ---                 +              +++
         selective recognition                ---                 +              +++
         ability to work in NIPS mode         +++                 +                -
                   by bandwidth                               +++                     +                     -
    Scalability




                   by pattern set size                        ---                    +++                   ---
          by pattern length                                    -                     +++                   +++
 Ability to use redundancy of pattern set                      +                     ---                   +++
 A significant drawback that negates the                  Excessive                Fixed             "Explosive"
 main advantages of the approach                        resource costs         pattern length       memory growth
Notation: "+" – medium advantage; "+++" – significant advantage; "-" – medium drawback; "- - -" – significant drawback.

    As we can see, none of the researched approaches shows obvious advantages over others, each has
its own positive features and disadvantages. And an advantage according to one of the indicators often
turns into a disadvantage with respect to another. As a result, each of the approaches has a significant
drawback, which, in fact, negates the main advantages of this approach.
    For instance, CAM based on DCs and their modifications provide maximum performance, but more
expensive than other approaches in terms of hardware resources and electricity consumption. They also
lose in scaling. The Bloom filter is more scalable and effective by resources, but imposes restrictions
on the length of the patterns. It also requires additional costs to check the obtained results due to its
inherent false positive errors probability when matching. Finite automata are modest in terms of logic
consumption, provide stable but relatively low bandwidth, are difficult to build, and lead to an
"explosive" increase in memory costs for large pattern dictionaries.

8. Conclusion
    Multi-pattern string matching is a fundamental technology used in signature-based security systems
like network intrusion detection systems, antivirus, anti-spam, worm-containment and so on. This task
is computational-intensive, and traditional software solutions do not keep up with speeds of modern
networks because the number and sophistication of attacks against the computer infrastructure are
increasing constantly. Hardware accelerators are able to overcome this bottleneck. There are several
types of such devices known as acceptable solution: specialized coprocessors based on ASIC, ternary
content addressable memory, accelerators based on multi-core processors, graphics accelerators and
reconfigurable accelerators based on FPGA. The latter best meet the information security area
requirements, outperforming competing platforms in flexibility and cost-effectiveness.
    The most promising approaches to the construction of reconfigurable matching modules of
signature-based security systems are: content addressable memory based on digital comparators, Bloom
filter based on hash-functions and Aho–Corasick algorithm implemented in the form of a finite
automaton. Numerous researchers have also considered a lot of modifications and improvements to the
basic solutions. But none of these approaches has a significant advantage over others in terms of
efficiency.
    The contribution of this work is as follows.
    A comprehensive survey on the known hardware solutions in this area for the past few years has
made. Most effective platforms, approaches, techniques and customized solutions to realize the multi-
pattern matching task in hardware are analyzed, systematized and classified.
    A comparative analysis of main hardware platforms for security systems acceleration has provided.
The most promising direction using programmable logic is more closely explored.
    Specific features of different approaches to the construction of FPGA-based matching schemes in
terms of resource costs, speed/throughput parameters, functional characteristics, as well as scaling
parameters are formulated and thoroughly investigated.
    In fact, a developer guide aiming to facilitate choosing the most appropriate solution for a specific
application of signature-based high performance security systems is proposed.

Acknowledgment
   This work was supported in part by the Informatization Program of the National Academy of
Sciences of Ukraine for 2020-2024.

References
[1] B. Smyth, Computing Patterns in Strings, Pearson Addison Wesley, Essex, 2003.
[2] H. Chen, Y. Chen, D. H. Summerville, A Survey on the Application of FPGAs for Network
    Infrastructure Security, IEEE Communications Surveys and Tutorials, Article volume 13, 4 (2011)
    541-561. doi:10.1109/surv.2011.072210.00075.
[3] D. Gusfield, Algorithms on Strings, Trees, Sequences: Computer Science and Computational
    Biology, Cambridge University Press, Cambridge, 1997.
[4] T. AbuHmed, A. Mohaisen, D. Nyang, Deep Packet Inspection for Intrusion Detection Systems:
    A Survey, Computer Networks, volume 57, 18 (2012) 4047-4064.
[5] Y. N. Lin, Y. C. Chang, Y. D. Lin, Y. C. Lai, Resource allocation in network processors for
    network intrusion prevention systems, Journal of Systems and Software, Article volume 80, 7 (July
    2007) 1030-1036. doi:10.1016/j.jss.2007.01.032.
[6] F. Yu, R. H. Katz, T. V. Lakshman, Gigabit rate packet pattern-matching using TCAM, in:
    Proceedings of the 12th IEEE International Conference on Network Protocols, 2004, pp. 174-183.
    doi:10.1109/icnp.2004.1348108.
[7] C. C. Xu, S. H. Chen, J. S. Su, S. M. Yiu, L. C. K. Hui, A Survey on Regular Expression Matching
     for Deep Packet Inspection: Applications, Algorithms, Hardware Platforms, IEEE
     Communications Surveys and Tutorials, Article volume 18, 4 (2016) 2991-3029. doi:10.1109/
     comst.2016.2566669.
[8] Intel® Xeon Phi™ Processor 7290, 2016. URL: https://ark.intel.com/content/www/ru/ru/ark/
     products/95830/intel-xeon-phi-processor-7290-16gb-1-50-ghz-72-core.html.
[9] Y. H. E. Yang, V. K. Prasanna, Optimizing regular expression matching with SR-NFA on multi-
     core systems, in: Proceedings of the Parallel Architectures and Compilation Techniques, PACT,
     2011, pp. 424-433. doi:10.1109/PACT.2011.73.
[10] J. D. Owens et al., A survey of general-purpose computation on graphics hardware, in: Proceedings
     of the Computer Graphics Forum, Review volume 26, no. 1, 2007, pp. 80-113. doi:10.1111/j.1467-
     8659.2007. 01012.x.
[11] C. Maxfield, The Design Warrior's Guide to FPGAs: Devices, Tools and Flows, Elsevier Science
     & Technology Books, Oxford, UK, 2004.
[12] FPGA Boards and Systems, 2021. URL: http://www.fpga-faq.com/FPGA_Boards.shtml.
[13] R. Abdulhammed, M. Faezipour, K. M. Elleithy, Network Intrusion Detection Using Hardware
     Techniques: A Review, in: Proceedings of the IEEE Long Island Systems, Applications and
     Technology Conference (Lisat), 2016, p. 7.
[14] W. Jiang, Y. H. E. Yang, V. K. Prasanna, Scalable multi-pipeline architecture for high performance
     multi-pattern string matching, in: Proceedings of the 24th IEEE International Parallel and
     Distributed Processing Symposium, IPDPS 2010. doi:10.1109/IPDPS.2010.5470374.
[15] V. Jyothi, S. K. Addepalli, R. Karri, DPFEE: A High Performance Scalable Pre-Processor for
     Network Security Systems, IEEE Transactions on Multi-Scale Computing Systems, Article
     volume 4, 1 (January-March 2018) 55-68. doi:10.1109/tmscs.2017.2765324.
[16] S. Hilgurt, Reconfigurable accelerators. Analytical overview, Electronic modeling, Article
     volume 35, 4 (2013) 49-72, (in Russian).
[17] J. M. Korostil, S. Y. Hilgurt, Principles of building FPGA-based network intrusion detection
     systems, Modeling and information technologies. Collection of scientific works of PIMEE NAS
     of Ukraine, 57 (2010) 87-94, (in Russian).
[18] T. Katashita, Y. Yamaguchi, A. Maeda, K. Toda, FPGA-based intrusion detection system for 10
     Gigabit Ethernet, IEICE Transactions on Information and Systems, Article volume E90D, 12
     (December 2007) 1923-1931. doi:10.1093/ietisy/e90-d.12.1923.
[19] S. Hilgurt, Constructing optimal reconfigurable pattern matching tools for information security,
     Ukrainian Scientific Journal of Information Security, volume 25, 2 (2019) 74-81, (in Ukrainian).
     doi:10.18372/2225-5036.25.13824.
[20] S. Hilgurt, Parallel combining different approaches to multi-pattern matching for FPGA-based
     security systems, Advances in cyber-physical systems, volume 5, 1 (2020) 8-15.
[21] S. A. Guccione, D. Levi, D. Downs, A reconfigurable content addressable memory, in:
     Proceedings of the Parallel and Distributed Processing, vol. 1800, 2000, pp. 882-889.
[22] Y. H. Cho, S. Navab, W. H. Mangione-Smith, Specialized hardware for deep network packet
     filtering, in: Proceedings of the Field-Programmable Logic and Applications, Proceedings:
     Reconfigurable Computing Is Going Mainstream, volume 2438, 2002, pp. 452-461.
[23] I. Sourdis, D. Pnevmatikatos, Fast, large-scale string match for a 10Gbps FPGA-based network
     Intrusion Detection System, in: Proceedings of the Field-Programmable Logic and Applications,
     Proceedings: Reconfigurable Computing Is Going Mainstream, volume 2778, 2003, pp. 880-889.
[24] Y. H. Cho, W. H. Mangione-Smith, Deep packet filter with dedicated logic and read only
     memories, in: Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom
     Computing Machines, 2004, pp. 125-134. doi:10.1109/fccm.2004.25.
[25] I. Sourdis, D. Pnevmatikatos, Pre-decoded CAMs for efficient and high-speed NIDS pattern
     matching, in: Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom
     Computing Machines, Proceedings, 2004, pp. 258-267. doi:10.1109/fccm.2004.46.
[26] I. Sourdis, D. N. Pnevmatikatos, S. Vassiliadis, Scalable multigigabit pattern matching for packet
     inspection, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Article
     volume 16, 2 (February 2008) 156-166. doi:10.1109/tvlsi.2007.912036.
[27] B. H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors, Communications of
     the ACM, Article volume 13, 7 (1970) 422-426. doi:10.1145/362686.362692.
[28] L. Fan, P. Cao, J. Almeida, A. Z. Broder, Summary cache: A scalable wide-area Web cache sharing
     protocol, IEEE – ACM Transactions on Networking, Article volume 8, 3 (June 2000) 281-293.
     doi:10.1109/90.851975.
[29] S. Dharmapurikar, P. Krishnamurthy, T. S. Sproull, J. W. Lockwood, Deep packet inspection using
     parallel bloom filters, IEEE Micro, Proceedings Paper volume 24, 1 (January-February 2004) 52-
     61. doi:10.1109/mm.2004.1268997.
[30] S. Dharmapurikar, M. Attig, J. Lockwood, Design and Implementation of a String Matching
     System for Network Intrusion Detection using FPGA-based Bloom Filters, All Computer Science
     and Engineering Research, Report Number: WUCSE-2004-12, 2004-03-25, Washington
     University in St. Louis, 2004.
[31] J. Harwayne-Gidansky, D. Stefan, I. Dalal, FPGA-based SoC for Real-Time Network Intrusion
     Detection using Counting Bloom Filters, in: Proceedings of the IEEE SoutheastCon, 2009.
[32] S. Geravand, M. Ahmadi, Bloom filter applications in network security: A state-of-the-art survey,
     Computer Networks, Article volume 57, 18 (December 2013) 4047-4064. doi:10.1016/
     j.comnet.2013.09.003.
[33] A. V. Aho, M. J. Corasick, Efficient String Matching: An Aid to Bibliographic Search,
     Communications of the ACM, vol. 18, 6 (1975) 333-340. doi:10.1145/360825.360855.
[34] L. Tan, T. Sherwood, A high throughput string matching architecture for intrusion detection and
     prevention, in 32nd International Symposium on Computer Architecture, Madison, WI, Jun 04-08
     2005, LOS ALAMITOS: IEEE Computer Soc, 2005, pp. 112-122.
[35] J. Lunteren, High-performance pattern-matching for intrusion detection, in: Proceedings of the
     25th IEEE International Conference on Computer Communications, volumes 1-7, 2006, pp. 1409-
     1421.
[36] C. Lin, Y. Tai, S. Chang, Optimization of pattern matching algorithm for memory based
     architecture, in: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking
     and communications systems, Orlando, Florida, USA, 2007.
[37] D. Pao, W. Lin, B. Liu, Pipelined architecture for multi-string matching, IEEE Computer
     Architecture Letters, volume 7, 2 (2008) 33-36. doi:10.1109/L-CA.2008.5.
[38] C. H. Lin, S. C. Chang, Efficient Pattern Matching Algorithm for Memory Architecture, IEEE
     Transactions on Very Large Scale Integration (VLSI) Systems, Article volume 19, 1 (January
     2011) 33-41. doi:10.1109/tvlsi.2009. 2028346.
[39] K. Pagiamtzis, A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures:
     A tutorial and survey, IEEE Journal of Solid-State Circuits, Article volume 41, 3 (March 2006)
     712-727. doi:10.1109/jssc.2005.864128.
[40] J. Huang, Z. K. Yang, X. Du, W. Liu, FPGA based high speed and low area cost pattern matching,
     in: Proceedings of the IEEE Region 10 Conference (TENCON 2005), New York, 2006, pp. 2693-
     2697.
[41] S. Yusuf, W. Luk, Bitwise optimised CAM for network intrusion detection systems, in:
     Proceedings of the International Conference on Field Programmable Logic and Applications, FPL,
     Tampere, 2005, pp. 444-449. doi:10.1109/FPL.2005.1515762.
[42] H. R. Lewis, C. H. Papadimitriou, Elements of the Theory of the Computations, 2nd ed. Prentice-
     Hall 1998.
[43] R. M. Keller, Computer Science: Abstraction to Implementation. Harvey Mudd College, 2001.
[44] T. Song, W. Zhang, D. S. Wang, Y. B. Xue, A memory efficient multiple pattern matching
     architecture for network security, in: Proceedings of the 27th IEEE Conference on Computer
     Communications (Infocom), Volumes 1-5, 2008, pp. 673-681.
[45] Q. B. Wang, V. K. Prasanna, Multi-Core Architecture on FPGA for Large Dictionary String
     Matching, in: Proceedings of the 17th IEEE Symposium on Field Programmable Custom
     Computing Machines, 2009, pp. 96-103. doi:10.1109/fccm.2009.43.
[46] H. Lu, K. Zheng, B. Liu, X. Zhang, Y. H. Liu, A memory-efficient parallel string matching
     architecture for high-speed intrusion detection, IEEE Journal on Selected Areas, Communications,
     Article volume 24, 1 (October 2006) 1793-1804. doi:10.1109/jsac.2006.877221.