<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Multi-Pattern Matching with Online User Requests for Determining Browser Capabilities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ignat Blazhko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetiana Kovaliuk</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliya Kobets</string-name>
          <email>nmkobets@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tamara Tielysheva</string-name>
          <email>telyshevatamara@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Borys Grinchenko Kyiv University</institution>
          ,
          <addr-line>18/2 Bulvarno-Kudriavska str., Kyiv, 04053</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>37, ave. Peremohy, Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>64/13, Volodymyrska str., Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>One of the most effective ways to get information about Internet users is to analyze User-Agent request header, because it contains information about the user's browser. Quick methods are needed for multi-pattern matching to instances of User Agents. The article considers the algorithm of patterns matching with input strings to determine the capabilities of the browser. The input strings are the headers of User-Agent queries, and strings with special wildcards represent search patterns. The developments in the field of solving the problem of comparing search templates with input lines are considered and analyzed. A new algorithm for solving the problem is proposed. The steps of the algorithm are given and the time complexity of the algorithm for the problem is analyzed. Comparisons of the performance of the received program with analogues are made and the advantage of the offered approach is shown. The program implementation was compared with the most prominent analogues and the advantage of the proposed approach has been shown in terms of execution speed.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Pattern matching</kwd>
        <kwd>wildcards</kwd>
        <kwd>browser capabilities</kwd>
        <kwd>User Agent request</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays, when many companies operate on the Internet, the need to identify user information is
very acute. One of the most effective ways to obtain such information is to analyze the User-Agent
request HTTP header, which identifies the program making a request to the server and requesting access
to the website. The User-Agent request header contains specific information about the hardware and
software of the device making the request. This information typically includes information about the
browser, web visualization mechanism, operating system, and user’s device, including iPhone, iPad or
other mobile device, tablets, desktop computers. Web servers use User-Agent to maintain different web
pages in different browsers in case of not adaptive web design, display different content for different
operating systems, collect statistics that reflect the browsers and operating systems used by visitors.
Web crawling bots also use User-Agent.</p>
      <p>
        The data specified in the User-Agent request HTTP header [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] describes browsers (Chrome, Firefox,
Internet Explorer, Safari, BlackBerry, Opera), search engines (Google, Google Images, Yahoo), game
consoles (PlayStation 3, PlayStation Portable, Wii, Nintendo Wii U), offline browsers (Offline
Explorer), links (Link Checker, W3C-check link), electronic readers (Amazon Kindle), validators,
cloud platforms, e-mail libraries, scripts. By User-Agent, you can define functions that are supported
by a web browser, such as JavaScript, Java Applet, cookie, VBScript, and Microsoft ActiveX.
      </p>
      <p>
        The User-Agent string is used to match the content when the source server selects the appropriate
content or operating parameters to respond to the received user request. Thus, the concept of adapting
content to the needs of users is realized. The User-Agent string is one of the criteria by which search
engines can be excluded from accessing certain parts of the website using the robot exclusion standard
(according to the robots.txt file). Information about users has obvious applications in the advertising
sector, namely in cases where the user's device can operate as a criterion for determining the user's
affiliation to the target market. Web analysis is an equally important use case. Due to the availability of
information about customers, further analysis can significantly improve the decision about the
published content; increase the conversion of the site by turning site visitors into real customers.
Analyzing User-Agent request headers that are constantly updated also means that businesses will be
aware of the emergence of new devices that visit their resource, and therefore related problems will be
identified at an early stage. However, parsing the User-Agent request header is not a trivial task. Many
software libraries are used to solve this problem by balancing between accuracy and speed of obtaining
results. There are services that provide definition and analysis of the User-Agent request header, in
particular, deviceatlas.com. The accuracy problem can be solved by using open lists of User-Agent
templates, which are constantly updated. There are more than 62 million User-Agent strings [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], so
existing solutions cannot quickly work with all of them. The speed of response to the client is another
important indicator for business. Therefore, quick methods of analyzing these patterns are needed.
These are methods for matching a large number of search patterns to User-Agent instances. The
development of algorithms based on which libraries will be created to determine the capabilities of
browsers is an urgent and necessary task for any web service.
      </p>
      <p>
        The purpose of the research is to develop an algorithm for multi-pattern matching with wildcards to
be able to detect browser capabilities based on browscap.org data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The library should work faster
than its alternatives − the speed should be comparable to libraries that use simplified (inaccurate)
methods to detect browser capabilities. Browscap is an open project dedicated to collecting and
disseminating a database of browser capabilities, actually a set of different files describing browser
capabilities. Browscap has built-in support for using these files.
      </p>
      <p>To achieve this goal we need to solve the following tasks:
• figure out browscap.org data structure and similarities between different user agents of the same
family;
• review and analyze existing studies that solve the problem of multi-pattern matching with
wildcards
• develop an algorithm for multi-pattern matching with the best known lower bound time
complexity for determining the browser capabilities;
• develop software implementation of the algorithm and compare the speed of the proposed
algorithm with analogue libraries, which give approximate results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Statement</title>
      <p>Let the set of search patterns P = { p1, p2 ,..., pm } be input to the search engine. Each search pattern
pi , i = 1, m is a string of characters that belong to the alphabet Σ . Each pattern contains special
characters "?" and "*", that do not belong to this alphabet, and are called wildcards. The character "?"
corresponds to any single character in the alphabet Σ , the character "*" corresponds to any string of
characters in the alphabet Σ , including the empty string. The search pattern pi , i = 1, m can be
represented as a string pi = ci1ki1ci2ki2...cij kij * , where cij is the j -th wildcard character for the i -th
pattern, kij is a keyword from a dictionary of unique keywords W , that is, a string of characters in the
alphabet Σ . There is an additional condition for the wildcard character ci1 : it can be either skipped or
equal to the character "*".</p>
      <p>Input requests are in the form of a string S = {si }, i = 1, n , consisting of n characters in the alphabet
Σ . It is necessary for each input string S = {si }, i = 1, n to find all the search patterns from the set P
that matches to the query string, i.e. get a set of patterns A = { pa1 , pa2 ,..., paanslen }, where paS denotes
the search pattern for the S -th request. The number of requests is not limited. Requests must be
answered as they are received.</p>
      <p>One of the features of this task is that the set of search patterns is very large and is in the millions of
copies. Query string sizes typically range from 100 to 200 characters.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Related Works</title>
      <p>The main tasks of text string analysis that arise in the development of information verification tools
in information retrieval systems include:
• the task of matching text strings;
• the problem of calculating the distance between text strings;
• the problem of fuzzy match text strings;
• the task of finding the longest repeating text substring.</p>
      <p>The algorithms for text string analysis are presented in the works of domestic and foreign
researchers, in particular Stephen G.A., Knuth D.E., Morris J.H., Pratt V.R., Karp R.M., Rabin, M.O.,
Boyer R.S., Moore J.S., Fischer M.J., Hirschberg D.S., Hunt J.W., Szymanski T.G., Landau G.M.,
Vishkin U. Hamming R.W., Levenshtein V.I.</p>
      <p>
        The task of determining the browser capabilities based on the analysis of the User-Agent HTTP
header is reduced to the task of matching text strings, which are used as search patterns and incoming
requests. Algorithms for matching search patterns with input strings have many important applications
and are used in antivirus software, systems for detecting various types of attacks and intrusion
prevention [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], in-text compression, text search, data mining, programming grammar checking rules [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
etc. For example, different types of attacks can be defined using rules, which are regular expressions
that match a set of possible strings. Algorithms for working with regular expressions require
preprocessing which is very memory-intensive and time-consuming. An example of the application of the
algorithm for matching search patterns with input strings is the study of DNA sequences, where
wildcards are used as a replacement for some components of protein sequences.
      </p>
      <p>Research on the problem of comparing search patterns with text can be classified in the following
areas:
• the case of one pattern and one string;
• the case of patterns with a fixed number of wildcards or additional restrictions on the number
of patterns;
• the case of patterns with the possibility of a flexible task of wildcards, for example, the task of
the minimum and maximum break length.</p>
      <p>
        In the general case, the task of matching text strings requires localizing all occurrences of a pattern
in the text. In paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] the widely used multiple string patterns matching algorithms have been analyzed
and discussed. The main algorithms for matching text strings include the Knuth-Morris-Pratt [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
Rabin-Karp [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] algorithms, the Boyer-Moore algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and its variations. One of the most efficient
algorithms for finding substrings in a string is the Aho-Corasick algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which groups all strings
from a dictionary into a prefix tree and then converts it into a finite state machine in which the search
is performed. The operating time is proportional O(m + n + k ) , where n is the length of the pattern
string, m is the total length of the dictionary strings, k is the length of the answer string, that is, the
total length of occurrences of words from the dictionary into the pattern string. The Rabin-Karp search
algorithm uses hashing to find words from the dictionary in the text. For text length n and m strings
with length | P | , execution time is O(n+ | P |) a memory cost O(m) .
      </p>
      <p>
        The Commentz-Walter algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is used to search for multiple patterns at text, which combines
the ideas of Boyer-Moore algorithms and Aho-Corasick. The time complexity of the search stage is
O(nlmax ) , where n is the length of the string, lmax is the length of the longest string from the
dictionary. The Wu-Manber string-matching algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], based on the idea of the Boyer-Moore
algorithm. The Wu-Manber algorithm is considered the fastest multi-pattern string-matching algorithm
in practice. The main difference is that the algorithm does not consider individual characters, but blocks
of characters of a given length. The disadvantage of this algorithm is the slowing down of its speed with
the increasing number of strings in the dictionary.
      </p>
      <p>
        The Fischer-Paterson algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for the string matching is based on the Fast Fourier Transform
(FFT) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and has a time estimate of complexity O(n log m logσ ) , where σ is the size of the alphabet
Σ . Indyk's randomized algorithm [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] uses FFT to calculate the convolution between the pattern and
text with the time estimate O(n log n) . In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the optimal algorithm with time complexity
O(n + m + occ) is proposed, where occ is the number of pattern’ occurrences in the text. Algorithms
by K. Barton and K. Iliopoulos [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] use wildcards either only in the text or only in the patterns. The
where g is the number of wildcards in the pattern.
      </p>
      <p>
        There are matching algorithms, which use bit parallelism to speed up the algorithm for text search
patterns. Operations on the bits of the same machine word are performed in parallel. An example of
using bit parallelism in search algorithms is the bitap algorithm (also known as the Shift-Or, Shift-And
or Baeza-Yates–Gonnet algorithm) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. This algorithm is an approximate string-matching algorithm.
The approximate equality of a substring to a given pattern is determined in terms of the Levenshtein
distance. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], an algorithm for matching a pattern with a text, based on a directed acyclic word graph
(DAWG) is described. A directed acyclic graph represents the suffixes of a given string in which each
edge is labelled with a character. The characters along a path from the root to a node are the substring,
which the node represents. An example of algorithms with several search patterns that have a fixed
number of wildcard characters is an algorithm based on the Hamming distance [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] between bit vectors.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref20 ref21 ref22">20-22</xref>
        ] algorithms have proposed that use a finite automaton according to the Aho-Сorasick
log(k)
algorithm and dynamic marking of ancestor nodes. The time complexity is O((| P | + | t |)
loglog(k)
limitations of these algorithms are using one copy of the text, only wildcards of the form "*", and costly
operations of construction and modification of automata. Algorithm [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] works with wildcards that
specify a range of characters. It is based on the suffix tree. The estimation of the time complexity of
) . The

this algorithm is O n + m

n l−1 
      </p>
      <p>∑Gi  , where n is the length of the input text, m is the number of search
| Σ | i=0 
patterns, | Σ | is the number of characters in the alphabet Σ , Gi is the average length of the i -th range
in the search pattern. The time complexity of this algorithm remains quite large and depends on the
specified ranges of wildcards.</p>
      <p>Thus, the existing methods have a number of disadvantages, with a large number of search patterns,
they work very slowly, poorly scalable for large templates and texts, which makes them unsuitable for
solving the problem.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithm for matching search patterns with incoming query strings</title>
    </sec>
    <sec id="sec-5">
      <title>4.1. Description of the method for solving the problem</title>
      <p>
        To solve the problem of matching search patterns with input strings, an algorithm based on the prefix
tree [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] as a data structure and a finite automaton built on a modified Aho-Corasick algorithm is
proposed. The prefix tree is built based on input patterns according to the compressed prefix tree [25].
Each pattern is represented as a string of keywords and special characters "*" and "?" between them.
Each edge in the prefix tree contains information about the transition keyword and a special character
that precedes it. The tree is built only once when obtaining search patterns and is unchanged during the
further operation of the algorithm. To reduce the memory consumption, each keyword in the set W of
all keywords in the set P of the patterns is assigned a character from the alphabet ∆ , i.e. there is a
bijective mapping such that f :δ j → w j ; g : w j → δ j .
      </p>
      <p>An automaton built according to the Aho-Corasick algorithm is used to quickly search for keywords
in the text of incoming requests. Terminal links are added to the automaton to speed up the search for
keywords that are prefixes to other keywords. This allows you to find all keywords in the text S within
a linear time of the number of the keywords in the text. Links to nodes of the prefix tree will be added
to the final states of the automaton, which may correspond to the beginning of some search pattern.
This will cut off options that are not possible in advance. When a new string S is received, it is scanned
from left to right and at the same time, a set of prefixes of possible search patterns is constructed, using
information about transitions in the prefix tree.
4.2.</p>
    </sec>
    <sec id="sec-6">
      <title>Description of the data structure</title>
      <p>A prefix tree (trie) is a data structure in the form of a tree, in which the path from the root to leaf
defines a string. The strings with the same prefixes have a common path from the root with the length
of that prefix. The prefix tree allows you to store an associative array, the strings of which are the keys.
Unlike binary trees, the key is not stored in the tree leaf. The key value can be obtained by looking at
all parent nodes, each of which stores one or more alphabet characters. The root of the tree is associated
with an empty string. An internal node will contain a key if that key is a prefix of some other key in
that tree. The prefix tree can be compressed to optimize memory usage. Nodes that have only one child
are compressed. In this case, the transitions can contain not one character, but a whole string that
corresponds to the transition.</p>
      <p>Since prefix trees are built in such a way that all keys with a common prefix have common nodes
that correspond to this prefix, this makes it possible to search without having a full key, but only its
prefix. The result will be a set of possible keys that are stored in the tree.
4.3.</p>
    </sec>
    <sec id="sec-7">
      <title>Practical implementation options</title>
      <p>The main difference between the implementations is the way of storing transitions between nodes.
The chosen method depends on what needs to be optimized: runtime, memory consumption, or
something else. For example, for a small alphabet, it may be advisable to store each node in an array in
which the transitions for each of the characters in the alphabet will be recorded. This will make the
access time to the next node as small as possible O(1) , but will increase memory consumption since
not all characters of the alphabet will contain transitions. You can also use a red-black search tree [26],
which will not increase memory consumption and will give a logarithmic access time O(log x) to the
next node. Another alternative is the use of a hash table, which makes it possible to perform an operation
of accessing an element in time O(1) with an asymptotic estimate of the total memory cost with an
ordinary array. Although it is possible that the search operation will be performed on O(x) , which is
influenced by the fullness of the table and the choice of hashing methods.
4.4.</p>
    </sec>
    <sec id="sec-8">
      <title>Construction of the automaton based on the Aho-Corasick algorithm</title>
      <p>The Aho-Korasik algorithm is a string search algorithm that allows you to find the elements of a
certain dictionary in an input string. The algorithm finds a match as it reads the input string and returns
matches found. The algorithm does not need to know the whole string in advance.</p>
      <p>At the beginning of the algorithm, a finite state automaton is constructed based on strings from the
dictionary. This automaton is a prefix tree with additional links, which are called suffix links. Such
links indicate the longest suffix of the current state-string that exists in this automaton. The link
indicates the initial state (root) in the absence of such suffix.</p>
      <p>The behaviour of the automaton can be described by three functions: the transition function, the
failure function, the output function. The transition function for each next input character of the line
indicates in what state the transition needs to be made or informs that there are no such transitions. The
failure function is used if there are no transitions. This function works as follows: while there is no
transition from the state to the transition function, we make transitions by suffix links. If necessary,
these transitions with suffix links are repeated until the initial state is reached. The output function
allows you to check the correspondence of each state of the automaton to a certain string in the
dictionary and returns it as a result.</p>
      <p>Additional links (final or valid) can be added to speed up the search for dictionary strings in the
input string. These links for each state of the automaton will point to another state, the string of which
is the longest suffix of the current state-string that matching to some string in the dictionary. Suffix
references can be pre-calculated during the construction of the automaton. This will allow you to find
quickly all the words in the dictionary that are in the current input string and end with the given character
while processing the next character. If the dictionary is known in advance, the construction of the
automaton can be performed only once. The constructed automaton can be reused by transition to the
initial state for each new input string.
4.5.</p>
    </sec>
    <sec id="sec-9">
      <title>Description of algorithm steps</title>
      <p>The algorithm can be divided into two parts: the algorithm for pre-processing search patterns and
the algorithm for processing incoming query strings. Consider the steps of the proposed algorithm.</p>
      <p>Step 1. Let's build a set of all unique keywords W . For each i -th search pattern pi we add kij to
m
the set W : W = {w j | j ∈ J} = {kij | j = 1,li }.</p>
      <p>i =1</p>
      <p>Step 2. Each keyword w j ∈W is matched with a number δ j to obtain a bijective reflection:
f :δ j → w j ; g : w j → δ j .</p>
      <p>Step 3. Let's build a set Β of converted search patterns, by replacing keywords in pi with characters
of the alphabet
∆ :| ∆ |=| W | and removing the character
ci1
in case of its presence:
Β = {bi = g (ki1)ci2 ...g (kili ) | i = 1, m}.</p>
      <p>Step 4. Based on the set Β , we will construct a prefix tree T as follows. Each prefix g (kij ) will be
considered as an ordinary symbol cij of the alphabet ∆ , if it is equal to "*" or absent. In this case, we
will add an edge with a symbol to this prefix tree. Since the size of the alphabet can be quite large, in
each of the nodes we will store a hash table with a list of edges, which are symbols g (kij ) , as a search
key. This will allow you to quickly make transitions along specific edges, which are symbols of the
alphabet ∆ . If symbol cij equal to "?", we will additionally notice such edges to distinguish them from
ordinary edges.</p>
      <p>Step 5. Let's build a finite automaton according to the Aho-Corasick algorithm based on keywords
from W . For each state that corresponds to a certain keyword w j , we will additionally save number
δ j . Transitions for each of the states by alphabetic characters Σ will be stored in the hash table.</p>
      <p>Step 5.1. Moving along the suffixed links, for all states of the automaton, we will preliminarily
calculate the "final" links termLink to the nearest of the states, from which it is possible to get to the
state corresponding to a certain keyword from W .</p>
      <p>Step 5.2. For states of the automaton matching to a keyword w j such that it is present in any search
pattern at the first position and does not have a wildcard character that precedes it, we will store a link
headLink to the corresponding node in the prefix tree T , which can be found by moving from the root
of T along the edge g(w j ) .</p>
      <p>Step 5.3. For states of the automaton matching to a keyword w j such that it is present in any search
pattern at the first position and has a wildcard character "*" that precedes it, we will additionally store
a link looseHeadLink to the corresponding node in the prefix tree T , which can be reached by moving
from the root T along the edge g (w j ) .</p>
      <p>After performing pre-processing of the search patterns (the first five steps of the algorithm), the
algorithm can accept text analysis requests − input string S . For each individual string, you need to do
the following:</p>
      <p>Step 6. Create an empty array q in which to store a pair of values: the index of string S , and the
links to the trie T node. At the beginning of the operation, the state of the automaton corresponds to the
root node curState = rootState .</p>
      <p>Step 7. Scan the string S from left to right. For the current character sidx ,idx = 1, n , do the following:
Step 7.1. Let's move from the current state of the automaton to the new one by symbol sidx , using
the transition function in the Aho-Corasick algorithm, and update curState .</p>
      <p>Step 7.2. If the current state of the automaton matches some keyword w j and | w j |= idx , then add a
pair of values (idx,curState.headLink ) to the array q .</p>
      <p>Step 7.3. If the current state has a reference to looseHeadLink , then add a pair
(idx,curState.looseHeadLink ) to the array q .</p>
      <p>Step 7.4. If the current state of the automaton matches some keyword w j and | w j |≠ idx , then
lb = idx− | w j | . For all pairs from an array q such that qi .idx ≤ lb it is necessary to check up, whether
the transition from a tree T node along an edge g(w j ) .</p>
      <p>Step 7.4.1. If the transition exists and the edge is not marked with the symbol "?", make the transition
on it, adding a pair idx and a node to the array q .</p>
      <p>Step 7.4.2. If the transition exists and the edge is marked with the symbol "?", we will additionally
check that qi .idx = lb − 1 and add the pair idx and the node to which we have passed, to the array q .</p>
      <p>Step 7.4.3. If a new pair of values has been added to q , check that the tree node in that pair matches
some search pattern. If so, we will add this search template to the answer A .</p>
      <p>Step 7.5. If the current state has a link termLink , go to this link and return to step 7.3.
Step 8. Return the set of found search patterns A and finish the algorithm.</p>
      <p>As you can see, in the process of the algorithm, we move all possible paths in the prefix tree T , and
this ensures that all search patterns are found in the string S , because they are all in T .</p>
    </sec>
    <sec id="sec-10">
      <title>5. Test case</title>
      <p>Let there be an alphabet Σ = {c1,c2 ,c3,c4 ,c5}. The input is set P = { p1, p2 , p3, p4 , p5 , p6 , p7}search
patterns. Patterns are as follows: p1 = *c1 * c1c3 ; p2 = *c1 ; p3 = c1c2 * c4c2c5 ; p4 = c2c5 * c1c3 * c4c2c5 ;
p5 = c1c2 * c2c5 * c4c2c5 ; p6 = c2c5 * c1 * c2c5 ; p7 = c2c5 * c4c2c5 .</p>
      <p>Select keywords from search patterns and get the set W = {w1, w2 , w3, w4 , w5} of keywords as: w1 = c1;
w2 = c1c2 ; w3 = c1c3 ; w4 = c2c5 ; w5 = c4c2c5 .</p>
      <p>Let's build a set Β = {b1,b2 ,b3,b4 ,b5 ,b6 ,b7} of converted search patterns based on keywords. The
converted search patterns are of the form: b1 = *w1 * w3 ; b2 = *w1 ; b3 = w2 * w5 ; b4 = w4 * w3 * w5 ;
b5 = w2 * w4 * w5 ; b6 = w4 * w1 * w4 ; b7 = w4 * w5 .</p>
      <p>We construct a prefix tree T , using the set Β = {b1,b2 ,b3,b4 ,b5 ,b6 ,b7} (figure 1).</p>
      <p>Using a set of keywords W = {w1, w2 , w3 , w4 , w5} and the prefix tree T , we construct a finite
automaton according to the modified Aho-Corasick algorithm (figure 2). To simplify the image of the
automaton, suffix links that lead to the initial state are not shown in the figure.</p>
      <p>Then comes the input string S = c2c5c5c5c1c2c4c3c3c4c2c5c2c1c3c2c5 . Let's start moving from left to
right along the string S . Initially, the array q is empty. The elements added to q can be considered an
implicit construction of the subtree of the prefix tree T , which for each subsequent character
si ∈ S , i = 1, m contains all the prefixes of the tree T .</p>
      <p>When reading the second character from the input string, we go into the state state7 of the
automaton, which corresponds to the keyword w4 . This keyword w4 is labelled hasHeadLink , which
means it can be the first keyword in the search pattern and must be at the beginning of the string S .
Since the link to the corresponding node in the prefix tree T was saved in the automaton, let's add the
node v3 to the array q : q = {(2, v3 )} .</p>
      <p>Consider the prefix tree after processing the fifth character. The automaton is in a state state1 that
matches the keyword w . This state has a final link terminalLink , the keyword has a label
1
looseHeadLink , and therefore can be the beginning of a search pattern in any part of the input string
S . We will add a node v1 that matches this initial keyword to the array q . As we see v1 corresponds to
the full search pattern p2 . We will add it to the answer. We will also check the presence of transitions
from the nodes added to the array q by the keyword w1 . Since there is a transition from the node v3 to
the node v7 , add these nodes to array q (Figure 4) and have the array in the form
q = {(2, v3 ), (5, v1), (5, v7 )} .</p>
      <p>When processing the last character of the string S , the automaton is in a state state8 that matches
to the keyword w . Among the elements of the array q is a node v8 , from which it is possible to go
5
along the edge w5 to the node v12 . The node v12 matches the search pattern p4 . Let's add this match
to the answer. The state state8 has a final reference to the state state7 that matches the keyword w4 .
But among the elements q there is no node from which you can go along the edge w4 to a new, not yet
visited node. At the end of the algorithm on the last character, the array q looks like this (Figure 5):
q = {(2, v3 ), (5, v1), (5, v7 ), (12, v9 ), (12, v11), (15, v4 ), (15, v8 ), (18, v12 )}</p>
      <p>As a result, five search patterns were found in the input string S : A = { p2 , p7 , p6 , p1, p4} . When
executing the algorithm, we did not change the prefix tree, but only used information about the
transitions. The automaton also remained unchanged, only the transition from state to state was
performed. Since during the operation of the algorithm to find keywords in a string, an automaton is
built according to the Aho-Corasick algorithm, this guarantees that all keywords are found, i.e. when
parsing a string, we will not miss a single keyword.</p>
      <p>Because the Aho-Corasick algorithm is used to find keywords in the string S when the algorithm is
running, this guarantees that all keywords will be found, i.e. we will not miss a single keyword when
parsing the string S . When processing a string S , we implicitly build a subtree of the prefix tree T ,
checking for each subsequent keyword the ability to increase the subtree by adding nodes that exist in
T . In the process of the algorithm, we move all possible paths in the prefix tree T , and this ensures
that all possible search patterns are found in the string S , because they are all in T .</p>
    </sec>
    <sec id="sec-11">
      <title>6. Analysis of algorithm execution time and memory consumption</title>
      <p>The assessment of the complexity of the algorithm can be divided into two parts: the assessment of
the pre-processing of search patterns, which is performed only once when the program is initialized,
and the assessment of the processing of the next string coming to the program during its operation.</p>
      <p>Consider first the stage of pre-processing of search patterns. Let's construct the set of all keywords
in time O(| M |) , where | M | is the total length of all search patterns. Let K is the total number of
keywords. Then the complexity of constructing the set Β at step 3 is O(| M | +K log | W |) . Building a
prefix tree will take O(K log | W |) . Estimation of the time of construction of the finite automaton in
step 5 is O(| M | log | Σ |) . That is, in general, given the relationships between variables, the total score
will not exceed O(| M | log | W |) . In this case, the memory consumption is linear with respect to the
total length of the search patterns, i.e. have an estimate O(| M |) .</p>
      <p>Now let us move on to evaluating the time and memory of processing a string S that comes during
program run. The algorithm goes through all the instances of the found keywords in the string. Let their
number be equal occ . For each found keyword, we check all possible states in q , from which it is
possible to navigate by this keyword. The number of such states is the number of prefixes in T that
were found in the part of the string preceding our keyword. That is, the time complexity of the algorithm
can be estimated as O(occ × prefnum × log | W | +anslen) . For the worst case, number of prefixes
prefnum can be estimated as O(| T |) if we do not count duplicates in the algorithm as possible different
answers. The bottom score in the case where there are no matches with the prefixes T is Ω(occ) . We
can conclude that the algorithm should work well for our data type, when the input strings S are not
overloaded with keywords, and the search patterns for the most part are not subsequences of each other.</p>
      <p>During the pre-processing of search patterns at the first and second steps, memory is spent O(| W |)
on building keywords. At the third step, memory is spent O(| M |) to build a set of transformed search
patterns. At the fourth step, memory is spent O(| M |) on building a prefix tree, and at the fifth step,
memory will be spent O(| W |) on building an automaton using the Aho-Corasick algorithm. So, the
total memory costs are linear with respect to the total length of the search patterns, that is, they have an
estimate O(| M |) . It should be noted that the actual memory consumption can be much less, it depends
on the number of unique keywords in the search patterns and the degree of similarity of the prefixes of
the search patterns. When processing a string S , memory is spent on support an array of nodes. That is,
memory consumption is linearly dependent on the number of prefixes O( prefnum) that were found in
the string.</p>
    </sec>
    <sec id="sec-12">
      <title>7. Software implementation</title>
      <p>The software implementation of the developed algorithm is carried out in the Java programming
language. The best library in terms of user-agent string analysis speed, which uses a complete list of
search patterns from the browscap database [27], was chosen as an analogue for comparison. Software
to determine the capabilities of the browser is created using modern tools. Cross-platforming is achieved
through the use of a Java virtual machine (JVM), which implements the principle of "write once, run
anywhere". The SBT system for automatically compiling projects written in Scala and Java is used to
manage the dependencies and plug-ins required for any particular type. The OpenCSV library is used
to read patterns, test user-agent instances, and parse CSV files. Comparison of the results of the
developed program on the test case with the results of the library-analogue browscap-php is carried out
using the JUnit framework. The Browscap-php library, which uses the browscap resource, checks the
correctness of the program and returns information about the capabilities of the user's browser. JMH
tools [28], which are one of the best for measuring the execution time of Java code in conditions close
to real, were used for benchmarking. In this work, JMH is used to compare the speed of the program
with similar programs.</p>
      <p>The main program is presented in three packages: Trie, Parser and Capabilities. The Trie package is
responsible for constructing and presenting compressed information about all input patterns in the form
of trees. These are the steps of the offline operation of the algorithm. The Parser package is responsible
for the main operation of the program, i.e. implements the online steps of the algorithm. The Capabilities
package contains classes and interfaces for presenting information about browser capabilities.</p>
      <p>The structure of the program consists of five components (figure 6).</p>
      <p>The User Agent DataBase component is responsible for presenting and interfacing browser features
and capabilities, preserving the relationship between search templates and the browsers that match
them. The Pattern Trie component implements and maintains a data structure in the form of a prefix
tree, which is built at the stage of pre-processing of search patterns and is used when processing
incoming requests (rows). The Automaton component implements and maintains a finite automaton
built according to the modified Aho-Corasick algorithm at the stage of pre-processing of search
patterns, which is used in the processing of incoming requests (strings). The Patterns Preprocessor
component is responsible for the initial processing of search patterns and builds a prefix tree and
automaton. The User-Agent processor handles incoming requests (strings), works with the Pattern Trie
and Automaton components to find matches among the search patterns with the input string.</p>
      <p>Consider the input query as a string:
«Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/59.0.3071.115 Safari/537.36»</p>
      <p>The program under consideration analyzes the User-Agent string and displays the following results
(figure 7):</p>
      <p>The developed program displays full information about the used device, operating system, and the
browser. The parsed User-Agent string corresponds to the Chrome browser installed on MacOS.
Browser features like JavaScript, iframe, and others are listed.</p>
    </sec>
    <sec id="sec-13">
      <title>8. Results and discussion</title>
      <p>Problem description. Research in the field of word processing and analysis methods was performed,
in particular, a review of methods and algorithms for comparing search patterns with instances of texts
to solve the problem of determining the capabilities of the browser was done.</p>
      <p>Solution methods. Our own algorithm for solving the problem of comparing search patterns with
input strings was developed based on the analyzed research. The strengths of the studied algorithms
and their shortcomings were taken into account in the development of the algorithm. As a result, the
developed algorithm received the best known estimate of the operating time from below.</p>
      <p>Results. The idea of using a prefix tree and a finite automaton built on the Aho-Corasick algorithm
to process incoming string queries were put forward. These data structures have been modified and
interconnected to achieve this goal. Testing the correctness of the software application was carried out
using the results of the official library browscap-php based on browscap data.</p>
      <p>The list of thousands of the most popular User-Agents, requests from which entered the analytical
system, was used as a corpus for testing.</p>
      <p>A prototype library to determine the capabilities of the browser was developed on the basis of the
developed algorithm. The speed of analysis of User-Agent strings collected in the browscap.org
database is determined by 6,978±0,046 query processing per millisecond. Given that the database
contains more than 62,5 million records of User-Agent strings, the search for the desired User-Agent
can take 30 seconds or longer. The running time of the developed application was comparable for
different real User-Agent strings. A significant advantage in the speed of processing strings by the
developed program was found in comparison with the browscap-java library [27]. The proposed
software implementation performs 40.584±0.415 string processing per millisecond. During the
benchmarking and testing of the developed library, the efficiency of its work and a significant advantage
in speed compared to the nearest competitor was demonstrated, namely a difference of 7 times. The
results of benchmarking are presented in table 1.</p>
      <p>The value of the results. For businesses that operate on the Internet, the accuracy and speed of the
results are very important for further effective work with users. The need to develop fast methods for
matching a large number of search patterns (more than 200,000) with User-Agent instances (more than
62.5 million User-Agent records) is relevant.</p>
      <p>Information about the source of the request may be necessary to solve the following tasks:
• redirecting requests to the mobile version;
• using specific styles for specific browsers;
• collecting statistics on the number of requests from different devices;
• creation of special rules for processing requests from robots;
• prohibition of access to the site for any web utilities and etc.</p>
    </sec>
    <sec id="sec-14">
      <title>9. Conclusions</title>
      <p>The problem of insufficient speed of existing libraries to determine the capabilities of the user's
(client's) browser, which uses the browscap database, was formulated. A detailed analysis of the
literature, available research and developments in the field of matching search patterns with text is
made. An algorithm for matching an arbitrary number of search patterns with text instances in real-time
has been developed. The time complexity of the algorithm and memory consumption were analyzed.
The software implementation has shown a significant performance advantage over existing analogues
for identifying browsers capabilities.</p>
      <p>Further research can be done to optimize memory consumption during the operation of the
algorithm, as well as to reduce the pre-processing time of search patterns in the stage of initialization
of the library. You can also extend the ability to specify wildcards, such as making it possible to specify
ranges of the number of characters that can be substituted.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>User</given-names>
            <surname>Agents Database</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://developers.whatismybrowser.com/useragents/database/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Browser</given-names>
            <surname>Capabilities Project</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://browscap.org/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sommer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kreibich</surname>
          </string-name>
          ,
          <source>Recent Advances in Intrusion Detection: 13th International Symposium, RAID</source>
          <year>2010</year>
          , Ottawa, Ontario, Canada,
          <source>September 15-17</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yakushev</surname>
          </string-name>
          ,
          <source>Virtual Machine for Regular Expressions</source>
          ,
          <year>2018</year>
          . URL: https://www.slideshare.net/AlexanderYakushev1/
          <article-title>virtual-machine-for-regular-expressions.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <article-title>Analysis of Multiple String Pattern Matching Algorithms</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Information Technology</source>
          ,
          <volume>3</volume>
          :
          <issue>4</issue>
          (
          <year>2014</year>
          )
          <fpage>344</fpage>
          -
          <lpage>353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Knuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Pratt</surname>
          </string-name>
          ,
          <article-title>Fast pattern matching in strings</article-title>
          .
          <source>SIAM Journal on Computing</source>
          ,
          <volume>6</volume>
          :
          <fpage>2</fpage>
          , (
          <year>1977</year>
          )
          <fpage>323</fpage>
          -
          <lpage>350</lpage>
          , doi: 10.1137/0206024.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Karp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Rabin</surname>
          </string-name>
          ,
          <article-title>Efficient randomized pattern-matching algorithms</article-title>
          .
          <source>IBM Journal of Research and Development</source>
          ,
          <volume>31</volume>
          :
          <fpage>2</fpage>
          , (
          <year>1987</year>
          )
          <fpage>249</fpage>
          -
          <lpage>260</lpage>
          , doi: 10.1147/rd.312.0249.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Boyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <article-title>A fast string searching algorithm</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>20</volume>
          :
          <fpage>10</fpage>
          (
          <year>1977</year>
          )
          <fpage>762</fpage>
          -
          <lpage>772</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Aho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Corasick</surname>
          </string-name>
          ,
          <article-title>Efficient string matching: an aid to bibliographic search</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>18</volume>
          :6 (
          <year>1975</year>
          )
          <fpage>333</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Commentz-Walter</surname>
          </string-name>
          ,
          <article-title>A string matching algorithm fast on the average</article-title>
          . In: Maurer H.A. (eds) Automata,
          <article-title>Languages and Programming</article-title>
          .
          <source>ICALP</source>
          ,
          <volume>71</volume>
          (
          <year>1979</year>
          ). Springer, Berlin, Heidelberg, doi:10.1007/3-540-09510-1_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Manber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Fast</given-names>
            <surname>Algorithm for Multi-Pattern</surname>
          </string-name>
          <string-name>
            <surname>Searching</surname>
          </string-name>
          ,
          <source>Technical Report TR-94- 17</source>
          Department of Computer Science, University of Arizona,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>M. J. Fischer</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Paterson</surname>
          </string-name>
          , String Matching and
          <article-title>Other Products</article-title>
          . In: Complexity of Computation,
          <string-name>
            <surname>SIAM-AMS</surname>
            <given-names>Proceedings</given-names>
          </string-name>
          , (
          <year>1974</year>
          )
          <fpage>113</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Nussbaumer</surname>
          </string-name>
          ,
          <source>Fast Fourier transform and convolution algorithms</source>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          ,
          <article-title>Faster algorithms for string matching problems: Matching the convolution bound</article-title>
          .
          <source>In: Proceedings of the 39th Annual Symposium on Foundations of Computer Science</source>
          , IEEE, (
          <year>1998</year>
          )
          <fpage>166</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Iliopoulos</surname>
          </string-name>
          ,
          <article-title>Pattern matching algorithms with don't cares</article-title>
          .
          <source>In: Proceedings of the 33rd Conference on Current Trends in Theory and Practice of Computer Science</source>
          , SOFSEM, (
          <year>2007</year>
          )
          <fpage>116</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Iliopoulos</surname>
          </string-name>
          ,
          <article-title>On the average-case complexity of pattern matching with wildcards</article-title>
          ,
          <source>CoRR</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Gonnet</surname>
          </string-name>
          ,
          <article-title>A new approach to text searching</article-title>
          ,
          <source>Communications of the ACM</source>
          , (
          <year>1992</year>
          )
          <fpage>74</fpage>
          -
          <lpage>82</lpage>
          , doi: 10.1145/135239.135243
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Navarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raffinot</surname>
          </string-name>
          , Flexible Pattern Matching in Strings: Practical On-line
          <source>Search Algorithms for Texts and Biological Sequences</source>
          , Cambridge: Cambridge University Press,
          <year>2002</year>
          , doi: 10.1017/CBO9781316135228
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <article-title>Distance and similarity measures for hesitant fuzzy sets</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>181</volume>
          :
          <fpage>11</fpage>
          (
          <year>2011</year>
          )
          <fpage>2128</fpage>
          -
          <lpage>2138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kucherov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rusinowitch</surname>
          </string-name>
          ,
          <article-title>Matching a set of strings with variable length don't cares</article-title>
          ,
          <source>Theoretical Computer Science</source>
          ,
          <volume>178</volume>
          (
          <year>1997</year>
          )
          <fpage>129</fpage>
          -
          <lpage>154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Yi Zhang, L.Zuo
          <string-name>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>A faster algorithm for matching a set of patterns with variable length don't cares</article-title>
          .
          <source>Information Processing Letters</source>
          ,
          <volume>110</volume>
          :6 (
          <year>2010</year>
          )
          <fpage>216</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Haapasalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Silvasti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sippu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Soisalon-Soininen</surname>
          </string-name>
          ,
          <article-title>Online dictionary matching with variable-length gaps</article-title>
          .
          <source>In: International Symposium on Experimental Algorithms</source>
          . Springer, Berlin, Heidelberg, (
          <year>2011</year>
          )
          <fpage>76</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Na</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Fei</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          <article-title>. Multi-pattern matching with variable-length wildcards using suffix tree</article-title>
          .
          <source>Pattern Analysis and Applications</source>
          ,
          <volume>21</volume>
          (
          <year>2018</year>
          )
          <fpage>1151</fpage>
          -
          <lpage>1165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sartaj</surname>
          </string-name>
          ,
          <article-title>Handbook of data structures and applications</article-title>
          . 2nd ed.,
          <source>Chapman</source>
          and Hall/CRC,
          <year>2018</year>
          .
          <string-name>
            <given-names>D.R.</given-names>
            <surname>Morrison</surname>
          </string-name>
          , Practical Algorithm to Retrieve Information Coded in Alphanumeric.
          <source>Journal of the ACM</source>
          ,
          <volume>15</volume>
          :4 (
          <year>1968</year>
          )
          <fpage>514</fpage>
          -
          <lpage>534</lpage>
          . T. Cormen,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leiserson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rivest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Introduction to Algorithms, 3rd ed., MIT Press, Cambridge, Massachusetts,
          <year>2009</year>
          ,
          <fpage>308</fpage>
          -
          <lpage>309</lpage>
          . browscap-java,
          <year>2020</year>
          , URL: https://github.com/blueconic/browscap-iava.
          <source>Java Microbenchmark Harness</source>
          , URL: https://openidk.java.net/proiects/code-tools/imh/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>