<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A combined method of similar code sequences search in executable les</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A S Yumaganov</string-name>
          <email>yumagan@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoye shosse, 34, Samara, Russia, 443086</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>394</fpage>
      <lpage>400</lpage>
      <abstract>
        <p>The article is devoted to the development of a method of similar code sequences search in executable les, which is based on both syntax analysis of the code and function's control ow graphs analysis. The syntax analysis method used in this paper is based on a comparison of the spatial distribution of processor instructions in the function body. The analysis of function control ow graph is used a structural description of xed-order subgraphs of the function control ow graph. The results of experimental studies, including the comparison of the proposed method and previously known methods of searching for similar code sequences, are presented.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The problem of nding similar code sequences in executable les is very relevant today.
According to the studies presented in [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], developers often use previously created program code
during the process of the new software development. This approach to software development
is called code reuse. Despite the obvious advantages of this approach, it can also cause errors
and vulnerabilities in the software being developed. In addition, third-party code reuse may
be illegal. This approach is also used in the development of various malicious programs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Thus, solving the problem of nding similar code sequences in executable les allows us to solve
several problems: nding known vulnerabilities, nding plagiarism in software, and searching
for malware.
      </p>
      <p>Currently, there are a large number of methods of similar code sequences search in executable
les. All known algorithms and methods for solving the above problem are usually based either
on the syntactic analysis of the assembler program code or on the analysis of the structure of
control ow graphs of the executable le functions.</p>
      <p>
        Let us consider the methods based on the syntax analysis of the code. In [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], similar
methods of similar code sequences search was presented. These methods are based on comparing
sequences or permutations of processor instructions of a xed length (k-grams or n-perms) in
functions. The IDA disassembler uses the IDA FLIRT algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to identify library functions,
which is based on a comparison of function's patterns. The author of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presented a method for
detecting malware, based on the analysis of the frequency of processor instructions occurrence
inside the examined le. Signi cant impact on the quality of the similar code sequences search
for this group of methods is made by syntactic code changes: replacing processor instructions
with equivalent ones, rearranging instructions, inserting new ones, and deleting old instructions.
The methods based on the analysis of the functions control ow graph allow to overcome this
drawback. The authors of the method presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used the information of basic blocks (the
vertices of the control ow graph) to detect and classify malware. The authors of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] used a
method based on determining the isomorphism of control ow graphs to identify malware. A
similar approach was described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], where the detection of malware was based on determining
the isomorphism of xed length function's subgraphs and comparing the signatures ( ngerprints)
of their basic blocks. However, this group of methods also has several disadvantages: low search
accuracy for functions with a small number of basic blocks, high sensitivity to structural changes
in functions.
      </p>
      <p>In this paper, a combined method for searching functions of an executable le, which are
similar to the known functions from some software "archive" is proposed. The basis of the
proposed method is to use of both the syntactic analysis of the code and the analysis of functions
control ow graph. The description of the function in the presented method is formed by its
similarity with the functions of the basis library.</p>
      <p>The paper is structured as follows. The rst section presents the basic de nitions and a
brief description of the proposed method. The second section discusses the process of getting
a syntactic description; the third one describes the process of getting the structural description
of functions. The fourth section is devoted to the description of the similar functions search
algorithm. The fth section provides the e ectiveness evaluation technique of the proposed
method and the results of the experiments. In the nal part of the paper, the conclusions and
a list of references are presented.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Basic concepts and principle of operation</title>
      <p>The following de nitions are used in this paper:
current library - the set of the investigated executable le functions;
archival data - the set of the known functions;
basis library - an auxiliary set of functions used to compare the functions of archived data
and the current library.</p>
      <p>
        Taking into account the above de nitions, the problem solved by the proposed method is
formulated as follows: for a given (or each) function of the current library, nd the most similar
function from the archive data. In this paper, we use two de nitions of the functions similarity
measure. The rst one is based on the position of the functional groups of processor instructions
in the body of functions. The second one is based on the analysis of the structure of the functions
control ow graph. In the previously published works of the author [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], two methods
of similar code sequences search were presented, each of which is based on one of the above
de nitions of the function similarity measure, respectively. This paper presents a combined
method of similar code sequences search. In this method, the description of function is formed
through its similarity with the functions of the basis library, using two di erent de nitions of
the functions similarity measure described in [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>The proposed method of similar code sequences search includes several stages. At the rst
stage, the description of the archive data functions is formed through the library of basis
functions. In addition, for each function, two descriptions are obtained (syntactic and structural)
corresponding to di erent measures of functions similarity. At the second stage, the description
of the current library functions is formed similarly. At the nal stage, the search for similar
functions is performed. The algorithm of search is described in detail in the fth section.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The syntactic description of the function</title>
      <p>
        Using the IDA [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] disassembler, we can obtain a partition of the assembler code of the analysed
executable le into functions. The assembler code consists of a sequence of processor instructions
and associated operands. All processor instructions can be divided into K functional groups
according to the type of operations they perform. Examples of such groups are: a group of
arithmetic instructions, a group of logical instructions, a group of data transfer instructions.
For each of the K functional groups for a given function, we obtain a list of o sets relative to
the beginning of the function, on which the instructions of this group are located.
      </p>
      <p>Let us determine the spatial distribution of the k instruction type as the absolute frequency
of the instructions of this type entering in a relative normalized i-th interval (I = 100):
f~ik =</p>
      <p>Nk 1
X I
j=0
njk
N
100 2 (i</p>
      <p>!
1; i] ; i = 1; I;
where n0k; :::; nkNk 1 are absolute o sets relative to the beginning of the function of instructions
of group k, Nk is a total number of instructions of this group in this function, N is the length
of the function, I(:) is the event indicator, which takes the values "0" or "1" depending on the
truth of the corresponding argument.</p>
      <p>To obtain the spatial distribution of instructions in the integral form, we use the following
formula:
i</p>
      <p>P f~k
f^ik = y =I0 y ; i = 1; I:</p>
      <p>P f~k
j=0 j
(1)
(2)
(3)
(4)
(5)
(6)</p>
      <p>Then, the spatial position of the k-th group of processor instructions in the body of the
function is described by the vector:
ak = f^1k; f^2k; : : : ; f^Ik T ; k = 0; K
1:
As a result, the description of the considered function has the following form:</p>
      <p>A = (a0; a1; : : : ; aK 1) :</p>
      <p>The matrix B, which represents the description of the basis library functions, is formed in a
similar way. The measure of functions similarity, which description is given by the matrices A
and B, has the following form:
where</p>
      <p>K 1
(A; B) = X
k=0
k cos(ak; bk);</p>
      <p>k = 1;
K 1
X
k=0
cos
is the cosine distance. If two functions are identical, the measure of similarity takes the value
"1", otherwise "0".</p>
      <p>Let a library of basis functions contains J functions, each of which has a description in the
form of a matrix Bj . Then the description of the considered function through the library of
basis functions will have the following form:</p>
      <p>xA = ( (A; 0) ; (A; 1) ; : : : ; (A; J 1))T :</p>
      <p>
        Further, using the obtained intermediate description of the function, its nal description is
formed using the PCA (principal component analysis) method for reducing the dimensionality of
the data. The process of forming the nal description is described in detail in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The resulting
description of the function is stored in the corresponding database (archive or current).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. The structural description of the function</title>
      <p>IDA disassembler allows to obtain a control ow graph of the analysed executable le functions.
The control ow graph of a function is a directed graph which vertices are the basic blocks of
function. The basic block of function is a sequence of processor instructions. The rst instruction
of the basic block receives the control from some processor instruction, and the last one is an
instruction, which passes the control to another basic block. The edges of the control ow graph
determine the order of the basic blocks in the control ow of the function.</p>
      <p>The control ow graph of the analysed function is divided into subgraphs of xed order k
(k-subgraphs) as follows: each basic block is chosen as a starting node and all edges beginning
from that node are traversed until k nodes are encountered. In this paper, k = 3 is used.</p>
      <p>
        The description of each of the k-subgraphs of the function consists of a pair of vectors: a
and b. The a vector is a binary vector obtained by combining the rows of the adjacency matrix
of a given k-subgraph. The b vector characterizes the presence or absence of reading or writing
operations in operands of various types in this k-subgraph. A detailed description of the vector
b is presented in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Thus, the initial description of a function consists of a set of pairs of
vectors a and b describing each k-subgraph of the function.
      </p>
      <p>
        The intermediate description of the function is formed on the basis of its similarity with the
functions of the basis library. As a measure of similarity, we use the generalized Jaccard index
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]:
      </p>
      <p>J (x; y) =</p>
      <p>P min(xi; yi)
i
P max(xi; yi)
i
;
where x is a set of vector pairs which described the rst function, y is a set of vector pairs
which described the second function,xi is a number of pairs i in set x,yi is a number of pairs i
in set y, i passed through all unique pairs of vectors in the combined set x S y. In the case of
complete similarity between functions the value of similarity measure is "1", in case of complete
dissimilarity it is "0".</p>
      <p>Let x be a set of vector pairs described the analysed function, yi is a set of vector pairs
described the i-th function of the basis library, I is a number of functions in the basis library,
then the intermediate description of the function under investigation has the following form:
z = (J (x; y0); J (x; y1); :::; J (x; yI 1))T :
(7)
(8)</p>
      <p>The nal description of the function is obtained after applying the PCA dimension reduction
method and is stored in the appropriate database (archive or current).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Search for similar functions</title>
      <p>The nal stage of the proposed method is the search for similar functions based on the previously
obtained function description vectors.</p>
      <p>
        In the previous works of the author, a search algorithm with the following assumption was
used: the size of the changed functions di ers from the original by no more than 30% [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This
assumption was used to increase the quality of the search. Thus, at the rst stage of the search,
the archive functions were ltered by their size, then the Euclidean distance to each function
was calculated from the ltered list of archive functions, and the result was sorted by increasing
the Euclidean distance.
      </p>
      <p>This paper presents a combined method of similar functions search, which uses a syntactic
and structural description of functions. However, if the number of basic blocks of the function
being studied is small (bbmin 5), only the syntactic description of the function and the search
algorithm described above are used. This condition allows to increase the search accuracy since
the structural description of small functions can be very similar to the structural description of
the similar size functions due to their small size.</p>
      <p>The search method presented in this paper uses the following algorithm for similar functions
search (if bbmin &gt; 5):</p>
      <p>At the rst stage, preliminary ltering of archive functions is performed. For the function
under study and all the functions of the archival data, the Euclidean distance between the
vectors of the syntactic (or structural) description of the functions is calculated and sorted
by ascending distance. The rst topth = 30 elements of the list of the most similar archive
functions are used in the second stage of the search.</p>
      <p>At the second stage of the search, the Euclidean distance from the function under
investigation to the list of functions obtained above is calculated. However, in this case, we
use another type of vectors as the description vectors. In other words, if at the rst stage
the functions were compared by the vectors corresponding to their syntactic description,
then at the second stage functions will be compared by the vectors corresponding to their
structural description and vice versa. Then, the obtained result is sorted by increasing the
Euclidean distance.</p>
      <p>As a result, for the analysed function, a list of archive data functions is obtained, sorted by
similarity in descending order.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <p>To evaluate the e ciency of the proposed method of similar code sequences search in executable
les, the functions of one dynamic library are used as archive data, and the functions of the
same library, but of a di erent version, are used as the current library. It was considered that
in the process of switching from one version of the dynamic library to another, the names of the
functions did not change and there are no functions with the same name among the functions
of the archive data.</p>
      <p>
        Using the search algorithm described in the fth section, for a given function of the current
library, we obtain an ordered list of archive data functions. Let us assign a binary sequence
= ( 1; 2; :::; L) to this list, the i-th element of which is equal to one, if the name of the
function at the i-th position of the list is identical to the name of the function being checked,
and the i-th element of which is equal to zero otherwise. The following criteria to evaluate the
quality of information retrieval [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] are used:
k
P l
Precision for the k-th position of the list: Pk = l=1
k
k
      </p>
      <p>P l
Recall for the k-th position of the list: Rk = l=1</p>
      <p>K</p>
      <p>L
The average precision of the list: AveP = P Pk(Rk
k=1</p>
      <p>Rk 1); R0 = 0</p>
      <p>The average precision for all functions included in the current library is calculated by the
formula:</p>
      <p>P = 1 SX1 AvePs;</p>
      <p>S s=0
(9)
where S is a number of functions in the current library.</p>
      <p>
        Several versions of the libti library [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] were used for experiments. The functions of the
library libti 4.0.8 were used as functions of the archived data. These libraries were compiled
with the optimization ag /Od (optimization disabled).
      </p>
      <p>The results of comparing two methods of preliminary ltering of the archival data functions
are presented in table 1.</p>
      <p>The average precision of the search for the considered libraries is higher when preliminary
ltering of the archival data functions by structural description is used. Therefore, in further
experiments, this method of preliminary ltering of archive functions will be used.</p>
      <p>
        In next experiment, the average precision of search of the proposed method was compared
with some previously known methods: a method based on the analysis of the spatial position of
processor functional groups [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and a method based on k-gram comparison [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].As a comparison
object for the rst method, the comparison object recommended by the authors was used (the
spatial distribution of instructions in the function body in integral form). For the second method,
the value of the parameter k = 5 was also chosen based on the recommendations of the authors.
The results are presented in table 2.
      </p>
      <p>
        The analysis of the obtained results shows that the method of similar code sequences search
presented in this paper is superior to the previously known methods in each of the used current
libraries. Moreover, the "closer" the version of the current library to the version of the archive
data library, the less advantage the presented method has over the method [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This is explained
by the fact that for libraries used in experimental studies, some functions of older versions of
the current library have signi cant syntactic changes with respect to archive functions. And
preliminary ltering of archival data by structural description can signi cantly improve the
precision of the search.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The paper presents a combined method of similar code sequences search in executable les
using both syntactic and structural descriptions of functions. The results of experiments
demonstrating the superiority of the developed method over some previously known methods
have been presented. Further studies will be carried out in improving the accuracy of similar
functions search and studying the e ciency of the proposed method using executable les
compiled with di erent compilation settings.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Abdalkareem</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shihab</surname>
            <given-names>E</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rilling J 2017</surname>
          </string-name>
          <article-title>On code reuse from StackOverow: An exploratory study on Android apps Information</article-title>
          and
          <source>Software Technology</source>
          <volume>88</volume>
          <fpage>148</fpage>
          -
          <lpage>158</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Gharehyazie</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ray</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <string-name>
            <surname>Filkov</surname>
            <given-names>V 2017</given-names>
          </string-name>
          <article-title>Some from here, some from there: cross-project code</article-title>
          reuse in
          <source>GitHub Proc. of the 14th International Conference on Mining Software Repositories</source>
          <volume>1</volume>
          <fpage>291</fpage>
          -
          <lpage>301</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Examining</given-names>
            <surname>Code Reuse Reveals Undiscovered Links Among North Koreas Malware Families</surname>
          </string-name>
          <string-name>
            <surname>URL</surname>
          </string-name>
          :https://securingtomorrow.mcafee.
          <article-title>com/mcafee-labs/examining-code-reuse-reveals-undiscovered -links-among-north-koreas-malware-families/ (05</article-title>
          .
          <fpage>11</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Myles</surname>
            <given-names>G</given-names>
          </string-name>
          and
          <string-name>
            <surname>Collberg C 2005</surname>
          </string-name>
          <article-title>K-gram based software birthmarks</article-title>
          <source>Proc. of the ACM symposium on Applied computing</source>
          <volume>1</volume>
          <fpage>314</fpage>
          -
          <lpage>318</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Karim</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walenstein</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhotia</surname>
            <given-names>A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Parida L 2005</surname>
          </string-name>
          <article-title>Malware phylogeny generation using permutations of code</article-title>
          <source>Journal in Computer Virology</source>
          <volume>1</volume>
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>IDA F.L.I.R.T Technology</surname>
          </string-name>
          :
          <string-name>
            <surname>In-Depth</surname>
            <given-names>URL</given-names>
          </string-name>
          : https://www.hex-rays.com/products/ida/tech/irt/in depth.
          <source>shtml (05.11</source>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Bilar</surname>
            <given-names>D 2007</given-names>
          </string-name>
          <article-title>Opcodes as predictor for malware</article-title>
          <source>International Journal of Electronic Security and Digital Forensics</source>
          <volume>1</volume>
          <fpage>156</fpage>
          -
          <lpage>168</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Gheorghescu</surname>
            <given-names>M 2005</given-names>
          </string-name>
          <article-title>An automated virus classi_cation system</article-title>
          <source>Virus Bulletin Conference</source>
          <volume>1</volume>
          <fpage>294</fpage>
          -
          <lpage>300</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Bruschi</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martignoni</surname>
            <given-names>L</given-names>
          </string-name>
          and
          <string-name>
            <surname>Monga M 2006</surname>
          </string-name>
          <article-title>Detecting selfmutating malware using control-ow graph matching</article-title>
          <source>Proc. of the Third international conference on Detection of Intrusions and Malware &amp; Vulnerability Assessment</source>
          <volume>1</volume>
          <fpage>129</fpage>
          -
          <lpage>143</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kruegel</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirda</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutz</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robertson</surname>
            <given-names>W</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vigna</surname>
            <given-names>G 2005</given-names>
          </string-name>
          <article-title>Polymorphic worm detection using structural information of executables</article-title>
          <source>Recent Advances in Intrusion Detection</source>
          <volume>1</volume>
          <fpage>207</fpage>
          -
          <lpage>226</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yumaganov</surname>
            <given-names>A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>V 2017</given-names>
          </string-name>
          <article-title>A method of searching for similar code sequences in executable binary files using a featureless approach</article-title>
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>5</issue>
          )
          <fpage>756</fpage>
          -
          <lpage>764</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-5-
          <fpage>756</fpage>
          -764
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Yumaganov</surname>
            <given-names>A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>V 2018</given-names>
          </string-name>
          <article-title>Searching for similar code sequences in executable files based on the structural analysis of functions J</article-title>
          .
          <source>Phys.: Conf. Ser. 1096 012093</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Hex-Rays</surname>
            <given-names>IDA</given-names>
          </string-name>
          :
          <string-name>
            <surname>About</surname>
            <given-names>URL</given-names>
          </string-name>
          : http://hex-rays.com/products/ida/ (
          <volume>05</volume>
          .
          <fpage>11</fpage>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Spath</surname>
            <given-names>H 1981</given-names>
          </string-name>
          <article-title>The minisum location problem for the Jaccard metric Operations-ResearchSpektrum 3</article-title>
          <fpage>91</fpage>
          -
          <lpage>94</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Buckland M and Gey F 1994</surname>
          </string-name>
          <article-title>The relationship between recall</article-title>
          and
          <source>precision JASIS</source>
          <volume>45</volume>
          <fpage>12</fpage>
          -
          <lpage>19</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Powers</surname>
            <given-names>D 2011</given-names>
          </string-name>
          <article-title>Evaluation: From Precision, Recall and</article-title>
          <string-name>
            <surname>F-Measure to</surname>
            <given-names>ROC</given-names>
          </string-name>
          , Informedness, Markedness &amp;
          <source>Correlation Journal of Machine Learning Technologies</source>
          <volume>2</volume>
          <fpage>37</fpage>
          -
          <lpage>63</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>TIFF</given-names>
            <surname>Library and Utilities</surname>
          </string-name>
          <string-name>
            <surname>URL</surname>
          </string-name>
          : http://www.libti_.
          <source>org/ (05.11</source>
          .
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>