<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Reliable Network Entity Tracking Using Behavioural Bag of Words Representation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaroslav Hlavácˇ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Kopp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Polák</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Kohout</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cisco Systems, Cognitive Research Team in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1802</year>
      </pub-date>
      <abstract>
        <p>In this paper, we study the problem of entity identification and tracking in the domain of network security. The ability to uniquely identify and track network entities in time is essential for network behavioural analytics. Our approach leverages the Bag of Words (BoW) representation, enabling us to build representations from many different features from multiple data sources. However, normalisation methods traditionally used with BoW in other application domains (e.g. tf-idf, stop words) do not work well with network data as they are not designed to capture behavioral patterns. Some features, such as common networks servers (e.g. google.com, update.microsoft.com), or executed binaries (e.g. web browsers), are often too frequent but still valuable behavioural indicators. In order to capture important longterm patterns in entities' behaviour, we introduce timeaware normalisation of the BoW representations. We compare different representations for device tracking on real network telemetry. Our results show that using multiple data sources significantly improves entity tracking, especially when combined with proposed time-aware normalisation.</p>
      </abstract>
      <kwd-group>
        <kwd>bag of words</kwd>
        <kwd>entity tracking</kwd>
        <kwd>network behavioural analytics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The ability to identify and track network entities (devices,
users, etc.) is crucial for user and entity behavioural
analytics (UEBA), which is the application domain of our
paper. UEBA systems usually create a representation of
the normal behaviour of an entity and then look for
abnormal activity in the subsequent network communication.
In general, an entity representation is a numerical vector
in a latent space that describes real-world objects, such
as words or movies, their relationships, and behaviour.
Ideally, the semantic similarity in the input space is
captured in the latent space, meaning that similar input
objects (words, movies) are close to each other in the latent
space. In our case, the input objects are network devices
and users.</p>
      <p>
        We are looking for a representation that would serve as
a unique fingerprint for each device/user, and differences
between them would correspond to differences in their
behaviour. A straightforward way to represent user or device
behaviour on the network is by sets of features, such as
visited application servers, web domains, or used programs.
Unfortunately, such representation does not form a metric
space as it supports only pair-wise similarity comparisons
(see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for a more detailed explanation). Furthermore,
such representation is inconvenient to work with, and set
operations are computationally expensive. The
representation in the latent space helps to overcome these issues.
      </p>
      <p>There are other challenges that an ideal representation
should respect. The entities (users and devices) change
in time; therefore, their representation moves in the latent
space. Tracking this movement opens new possibilities for
anomaly detection. Any significant shift in latent space
indicates a sudden change in user/device behaviour and may
be worth reporting as an anomaly. Clustering the
representations could help to discover groups of similarly behaving
entities. Finding that an entity has changed or is frequently
changing its group can be treated as an anomaly.</p>
      <p>In this paper, we are focusing on the first problem,
which is user/device tracking. Uniquely identifying and
tracking any network device is the first step of all the
above-mentioned use cases. We compare device
representations based on the Bag of Words (BoW) built on top
of multiple features from different data sources as well as
their combination. BoW is a universal approach that
allows explaining the differences in representations directly
from feature vectors.</p>
      <p>Furthermore, we introduce the time-aware
normalisation of the BoW representations to reduce the influence of
the most common network servers, binaries, etc. We
compare the representations on a week of telemetry gathered
in a real company network.</p>
      <p>The rest of the paper is organised as follows. The next
section covers the related work in the field of entity
representation. Section 3 formally describes the entity
representation task, followed by the experimental evaluation in
Section 4. The Section 5 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>There are two major challenges that need to be solved
in the task of time-aware tracking of network entities.
Firstly, representations need to change in time as the
behaviour of the entity evolves. Secondly, the change in
network entity behaviour needs to be easily explainable from
the deviation in the representation vector.</p>
      <p>
        We are not aware of any prior research in finding
representations from multi-modal data in the area of network
security. However, the problem of finding entity
representations is actively researched in other domains, such
as healthcare [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], graph node classification [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], natural
language processing [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], recommender systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
others.
      </p>
      <sec id="sec-2-1">
        <title>Recommender systems use time-aware representation</title>
        <p>
          to successfully predict what item the user might be
interested in during the next interaction with the application,
e.g. the next movie to watch [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] or next item to buy [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
Using recurrent neural networks (RNNs) that take
useritem transactions as an input (such as JODIE [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]) has
recently proven to be promising in tracking and predicting
the trajectories of both users and items. However, in our
use case in network security, they suffer from a lack of
explainability. We need to track trajectories of the entity,
but we also need to attribute the change in behaviour to a
specific feature or set of transactions that caused the
deviation. This allows faster investigation of a potential security
incident.
        </p>
        <p>
          The explainability problem can be directly solved by
the Bag of Words representation. It brings the possibility
to tie the (dis)similarity of two objects with concrete
features. BoW is a well-known and effective approach
originally used for document classification but also in other
domains like image recognition [
          <xref ref-type="bibr" rid="ref12">19, 12</xref>
          ] and NLP [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. It
suffers from two weaknesses: sparsity of the resulting
representations and the inability to capture the semantics of
the represented entity.
        </p>
        <p>The sparsity in the latent space is often tackled by
entirely removing the most common features and using only
top N most frequent features from the remaining
vocabulary. In our use case, this approach is unfeasible as both
high and low frequented features are needed to capture the
behavior of an entity.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] the semantics are captured by dividing the image
into subsections and applying the BoW representation on
them. We also divide the traffic into smaller parts. But as
our data are time series and not images, we exploit
temporal (time windows) rather than spatial vicinity.
        </p>
        <p>
          The problem of uniquely identifying users is also
being studied in user-computer interaction. Passive
monitoring of user actions is used to identify the user. For
example, [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] treats keyboard strokes and mouse movement as
a biometric means of authentication. In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], they were able
to uniquely identify users by collecting data from browser
sessions. Our problem is more complex, as we do not have
access to the direct interaction of the human user with the
computer nor to active probing. Our only source of data
is passive log access. Even distinguishing between human
and machine generated components of network traffic is a
difficult task.
        </p>
        <p>
          Lastly, related problem is studied in named-entity
recognition [
          <xref ref-type="bibr" rid="ref11 ref14">14, 11</xref>
          ]. Named entity recognition focuses
on finding tokens that identify entities of predefined
categories (names, currencies, etc.) in the textual or similar
data. The methods often rely on some contextual
knowledge from surrounding words or sentences. However, we
can hardly rely on such context in network data as it can be
scattered over several hundred or even thousands of logs.
Also, the entities we are tracking are users and devices that
exhibit complex and very dynamic behaviour that is often
changing in time. Therefore, methods from named-entity
recognition are not directly applicable.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Entity Representation</title>
      <p>The representation of an entity is the result of a mapping
e : X ! L from a general Cartesian feature space X to a
latent space L where a numerical vector represents each
entity. We assume that there exists a similarity function:
sim : L</p>
      <p>L ! [0; 1]
in the latent space L. This similarity function fulfils the
standard requirements of symmetry and identity of
indiscernibles.</p>
      <p>The goal is to find such mapping that would create a
time-aware behavioural fingerprint of an entity. Formally,
we define following requirements for the mapping:
• Requirement 1: Representations of an entity in two
subsequent time periods should be similar to each
other. This can be expressed by following formula
for average self-similarity:
r1 =
1 N</p>
      <p>å sim (e(xit1 ); e(xit2 )) ;
N i
(1)
where N is the number of entities in the set, e is the
mapping from raw feature space X to latent space L,
xi 2 X is the entity representation in X , and t1, t2 are
the consecutive time periods.
• Requirement 2: Different entities need to be
dissimilar and distinguishable by their representation. This
can be expressed by the following formula for
average dissimilarity between different entities:
r2 =</p>
      <p>2
N(N</p>
      <p>N i 1
å å (1
1) i j</p>
      <p>sim(e(xit ); e(x jt ))) (2)
where N is the number of entities in the set, e is the
mapping from raw feature space X to latent space L,
and xi; x j 2 X are different entities.</p>
      <p>With the requirements above and a given similarity
function sim we want to find the mapping e which
maximises both requirements r1,r2, e.g., in a form of their
weighted sum.
In this work we are using bag of words (BoW) [16]
representation as a baseline for further work. BoW is an
information retrieval technique originating in document
classification. It is used to represent a document in a vector
space by computing the number of term occurrences,
discarding the document’s structure. A term is usually a word
or n-gram. The dimension of the representation space is
determined by the number of unique terms (called
vocabulary) in the set of compared documents.</p>
      <p>Having network flows and endpoint logs at disposal, the
bag is constructed from all values (terms) observed in one
feature in a given time period, i.e. all executable hashes
used by a device in one day (treated as a “document”.)
would be added to the bag. The vocabulary would then
be all the hashes used by the devices in the network in an
extended time window (e.g. day, week).</p>
      <p>Using only counts of occurrences does not work well
for many of the features as usually few values occur
significantly more often than others. Thus, all vectors may look
similar because of this frequent feature. In the case of
executable hashes, this could be the hash of Google Chrome,
as it is the most common browser. Therefore, tf-idf (term
frequency - inverse document frequency) [17] is used to
weight the vector by the amount of information each term
brings. If the term is very common, the idf value is small,
reducing the impact of the term in the resulting vector.</p>
      <p>The frequent features can have several orders of
magnitude more occurrences than the other features. In the
document classification problems, the most frequented words
(is, are, with, the, a, an etc.) can be removed from the
vocabulary. It is not the case in the network and endpoint
telemetry as the most frequented terms can change (e.g.
updating a program changes the executable’s file hash).
Or they can contain valuable information for entity
identification (e.g. the most frequented autonomous systems
(AS) contacted by Windows machines are maintained by
Microsoft, distinguishing them from Linux machines).</p>
      <p>According to our experiment, using tf-idf to re-weight
features is not enough. Therefore, we utilised time
window bag of words (tw-BoW) representation. In tw-BoW,
each feature is counted only once for each time window it
occurred in (e.g. for ports, no matter how many times in
a time window device accessed port 80, it counts as only
one occurrence). This creates a constraint on the maximal
value of each vector component (e.g. 288 for a
representation of one day split into 5-minute windows). Smoothing
the vector by this method enables less significant values
for a given feature to have a bigger impact on the final
vector. Otherwise, the most frequent features could
overshadow others even after tf-idf smoothing.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>This section covers experiments with different entity
representations. The main goal was to compare mappings
from raw feature space to latent vector space on the use
case of tracking entities in the network over time.</p>
      <p>The endpoint client IDs were used as labels for the
purpose of device tracking.</p>
      <p>Two possible mapping approaches based on the BoW
method explained in Section 3.1 are compared in the
experiments. The first approach is classic BoW
representation, where feature frequencies are computed from all
term occurrences (e.g. for ports, each access on
destination port 80 counts as one occurrence), tf-idf is later used
to re-weight each feature according to the frequencies
observed in the network.</p>
      <p>The second approach is the time window BoW
representation (again weighted by tf-idf), where each feature
counts only once for each time window it occurred in. For
this experiment, we used 5-minute time windows.
Therefore each vector component ranges from 0 to 288 (number
of 5-minute windows in 24 hours).
4.1</p>
      <sec id="sec-4-1">
        <title>Evaluation</title>
        <p>The representations were evaluated according to their
ability to track the device in time in the latent space. This
evaluation criterion is formalised by the requirements 1 and 2
in Section 3. To evaluate the quality of a mapping,
similarities between all devices appearing in one day, and all
devices appearing in the other day were computed. Total
of N M similarities were computed for every two days,
where N is the number of devices in the first day and M
is the number of devices in the second day. The M
similarities in each row i of the matrix were ranked according
to:
ranki = 1 + jf jjsim(e(xit1 ); e(x jt2 )) &gt; sim(e(xit1 ); e(xit2 )gj;
(3)
where sim is a similarity measure, e(xit1 ); e(x jt2 ) 2 L are
representations of different devices xi; x j in the latent space
L and times t1; t2 are two consecutive days.</p>
        <p>The following metrics were used to compare the quality
of the representations:
• Mean rank R in which each device representation
appeared in the second day:</p>
        <p>R =
åiM ranki</p>
        <p>M
;
(4)
where M is the number of devices in the second day
and ranki is the rank from Equation 3. The lower
mean rank, the better the representation. In the best
case the mean rank would be 1, allowing to precisely
track all devices over time.
• Percentage of precise hits A is defined as:</p>
        <p>åiM I[ranki = 1]
A = (5)</p>
        <p>M
where M is the number of devices in the second day,
I is the indicator function which is 1 if the rank is
equal to 1 and zero otherwise and ranki is the rank
from Equation 3.</p>
        <p>The value of A shows the portion of devices that could
be uniquely identified. If there are multiple devices
tied with the same highest similarity, it does not count
as precise hit, because the device cannot be identified
uniquely (cf. Requirement 2 in Section 3). For
example, when two devices access the same set of
autonomous systems during the day, they both correctly
appear at the first rank. However, they cannot be
differentiated based on ASN.
• Cumulative distribution function (CDF) of the
device appearing on rank N or lower for each device:
f (x) = P(ranki
x)
(6)
where the right-hand side represents the probability
of randomly selected device x having higher rank
than ranki. This metric is useful for comparison
between different algorithms.</p>
        <p>These metrics enable comparison of both BoW and time
window BoW approaches as well as comparing the
different features used to create them.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experimental Setup</title>
        <p>The experiments were performed on a week of real
telemetry (Jan 11 to Jan 18, 2021) from a corporate network. The
telemetry was collected both on the endpoint devices and
network proxies. Table 1 shows the numbers of devices
present in the network for each day in the week. The
difference between numbers of devices seen at the endpoint
and in the network is mainly caused by endpoint devices
that communicate only within the private network.
Therefore, the traffic was not observed on a proxy. A significant
drop in device number can be observed during the
weekend.</p>
        <p>We used three different features, listed in Table 2, to
test the viability of the BoW approach. Private destination
IPs are addresses falling into ranges defined in RFC1918
[18]. Private IP address was chosen because it is one of the
few that captures behaviour on the internal network. They
were collected on the endpoint as network proxies usually
do not handle internal traffic.</p>
        <p>The file hash is a string uniquely identifying a file. Files
that were created, opened or executed on the endpoint are
supplied to a hashing function to compute the hash.</p>
        <p>Autonomous system number comes from enriching
the network telemetry with information from a GeoIP</p>
      </sec>
      <sec id="sec-4-3">
        <title>Date</title>
        <p>Jan 11 (Mon)
Jan 12 (Tue)
Jan 13 (Wed)
Jan 14 (Thu)
Jan 15 (Fri)
Jan 16 (Sat)
Jan 17 (Sun)
Jan 18 (Mon)
database. It is expected that a single network in one
location will mostly communicate with several autonomous
systems. The dominant autonomous systems will be
similar for each device in the network. The experiment is
designed to test whether the remaining less frequent ASNs
can serve to distinguish devices.</p>
        <p>The representations were created for one feature at a
time using the BoW and time window BoW approach
covered in Section 3.1. One day (24 hours) period was
selected to create a representation, assuming that it
contains most of the regular behavioural routines of the device
and is small enough to detect the change in behaviour as
soon as possible. The dimensions of the latent spaces
(defined by the vocabulary size) changed between different
days. They were 3500 for dstIpPrivate, and 12000
for fileHash, and 850 for autonomusSystemNumber
changing slightly every day.</p>
        <p>Two representations (one for each day) were created
every two consecutive days to test the device tracking in
time. The vocabulary used for BoW mapping contains all
terms (observed feature values) that occurred during these
two days. A days representation was created for each
device by counting feature occurrences and re-weighting the
resulting vector by tf-idf. Cosine similarity between
representations from consecutive days was used to evaluate the
quality of representations. This process was repeated for
each day of the week.
4.3</p>
      </sec>
      <sec id="sec-4-4">
        <title>Results</title>
        <p>Figure 1 shows similarity distributions to self and the most
similar device using the fileHash feature. The plotted
histogram represents similarities of device representations
(a) fileHash: BoW
(a) All representations with the CDFs of rank for
probability of the device being within the top N ranks of
similarities in the next day.</p>
        <p>(b) fileHash: tw-BoW
from Monday, January 11, and Tuesday, January 12. For
other days the distributions are similar. The orange colour
depicts the similarity to self on the second day. Better
representation of devices can be seen from the number of
orange bars that are higher than the blue ones. Mean
similarities to self and the closest other device are listed in
Table 3. By looking at the results, it is clear that using
twBoW increases mean similarity to self in relation to mean
similarity to other devices. However, mean similarities to
the closest device are still higher, indicating that
individual features are not enough for accurate device tracking.
(b) CDF of rank for fileHash in different days of the
week. The quality of representations deteriorates
significantly over the weekend as the number of devices drops
significantly.</p>
        <p>Figure 1 visualises what that means for the fileHash
feature.</p>
        <p>To compare the ability to track devices in time,
cumulative distribution functions from Equation 6 are plotted in
Figure 2a. Bigger area under the CDF indicates better
representation. tw-BoW representations outperform the raw
BoW approach for all the tested features. File hash shows
the best results, with 75% of devices being in the top ten
ranks in the second-day representations. The autonomous
system number performs the worst of the three features.
This indicates that infrequently accessed autonomous
systems are not enough to differentiate between devices.
Private destination IP address slightly outperforms the ASN
feature. After inspecting the data, most private network
communications were to several addresses that belong to
load-balanced servers.</p>
        <p>Complete results averaged over the whole week are
listed in Table 4. Best values for each category are
highlighted in bold. In all three tested features, the time
window BoW representation has shown better results, and file
hash proved to be the best feature for device tracking.
where M = fdstI p; f ileHash; ASNg is a set enumerating
similarities for different features and xi, x j are the devices
compared.</p>
        <p>Only devices that were present in all three telemetries
were used in this experiment, which significantly reduces
the number of devices in the dataset. Figure 3 shows the
CDFs for ranks using common devices. Using the average
similarity shows an improvement, increasing the precise
hit ratio significantly. Concrete numbers can be found in
Table 5, together with results for individual features on the
reduced dataset.</p>
        <p>Lastly, to visualise the difference between all
approaches, Figure 4 shows confusion matrices for 50
randomly selected devices. Lighter tile colour means higher
similarity. From Figure 4a private destination IP address
does not seem to be a good feature for individual user
tracking. However, groups of devices behaving very
similarly might be harvested from the data. Looking at the raw
data has revealed that the similarity comes mainly from
few common IP addresses. These addresses are LDAP and
HTTP servers in the network. However, as these servers
use load balancing, the clusters are not stable in time as
devices communicate to different IP addresses. Figure 4c
differentiates most of the devices well, with several
exceptions (very light squares off the diagonal). The averaged
similarities shown in Figure 4d reduce the impact of these
exceptions, which corresponds with the results in Table 5.
Surprisingly, the time window BoW representation has
proven to be quite effective for device tracking while
using only three easily interpretable features from different
modalities. More features and modalities can be added for
further improvement.</p>
        <p>Even though file hashes alone show promising results,
they do not enable to confidently track the device over
time. Averaging the similarities from different feature
representations significantly increases the number of
precisely tracked devices. The downside is that both network
and endpoint telemetries have to be present. This can be
improved in the future by combining the representations
from different features even if one feature is missing.</p>
        <p>The results also indicate that the behaviour of most of
the network devices differs significantly between
work(a) dstIpPrivate
(b) ASN
(c) fileHash
(d) average
days and weekends. Creating separate representations for
weekdays and weekends could yield better results in
overall tracking.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we studied the problem of user/device
identification and tracking. Uniquely identifying and
tracking any network device is the first step to proper user
and entity behavioural analytics (UEBA). As a baseline,
we tested device representations based on Bag of Words
(BoW) built from multiple features from different data
sources. Our experiments on real-world data showed that
combining these features is beneficial for device
identification and tracking.</p>
      <p>Furthermore, we introduced the time-aware
normalisation of the BoW representations. It reduces the influence
of the most common values in each data source (network
servers, binaries, autonomous systems) and significantly
improves unique user/device identification accuracy.</p>
      <p>In the future, we would like to focus on learning
lowerdimensional representations that would still capture the
entity behaviour and would be adaptable in time.
[16] Nikolaos Passalis and Anastasios Tefas. Entropy optimized
feature-based bag-of-words representation for information
retrieval. IEEE Transactions on Knowledge and Data
Engineering, 28(7):1664–1677, 2016.
[17] Juan Ramos et al. Using tf-idf to determine word relevance
in document queries. In Proceedings of the first
instructional conference on machine learning, volume 242, pages
29–48. Citeseer, 2003.
[18] Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot,
and E. Lear. Address Allocation for Private Internets. RFC
1918, IETF, February 1996.
[19] Chih-Fong Tsai. Bag-of-words representation in image
annotation: A review. International Scholarly Research
Notices, 2012, 2012.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Károly</given-names>
            <surname>Boda</surname>
          </string-name>
          , Ádám Máté Földes, Gábor György Gulyás, and
          <string-name>
            <given-names>Sándor</given-names>
            <surname>Imre</surname>
          </string-name>
          .
          <article-title>User tracking on the web via crossbrowser fingerprinting</article-title>
          .
          <source>In Nordic conference on secure it systems</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>46</lpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Hung-Hsuan Chen</surname>
          </string-name>
          .
          <article-title>Behavior2vec: Generating distributed representations of users' behaviors on products for recommender systems</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data (TKDD)</source>
          ,
          <volume>12</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sajad</given-names>
            <surname>Darabi</surname>
          </string-name>
          , Mohammad Kachuee, Shayan Fazeli, and
          <string-name>
            <given-names>Majid</given-names>
            <surname>Sarrafzadeh</surname>
          </string-name>
          .
          <article-title>Taper: Time-aware patient ehr representation</article-title>
          .
          <source>IEEE journal of biomedical and health informatics</source>
          ,
          <volume>24</volume>
          (
          <issue>11</issue>
          ):
          <fpage>3268</fpage>
          -
          <lpage>3275</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Carlos</surname>
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Gomez-Uribe</surname>
            and
            <given-names>Neil</given-names>
          </string-name>
          <string-name>
            <surname>Hunt</surname>
          </string-name>
          .
          <article-title>The netflix recommender system: Algorithms, business value, and innovation</article-title>
          .
          <source>ACM Transactions on Management Information Systems (TMIS)</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          . node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>William</surname>
            <given-names>L Hamilton</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rex Ying</surname>
            , and
            <given-names>Jure</given-names>
          </string-name>
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          .
          <article-title>Inductive representation learning on large graphs</article-title>
          .
          <source>In Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , pages
          <fpage>1025</fpage>
          -
          <lpage>1035</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jaroslav</surname>
            <given-names>Hlavácˇ</given-names>
          </string-name>
          , Martin Kopp, and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Kohout</surname>
          </string-name>
          .
          <article-title>Cluster representatives selection in non-metric spaces for nearest prototype classification</article-title>
          .
          <source>arXiv preprint arXiv:2107.01345</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Monisha</given-names>
            <surname>Kanakaraj</surname>
          </string-name>
          and
          <article-title>Ram Mohana Reddy Guddeti. Nlp based sentiment analysis on twitter data using ensemble classifiers</article-title>
          .
          <source>In 2015 3Rd international conference on signal processing, communication and networking (ICSCN)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Srijan</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Xikun Zhang, and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <article-title>Predicting dynamic embedding trajectory in temporal interaction networks</article-title>
          .
          <source>In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          , pages
          <fpage>1269</fpage>
          -
          <lpage>1278</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Svetlana</surname>
            <given-names>Lazebnik</given-names>
          </string-name>
          , Cordelia Schmid, and
          <string-name>
            <given-names>Jean</given-names>
            <surname>Ponce</surname>
          </string-name>
          .
          <article-title>Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</article-title>
          .
          <source>In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>2169</fpage>
          -
          <lpage>2178</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Jing</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Aixin</given-names>
            <surname>Sun</surname>
          </string-name>
          , Jianglei Han, and
          <string-name>
            <given-names>Chenliang</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>A survey on deep learning for named entity recognition</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Teng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Tao Mei</article-title>
          , In-So Kweon, and
          <string-name>
            <surname>Xian-Sheng Hua</surname>
          </string-name>
          .
          <article-title>Contextual bag-of-words for visual categorization</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          ,
          <volume>21</volume>
          (
          <issue>4</issue>
          ):
          <fpage>381</fpage>
          -
          <lpage>392</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>David</given-names>
            <surname>Nadeau</surname>
          </string-name>
          and
          <string-name>
            <given-names>Satoshi</given-names>
            <surname>Sekine</surname>
          </string-name>
          .
          <article-title>A survey of named entity recognition and classification</article-title>
          .
          <source>Lingvisticae Investigationes</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Piotr</surname>
            <given-names>Panasiuk</given-names>
          </string-name>
          , Maciej Szymkowski, Marcin Da˛browski, and
          <string-name>
            <given-names>Khalid</given-names>
            <surname>Saeed</surname>
          </string-name>
          .
          <article-title>A multimodal biometric user identification system based on keystroke dynamics and mouse movements</article-title>
          .
          <source>In IFIP International Conference on Computer Information Systems and Industrial Management</source>
          , pages
          <fpage>672</fpage>
          -
          <lpage>681</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>