<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Human Activity Recognition with Deep Metric Learners</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kyle Martin[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anjana Wijekoon[] Wir</string-name>
          <email>n.wiratunga@rgu.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Robert Gordon University</institution>
          ,
          <addr-line>Aberdeen, Scotland</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Establishing a strong foundation for similarity-based return is a top priority in Case-Based Reasoning (CBR) systems. Deep Metric Learners (DMLs) are a group of neural network architectures which learn to optimise case representations for similarity-based return by training upon multiple cases simultaneously to incorporate relationship knowledge. This is particularly important in the Human Activity Recognition (HAR) domain, where understanding similarity between cases supports aspects such as personalisation and open-ended HAR. In this paper, we perform a short review of three DMLs and compare their performance across three HAR datasets. Our findings support research which indicates DMLs are valuable to improve similarity-based return and indicate that considering more cases simultaneously offers better performance.</p>
      </abstract>
      <kwd-group>
        <kwd>Human Activity Recognition</kwd>
        <kwd>Deep Metric Learning</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Metric Learning</kwd>
        <kwd>Matching Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Establishing a strong foundation for similarity-based return is a top priority in
CaseBased Reasoning (CBR) systems. Without a firm understanding of the similarity
between cases, CBR systems are poorly placed to offer solutions from previous
knowledge and become increasingly reliant on their adaptation component [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. With this in
mind, it is reasonable to claim that a strong similarity component can be central to the
success of a CBR system.
      </p>
      <p>
        Deep Metric Learners (DMLs) are a group of neural network architectures which
learn to optimise case representations for similarity-based return. This is achieved by
training on multiple cases simultaneously. Early examples of DMLs trained upon pairs
of input cases [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], while more recent learners use triplets [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or a representative from
each cluster [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to better incorporate contextual knowledge of the feature space. Due
to their lack of reliance on class knowledge, DML algorithms are traditionally applied
to matching problems with potentially very large numbers of classes, such as
signature verification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or face re-identification [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, recent research has
demonstrated that DMLs have great potential within the Human Activity Recognition (HAR)
domain. Feature representations learnt with Siamese Networks produced better
classification results in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Well established robustness in Matching Networks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] when
tackling few-shot or one-shot learning problems [
        <xref ref-type="bibr" rid="ref15 ref5">5,15</xref>
        ] was exploited in achieving
personalisation of HAR [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] efficiently. Also recent work highlighted strong performance
to support Open-ended HAR [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. It is challenging in the domain of HAR to collect
large datasets with sensory equipment, it is also desirable learn personal nuances of
human activities with the limited amount of data available. Learning feature
representations that are similarity driven holds significance as it enables incorporating personal
nuances with limited number of cases.
      </p>
      <p>Despite their growing impact on this domain, there remains a lack of literature
examining the quality of representations obtained by different DML architectures for a
HAR task. With this in mind, we are motivated to do a review of the performance of
three DMLs (Siamese Neural Network, Triplet Network and Matching Network) across
three HAR datasets (SelfBACK, PAMAP2 and MEx). The contributions of this paper
are therefore as follows: (1) we offer an introductory review of three DML architectures
and (2) present a comparison of their performance on three HAR datasets.</p>
      <p>The paper is structured in the following manner. In section 2 we provide an
introductory review of each of the DML architectures that we consider in this paper. In
section 3 we detail the experimental setup of our comparative evaluation and give details
of the three HAR tasks, while in section 4 we analyse and discuss the results. Finally in
section 5 we present some conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Deep Metric Learners</title>
      <p>In this section we explore the unique traits of several DML architectures; Siamese
Neural Networks, Triplet Networks and Matching Networks. Though individual
architectures possess distinct nuances, there are several themes which are consistent across
DMLs. With this in mind, let us introduce some general notation used throughout this
paper. Let X be a set of labelled cases, such that for x 2 X , the function y(x) returns
the class label, y, of case x. In the context of this paper, we will define matching cases as
those which have the same class while non-matching cases will have differing classes.
The embedding function ✓ is an appropriate parameterisation of any function used to
create the vectorised representation of a given x, while the function DW represents an
arbitrary metric function to measure the distance between two vector representations.</p>
      <sec id="sec-2-1">
        <title>2.1 Siamese Neural Networks</title>
        <p>Siamese Neural Networks (SNN) are deep metric learners which receive pairs of cases
as input. The SNN architecture consists of two identical embedding functions, enabling
the SNN to generate a multi-dimensional embedding for each member of a pair. Input
pairs are labelled as either matching or non-matching respectively. Correspondingly, the
objective of training is to minimise the distance between the generated embeddings for
matching pair members while maximising the distance between embeddings for
nonmatching pair members. Thus the overall goal of the network is the development of a
space optimised for similarity-based return.</p>
        <p>To achieve this goal, each training pair, p 2 P, consists of two cases from the
training set, p = (xˆ, x¨). Whether the pair is matching or non-matching is governed by
the relationship of the pivot case, xˆ, to the passive case, x¨. In the context of this work,
the pair’s relationship class is established by comparing class labels of its members
(1)
(2)
(3)
(e.g. y(xˆ) with y(x0)). For this we use function Y (p), which returns p’s relationship
class label, such that Y (p) = 0 to symbolise a matching pair when y(xˆ) = y(x¨), and
Y (p) = 1 to symbolise a non-matching pair when y(xˆ) 6= y(x¨).</p>
        <p>After input to a network, the generated embeddings for each member of a pair can
be compared using a distance metric, DW (✓ (xˆ), ✓ (x¨)). This distance metric plays a
key role in the unique contrastive loss used by SNN (as in Equation 3); it penalises
members of matching pairs until they occupy the exact same spot in the space (using
LG in Equation 1) and penalises members of non-matching pairs until they exist at least
a set margin distance of ↵ apart (using LI in Equation 2) to calculate error for model
update (see Figure 1). The error is then backpropagated over both embedding functions
to ensure they remain identical.</p>
        <p>LG = (1</p>
        <p>YA) · DW (✓ (xˆ), ✓ (x¨)) 2
LI = YA · (max(0, ↵
L = LG + LI</p>
        <p>DW (✓ (xˆ), ✓ (x¨))))2
Using both errors means the similarity metric can be directly learned by the network
through the comparison of the actual pair label YA (which, as above, is equal to 0
for matching and 1 for non-matching pairs respectively) and the distance between pair
members, while using the generated embeddings of pair members during distance
comparisons ensures iterative model refinement. It is also these learned embeddings which
act as the improved representation for similarity-based return after training is complete.</p>
        <p>
          Designed for matching tasks such as signature verification [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] or similar text
retrieval [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], SNNs generalise well to classification tasks when supported by a
similaritybased return component such as k-Nearest Neighbour (kNN). Research into SNNs
highlighted their capacity to support one-shot learning [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], an area of research where recent
innovations on deep metric learning such as Matching Networks still demonstrate
stateof-the-art results [
          <xref ref-type="bibr" rid="ref15 ref17">15,17</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Triplet Networks</title>
        <p>Triplet networks (TN) are DMLs which learn from three input cases simultaneously.
Together described as a triplet, these inputs are the anchor case (xa), a positive case
(x+) and a negative case (x ). The anchor case acts as a point of comparison, meaning
that the positive and negative cases are dictated by their relationship to the anchor (i.e.</p>
        <p>Fig. 2: Triplet Network Architecture
matching and not matching respectively). Similar to SNN, the objective during
training is to minimise the distance between an anchor and its associated positive case and
maximise the distance between an anchor and its associated negative case. However,
considering three cases at once ensures that update of weights is more focused. This
is because SNN are learning based on only one aspect at any given time (e.g. either
pair members are alike, or not), meaning that more pairs are required to build the full
picture. Considering three cases at once allows the triplet network to better understand
the context of the anchor case.</p>
        <p>A triplet network is comprised of three identical embedding functions, each of
which creates an embedding for one input (see Figure 2) before the error is calculated
using triplet loss:</p>
        <p>L = DW (✓ (xa), ✓ (x+))</p>
        <p>DW (✓ (xa), ✓ (x )) + ↵
(4)
Like contrastive loss (see Equation 3), triplet loss is a distance based function. The
formula will generate a loss value in situations where the distance between the anchor case
and the negative case, DW (✓ (xa), ✓ (x )), is less than the distance between the anchor
case and the positive case, DW (✓ (xa), ✓ (x+)). The network is therefore penalised
until matching cases are closer than non-matching cases. A minimum boundary between
non-matching cases is enforced by the margin ↵ .</p>
        <p>
          Unlike SNNs, TNs were designed to be supported by a similarity-based return
component [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and cannot perform classification tasks on their own. This means that
despite common use in matching problems such as facial recognition [
          <xref ref-type="bibr" rid="ref13 ref6">6,13</xref>
          ] or
imagebased search [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], TNs are very capable of establishing an effective basis for
similaritybased return on multi-class problems [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Though convergence of these networks can be
achieved through creation of random triplets, recent work has demonstrated that a
training strategy which optimises triplet creation can improve training efficiency [
          <xref ref-type="bibr" rid="ref13 ref16">13,16</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Matching Networks</title>
        <p>
          Matching Networks (MNs) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] are unique in that they an be used flexibly as either
a classifier or a DML. MNs learn to match a query case to members of a support set




⍺
(5)
(6)
(7)
        </p>
        <p>n; where k is the number of representatives per cluster and n is the
number of clusters in the dataset. The similarity between the query case and the
support set case pairs (x0, x0i) are calculated with a suitable similarity metric (DW (x0, x0i)),
and an attention mechanism in the form of weighted majority vote estimates the class
distribution (see Equation 6 and 7). This is enforced by the loss function categorical
cross-entropy, which quantifies the difference between the estimated and actual
distributions.</p>
        <p>✓ (x) = x0
a(x0, x0i) =</p>
        <p>eDW (x0,x0i)</p>
        <p>P|s| eDW (x0,x0i)
yˆ =
|S|</p>
        <p>X a(x0, x0i) ⇥ y</p>
        <p>
          MN was first applied in the domain of one-shot and few-shot learning with image
recognition [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Prototypical Networks by [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] is a adaptation of MN applied in the
same domain. Here the model creates a prototype (by averaging over similar elements in
the support set) for each class in the support set, then behaves as a one-shot learning MN
model that out performed original MN in few-shot learning. More recently, MN was
exploited successfully to achieve personalised HAR[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and Open-ended HAR [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
where they successfully utilise support set to enforce personal traits of human activities.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>In this section, we offer details of our evaluation of three deep metric learners - SNN,
TN and MN. We perform an empirical comparison of the representations gained from
each network architecture to determine their effectiveness within the HAR domain. For
both SNN and TN we use k-NN accuracy as a proxy for representation goodness, while
for MN we use classification accuracy from the model’s final parametric comparison
between query and support set to indicate the quality of representations. We evaluate
this by performing a one-tail t-test to establish statistical significance at a confidence
level of 95% on classification accuracy.
3.1</p>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>
          SelfBACK [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] features time series data 1 collected with 34 users each performing 9
different ambulatory and sedentary activities. Activities include jogging, lying, sitting,
standing, walking downstairs, walking upstairs, walking in slow, medium or fast pace.
Data was collected by mounting a tri-axial accelerometer on the right thigh and
righthand wrist of participants and data was recorded at a sampling rate of 100Hz for 3
minutes.
        </p>
        <p>MEx (Multi-modal Exercise Dataset) 2 dataset is a Physiotherapy exercise dataset that
contains data collected using a pressure mat, a depth camera and two accelerometers
placed on the wrist and the thigh. Data was recorded for 7 exercises with 30 users. Each
user performed one exercise for a maximum of 60 seconds. Seven exercises included
in the dataset are Knee rolling, Bridging and Pelvic tilt, The Clam, Extension in lying,
Prone Punches and Superman. The pressure mat, the depth camera and the
accelerometers data record frequencies are 75Hz, 15Hz and 100Hz respectively. In this work we
only work with the data from two accelerometers for the purpose of comparability with
other datasets.</p>
        <p>
          PAMAP2 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is a Physical Activity Monitoring dataset 3 which contains data from 3
IMUs located on wrist, chest and ankle. Data was recorded with 9 users approximately
at 9Hz for 18 activity classes by following a pre-defined protocol. Activities include
that are ambulatory, sedentary and activities of daily living. One user and 10 activities
were filtered out of this dataset due to insufficient data. The refined dataset contained 8
users and 8 activity classes.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Pre-processing</title>
        <p>The following pipeline was applied on all datasets as pre-processing and to form cases
by progressively converting a raw signal to a Discrete Cosine Transformation (DCT)
feature vector.</p>
        <p>1Public dataset available at https://github.com/rgu-selfback/Datasets
2Public dataset available at https://data.mendeley.com/datasets/p89fwbzmkd/1
3Public dataset available at http://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring
(a) Convolutional ✓</p>
        <p>(b) Dense ✓
1. Use a sliding window of 500 timestamps to segment the original raw sensor signal.</p>
        <p>(no overlap for SelfBACK and PAMAP2, 2 second overlap for MEx)
2. Extract 3-dimensional (x, y, z) raw accelerometer data from each sensor.
3. Apply DCT and extract most significant 60 features from each dimension.
4. Concatenate all DCT feature vectors from each dimension of all sensors to form the
final feature vector. Lengths of resulting feature vectors for SelfBACK, PAMAP2
and MEx are 360, 540 and 360 respectively.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Experimental Setup</title>
        <p>We define a nearest-neighbour classification task for all three HAR datasets -
SelfBACK, PAMAP2 and MEx. We use Leave-One-Person-Out (LOPO) validation with
each dataset and perform 34, 8 and 30 experiments for SelfBACK, PAMAP2 and MEx
respectively. We record the accuracy of each experiment and present mean value as the
performance metric.</p>
        <p>
          All feature embedding functions (Figure 4a and Figure 4b) used the ReLU
activation and were trained using the Adam optimiser [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We implemented an empirical
evaluation to identify the best performing hyper-parameters for all datasets (see
Table 1). These parameters are kept constant across all experiments, meaning that SNN,
TN and MN all used the same embedding function to allow fair comparison. To ensure
the comparability between DMLs, we also enforced random pair, triplet and subset
creation. Each training case was represented within two pairs (one matching and one
nonmatching), a single triplet and a single query for comparison to a subset respectively.
This was to mitigate the fact that TNs and MNs inherently consider more information
than SNNs. In future, it would be interested to do a co-ordinated examination of the
networks invoking pair, triplet or subset mining strategies.
General
TN/ SNN
MN
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Parameter
Batch size
Training epochs
Output length - feature embedding function
Number of neighbours for kNN
↵
Mini-batch size
Samples per class
Classes per set
Training cases
Each dataset was evaluated against the kNN algorithm which sets the baseline for DML
algorithms. Table 2 presents the comparative results we obtained with algorithms
detailed in the previous section. An asterisk indicates that the performance improvement
is statistically significant at 95% confidence interval compared to the kNN baseline and
we highlight the best performing algorithm with bold text.</p>
      <p>It is evident that the representations learned with DML algorithms are better for
similarity-based return than the raw representation. With a single exception, all
representations learned with DMLs outperform the baseline. In the SelfBACK dataset, all
DML algorithms significantly outperform kNN baseline with both MLP and CNN
feature embedding functions. MN algorithm achieves the best performance and the MLP
embedding function earns a 1.34% performance improvement over CNN embedding
function. Similarly, results on the MEx dataset show that all DML algorithms
significantly outperform the kNN baseline and the MN algorithm achieves the best
performance. Unlike the SelfBACK results, CNN feature embedding function edge out MLP
embedding function with a performance improvement of 1.30%. These results suggest
that considering more cases simultaneously is advantageous to the algorithm.
Furthermore, the choice of MLP or CNN to operate as an embedding function seems to be
problem specific.</p>
      <p>We observe a distinct difference on the PAMAP2 dataset where TN algorithm with
CNN feature embedding function outperforms all DML algorithms and baseline. In
addition, the MN algorithm performs the poorest and does not significantly outperform the
baseline. This dataset is comparatively smaller compared to other two datasets which
can be a contributing factor for the reduced performance with MN algorithm. It is
interesting that the TN algorithm significantly outperform the baseline even with limited
training data. We plan to explore this insight further in future work.</p>
      <p>Overall the results are indicative that deep metric learning methods can learn an
effective feature representation for similarity calculations. Most importantly, the
imDataset</p>
      <p>Network
SelfBACK
MEx
PAMAP2
kNN
SNN
TN
MN
kNN
SNN
TN
MN
kNN
SNN
TN
MN</p>
      <p>Raw
provements to performance were statistically significant in almost every circumstance.
This is an important insight for when using similarity based algorithms where we
previously relied on hand-crafted feature engineering.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper we review three DML algorithms to learn feature representations and
evaluate them against hand-crafted feature representations in a similarity based
classification task. We select HAR as the classification task as similarity based classification
holds special significance in improving performance by understanding personal
nuances. Our results show that the feature representations learnt with DML algorithms
significantly outperform hand-crafted feature representations in the selected domain.
These results highlight the potential of DML algorithms to create effective feature
representations efficiently, which is crucial in domains such as case-based reasoning. In
future we plan to extend this review to other domains and compare further aspects of
DMLs, such as the performance of different case mining approaches.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bromley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Signature verification using a 'siamese' time delay neural network</article-title>
          .
          <source>International Journal of Pattern Recognition and Artificial Intelligence</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <fpage>669</fpage>
          -
          <lpage>688</lpage>
          (
          <year>August 1993</year>
          ). https://doi.org/doi:10.1142/S0218001493000339
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ganesan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakraborti</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An empirical study of knowledge tradeoffs in case-based reasoning</article-title>
          .
          <source>In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18</source>
          . pp.
          <fpage>1817</fpage>
          -
          <lpage>1823</lpage>
          .
          <source>International Joint Conferences on Artificial Intelligence Organization</source>
          (7
          <year>2018</year>
          ). https://doi.org/10.24963/ijcai.
          <year>2018</year>
          /251, https: //doi.org/10.24963/ijcai.
          <year>2018</year>
          /251
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hoffer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ailon</surname>
          </string-name>
          , N.:
          <article-title>Deep metric learning using triplet network</article-title>
          . In: Feragen,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Pelillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Loog</surname>
          </string-name>
          , M. (eds.)
          <article-title>Similarity-Based Pattern Recognition</article-title>
          . pp.
          <fpage>84</fpage>
          -
          <lpage>92</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Koch</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Siamese neural networks for one-shot image recognition</article-title>
          .
          <source>In: Deep Learning Workshop. ICML '15 (July</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ying</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Rosenhahn</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Triplet-based deep similarity learning for person re-identification</article-title>
          .
          <source>In: Proceedings of the IEEE International Conference on Computer Vision</source>
          . pp.
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Scene classification via triplet networks</article-title>
          .
          <source>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <fpage>220</fpage>
          -
          <lpage>237</lpage>
          (
          <year>Jan 2018</year>
          ). https://doi.org/10.1109/JSTARS.
          <year>2017</year>
          .2761800
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiratunga</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clos</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A convolutional siamese network for developing similarity knowledge in the selfback dataset</article-title>
          . p.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Neculoiu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Versteegh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rotaru</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Learning text similarity with siamese recurrent networks</article-title>
          .
          <source>In: Rep4NLP@ACL</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Reiss</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stricker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Introducing a new benchmarked dataset for activity monitoring</article-title>
          .
          <source>In: Wearable Computers (ISWC)</source>
          ,
          <year>2012</year>
          16th International Symposium on. pp.
          <fpage>108</fpage>
          -
          <lpage>109</lpage>
          . IEEE (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiratunga</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Selfback - activity recognition for selfmanagement of low back pain</article-title>
          .
          <source>In: Research and Development in Intelligent Systems XXXIII</source>
          . pp.
          <fpage>281</fpage>
          -
          <lpage>294</lpage>
          . SGAI '
          <volume>16</volume>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>Nature</given-names>
          </string-name>
          , Cham,
          <source>Switzerland (December</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiratunga</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Personalised human activity recognition using matching networks</article-title>
          . In: Cox,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Begum</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <source>(eds.) Case-Based Reasoning Research and Development</source>
          . pp.
          <fpage>339</fpage>
          -
          <lpage>353</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schroff</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalenichenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philbin</surname>
          </string-name>
          , J.:
          <article-title>Facenet: A unified embedding for face recognition and clustering</article-title>
          .
          <source>In: Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>815</fpage>
          -
          <lpage>823</lpage>
          . CVPR '15, IEEE Computer Society, Washington, DC, USA (
          <year>June 2015</year>
          ). https://doi.org/doi:10.1109/cvpr.
          <year>2015</year>
          .7298682
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Snell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swersky</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
          </string-name>
          , R.:
          <article-title>Prototypical networks for few-shot learning</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>4077</fpage>
          -
          <lpage>4087</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blundell</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lillicrap</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>Matching networks for one shot learning</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>3630</fpage>
          -
          <lpage>3638</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenberg</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philbin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning fine-grained image similarity with deep ranking</article-title>
          .
          <source>In: Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>1386</fpage>
          -
          <lpage>1393</lpage>
          . CVPR '14, IEEE Computer Society, Washington, DC, USA (
          <year>June 2014</year>
          ). https://doi.org/doi:10.1109/cvpr.
          <year>2014</year>
          .180
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wijekoon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiratunga</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Zero-shot learning with matching networks for openended human activity recognition</article-title>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>