=Paper= {{Paper |id=Vol-2771/paper39 |storemode=property |title= An Investigation of Transfer Learning for a Lifelog Dataset |pdfUrl=https://ceur-ws.org/Vol-2771/AICS2020_paper_39.pdf |volume=Vol-2771 |authors=Akanksha Rajpute,Tejal Nijai,Graham Healy |dblpUrl=https://dblp.org/rec/conf/aics/RajputeNH20 }} == An Investigation of Transfer Learning for a Lifelog Dataset== https://ceur-ws.org/Vol-2771/AICS2020_paper_39.pdf
                                       An Investigation of Transfer Learning for a
                                                     Lifelog Dataset

                                              Akanksha Rajpute? , Tejal Nijai?? , and Graham Healy

                                                School of Computing, Dublin City University, Ireland
                                      {akanksha.rajpute4,tejal.nijai2}@mail.dcu.ie, graham.healy@dcu.ie



                                        Abstract. Achieving high image classification performance is often dif-
                                        ficult when little training data is available, particularly when using deep
                                        learning approaches. Lifelog image classification is an example of this,
                                        where there can often be insufficient data available to directly train a
                                        deep learning model in an end-to-end manner. Transfer learning has
                                        been proposed as a potential solution here, that is using the existing
                                        knowledge in a pre-trained deep learning model (for another image clas-
                                        sification task) as a starting point for a new image classification task.
                                        Our image classification problem in this paper is about classifying daily
                                        activities from lifelog images. We evaluate two different types of trans-
                                        fer learning approaches to improve training time and performance in the
                                        presence of limited training data. In this paper, we outline a comparative
                                        study of two different transfer learning approaches: a) the application of
                                        traditional classifiers using features extracted by a pre-trained model and
                                        b) the fine-tuning of a pre-trained model. We benchmark these two dif-
                                        ferent approaches for transfer learning using metrics for accuracy, recall,
                                        precision, and f1-Score to identify which approach performs best. For
                                        the LSC2018 dataset used in this study, we find that using a pre-trained
                                        VGG19 model as a feature extractor in combination with XGBoost to
                                        give the best performance in terms of accuracy.

                                        Keywords: Transfer Learning · Lifelog · Feature Extraction · Fine-
                                        tuning· Pre-trained Models.


                                1     Introduction
                                Lifelogging represents a phenomenon whereby people can digitally record their
                                own daily lives in varying amounts of detail, for a variety of purposes [10]. This
                                could be for example to allow an individual to analyze their lifestyle or to keep a
                                record of life experiences for memories [10]. Most lifelogging research has focused
                                on visual lifelogging in order to capture life details (examples shown in Figure 1).
                                Considering the high rate of image capture of lifelog wearable devices, over a mil-
                                lion images could be captured in a year by a single user (assuming that a lifelog
                                image is captured every 20 seconds over a 16-hour day). As the sheer volume of
                                ?
                                    Equal contribution
                               ??
                                    Equal contribution




Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2      Akanksha Rajpute, Tejal Nijai, and Graham Healy

images generated makes it infeasible for a user to manually label these images,
automated methods are needed to extract rich metadata in the form of image
concepts/labels in order to support processes like indexing, search and summa-
rization [8]. These are vital components that a software system that uses lifelog
data needs to have to support many different types of lifelog applications [10].
Many highly accurate image classification and concept detection models already
exist that rely on deep learning approaches (e.g. [4, 5]), and the outputs of these
pre-trained solutions can partly align with the image concepts lifelog software
systems require. An ostensibly sensible solution here is to train new lifelog image
concept detectors in the same way as these existing solutions, however, very often
sufficient quantities of labelled training data are not available. This is particu-
larly evident when considering the millions of labelled images that are commonly
used to train popular image classification models [18]. One solution here is to
use transfer learning, a process whereby the "knowledge" in a pre-trained model
(trained on millions of labelled samples) for one task can be used as the basis
for learning to solve a new problem where relatively little labelled training data
may be available. Transfer learning is a prevalent technique in the deep learn-
ing community because it can both A) mitigate issues in training deep neural
networks where relatively little data is available by infusing a base model with
task-independent knowledge of visual primitives and concepts, and through this
B) avoid long training times [2]. This is very efficient because most real-world
problems typically do not have millions of labelled data samples. In this paper,
we explore how transfer learning approaches can be used to overcome inherent
limitations in a lifelog dataset that has relatively few labelled samples. There
are various pre-trained models available which have performed well on the Im-
ageNet Large Scale Visual Recognition Challenge (ILSVRC) [18], and thus are
used in our work. For our investigation, we use the LSC2018 data [1], which is a
dataset containing various pre-labelled lifelog images captured by an individual
over a one month period. We compare two approaches of transfer learning using
this labelled lifelog dataset. In the first approach, we use traditional machine
learning approaches on features extracted using pre-trained deep convolutional
network models. In the second approach we use the existing pre-trained models,
and fine-tune the weights of the last layers of the model for our lifelog image
classification problem. We seek to discover which existing pre-trained models
(VGG16 [5] vs VGG19 [4]) provide the best accuracy for our custom classifica-
tion problem defined on the lifelog dataset.
    The structure of the paper is as follows: the related works are presented in
Section II. Section III deals with the proposed methodology followed by the
experimental results; Section IV describes results followed by discussion and
conclusion in Section IV and Section VI, respectively.


2   Related Work

The use of CNNs (Convolutional Neural Networks) for image classification prob-
lems has become commonplace, as it gives excellent results compared to non-
                  An Investigation of Transfer Learning for a Lifelog Dataset     3

CNN approaches in many applications domains [19]. We describe the current
work in this area for different image classification problems, in order to high-
light the most relevant techniques and methods available for improving activity
classification of lifelog images. Previous work [16] has explored learning and
transferring mid-level image representations using CNNs, in order to overcome
problems related to limited labelled training data. In this work, image repre-
sentations were learned using CNNs on a large-scale annotated dataset, where
transfer learning was then used to accomplish other downstream visual recogni-
tion tasks. In the paper, the PASCAL VOC 2005 and 2012 datasets were used
for an object class recognition problem using a transfer learning approach from a
CNN model trained on ImageNet. By reusing the existing network and adapting
new layers, classification and object recognition results were improved on this
new task.
    Other work [13] has explored classifying images of tread patterns to help
in providing useful knowledge for investigating criminal cases and coping with
traffic accidents. The authors demonstrate using a CNN model (Alexnet [12])
as a feature extractor, and show increased efficiency through the application
of transfer learning. In this work, features from single layers, including convolu-
tional layers Conv3, Conv4 and fully connected layers fc6 and fc7 were compared
in terms of performance as inputs for traditional machine learning methods to
learn from. As the feature dimensionality for layers was large, Principle Com-
ponent Analysis (PCA) was applied for dimensionality reduction. Additionally,
dimensionality reduction was used in this way to eliminate interference compo-
nents due to the noise. In their results they show that using PCA gives better
accuracy than not using PCA. On these extracted features, a SVM Classifier
was used to compare the performance with existing algorithms for tread pattern
classification.
    In [7], transfer learning was used to classify fine-grained images. Fine-grained
image classification was studied where instances of different classes share some
common parts but have variation in shape and appearance. Classification of
fine-grained images is a challenging task. For the experiment, the Stanford Dog
dataset [11] was used to classify 120 breeds of dogs. A higher accuracy for recog-
nition on the dog breed recognition problem was demonstrated using fine-tuning
of VGG16 and InceptionV3 pre-trained models. The experiment was performed
using four different approaches, namely: A) a simple sequential network, B) a
simple sequential network with global average pooling, C) transfer learning with
the Adam optimizer of the classifier, and D) transfer learning with Adamax
training of the classifier. Transfer learning with Adamax training of classifier
outperformed with an accuracy of 93% compared to the other approaches. This
conveys the power of using fine-tuning on pre-trained models to classify hard-
to-distinguish images.
   There are many pre-trained CNN models available that have been shown to
perform well on image classification tasks. To decide on base models to use in
our task, we refer to the following papers to models that have outperformed
in different image classification problems. In [5], they examine using transfer
4       Akanksha Rajpute, Tejal Nijai, and Graham Healy

learning on pre-trained models (Alexnet, VGG16, DenseNet201, GoogleLeNet,
and ResNet) for flower classification. In this study, the CNN models were trained
on the ImageNet dataset, and then were fine tuned on the flower’s dataset [14].
The dataset consists of five classes, including chamomile, tulip, rose, sunflower
and dandelion. The model design was discovered by trial and error, and by
using adaptive methods. In this comparison, VGG16 performed well with 93.52%
accuracy compared to Alexnet, DenseNet201, Google Net and ResNet.
    The authors in [4] propose a new approach for highly realistic computer-
generated images detection by exploring inconsistencies in the region of the eyes.
Such inconsistencies are captured by exploring the power of features extracted
via the transfer learning approach with the VGG19 model. The VGG19 architec-
ture was used for feature extraction, and on these features, a traditional machine
learning classifier approach (SVM) was applied with an accuracy of 80%. [15]
performed a comparative analysis on the fine-tuning of two pre-trained mod-
els (InceptionV3 and Xception) focusing on diabetic retinopathy screening. The
authors describe approaches to fine-tuning pre-trained networks by studying dif-
ferent tuning parameters and their effect on the overall system performance due
with respect to the application of diabetic retinopathy screening.
    Another piece of work [3] which uses the InceptionV3 model performance
for extreme learning machine (ELM) and fully connected layer was compared
on overall training time and testing accuracy when applying transfer learning
on pre-trained models such as VGG16, ResNet50, and InceptionV3. In this,
they used the CIFAR10 and Fruit-360 datasets for their experiments. Other
authors [17] have focused on the effect of well-known optimizers (Adam, SGD,
and RMSProp) on CNN models namely, ResNet50 and InceptionV3. These op-
timizers were used to fine-tune the CNN models for 15 Epochs on a cat vs dog
dataset generated by handpicking hundreds of images of cats and dogs from the
Kaggle cat vs dog dataset. A lower learning rate (i.e. 0.001) was used in con-
junction with categorical cross entropy. The experiment showed that the SGD
optimizer outperformed the other two for ResNet50. An accuracy of nearly 97%
was observed for ResNet50 with 500 training and 100 validation images.

   From this related work, we identified VGG16 and VGG19 as suitable network
architectures to explore transfer learning for classification of lifelog images for our
experiments, since many previous studies have used these models with positive
outcomes for conceptually similar types of experiments.


3     Methodology

3.1   Dataset Description

For our study, we use the LSC2018 dataset [1], which consists of 27 days (from 15
Aug 2016 to 10 Sep 2016) of data of an active lifelogger. The dataset is based on
the NTCIR-14 Lifelog dataset [9]. The dataset includes 41,692 images captured
by a wearable camera, multimedia files, and related metadata files (containing
                  An Investigation of Transfer Learning for a Lifelog Dataset     5

image labels) on a per-minute basis( 3 images per minute). For this research, we
used the lifelog images and the corresponding metadata CSV file that contains
labels for each of the images associated with the activities of the lifelogger. Some
lifelog camera samples from the dataset are shown in Figure 1.




Fig. 1. Samples images from the LSC2018 Lifelog Image Dataset for labels: "at home"
(leftmost), "using mobile" (center) and "walking" (rightmost).




3.2   Dataset Preprocessing

After filtering the images using the metadata CSV file (that contains the image
labels), we found that 24,162 images of these images were annotated across
209 categories. We found that many of the categories were nearly similar e.g.
drinking water, drinking coffee, drinking tea and drinking beer were all categories.
In order to ensure there was sufficient labelled data to support our analysis, we
combined labels for such similar and overlapping categories to make them into
single categories based upon the dominant activity e.g. eating, drinking, walking.
This first pass of merging categories resulted in 25 categories1 shown in Table 1.
    Issues were identified with some categories of images, for example some were
blurred, some had very few sample images (e.g. less than 20), and some images
semantically belonged to more than one category. For example, the center image
in Figure 1 is labelled as "Using Mobile" in the dataset, but it has both a mobile
and a laptop in the image.
    In order to mitigate these issues, we sorted each category by the number of
labelled images available (in descending order), and kept the top-10 categories
for our analysis2 . A breakdown of the number of labelled samples available per
category is shown in Table 2.
1
  For example, categories such as "eating apple", "eating food", "eating curd", and
  "eating strawberry" were merged to one category to "eating". "In bedroom", "In
  living room" and "In a bedroom" were merged as "At home". Similarly, "Car",
  "Commuting to work in car", "Travelling in car" and "Travelling back from work in
  car" were merged as "Commuting in car".
2
  When 25 categories were used, the measured performance was poor as would be
  expected given many categories had insufficient images to be able to train with.
6        Akanksha Rajpute, Tejal Nijai, and Graham Healy

                          Table 1. List of 25 categories

Attending conference,            Brushing      Casual        Cleaning and
                      At home
presentation                     teeth         conversation washing
Commuting in car      Drinking Eating          Gardening     In airplane
Organising things     Other      Printing      Reading paper Relaxing
Retail shopping                  Using desktop
                      Travelling               Using laptop Using mobile
and purchasing                   computer
Using tablet          Walking Watching TV Work meeting Writing

Table 2. Ranked number of labelled images per category (across 10 cate-
gories)

                   Category                Number of Samples
                   Using laptop           4440
                   Relaxing               3055
                   Walking                2467
                   Commuting in car       2181
                   Using desktop computer 1934
                   Work meeting           1511
                   Casual Conversation    1180
                   In airplane            1028
                   Watching TV            882
                   Eating                 849




3.3    Dataset Analysis and Approaches
The images dataset (after filtering) we used for experiments contained 19,527
images. Table 2 shows the breakdown of the distribution of images across the
10 categories. A stratified 60%, 20% and 20% split for training, validation and
test set was used for all of our experiments. Analysis was carried out using deep
learning virtual machine instances in the Google Cloud Platform along with us-
ing a local machine with a 4GB Nvidia 940MX graphic card. We investigated
2 approaches to perform transfer learning as we wanted to compare the perfor-
mance of each to identify which approach performed best for the classification
of lifelog activities. For this, we use a pre-trained VGG19 model3 .
    We followed similar feature extraction and fine-tuning methods as described
in the papers in our related work section (detailed below). Moreover, instead of
considering only accuracy as an evaluation metric, we have considered precision,
recall, and F1-score also because accuracy is not always an informative metric
when there are imbalances in a labelled dataset i.e. if any label in the dataset is
disproportionately more frequent then a classifier could learn to blindly predict
this label and still have a high accuracy. Due to this reason, other metrics like
precision, recall, and F1 score are calculated and considered. The precision mea-
sures give insight about the false positive rate. The recall indicates how many
3
    The VGG19 model is available from https://keras.io/api/applications/
                 An Investigation of Transfer Learning for a Lifelog Dataset     7

true positives (correctly predicted labels) are found from all the positive sam-
ples in the test set. F1-score is the harmonic mean of precision and recall, and
a useful metric for imbalanced dataset evaluation. In our analysis, we concluded
to three highly performing classifiers in feature extraction approach. In addition
to that, we have also adopted three different ways of fine-tuning the transfer
learning process by only training certain layers and freezing the others.

Approach 1: Using CNNs as Feature Extractors
For Approach 1 we used a pre-trained VGG19 model as a feature extractor
in combination with traditional machine learning classifiers, as this is one of
the most common ways to do transfer learning [6]. In this process, features
are extracted from images using pre-trained models, where the activations at
particular layers of the model are used as feature vectors for other machine
learning methods (e.g. SVM). In our work, features were extracted on a layer-
by-layer basis (starting from the penultimate layer to the output layer) from
CNN models in order to examine the efficiency of features (activations) at that
layer.
    Using the extracted image-related features on layers of the pre-trained CNN,
we benchmarked several machine learning classifiers i.e. SVM, Random Forest,
Naive Bayes, XGBoost, and Bayesian ridge regression. We used principal com-
ponent analysis (PCA) to reduce the dimensionality of the extracted features,
selecting the number of components that would retain 95% of the variance of the
original data. We used PCA because features extracted at fully-connected layers
were prohibitively large to be sensibly used with our classifiers. For VGG16 and
VGG19 models, the last three layers define the classifier layer i.e fully connected
(fc1), fully connected (fc2) and predictions (dense) layer. We also explored hyper-
parameter tuning of the classifiers to improve the performance. For hyperparam-
eter tuning, we used a grid search cross-validation method to find the optimal
parameters for classifiers (using the validation set). For example, in the case of
SVM, we explored three hyper-parameters i.e values of C, gamma and kernel
type. For random forests, we explored number of estimators, max features,max
depth and criterion to maximize the performance. We explored hyperparameters
such as max depth, number of estimators and learning rate with the XGBoost
classifier. The detailed results for the classifiers’ outcomes for VGG19 are in
Table 3.

Approach 2: Fine Tuning
For Approach 2, we used fine-tuning (with a pre-trained VGG19 model) as a
transfer learning technique. In this approach, we considered three different ways
to do fine-tuning. For Method 1, we replaced the last layer of the model and
only trained this layer freezing all others i.e. predictions (dense) with a dense
softmax layer as output. For Method 2, we froze all the layers of the models
(VGG16 and VGG19) except the last eight layers and trained their weights.
We selected the last 8 layers as trainable because we wanted not only classifier
layers to be updated in training but also the convolutional layers. For Method 3,
8      Akanksha Rajpute, Tejal Nijai, and Graham Healy

Table 3. Approach 1 - Feature Extraction Score Table (in percent) of VGG19 for
10 categories


                                     VGG19
Layer Name                                                  Bayesian
                                       Random         Naive
(where features were extracted) SVM           XGBoost       Ridge
                                       Forest         Bayes
and respective evalution metrics                            Regression
                                 Accuracy
Predictions(Dense)               82.27 81.80  85.83   52.15 71.40
Fully Connected 2(fc2)           86.67 83.24  88.60   66.91 82.03
Fully Connected 1(fc1)           87.08 83.55  89.60   71.48 76.78
Flatten                          79.18 74.20  80.03   72.30 68.47
                                 Precision
Predictions(Dense)               82.82 76.64  85.36   52.78 69.91
Fully Connected 2(fc2)           87.34 85.13  88.76   67.14 77.11
Fully Connected 1(fc1)           88.48 85.71  89.90   70.59 84.04
Flatten                          72.13 71.77  75.55   44.84 57.71
                                  Recall
Predictions(Dense)               78.82 76.20  83.75   60.50 68.07
Fully Connected 2(fc2)           84.85 80.48  87.38   71.08 75.22
Fully Connected 1(fc1)           85.03 80.18  88.28   73.26 75.02
Flatten                          64.09 65.14  66.19   44.66 45.33
                                 F1 Score
Predictions(Dense)               80.79 76.43  84.56   56.37 68.98
Fully Connected 2(fc2)           86.10 82.72  88.07   69.05 76.15
Fully Connected 1(fc1)           86.78 82.83  89.09   71.90 79.27
Flatten                          67.86 68.31  70.51   44.74 50.77




we replaced the top layers of the model (i.e. the dense and classifier layers) and
replaced them with a dense fully connected layer (with ReLu activation function)
and a softmax output layer. We used a SGD optimizer (no minibatching) with
a learning rate of 0.0001. For all methods we used a sparse_categorical_cross-
entropy loss function. To get a good fit for each model, each model was trained
over a number of epochs until a good fit model was achieved as shown in Figure
2.
    As per the learning curves shown in Figure 2, it can be seen that each model
began to show signs of overfitting at a different number of epochs, indicating
that it was possible to overtrain and in some cases this was detrimental to per-
formance. For Method 1 (replacing the last layer) this occurred at approximately
17 epochs where accuracy on the validation set began to decrease while training
set accuracy continued to increase. For Method 2, we can see the model learns
for the initial 2-3 epochs after which performance on the validation set stabilises
while the training set accuracy continues to increase. For Method 3, we can see
the validation set accuracy only marginally improves after an initial learning
period up to 3 epochs.
                   An Investigation of Transfer Learning for a Lifelog Dataset      9

    Table 4 shows the Approach 2 results for VGG19 4 . We evaluated perfor-
mance on the test set using accuracy, precision, recall and f1-score. Here, the
importance of F1-score is very high because our dataset has an uneven class
distribution. F1-score gives us the harmonic mean of precision and recall. So, for
a model to be accurate F1-score should have a high value.


Table 4. Approach 2 - Fine-Tuning Score Table (in percent) for 10 categories of
VGG19

                                           VGG19

Methods                                          Accuracy Precision Recall F1-Score
M1: Replacing Last Prediction Layer              77.63    77.15     74.37   75.73
M2: Freezing layers except last 8 layers         86.68    85.86     85.28   85.54
M3: Replacing classifier layers with other layers 81.19   80.70     78.93   79.56




Fig. 2. Training Curves of VGG19 for Three methods: Method1(leftmost),
Method2(center), Method3(rightmost)


   Here, in Figure 2 we can see that the plot of method 1 where only last layer
was removed seems to be a good fit and the other methods are overfit or underfit.
Whereas method 2 performed better.


4     Results
Comparing model performance with different classifiers in approach 1, VGG19
and XGBoost emerged as a better combination with an average accuracy of all
classifier layers outcome as 86.015%, precision of 84.89% , recall of 81.40% and
f1-score 83.05%. In approach 2, VGG19 emerged as the better model too on all
4
    From our analysis, VGG16 performed marginally worse than VGG19 and for this
    reason we don’t include the results
10        Akanksha Rajpute, Tejal Nijai, and Graham Healy

three applied methods with an accuracy of 86.68%, precision of 85.86%, recall of
85.28% and f1-Score as 85.54% when we kept only last 8 layers trainable (method
2).




                      Fig. 3. Screenshots of Incorrect Predictions


   When we compare approach 1 and approach 2, we can see that approach 1
gave a higher accuracy (with XGBoost). Also, approach 1 -Feature Extraction
and fitting a classifier is the most common approach for transfer learning5 .


5      Discussion

In this supervised machine learning classification study, we found that labelling
of the images is very crucial. As a human being, we often identify one aspect of
the image while labelling, that might not be correct in machine’s and model’s
perspective i.e. label noise in training and evaluation. For example, there are
some images in the categories "Work meeting" and "Reading paper" that contain
both papers and a laptop(s) in the images. However, these images are different
from the perspective of a lifelogger but could create confusion during model
training and prediction. In effect, some images belong to multiple classes e.g.
in Figure 3 the person has a phone and is in an airplane so the image could
belong to "Using mobile" or "In airplane", but it was labelled as "Using mobile"
but predicted as "In airplane". This is an clear example of two overlapping
categories. Similarly, the second image is predicted as "Using desktop computer"
when it is actually an image of a home. Such mislabelled samples in the training,
validation and testing sets could be introducing (label-)noise into our training
and evaluation procedure, resulting in sub-optimal results. In future work, we
will examine how to mitigate such issues when dealing with real-world datasets.


6      Conclusion

In this paper, we conducted our research using the top-10 labelled categories
of the LSC2018 dataset i.e. had the most numbers of samples per category.
5
     In our experiments, VGG19 emerged as the better performing model for transfer
     learning in both the approaches.
                  An Investigation of Transfer Learning for a Lifelog Dataset         11

We explored two transfer learning techniques using a pre-trained VGG19 model,
namely feature extraction and fine-tuning. In turn we evaluated the performance
of these two different approaches using precision, recall, accuracy and f1-score.
For our first approach, we benchmarked a battery of classifiers using the VGG19
model (across a number of layers) as a feature extractor e.g. XGBOOST, SVM
and random forest. Here we find that using XGBoost using the fully connected
layer 1 features of the pre-trained VGG19 model produced the best results. In
our second approach (fine-tuning), we used three distinct approaches, where we
found only training the last 8 layers produced the best accuracy for this approach.
Although we find that fine tuning is an effective approach (with 86.68% accuracy
for method 2), the accuracies for all methods in this approach were surpassed by
simply using the pre-trained VGG19 model as a feature extractor in combination
with XGBoost model (89.6% accuracy).
    Future studies will assess the impact of dataset labelling and methods to
reduce labelling noise. Similarly, in future work we will examine the impact of
data availability to see if additional data might improve the fine-tuning approach.
Moreover, as this paper focuses on a single lifelog dataset, future work will need
explore the limitations of these approaches when applied to other lifelog datasets.
    We conclude that in instances where there is little labelled data that using a
pre-trained model as a feature extractor is a suitable approach to classify images
of daily activities present in a Lifelog.


References
 1. Lsc 2019 @ icmr 2019, http://lsc.dcu.ie/2019/data/index.html
 2. Alom, Z., Taha, T.M., Yakopcic, C., Westberg, S., Nasrin, S., Asari,
    V.K.: Comprehensive Survey on Deep Learning Approaches (2017),
    https://arxiv.org/pdf/1803.01164.pdf
 3. Alshalali, T., Josyula, D.: Fine-tuning of pre-trained deep learning models with
    extreme learning machine. Proceedings - 2018 International Conference on Com-
    putational Science and Computational Intelligence, CSCI 2018 pp. 469–473 (2018).
    https://doi.org/10.1109/CSCI46756.2018.00096
 4. Carvalho, T., De Rezende, E.R., Alves, M.T., Balieiro, F.K., Sovat, R.B.: Expos-
    ing computer generated images by eye’s region classification via transfer learn-
    ing of VGG19 CNN. Proceedings - 16th IEEE International Conference on Ma-
    chine Learning and Applications, ICMLA 2017 2017-Decem, 866–870 (2017).
    https://doi.org/10.1109/ICMLA.2017.00-47
 5. Cengil, E., Cinar, A.: Multiple classification of flower images using transfer learn-
    ing. 2019 International Conference on Artificial Intelligence and Data Processing
    Symposium, IDAP 2019 (2019). https://doi.org/10.1109/IDAP.2019.8875953
 6. Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and
    large-scale anomaly detection using a linear one-class svm with deep learning.
    Pattern Recognition 58, 121–134 (2016)
 7. Golodov, V.A., Dubrovina, M.S., Paziy, A.S.: Transfer Learning Ap-
    proach to Fine-Grained Image Classification. Proceedings - 2019 Interna-
    tional Russian Automation Conference, RusAutoCon 2019 pp. 1–5 (2019).
    https://doi.org/10.1109/RUSAUTOCON.2019.8867653
12      Akanksha Rajpute, Tejal Nijai, and Graham Healy

 8. Gurrin, C., Joho, H., Hopfgartner, F., Zhou, L., Albatal, R., Healy, G., Nguyen,
    D.T.D.: Experiments in lifelog organisation and retrieval at ntcir. In: Evaluating
    Information Retrieval and Access Tasks, pp. 187–203. Springer (2020)
 9. Gurrin, C., Joho, H., Hopfgartner, F., Zhou, L., Ninh, V.T., Le, T.K., Albatal,
    R., Dang-Nguyen, D.T., Healy, G.: Overview of the ntcir-14 lifelog-3 task. In:
    Proceedings of the 14th NTCIR conference. pp. 14–26. NII (2019)
10. Gurrin, C., Smeaton, A.F., Doherty, A.R.: LifeLogging: Personal big data.
    Foundations and Trends in Information Retrieval 8(1), 1–125 (2014).
    https://doi.org/10.1561/1500000033
11. Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained
    image categorization. Proc. IEEE Conf. Comput. Vision and Pattern Recognition
    (2011)
12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
    volutional neural networks. In: Advances in neural information processing systems.
    pp. 1097–1105 (2012)
13. Liu, Y., Zhang, S., Wang, F., Ling, N.: Tread Pattern Image Classification using
    Convolutional Neural Network Based on Transfer Learning. IEEE Workshop on
    Signal Processing Systems, SiPS: Design and Implementation 2018-October, 300–
    305 (2018). https://doi.org/10.1109/SiPS.2018.8598400
14. Mamaev,           A.:        Flowers           recognition        (Jun         2018),
    https://www.kaggle.com/alxmamaev/flowers-recognition
15. Mohammadian, S., Karsaz, A., Roshan, Y.M.: Comparative Study of Fine-Tuning
    of Pre-Trained Convolutional Neural Networks for Diabetic Retinopathy Screening.
    2017 24th Iranian Conference on Biomedical Engineering and 2017 2nd Interna-
    tional Iranian Conference on Biomedical Engineering, ICBME 2017 (December)
    (2018). https://doi.org/10.1109/ICBME.2017.8430269
16. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level im-
    age representations using convolutional neural networks. Proceedings of the IEEE
    Computer Society Conference on Computer Vision and Pattern Recognition pp.
    1717–1724 (2014). https://doi.org/10.1109/CVPR.2014.222
17. Poojary, R., Pai, A.: Comparative Study of Model Optimization Techniques
    in Fine-Tuned CNN Models. 2019 International Conference on Electrical and
    Computing Technologies and Applications, ICECTA 2019 pp. 1–4 (2019).
    https://doi.org/10.1109/ICECTA48151.2019.8959681
18. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
    Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large
    Scale Visual Recognition Challenge. International Journal of Computer Vision
    115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
19. Zhang, Q., Zhang, M., Chen, T., Sun, Z., Ma, Y., Yu, B.: Recent advances in
    convolutional neural network acceleration. Neurocomputing 323, 37–51 (2019).
    https://doi.org/10.1016/j.neucom.2018.09.038