=Paper= {{Paper |id=Vol-2871/paper2 |storemode=property |title=Embedding-based Neural Network Models for Book Recommendation in University Libraries |pdfUrl=https://ceur-ws.org/Vol-2871/paper2.pdf |volume=Vol-2871 |authors=Jaeyoung Choi,Chaeeun Han,Heeyoon Yang,Yeonkyoung Hong,Seoyoung Jeon,Yongjun Zhu |dblpUrl=https://dblp.org/rec/conf/iconference/ChoiHYHJZ21 }} ==Embedding-based Neural Network Models for Book Recommendation in University Libraries== https://ceur-ws.org/Vol-2871/paper2.pdf
                                                                  1st Workshop on AI + Informetrics - AII2021




           Embedding-based Neural Network Models for
           Book Recommendation in University Libraries

       Jaeyoung Choi1? 0000-0002-4190-9005, Chaeeun Han1? 0000-0001-5275-7142,
                Heeyoon Yang1? 0000-0002-7914-0879, Yeonkyoung Hong1
      0000-0001-9486-777X, Seoyoung Jeon1 0000-0001-9115-2867, and Yongjun Zhu1
                                0000-0003-4787-5122??

           Department of Library and Information Science, Sungkyunkwan University, Seoul
                                        03063, South Korea
                 {cjengy,hancece,chri0220,hyk0829,cjfaktks98,yzhu}@skku.edu



               Abstract. Recommendation systems have been widely used in various
               commercial applications for predicting the rating a user may give to an
               item. To encourage students to read more books, personalized book rec-
               ommendation systems are of great interest in university libraries. Because
               university libraries do not ask students to rate books that they borrowed,
               book reviews and ratings are not available. Without book ratings, im-
               plementing personalized book recommendation systems in libraries is a
               challenging problem. In this study, we propose a library book recommen-
               dation system that uses embedding based neural network models. The
               system uses book metadata and user information as input features and
               deep learning models were used to create embeddings of the features.
               A multi-class classification model and a multi-label classification model
               were trained and soft voting was used to integrate the final outcomes. The
               performance of the models was evaluated by 72 university students and
               the multi-class classification model received 3.4 average points whereas
               the multi-label classification model scored 3.0 average points in the 5-
               Point Likert Scale.

               Keywords: recommendation system · book recommendation · deep neu-
               ral networks · university libraries


      1      Introduction
          University libraries keep many books to serve students with various reading
      interests. Although this broadened the range of book choices for students, it
      became even difficult for them to choose books that they may like to read. This
      also imposes burden to librarians for book recommendation and they became less
      confident about recommending books to students. Capturing the characteristics
      of the students is a key of book recommendation service and this requires us to
      implement personalized book recommendation systems that rely on big library
       ?
           equal contribution
      ??
           corresponding author



Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
2       J. Choi et al.

data and advanced machine learning methods. To our knowledge, many libraries
still rely on the manual labor of librarians to recommend books, which may be
less effective and efficient[2,7,8].
     Some university libraries have implemented personalized book recommenda-
tion systems that incorporate the book rating component. As an example, Seoul
National University implemented a system called ‘S-Curation’ that recommend
books to its users based on their borrowed records and keywords that they
previously used to search for books. The system requires students to rate the
recommended books based on their preference and the rating data are utilized
to recommend further books to them. Although asking students to rate each of
the recommended books is an effective way of improving system’s performance,
this requires significant amount of user involvement. This kind of system may
not be practical in most university libraries because it largely depends on users’
active involvement and may require a considerable time to collect enough user
feedbacks as to provide reliable service. Therefore, personalized book recommen-
dation systems without the rating component may be a more practical option
that is affordable and easy to be adopted by many university libraries.
     In this paper, we propose personalized book recommendation models that
only utilize book metadata and user data. By applying embedding-based deep
learning methods to the data, we aim to implement effective models that perform
well when rating data are not available. This paper is divided into the following
parts. In the Related Work, we discuss various recommendation systems followed
by Methods, where we propose embedding-based neural network models. In the
Results, we evaluate the proposed methods and lastly, we conclude and discuss
our future work.


2   Related Works

    Recommendation systems have been studied with various methods such as
collaborative filtering, hybrid filtering, machine learning, and deep learning. Col-
laborative filtering is a method that recommends items to users by collecting
their preferences through ratings[3,6]. Hybrid filtering is a combination of col-
laborative filtering and content-based filtering, which reflects both the char-
acteristics of the books and the users[4,12]. Similarity based machine learning
approaches[13] and embedding-based deep learning have been applied in recom-
mendation systems as well[9,10].
    Studies of recommendation systems for libraries have also used the abovemen-
tioned methods. Fu et al. proposed a book recommendation system through user-
based collaborative filtering methods. They concentrated on the concept that
university library users have diverse interests depending on what they learn dur-
ing the semester. They utilized users’ borrowed records, users’ information, and
book metadata, and suggested that local recommendations, which considers the
students’ department shows better performance than global recommendation[3].
Liu et al. employed SVD++, model-based collaborative filtering, to develop a
university library book recommendation system. As it is challenging for libraries
                          University Library Book Recommendation System           3

to achieve ratings from users, they used the book loan duration as the rating
value[6]. By utilizing hybrid filtering, Tian et al. designed a personalized book
recommendation system for university students based on the users’ borrowing
records, users’ information, and book metadata. They showed the hybrid filtering
method’s superiority among individual collaborative filtering and content-based
filtering[12]. Tsuji et al. applied SVM for book recommendations. Their mod-
els input utilized several similarity values made from borrowed records, book
titles, and categories of the books[13]. Rahutomo et al. proposed an embedding
model to produce book recommendations using content filtering. They gener-
ated embeddings from the books’ original attributes namely book title, author,
publisher, and demographic information of the students of Binus University and
trained these embeddings through deep learning[10]. Covington et al. suggested
an embedding model for the recommendation system of a large video corpus.
They conducted a two-stage neural network model using the user’s video search
history, demographical information and the video’s embeddings. This method
outperformed the previous matrix factorization approaches that were used at
Youtube[9].
     As previously mentioned, rating-based models require extensive user responses.
Although these user feedbacks play an important role in recommendation mod-
els, most university libraries do not have them. On the other hand, recommen-
dation models that do not depend on ratings are a better option that is easy to
adopt and thus practical. The key to this kind of models is the effective utilization
and representation of book and user data. Therefore, we aim to use embedding
methods for the representation of library data and propose embedding-based
recommendation models.



3     Data and Methods

3.1   Data

    We acquired the data of book borrowing history from the Sungkyunkwan
University (SKKU) library. The data are about all the undergraduate students’
book borrowing history from 2015 to 2019 that includes 34,335 students, 206,089
books, which cover the entire types of genre, and 662,402 loan records. The
following data preprocessing and feature selection were used. First, from the
downloaded book list, we removed items such as cds or dvds, the encyclopedia
and theses. Likewise, we excluded books that had been designated for specific
purpose and being recommended to all the students since these books do not
necessarily reflect individuals’ reading preferences. For the concern of informa-
tion privacy, all the students were anonymized and only their affiliation at the
college level and book borrowing history data were used. Book metadata such as
publish year, pages, genre, title, and book cover image were used. The following
Table 1 describes the features we utilized for analysis.
4        J. Choi et al.

                               Table 1. List of Features

                      Feature          Description
                      Book Title       book’s title
                      Book Genre       book’s genre
                      Book Cover       book’s cover image
                      Book Year        book’s publish date
                      Book Page        book’s number of pages
                      User Affiliation student’s associated college



3.2    Methods

    Book recommendation models were implemented in two different ways by
adopting different output values as the outcomes. The first model is a multi-
class prediction model that predicts the last book a user borrowed and thus
recommends one book to the user while the second model is a multi-label pre-
diction model that predicts multiple books to the user. In the first model, for
each user, we select 85 books (i.e., median value) and use the first 84 books as
input and predict the last (i.e., the 85th) book. For example, if a student bor-
rowed 100 books, the first 85 books would be selected and the reaming 15 books
would be disposed. The second model, on the other hand, uses a maximum of
84 books as input and predict the remaining book. For example, if a student
borrowed 100 books, the first 84 books would be used as input, and the remain-
ing 16 books would be used as outcomes. We implemented two deep learning
models that share the same model architecture but have different output layers.
The following Fig. 1 displays the proposed model architecture.




    Fig. 1. The Proposed Model Architecture for Multi-class and Multi-label Model
                          University Library Book Recommendation System          5

    The embeddings of the borrowed books and student’s information pass through
average-pooling layers and concatenated into a vector. This vector consists of the
entire information of the inputs. The embeddings of books include the publish
year, page, cover image, genre, and title of the books that were loaned and the
student’s school. The concatenated vector goes through the batch normalization
layers, dropout layers, and dense layers with relu or tanh activation in sequence.
In terms of the output layer, the multi-class classification’s output layer uses the
softmax activation while the multi-label classification’s output layer uses the sig-
moid activation. In turn, they predict a value and multiple values, respectively.
    Embeddings lie at the heart of the proposed methods. Embeddings allow
sparse data to increase the density of the data by projecting it into a vector
space with a relatively lower dimension, making it more efficient[14]. For this
reason, we implemented embeddings to compile the diverse types of data. Three
of the embedding layers were constructed using XLM-RoBERTa(Cross-Lingual
Robustly Optimized BERT pretraining approach). This includes the title of the
book, the genre of the book, and the school of the student. Particularly because
the titles of the books included Korean, English, and Chinese, we used XLM-
RoBERTa to formulate the embeddings, which is a transformer-based multilin-
gual masked language model that is pre-trained on a text in 100 languages that
include the abovementioned languages mentioned[1]. The fourth embedding layer
was constructed with the cover images of each book. We utilized EfficientNet to
construct the image embeddings. Through feature extraction by the pretrained
EfficientNet, which applies a compound scaling method that considers three
scaling methods in balance: depth scaling, width scaling, and resolution scaling,
we were able to create meaningful embedding values[11]. The last layer was a
combination of the numerical values. This layer contained three versions of the
books publish year and pages respectively, including the square, root, and orig-
inal value. Lastly, all of these embeddings were normalized by a MinMaxScaler
to coordinate the range of the data.


4     Results

4.1   Model Implementation

    As the multi-class prediction model’s output layer uses the softmax activa-
tion, each student is assigned with only one of the 206089 books. The multi-label
prediction model uses the sigmoid activation, where each student is assigned with
multiple books. These models were evaluated by accuracy. To improve models’
performance, we trained multiple models by fine-tuning hyperparameters such as
the learning rate, the momentum, and the optimizer. After fine-tuning the mod-
els, we constructed an ensemble model with soft voting. The following Table 2
shows the accuracy for the final models built through soft voting.
    As shown in Table 2, the multi-class prediction model performed better than
the multi-label prediction model. The former model’s training objective con-
centrated on one candidate value, whereas the latter model’s training focus is
6      J. Choi et al.

                        Table 2. Accuracy for the Final Models

                            multi-class model multi-label model
                   accuracy 0.998             0.546



distracted to several candidate values, making it more challenging to predict
outcomes.


4.2   User-Centered Evaluation

    We recruited 72 SKKU undergraduate students to evaluate the two models.
For equivalent comparison, each model selected five books with high classification
probabilities, resulting in a total of ten books recommendation to each student.
The students were asked to evaluate the performance of the models by rate each
of the ten recommended books using a 5-point Likert scale. Each model’s five
ratings given by a user were averaged and treated as the final score that the user
give to the model. The multi-class classification model got a higher satisfaction
rating than the multi-label classification model. The multi-class model scored
3.4 average points (sd = 1.21) out of 5 points while the latter, the multi-label
model, received 3.0 average points (sd = 1.46) out of 5 points. The distribution
of the scores is shown in Figure 2.



                 Fig. 2. Distribution of the User Evaluation Score




    Interviews were also conducted with the evaluation, where some students re-
sponded that they had high reliability with the models because they received rec-
ommendations of books they had borrowed from outside the school library. Other
students responded that the list of the recommended books exactly matched their
preference as it contained books they personally owned. Whilst most responses
are positive, limitations of the models were also pointed out. Some students men-
tioned that the recommendations reflect their fields of study well but are not
good at capturing their everyday reading patterns. This feedback is plausible as
                           University Library Book Recommendation System            7

we did not use personal information of individuals for privacy reasons. By mak-
ing use of more individual-level data, recommendations may be more tailored to
each individual.


5    Conclusions and Future work

    This study employs deep neural network techniques to propose book recom-
mendation models using book metadata and user data while the rating data
is not available. Two types of models: the multi-class classification model and
the multi-label classification model were proposed and evaluated by university
students. The multi-class classification model performed better than the multi-
label classification model (i.e., 3.4 vs. 3.0 in the 5-Point Likert Scale). While most
recommendation systems rely on users’ ratings on items, the proposed models
do not rely on users’ book ratings given that ratings are generally not avail-
able in the university libraries. Therefore, the proposed model may have higher
feasibility in the real-world setting.
    For the future work, we plan to improve the models using more borrowing
records from an extended period of time. In addition, we plan to add a filtering
option for students to narrow down the recommendations based on their specific
information needs and cluster books based on their similarities. We believe these
approaches would increase user’s experience and satisfaction.


References
1. Alexis C., Kartikay K., Naman G., Vishrav C., Guillaume W., Francisco G., Edouard
   G., Myle O., Luke Z., Veselin S.: Unsupervised Cross-lingual Representation Learn-
   ing at Scale. In: Proceedings of the 58th Annual Meeting of the Association for
   Computational Linguistics, pp. 8440–8451. Association for Computational Linguis-
   tics, online. (2020)
2. Carnegie Library of Pittsburgh Homepage, https://www.carnegielibrary.org/
   get-book-recommendations-at-home/. Last accessed 28, Jan 2021
3. Fu, S., Zhang, Y.: On the Recommender System for University Library. International
   Association for Development of the Information Society (2019)
4. Ghadling, S., Belavadi, K., Bhegade, S., Ghojage, P., Kamble, S.: Digital library:
   using hybrid book recommendation engine. International Journal of Engineering
   and Computer Science 4(11), 01–02 (2015)
5. Library of Seoul Nation University Homepage, https://library.snu.ac.kr/notice/
   view/2754823. Last accessed 28, Jan 2021
6. Liu, G., Zhao, X.: Recommender System for Books in University Library with
   Implicit Data. In: Proceedings of the 2018 International Conference on Network,
   Communication, Computer Engineering (NCCE 2018), pp. 164–168. Atlantis Press
   (2018) https://doi.org/10.2991/ncce-18.2018.28
7. Memorial Hall Library Homepage, https://mhl.org/advice. Last accessed 28, Jan
   2021
8. New York Public Library Homepage, https://www.nypl.org/shelf-help. Last ac-
   cessed 28, Jan 2021
8       J. Choi et al.

9. Paul C., Jay A., and Emre S.: Deep Neural Networks for YouTube Recommenda-
   tions. In: Proceedings of the 10th ACM Conference on Recommender Systems, New
   York, NY (2016)
10. Rahutomo, R., Haryono, S., Perbangsa, A., Pardamean, B.: Em-
   bedding Model Design for Producing Book Recommendation (2019).
   https://doi.org/10.1109/ICIMTech.2019.8843769
11. Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural
   Networks. In: Proceedings of the 36th International Conference on Machine Learn-
   ing, pp. 6105–6114. (2019)
12. Tian, Y., Zheng, B., Wang, Y., Zhang, Y., Wu, Q.: College library personalized rec-
   ommendation system based on hybrid recommendation algorithm. Procedia CIRP
   83, 490–494 (2019) https://doi.org/10.1016/j.procir.2019.04.126
13. Tsuji, K., Takizawa, N., Sato, S., Ikeuchi, U., Ikeuchi, A., Yoshikane, F., It-
   sumura, H.: Book recommendation based on library loan records and biblio-
   graphic information. Procedia-Social and Behavioral Sciences 147, 478–486 (2014)
   https://doi.org/10.1016/j.sbspro.2014.07.142
14. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent Trends in Deep Learning
   Based Natural Language Processing. In: IEEE Computational Intelligence Maga-
   zine, vol. 13, pp. 55–75. (2018)