Building a Sentiment Analysis Model for Libraries: The CSBNO
Consortium Approach
Anna Maria Tammaro1, Michele Tomaiuolo1, Monica Mordonini1, Mattia Pellegrino1 and
Riccardo Demicelis2
1
    University of Parma, Parma, Italy
2
    CSBNO Consortium, Milan, Italy

                  Abstract
                  The CSBNO Consortium investigated the libraries communities during the lockdown and at
                  their reopening, to learn about their wishes and expectations from the library. Sentiment
                  analysis could improve the analysis of data integrating the community's perception of the
                  library in services design. The framework and the methodology of the research are described
                  in the three foreseen phases: Selection and loading of training data, Text processing, Creating
                  a model. The research is in its initial phase and three characteristics will be analyzed:
                  Information access, Library space, Affect service. The findings will support CSBNO to
                  promote innovative libraries by actively engaging with participative communities.

                  Keywords 1
                  Sentiment analysis for libraries; User studies; Participatory approach

1. Introduction
    The CSBNO (Culture Socialità Biblioteche Network Operativo) Consortium manages 60 libraries
in the Milan area and coordinates the transformation of libraries and innovation of services to make
them supporting the changing needs of society. CSBNO collaborates with other innovative European
libraries gathered in the NEWCOMER1 project funded by ERASMUS 2 + which intends to promote
the vision of the innovative libraries improving the community. The NEWCOMER Project partners
intend to promote innovative libraries by actively engaging with users, in a participatory approach. A
Manifesto2 is shared by all Project NEWCOMER partners.

1.1.       How to get to know the library community?
   At the beginning of the Covid-19 pandemic in Italy, during the first lockdown from March to May
2020, the CSBNO tried to stay connected to libraries community, informing them that libraries, even if
closed, continue the service. The greatest difficulty for CSBNO has been to change the service model
from face-to-face services to remote services and understanding communities wishes and expectations
from the library. For data collection, more than 30.000 telephone calls were made by librarians from
the CSBNO Consortium to members classified as active and inactive members. Active members are
defined as users enrolled in CSBNO libraries starting from 31/12/2017 and still active in using the loan
service, selecting those aged between 25 and 65 years. Inactive members are defined as users enrolled
in CSBNO libraries starting from 31/12/2017 but no longer active in using the loan service.
   Two datasets were collected:
   1. telephone replies received from inactive library members;

IRCDL 2022: 18th Italian Research Conference on Digital Libraries, February 24–25, 2022, Padova, Italy
   annamaria.tammaro@unipr.it (A. Maria Tammaro); michele.tomaiuolo@unipr.it (M. Tomaiuolo);
monica.mordonini@unipr.it (M. Mordonini); mattia.pellegrino@unipr.it (M. Pellegrino); riccardo.demicelis@csbno.net (R. Demicelis)
   0000-0002-9205-2435 (A. Maria Tammaro); 0000-0002-6030-9435 (M. Tomaiuolo); 0000-0002-5916-9770 (M. Mordonini);
0000-0002-6592-7451 (M. Pellegrino)
               ©️ 2022 Copyright for this paper by its authors.
               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                 CEUR Workshop Proceedings (CEUR-WS.org)
1
  https://publiclibraries2030.eu/our-projects/newcomer/
2
   https://davidlankes.org/a-manifesto-for-global-librarianship/
   2. telephone replies received from active members.
   The first dataset concerns responses collected during the lockdown from inactive members, but had
been active in the past.
   The second dataset concerns responses collected during the lockdown of active members who use
the library loan.
   At the end of the lockdown in May 2020, the libraries of the CSBNO Consortium participated in a
national satisfaction survey called “Library for you” on the perception of the library by users. The aim
of the national survey was to analyze the satisfaction towards libraries upon reopening. The
questionnaire administered soon after the lockdown allowed respondents to answer qualitative
questions about the level of service and to leave open comments that provide additional data for
understanding community opinions. The answers of communities concerning the CSBNO libraries have
been extracted.
   The third dataset concerns the responses to the national Satisfaction survey from CSBNO libraries
community.

2. Aims and objectives
   The CSBNO Consortium intends to build a sentiment analysis model as a tool to explore community
expectations and wishes on which to build a participatory approach for service design. The aim of the
research is to establish a data mining model to perform sentiment analysis on qualitative comments
collected by libraries. The objective is to test a new analytical method to be used to understand the data
collected from community and their year-by-year comparison.
   The feedback mechanism most used by libraries in Italy is usually the survey collecting data with a
questionnaire, such as “The Library for You” survey. However this data collection has the drawback
that it is administered to only active users. To overcome this limitation, sentiment analysis, or opinion
mining, can use text datasets with data mining programs. As the name suggests, sentiment analysis
involves the analysis and identification of positive and negative opinions and emotions within a given
text [9]. By building such a model today, future library surveys done by the CSBNO Consortium can
be analyzed quickly and effectively to provide an accurate assessment of users' overall perception of
specific areas of the library.

2.1.    Sentiment analysis for libraries
    Sentiment analysis for libraries has never been studied in Italy. The international library community
has used sentiment analysis in three ways: using social media, using free answer text of questionnaires,
and using other corpora.
    An experience that is important for this research was carried out by Canadian libraries by collecting
the free text responses of the LibQual questionnaire [6]. The characteristics analyzed by Canadian
libraries were:
    •    Information control: access to information, promotion, skills and bibliographic guides;
    •    Library space: approach to physical or digital space;
    •    Affect service: negative and positive sentiment for service.
    These three characteristics analyze the feelings for the two fundamental services of the library seen
as access to a collection and physical space. An emotional perception that the library in general arouses
in users is added.

3. Methodology
   To gain comparative appreciation for respondent feedback over time, the comments of the three
datasets collected by CSBNO will be analyzed to track their sentiment and the topics they relate to. To
gain control over such a significant amount of data, computer-aided data mining tools will be used to
conduct sentiment analysis on the comments of each dataset of the survey. The framework for the
sentiment analysis model essentially involves three steps: selection of training data, text processing,
creating a model. Two students have been involved in the project.

3.1.    Selection and loading of training data
   The pre-tagged training data is selected and loaded into the program. To create a template, both text
elements and any corresponding sentiment assignments must be selected.

3.2.    Text processing
   The text considered by the CSBNO Consortium is in Italian. Text preprocessing eliminated minor
language differences, such as lowercase versus uppercase letters, pluralization, and tenses, using
common stemming and stop-words techniques, to create an accurate text analysis model. However,
since some models use the grammatical structure of text, the original plain text is also kept in the dataset,
for possible use in the following steps of analysis. Once finished, the training data corpus is used to
create positive, negative and neutral vectors of features, to capture the polarized elements that
characterize the text of the comment. Those vectors of features are saved for future use.

3.3.    Creating a model
    Using these vectors of features, the program uses a classification algorithm to create a pattern to
separate other unseen comments into positive, negative, or neutral, for sentiment analysis. As an
orthogonal task, the selected comments will also be classified according to their topic. This further
classification will provide a deeper and more complete understanding of the collected opinions. To
verify the accuracy of the models, they are tested on some pre-tagged test data, to measure the precision,
recall and accuracy of the classification. The model is saved for future use.
    In simplified terms, the most traditional approaches of sentiment analysis work by providing the
algorithm with a so-called “bag of words”, that allows it to recognize the words and the groups of words
that humans use to express positive and negative opinions. The process is a form of supervised machine
learning; pre-tagged datasets are used as training examples to “teach” the computer and create the basis
for the classification of future unlabeled information. By providing pre-tagged “positive” (good, polite,
excellent, etc.) and “negative” (terrible, shoddy, rude, etc.) words, the data mining software can
establish a model that will be applied to future comments to decipher their polarity or whether they have
a positive or negative feeling. With the same approach, it is also possible to classify a text according to
its specific topic, in a task of topic detection. Alternatively, it is possible to use clustering algorithms to
group together texts with similar features, in a non supervised scenario.
    In this work, the most representative and consolidated techniques of sentiment analysis and topic
detection will be compared. In particular, the best algorithms of different families will be considered,
including those based on some notion of geometric distance between samples (i.e. knn, svm) [4],
decision trees (rf, xgboost) [3], probability and statistics (NB) [5], perceptrons and small neural
networks [1]. In fact, those algorithms, or their composition [2], have proven their good accuracy over
many different datasets, of small and medium size [8].
    However, some newer algorithms have improved the accuracy over larger datasets, exploiting
socalled deep neural network architectures, together with more advanced techniques for collecting the
vectors of features of the training set. In fact, the traditional vectorization, based on the bag of words
algorithm, creates a dataset with a very large number of features and requires an accurate and sensible
phase of feature selection, for obtaining the best results. Instead, techniques of word embedding and
dense representations [10] are able to map each word in a multidimensional space, where semantically
related words are represented as points at a short distance. The vector representing each sample is
calculated on the basis of positions of words in this multidimensional space. Moreover, deep neural
networks have shown some impressive results in many applications, including sentiment analysis
(BERT) [7]. But these networks are characterized by a very large number of parameters which have to
be learned, requiring the use of samples in the order of magnitude of Big Data.
   In the present work, these new techniques will also be used, exploiting pre-trained models and
additional phases of transfer learning and fine tuning, for adapting the models to the particular task at
hand. The additional steps usually require much smaller datasets, than those used to train the whole
model.
   The research is in its initial step. We intend to analyze a training set of some comments, randomly
selected from the responses of the three data sets collected. This training set will be manually reviewed
by the two students and labeled as having a positive or negative feeling.
   Using the data mining platform these training sets of comments will provide the framework for
creating data-specific positive and negative word vectors to power the sentiment analysis model. It is
thought to create an additional process to isolate individual topics within the larger comments, allowing
for more nuanced sentiment analysis.

4. Conclusions
   The sentiment analysis model provides a complementary tool for analyzing quantitative and
qualitative results of simple satisfaction survey of active and inactive users for library services.
Sentiment analysis application, could facilitate the realization of a participatory approach with
communities, allowing a simple and efficient year-by-year analysis of open comments. The CSBNO
Consortium expects the sentiment analysis process to provide the means to isolate specific topics based
on specified keywords, allowing individual institutions to tailor results for more in-depth analysis.

5. References
[1] M. S. Akhtar, A. Kumar, D. Ghosal, A. Ekbal, P. Bhattacharyya, A multilayer perceptron based
     ensemble technique for fine-grained financial sentiment analysis, in Proceedings of the 2017
     conference on empirical methods in natural language processing (2017), pp. 540-546.
[2] G. Angiani, S. Cagnoni, N. Chuzhikova, P. Fornacciari, M. Mordonini, M. Tomaiuolo, Flat and
     hierarchical classifiers for detecting emotion in tweets, in Conference of the Italian Association for
     Artificial Intelligence (2016), pp. 51-64.
[3] R. H. Hama Aziz, N. Dimililer, SentiXGboost: enhanced sentiment analysis in social media posts
     with ensemble XGBoost classifier, in Journal of the Chinese Institute of Engineers (2021), 44(6),
     pp. 562-572.
[4] M. R. Huq, A. Ali, A. Rahman, Sentiment analysis on Twitter data using KNN and SVM, in
     International Journal of Advanced Computer Science and Applications (2017), 8(6), pp. 19-25.
[5] R. A. Laksono, K. R. Sungkono, R. Sarno, C. S. Wahyuni, Sentiment analysis of restaurant
     customer reviews on tripadvisor using naïve bayes, in IEEE 2019 12th International Conference
     on Information & Communication Technology and System (2019), pp. 49-54.
[6] M. T. Moore, Constructing a sentiment analysis model for LibQUAL+ comments, Performance
     Measurement and Metrics (2017), Vol. 18 No. 1, pp. 78-87. https://doi.org/10.1108/ PMM-07-
     2016-0031
[7] C. Sun, L. Huang, X. Qiu, Utilizing BERT for aspect-based sentiment analysis via constructing
     auxiliary sentences (2019), arXiv preprint arXiv:1903.09588.
[8] M. Tomaiuolo, G. Lombardo, M. Mordonini, S. Cagnoni, A. Poggi, A survey on troll detection, in
     Future Internet (2020), 12(2), p. 31.
[9] T. Wilson, J. Wiebe, P. Hoffmann, Recognizing contextual polarity in phrase-level sentiment
     analysis, in Proceedings of the conference on human language technology and empirical methods
     in natural language processing, Association for Computational Linguistics, (2005) pp. 347-354.
[10] A. Yadav, D. K. Vishwakarma, Sentiment analysis using deep learning architectures: a review, in
     Artificial Intelligence Review (2020), 53.6, pp. 4335-4385.