=Paper=
{{Paper
|id=Vol-2155/kutzner
|storemode=property
|title=Reviews of Cultural Artefacts: Towards a Schema for their Annotation
|pdfUrl=https://ceur-ws.org/Vol-2155/kutzner.pdf
|volume=Vol-2155
|authors=Kristin Kutzner,Anna Moskvina,Kristina Petzold,Claudia Roßkopf,Ulrich Heid,Ralf Knackstedt
}}
==Reviews of Cultural Artefacts: Towards a Schema for their Annotation==
<pdf width="1500px">https://ceur-ws.org/Vol-2155/kutzner.pdf</pdf>
<pre>
        Reviews of Cultural Artefacts: Towards a Schema for their Annotation
     Kristin Kutzner, Anna Moskvina, Kristina Petzold, Claudia Roßkopf, Ulrich Heid, Ralf
                                        Knackstedt
Institute for Business Administration and Business Information Systems, Institute for Information Science and Language
              Technologies, Institute for Creative Writing and Literary Studies, Institute for Cultural Politics
                       University of Hildesheim, Universitätsplatz 1, 31141 Hildesheim, Germany
                   {kutznerk, moskvina, petzold, rosskopf, heidul, ralf.knackstedt}@uni-hildesheim.de

                                                                Abstract
Digital transformation allows new forms of discussion on cultural and aesthetic practices. Several stakeholders are able to comment and
discuss cultural artefacts (books, museums and their exhibitions). Today, a variety of platforms exists: from general platforms (e.g.,
Amazon, Tripadvisor) to specialized ones (e.g., LovelyBooks, Behance). So far, there has not been any analysis of reviews of cultural
artefacts across platforms. Accordingly, this study identifies and classifies text components of reviews about cultural artefacts. Based on
the coding paradigm in the sense of Grounded Theory, (1) the components are empirically identified and (2) structured, resulting in a
first category system. After evaluating and modifying the system (3), a multi-layered category system resulted. Thereafter, a group of
students applied the system to provide insights into the understandability of the system. By providing such a categorisation, we intend
to contribute to further research in analysing the contents and text structure of online reviews.

Keywords: cultural artefacts, reviews, multi-layered category system
                                                                         McAuley and Leskovec, 2013), analysed online consumer
                     1.    Introduction                                  reviews. A few studies analysed reviews in German (e.g.,
                                                                         Mehling et al., 2018). Computational and linguistic
Today, a variety of online platforms including e-
                                                                         processing focused on techniques like opinion mining (e.g.,
commerce, social media and specialized rating platforms,
allows consumers to rate and communicate their opinion                   Pang and Lee, 2008) or sentiment analysis (e.g, Wiegand
                                                                         and Ruppenhofer, 2015; Klinger et al., 2016) to investigate
about products or services. According to industry research
                                                                         how reviews can be automatically classified into being
reports, purchasing decisions of consumers are highly
influenced by online reviews (Deloitte, 2007; Duan et al.,               positive, negative or neutral. However, to the best of our
                                                                         knowledge, there is no research approach analysing
2008). Further, a Nielsen (2012) report, surveying more
                                                                         reviews, especially about cultural artefacts, across
than 28.000 internet users worldwide, found that such
online customer reviews are the second most trusted source               platforms. Therefore, this study focuses on reviews of
                                                                         cultural artefacts (books, museums and their exhibitions) in
of brand information.
                                                                         German from different types of platforms. Accordingly, we
Furthermore, digital transformation supports new or                      aim to answer the following research question as a basis for
changed forms of collaboration (Kutzner et al., 2018) and                further research:
participation (O’Reilly, 2005), including discussion on
cultural and aesthetic practices. Through the interactive                        What kind of components are contained in reviews
                                                                                  of cultural artefacts?
mode of digital media for instance the formerly clear lines
between different groups of stakeholders are blurring, e.g.              Our contribution is a multi-layered category system for
between producer and consumer (see the notion of                         characterising components of reviews of cultural artefacts
‘Prosumer’ in Toffler, 1980) and between laymen and                      (e.g., contents and communicative acts). The category
professionals. Nearly every cultural artefact can get a                  system contributes to our ongoing research on reviews in
review: a movie (e.g., imdb.com), a book (e.g.,                          the digital world and aims at characterising and analysing
goodreads.com) or a new mobile application. In this study                reviews of cultural artefacts. Therefore, as a next step after
we focus on artistic artefacts (museums and their                        building the category system, we hope to be able to find
exhibitions) and books as cultural artefacts. Writing a                  patterns of components within reviews. In a first step, we
review about such an artefact, a reviewer can choose                     briefly outline the background of reviews, cultural artefacts
among a variety of platforms, from general, commercial                   and the digital world in which they are written (Section 2).
platforms (e.g., Amazon, Tripadvisor) and specialized                    Based on our research design (Section 3), we iteratively
platforms and community-based platforms (e.g.,                           built the multi-layered category system (Section 4) and
LovelyBooks, Behance) to more text oriented platforms                    evaluated it, applying it several times (Section 5). We then
(e.g., Sobooks, Mojoreads, Lectory). Some of these                       discuss the results and future research directions (Section
platforms not only present the related cultural artefacts, but           6) and conclude with our main findings (Section 7).
also support the interaction between several reviewers
(comments function), the rating of cultural artefacts and its                                2.    Background
participative further development (co-creation). In this
field, reviews about cultural artefacts can be seen as textual           In this section, we specify the terms related to our research
materializations of cultural practices and of their                      question. Therefore, we introduce our definition of a
perception.                                                              cultural artefact, the concept of a review and its
                                                                         components.
Earlier studies have tended to address heterogeneous and
mostly isolated aspects concerning reviews. Most of them                 In this study, we define cultural artefacts as artistic artefacts
analysed reviews in English. For instance, based on product              (museums and their exhibitions) and books. This focus on
ratings, e-commerce platforms, especially Amazon (e.g.,                  the cultural field also defines the sort of review we are


                                                                    17
looking for. In this context the concept of a review is                 identification of components and (Stage 2) the
characterized by its strong relation to the tradition of art and        development of a category system (build activities). To
literary criticism as a professional journalistic form of               leverage rigorousness and to demonstrate the utility,
discussing and reviewing newly published cultural                       quality and efficacy of the category system, we evaluated
artefacts.                                                              the system (Stage 3, evaluate activities), applying it several
                                                                        times (Figure 1).
In general, scholars and practitioners do not agree on a
universal definition of the term review and of its                                                    Inputs          Methods/Steps          Outputs
components. Therefore, a review could be analysed and
                                                                                      Stage 1:                       • Perform open      • 130 types of
understood from different point of views. Some of the most                            Identify   • Sample reviews      coding              review
prominent positions include the following statements:                               components                         iteratively         components


                                                                         Build
     Review as a product of evaluation of a cultural                                Stage 2:    • 130 types of      • Perform axial
      artefact: a review is considered as an article published                       Develop                                             • Multi-layered
                                                                                                   review              coding
                                                                                     category                                              category system
      by a journalist, that describes, explains, interprets                           system
                                                                                                   components          iteratively
      and/or evaluates a cultural artefact. Most
                                                                                     Stage 3:                        • Apply multi-
      characteristic parts of a review include                                                   • Sample reviews                        • Adapted multi-


                                                                         Evaluate
                                                                                     Evaluate                          layered
                                                                                                 • Multi-layered                           layered
      recommendation or dissuasion as well as statements                             category
                                                                                                   category system
                                                                                                                       category system
                                                                                                                                           category system
      on the originality and entertaining qualities of an                             system                           iteratively

      artefact (Stegert, 1997).
                                                                                          Figure 1: Research design.
     Review as the central text type of literary criticism: a          Stage 1: Identify components. First, we empirically
      review is understood as the critical discussion of a              derived components of reviews. To start this process, we
      new publication; the most common and most                         selected ten sample reviews from different types of
      important type of text in literary criticism                      platforms (e.g., social media platforms, blogs, other rating
      (Pfohlmann, 2005).                                                and exchange platforms), addressing artistic artefacts and
                                                                        books. To contribute to the robustness, nine researchers
     Review as an expert expression of opinion: a review               independently analysed the reviews. Each researcher
      is the opinion-expressing form of literary and art                named segments of the reviews with short labels that
      criticism. Book reviews, film criticism, a judgmental             characterise the components of the reviews. In the sense of
      report on a painting exhibition or an expert                      Grounded Theory this procedure is called open coding
      journalistic expression are examples (La Roche et al.,            (Glaser, 1978). It is a first step towards making analytic
      2013).                                                            interpretations of the reviews (Charmaz, 2006). In a
                                                                        workshop, the researchers consolidated their components.
     Review means asking for terms of arts, their
                                                                        In total, 130 different types of review components have
      functions and their origin. It further involves judging,
      making oneself unpopular and not being afraid of                  been identified.
                                                                        Stage 2: Develop category system. Second, the
      misunderstandings (Rauterberg, 2007).
                                                                        researchers independently structured and reassembled the
As the statements indicate, a review may contain several                identified components of reviews in a new way that is
components, addressing several aspects of a cultural                    called axial coding in the sense of Grounded Theory (e.g.,
artefact. For instance, descriptions, explanations,                     Strauss and Corbin, 1990, 1998), resulting in a multi-
interpretations of an artefact as well as its critical                  layered category system. The results were consolidated in
discussion and expressions of the reviewer’s opinion are                a subsequent workshop. The category system has been
components of a review. Consequently, a review is                       enriched by addressing both review components of artistic
characterised by several textual components and, therefore,             artefacts and books. In addition, similar components have
can be analysed from different perspectives. Furthermore,               been merged.
reviews and their components might vary depending on the                Stage 3: Evaluate category system. Selecting further
type of platform (e.g., general platforms like Amazon and               sample reviews of artistic artefacts and books, the
Tripadvisor, specialized and community-based platforms                  researchers independently applied the category system and
like LovelyBooks or Behance) and on the addressed                       annotated the reviews. Again, the researchers consolidated
cultural artefact (e.g., books, museums and their                       their results and experiences while annotating the reviews.
exhibitions). However, existing research mostly tended to               As a result, the category system has been modified. These
address isolated aspects of reviews. Therefore, there still             steps have been repeated several times until no more
seems to be a need to investigate multiple perspectives on              changes occurred. In May 2018, 24 students applied the
reviews of cultural artefacts. Accordingly, in this study we            resulting category system, annotating about 430 randomly
aim to provide a multi-layered category system that covers              selected Amazon book reviews (McAuley et al., 2015; He
several perspectives on a review, as a basis for further                and McAuley, 2016). As a step towards evaluation of the
research in this field.                                                 category system, we chose to analyse reviews of books
                                                                        from Amazon, due to its availability for scientific research,
                 3.    Research Design                                  its large database, and the great variety of artefacts
                                                                        (different authors, different books) and reviewers. As the
In order to identify components of reviews we conducted                 category system focuses on both reviews of artistic
an iterative three-stage research design, containing build              artefacts and books, it has to be applied to reviews of
and evaluate activities that characterise the Design Science            artistic artefacts as well. Therefore, the evaluation of May
Research in Information Systems (e.g., March and Smith,                 2018 can be seen as a first step within an evaluation activity
1995; Peffers et al., 2007). It consists of (Stage 1) the               in progress. In the May 2018 exerciser, several reviews

                                                                   18
have been independently analysed by three students.                     As a compact guideline for annotators, we built a tabular
Subsequently, we measured the nominal scale agreement                   overview that complements the hierarchically ordered,
(Fleiss kappa, e.g., Fleiss, 1971; Fleiss and Cohen, 1973)              labelled components by operationalised explanations and
and the absolute observed agreement of each category, to                examples (codebook).
measure the agreement between the three raters. We further
asked the students for their experience which categories                Presentation of selected categories. As described above,
they found easier to identify than others.                              each layer consists of several hierarchically ordered
                                                                        categories. In total, the category system contains 108
4.    Category System of Review Components                              different categories. More than half of the categories are
                                                                        assigned to Layer 1. On the topmost level, Layer 1 contains
Following the research design (Section 3), we built a multi-            eight categories (Figure 3). For reasons of space
layered category system, containing multiple components                 limitations, we will only present selected subcategories of
of a review (Stage 1 and 2). In this section, we describe the           category 1.1 in more detail (Figure 4).
category system in general and, for reasons of space                                               Layer 1—Content
limitations, we only present selected components of Layer
1 and 2 in more detail.                                                   1.1      Reviewer-Artefact-Relation: Focused

                                                 a                        1.2      Reviewer-Artefact-Relation: Extended

                                        𝐶𝑎𝑡1 …       𝐶𝑎𝑡
                                                                          1.3      Artefact-Author-Relation and its Environment
                                       1.1           1.𝑛
                                                                                   Artefact in Medium 1-Artefact in Medium 2-Relation
     Layer 1—Content               …                                      1.4
                                                                                   (Intermediality of the Artefacts)
     Layer 2—General Criticism
                                                                          1.5      Relation between Artefacts
     Layer 3—Style
                                                                          1.6      Reflexions on the Review(ing Process)
     Layer 4—Further Information
                                                                          1.7      Reviewers’ Self-thematisation
          Figure 2: Multi-layered category system.
Overview of the system. The category system is divided                    1.8      Relation between Reviews
into four different layers that distinguish several
components of reviews: Layer 1—Content addresses                                   Figure 3: Topmost categories of Layer 1.
various themes that are directly related to the cultural
artefact (e.g., story of the artefact, the reviewer’s emotions                                     Layer 1—Content
and her or his assessment of the artefact). Furthermore,                    1.1            Reviewer-Artefact-Relation: Focused
aspects of the background and of the reception context of
the artefact are annotated as part of this layer (e.g., location            1.1.1          Individual aspects
where the artefact has been perceived, biography of the                     1.1.1.1        Citation
author, reflection about the structure and objective of the                 1.1.1.2        Translation
review). When a reviewer addresses the aspects of the
                                                                            1.1.1.3        Author
artefact or its context by means of the text components
classified on Layer 1, he or she may have certain intentions                1.1.1.4        Content
and/or comes up with criticism. To indicate these intentions                1.1.1.5        Title
Layer      2—General        Criticism      contains    different            1.1.1.6        Physical Properties of the Artefact
communicative acts (e.g., summarise, recommend,
discourage, thank, ask questions) that are always related to                1.1.1.7        Outer Appearance
the components of Layer 1. In writing a review, the                         1.1.1.8        Language Style
reviewers sometimes use particular language styles (e.g.,                   1.1.2          View of the Artefact as a Whole
use of rhetorical means like irony, hyperboles, and
metaphors). Layer 3—Style contains these stylistic                          …              …
components. Moreover, sometimes multimedia content                                         Layer 2—General Criticism
(e.g., pictures, links, emojis) is used in reviews. In addition,
some sort of metadata (e.g., structure of the review) can be                2              Communicative Acts in the Reviews
captured. Layer 4—Further Information addresses these                       2.1            Summary
aspects of a review.                                                        …              …
The category system is used to annotate text strings in the                 2.9            Mention without Assessment
reviews. Further, the categories are organized in a                         2.10           Assessment
hierarchical structure, where each category has a numerical                 2.10.1         Positive/Agreement
and a textual label used in the practical annotation work.
For instance, Layer 1 contains category 𝐶𝑎𝑡1 that is divided                2.10.2         Negative/Disagreement
into several subcategories from 𝑆𝑢𝑏𝐶𝑎𝑡1.1 to 𝑆𝑢𝑏𝐶𝑎𝑡1.𝑛 . A                  2.10.3         Ambivalent
subcategory can be divided into further subcategories and                   …              …
so on (Figure 2).
                                                                                Figure 4: Selected categories of Layer 1 and 2.


                                                                   19
Layer 1—Category 1.1. This category contains various                     mentioned by the reviewer (category 2.9). Third, the
themes addressed by the reviewers that are directly related              annotator should take the perspectives of Layer 3 and 4.
to the cultural artefact. First, the reviewer might discuss              However, as the sample text passage does not address these
individual aspects (category 1.1.1) related to the artefact.             categories, no further annotation is required (Figure 5).
For instance, he or she might quote some content, address
the translation, the author, detailed content or the title of the        Evaluation. Applying the category system and annotating
artefact. Also physical properties such as the quality of                430 reviews, the students had to annotate sentences or
paper or illustrations, the outer appearance of the reviewed             indicator expressions (such as those underlined in Figure 5)
book (e.g., cover or used condition) and the language style              with the labels from our category system. In sum, 14.235
of the artefact can be addressed by the reviewer.                        text passages have been identified and annotated with
Alternatively or in addition, not only individual aspects, but           categories by the students. Some categories have been more
also the artefact as a whole can be discussed (Figure 4).                frequently used than others. For instance, the content of an
                                                                         artefact (category 1.1.1.4) is most commonly recognized by
Layer 2—Category 2. Addressing the categories of Layer                   the students (2603 times). The summary of some content is
1, the reviewer always pursues some intention. Therefore,                the second most common category (category 2.1) that is
Layer 2 offers different communicative acts in the reviews               recognized by the students (2403 times, Figure 6).
for annotating these intentions (Figure 4).
                                                                                     Top 10— Frequencies of Categories
                                                                                          (Total of 430 Reviews)
          5.      Application and Evaluation
As a first test of the category system, 24 students were                   #         No.         Category
asked to apply it in the manual annotation of 430 randomly                 2603      1.1.1.4     Content
selected Amazon reviews (Stage 3 of the research design).
We now present an example of an annotated text passage                     2403      2.1         Summary
which illustrates how labels from our category system are                  1369      2.10.1      Positive/Agreement
attached to text components of a review. In addition, we                   1297      1.1.2       View of the Artefact as a Whole
present selected results of the evaluation of the category
system.                                                                    542       1.1.1.8     Language Style
                                                                           510       1.1.4       Own Emotions
Annotation example. Annotators of reviews might
                                                                           355       1.1.1.6     Physical Properties of the Artefact
usefully read a given review several times, from different
perspectives. As a result, each text passage of the review is              319       1.1.1.3     Author
annotated with categories of both, Layer 1 and 2. If it                    316       2.10.2      Negative/Disagreement
involves a particular language style or multimedia content,
                                                                           282       2.9         Mention without Assessment
the categories of Layer 3 and 4 can be additionally
annotated.                                                                   Figure 6: Top 10—Frequencies of categories, total.
      Original, German Text
                                                                         Expecting 30 students to annotate the reviews, we divided
                                         Free Translation                the reviews equally among the students. To be able to
             Passage
                                                                         measure the nominal scale agreement (Fleiss kappa) among
   Ein sehr charmant witziges       A very charming                      the different raters (e.g., Fleiss, 1971; Fleiss and Cohen,
                 1.1.2, 2.10.1         1.1.2, 2.10.1                     1973), each review has been assigned to three different
   und unterhaltsames Buch          and entertaining book                students. However, only 24 students in the end worked on
                                                                         the task. As a consequence, not all reviews have been
           1.1.2, 2.10.1                    1.2, 2.10.1                  annotated three times.
   ganz im typischen Stil           in the typical style of
               1.1.1.8, 2.9               1.1.1.8, 2.9                               Top 10—Frequencies of Categories
   Ellen DeGeneres.                 Ellen DeGeneres.                                      (Triple of 139 Reviews)

       1.1.1.3, 2.9                    1.1.1.3, 2.9                        #         No.        Category
                                                                           952       1.1.1.4    Content
Figure 5: Annotation example from a review text passage.
For instance, we read a text passage from a sample review                  862       2.1        Summary
(for reasons of presentation, we translated it freely from                 516       2.10.1     Positive/Agreement
German to English): «A very charming and entertaining
                                                                           405       1.1.2      View of the Artefact as a Whole
book in the typical style of Ellen DeGeneres» (Amazon).
First, we analyse the text passage from the perspective of                 206       1.1.1.8    Language Style
Layer 1. The reviewer addresses the book in general                        185       1.1.4      Own Emotions
(category 1.1.2), stating that it is «very charming and
entertaining». Thereafter, the reviewer addresses the issue                123       1.1.1.6    Physical Properties of the Artefact
of the language style «the typical style» (category 1.1.1.8)               119       2.10.2     Negative/Disagreement
and the author «Ellen DeGeneres» (category 1.1.1.3). In a                  105       2.9        Mention without Assessment
second round, we read the text passage from the
perspective of Layer 2. Writing that the book is very                      91        1.1.1.7    Outer Appearance
charming and entertaining, the reviewer wants to assess the
                                                                          Figure 7: Top 10—Frequencies of categories, threefold.
artefact in a positive way (category 2.10.1). Both the
«typical style» and «Ellen DeGeneres» are simply


                                                                    20
For analysing the agreement between three raters regarding                             likely because of the special situation of Amazon
each category, we reduced the data set to reviews that have                            delivering the book. Finally, we note that a considerable
been annotated threefold (Triple), resulting in 139 reviews                            number of annotated text passages make reference to the
and different overall frequency figures for the categories                             emotions of the reviewers during the reception process.
(Figure 7).
                                                                                       However, because of several circumstances (e.g., variety of
A calculation of the inter-annotator agreement on the task                             incomplete annotations) the measurement of the nominal
is not trivial and yet provides only limited insight. On the                           scale agreement (Fleiss kappa) could not provide sufficient
one hand, not all text passages have consistently been                                 results. As described above (Section 3), the category
annotated by all annotators at both levels (Layers 1 and 2);                           system is meant to be platform-independent and generic
this leads, for example, to low values of Fleiss’ kappa. On                            enough to be applicable to both, reviews of artistic artefacts
the other hand, the category system contains a large number                            and books. Thus, the described evaluation can only be seen
of categories, and many of them have rarely been used, so                              as a partial evaluation activity in progress. The enhanced
that the calculation does not provide adequately                                       category system will be applied, in more controlled
interpretable results. As a consequence, we get rather low                             experiments, for reviews of artistic artefacts as well,
kappa values (the same applies for Krippendorff’s alpha for                            resulting in new annotations of reviews and additional
most categories, Krippendorff, 2011) which at first sight                              sample annotations to measure the agreement between
suggest little agreement between the annotators.                                       raters.
Nevertheless, we can interpret some of the data, calculating                           Moreover, the category system represents an idealized
the absolute agreement for each category. For instance,                                schema for annotating reviews. It serves as a starting point
regarding the most frequently used category, Content                                   to manually analyse the components of reviews. However,
(1.1.1.4), in 48% of the cases the raters use this category                            as a next step, we want to identify the review components
unanimously. Furthermore, in 24% of the annotations, two                               automatically with methods of machine learning. For
raters agree and one rater disagrees (i.e., he or she does not                         instance, the use of support vector machines (e.g., Cortes
use the category Content for annotating the same sentence).                            and Vapnik 1995; Boser et al., 1992) or decision trees (e.g.,
Only in 28% of the cases, two raters disagree and one                                  Rasoul and Landgrebe, 1991; Cho et al. 2002) can support
agrees. Thus, in 72% of the cases the majority of the raters                           the automatic classification of the components. The
agrees on the use of the category Content (Figure 8). As                               annotations that are manually identified by the students
described in Section 3, we also asked for the students’                                (Section 3) and/or in new annotation rounds, may serve as
personal opinion about accurately assignable (vs.                                      training data to learn a model and to predict components of
problematic) categories. The most commonly mentioned                                   reviews.
category is category Content: this confirms the calculated
result. Thus, we can conclude that this category is                                    Furthermore, in the medium term, we not only want to
understandable for the raters and easy to identify in the                              identify components of reviews, but also patterns of and
reviews. A similar interpretation applies to the use of                                relationships between components (e.g., relationship
category 1.1.2 (view of the artefact as a whole). Regarding                            between the view of the artefact as a whole and its
the absolute agreement of this category, in 66% of the cases                           assessment). Therefore, the training data can be used as an
only one rater uses category 1.1.2. In only 7% of the cases,                           input for further machine learning algorithms, like cluster
three raters agree on the use of this category (Figure 8). In                          analysis (e.g., Hartigan and Wong, 1979; Elkan, 2003) or
the student’s comments, category 1.1.2 is mentioned as not                             sequential pattern-mining algorithms like Apriori-based
accurately assignable by the majority. Thus, both results                              algorithms (e.g., Slimani and Lazzez, 2013). Moreover, to
indicate that category 1.1.2 is not understandable enough                              support the visualisation and analysis of reviews and their
and hard to identify in the review texts and thus has to be                            components, domain-specific modelling approaches (e.g.,
improved.                                                                              Guizzardi et al., 2002; Kishore and Sharman, 2004) can be
                                                                                       valuable further research directions.
                                                 Relation of raters who agree
                                                and disagree (agree : disagree)        As a result, this study provides a double contribution to the
 No.          Category                           3:0         2:1          1:2          field of Digital Humanities: First, the category system and
 1.1.1.4      Content                           48%          24%         28%           the first sample annotations constitute a starting point for a
 1.1.2        View of the Artefact as a Whole    7%          27%         66%           classification of textual components of reviews, and for its
                                                                                       subsequent automation by means of machine learning
         Figure 8: Relation of raters who agree/disagree.                              methods. Second, this study and the ongoing research
                                                                                       project which it is a part of go beyond the methodological
         6.     Discussion and Future Research                                         questions and address—by examining digital practices of
                           Directions                                                  cultural participation through reviews—the understanding
As described in Section 5, the absolute observed agreement                             of digital culture and society on a more fundamental level
per category provides first insights into the                                          which is, eventually, a central question of Digital
comprehensibility of the categories. As a consequence, we                              Humanities.
are able to revise the category system. From the first
annotation experiment we learn that positive evaluative                                                    7.    Conclusion
statements are much more frequent in our sample than
negative ones; many reviews make reference to the style or                             In order to identify components that are contained in
language of the reviewed book; and the physical                                        reviews of cultural artefacts, we used the coding paradigm
appearance of the book is mentioned quite prominently,                                 in the sense of Grounded Theory (e.g., Glaser, 1978;
                                                                                       Strauss and Corbin, 1990, 1998). We thus empirically


                                                                                  21
derived components by analysing and annotating selected
reviews. These components then have been structured,
resulting in a multi-layered category system. To evaluate
the system, we applied the categories, annotating several
reviews and we modified the system up to a stable version.
As a result, the system contains Layer 1—Content, Layer
2—General Criticism, Layer 3—Style and Layer 4—
Further Information. Each layer consists of several
categories that are organized in tree-like hierarchies. To
further evaluate the category system, students applied the
resulting system, annotating about 430 Amazon book
reviews. The measurement of the absolute agreement of
each category provides first insights into the
comprehensibility of the categories. Thus, it supports
revising the category system.
Overall, our findings contribute to the ongoing research on
reviews in the digital world and to the analysis and
characterisation of reviews of cultural artefacts. Based on
the category system and on sample annotations, researchers
are further able to identify the review components
automatically with methods of machine learning and to
discover patterns of review components.

              8.   Acknowledgements
This research was conducted in the scope of the research
project “Rez@Kultur” (01JKD1703), which is funded by
the German Federal Ministry of Education and Research
(BMBF). We would like to thank them for their support.


                                                              22
         9.    Bibliographical References                            La Roche, W., Hooffacker, G. and Meier, K. (2013).
Boser, B. E., Guyon, I. M. and Vapnik, V. N. (1992). A                 Einführung in den praktischen Journalismus: Mit
  training algorithm for optimal margin classifiers. In                genauer       Beschreibung       aller  Ausbildungswege
  Proceedings of the fifth annual workshop on Compu-                   Deutschland Österreich Schweiz. Springer-Verlag.
  tational learning theory. ACM, pages 144–152.                      March, S. T., Smith, G. (1995). Design and Natural Science
Charmaz, K. (2006). Constructing Grounded Theory. A                    Research on Information Technology. Decision Support
  Practical Guide Through Qualitative Analysis. SAGE.                  Systems, 15 (4):251–266.
Cho, Y. H., Jae, K. K. and Soung, H. K. (2002). A                    McAuley, J. and Leskovec, J. (2013). Hidden Factors and
  personalized recommender system based on web usage                   Hidden Topics: Understanding Rating Dimensions with
  mining and decision tree induction. Expert systems with              Review Text. In Proceedings of the 7th ACM
  Applications, (23:3):329–342.                                        Conference on Recommender Systems, pages 165–172,
Cortes, C. and Vapnik, V. (1995). Support-vector                       New York NY, USA, ACM.
  networks. Machine learning, (20:3):273–297.                        McAuley, J., Targett, C., Shi, Q. and Van Den Hengel, A.
Deloitte (2007). New Deloitte Study Shows Inflection                   (2015). Image-based recommendations on styles and
  Point for Consumer Products Industry; Companies Must                 substitutes. In Proceedings of the 38th International
  Learn to Compete in a More Transparent Age. Press                    ACM SIGIR Conference on Research and Development
  Release, Deloitte Services LP, New York, October 1.                  in Information Retrieval, pages 43–52, ACM.
Duan, W., Gu, B., Whinston, A. B. (2008). Do online                  Mehling, G., Kellermann, A., Kellermann, H., Rehfeldt, M.
  reviews matter? An empirical investigation of panel data.            (2018). Leserrezensionen auf amazon.de : Eine
  In Decision Support Systems. 45., pages 1007–1016.                   teilautomatisierte inhaltsanalytische Studie. Bamberg,
Elkan, C. (2003). Using the Triangle Inequality to                     Bamberg University Press.
  Accelerate k-Means. In Proceedings of the International            Nielsen (2012). Nielsen’s latest Global Trust in
  Conference on Machine Learning, Washington DC,                       Advertising                                       report.
  USA.                                                                 https://retelur.files.wordpress.com/2007/10/global-trust-
Fleiss, J. L. (1971). Measuring Nominal Scale Agreement                in-advertising-2012.pdf (downloaded 2018-04-30).
  among many Raters. Psychological Bulletin, 76(5): 378-             O’Reilly, T. (2005). What is the Web 2.0?, URL:
  382.                                                                 http://www.oreilly.com/pub/a//web2/archive/whatis-
Fleiss, J. L. and Cohen, J. (1973). The Equivalence of                 web-20.html (downloaded 2018-06-15).
  weighted Kappa and the Intraclass Correlation                      Pfohlmann, O. (2005). Kleines Lexikon der Literaturkritik.
  Coefficient as Measures of Reliability. Educational and              Verlag LiteraturWissenschaft.de.
  Psychological Measurement, 33:613–619.                             Pang, B. and Lee, L. (2008). Opinion Mining and
Glaser, B. G. (1978). Theoretical sensitivity. Mill Valley,            Sentiment Analysis. Foundations and Trends in
  CA: The Sociology Press.                                             Information Retrieval, 2(1–2):1–135.
Guizzardi, G., Ferreira Pires, L. and Van Sinderen, M. J.            Peffers, K., Tuunanen, T., Rothenberger, M. and
  (2002). On the role of domain ontologies in the design of            Chatterjee, S. (2007/2008). A Design Science Research
  domain-specific visual modeling languages. In                        Methodology for Information Systems Research.
  Proceedings of the ACM OOPSLA.                                       Journal of Management Information Systems, 24 (3):45-
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS                   77.
  136: A K-Means Clustering Algorithm. Journal of the                Rasoul, S. S. and Landgrebe. D. (1991). A survey of
  Royal Statistical Society, Series C (Applied Statistics) 28          decision tree classifier methodology. In IEEE
  (1):100–108.                                                         transactions on systems, man, and cybernetics,
He, R. and McAuley, J. (2016). Ups and downs: Modeling                 (21:3):660–674.
  the visual evolution of fashion trends with one-class              Rauterberg, H. (2007). Und das ist Kunst?!: eine
  collaborative filtering. In Proceedings of the 25th                  Qualitätsprüfung. Frankfurt am Main, S. Fischer Verlag.
  international conference on world wide web, pages 507–             Slimani, T. and Lazzez A. (2013). Sequential Mining:
  517, InternationalWorldWideWeb Conferences Steering                  Patterns and Algorithms Analysis. International Journal
  Committee.                                                           of Computer and Electronics Research, 2(5):639–647.
Kishore, R. and Scharman, R. (2004). Computational                   Stegert, G. (1997). Die Rezension: Zur Beschreibung einer
  Ontologies and Information Systems. Foundations.                     komplexen             Textsorte.       Beiträge       zur
  Communications of the Association for Information                    Fremdsprachenvermittlung, 31:89–110.
  Systems, 14 (8):158–183.                                           Strauss, A. and Corbin, J. (1990). Basics of qualitative
Klinger, R., Suliya, S. S. and Reiter, N. (2016). Automatic            research: Grounded theory procedures and techniques.
  Emotion Detection for Quantitative Literary Studies -- A             Newbury Park, CA, Sage.
  case study based on Franz Kafka’s „Das Schloss“ und                Strauss, A. and Corbin, J. (1998). Basics of qualitative
  „Amerika“. In Digital Humanities, Kraków, Poland.                    research: Grounded theory procedures and techniques.
Krippendorff, K. (2011). Agreement and Information in the              Thousand Oaks, CA, Sage, 2nd edition.
  Reliability of Coding. Communication Methods and                   Toffler, A. (1980). The Third Wave: The Revolution That
  Measures, 5 (2):1–20.                                                Will Change Our Lives. London/New York, Collins.
Kutzner, K., Schoormann, T. and Knackstedt, R. (2018).               Wiegand, M. and Ruppenhofer, J. (2015). Opinion Holder
  Digital Transformation in Information Systems                        and Target Extraction based on the Induction of Verbal
  Research: A Taxonomy-based Approach to Structure the                 Categories. In Proceedings of the 19th Conference on
  Field. In European Conference on Information Systems,                Computational Language Learning, pages 215–225,
  Portsmouth, UK.                                                      Beijing, China, Association for Computational
                                                                       Linguistics.


                                                                23

</pre>