=Paper= {{Paper |id=Vol-2721/paper567 |storemode=property |title=Creating and Exploiting the Mappings from Conference Review Forms to a Generic Set of Review Criteria |pdfUrl=https://ceur-ws.org/Vol-2721/paper567.pdf |volume=Vol-2721 |authors=Vojtěch Svátek,Sára Juranková,Radomír Šalda,Petr Strossa,Zdeněk Vondra |dblpUrl=https://dblp.org/rec/conf/semweb/SvatekJSSV20 }} ==Creating and Exploiting the Mappings from Conference Review Forms to a Generic Set of Review Criteria== https://ceur-ws.org/Vol-2721/paper567.pdf
Creating and Exploiting the Mappings from Conference
  Review Forms to a Generic Set of Review Criteria

Vojtěch Svátek1 , Sára Juranková1 , Radomír Šalda2 , Petr Strossa1 , and Zdeněk Vondra2
      1
          Dept. of Information and Knowledge Engineering, University of Economics, Prague
                       Nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic
                            {svatek|jurs02|petr.strossa}@vse.cz
                       2
                         Dept. of Multimedia, University of Economics, Prague
                       Nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic
                                {salr00|zdenek.vondra}@vse.cz


          Abstract. Conference papers are evaluated according to many criteria reflected
          in numerical scores, and the wording of the criteria differs among conferences.
          This makes the role of meta-reviewers tough when summarizing the evaluation
          across multiple criteria and reviewers. Based on a micro-study within semantic
          technology conferences, we conjecture that the criteria can, for particular fields,
          be mapped on generic metrics, and provide a provisional ontological representa-
          tion for such a mapping and a set of metrics, as well as a manual mapping tool.
          Finally, we showcase an application exploiting the mappings: a graphics genera-
          tor that aggregates the review data into a complex pictorial metaphor.


1     Introduction
Conference papers are often evaluated according to multiple criteria reflected in numer-
ical scores. In large conferences with many reviews per paper this amounts to dozens
of partial figures.This sheer number, and the fact that the wording of the criteria differs
from one conference to another, make the role of meta-reviewers during the discussion
periods difficult, and the effort invested into the detailed scoring may partly get lost.
    In the research we first explored whether the criteria can be generalized across
events within a field such as Semantic Technology (ST) to a small set of review met-
rics. Based on the positive outcome of this study, we designed a provisional ontology
for representing the mappings between specific review forms and such generic criteria,
and developed a simple mapping authoring tool and a mapping execution component.
Finally, we developed a tool that demonstrates one possible way of exploiting the map-
pings: a review visualizer that assembles the metric values, for a set of reviews of the
same paper, into a compound pictogram relying on the racing cars metaphor.
    Contributions of this paper are thus both the small empirical study and a multi-
part demo. In the demo we can demonstrate how: 1) a mapping from a form to the
common set of metrics can be created and published, 2) the values for a concrete set of
reviews can be manually entered, thus emulating an automatic input from a hypothetical
component of a conference review system, and 3) the pictorial scene can be generated.
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0).
2     Review Criteria Micro-Study and Mapping to Generic Metrics

We analyzed the review forms of nine ST conferences, always for the latest edition we
could access as author/s or reviewer/s. We semantically clustered the field labels (refer-
ring to the reviewer guidelines where in doubts), yielding seven partial review metrics
that we named as in the first column of Tab. 2, plus two global metrics, Confidence and
Overall score, present in all forms. The partial criteria converged well despite the vary-
ing wording, though some forms missed certain metrics at all. ISWC and SEMANTiCS
were clearly influenced by one another, having the same set of fields. ESWC had two
fields that we both ranged under ‘Technical quality’. We do not list K-CAP, as it had no
partial numerical field; this may be related to the ‘workshop flavor’ of this event.


3     Ontological Representation of Review Forms and Metrics

For a review of existing relevant ontologies we can refer to our up-to-date study of
research-related ontologies in general [2]. The recently developed FAIR ontology3 cov-
ers the overall review process (reviewers, reviews and venues). The associated Review
Measures module of the BIDO ontology4 contains, among other, a large collection of
individuals corresponding to different rating/confidence scales and their values. None
of these ontologies however addresses the semantics of partial review metrics. There-
fore, we rapidly prototyped an ontology (not yet considering all best practices, thus
likely subject to revisions in the future) that supports the publishing of metrics and
their relationships to review forms. The ontology is online at http://kizi.vse.cz/
pictoreview/ontology/, and contains the classes ReviewMetrics, ReviewForm, Re-
viewFormField and F2M_Mapping (for the field-to-metric mapping), plus the connect-
ing properties. The proposed metrics set (applicable on ST conferences, and proba-
bly many other computing field’s ones) is at http://kizi.vse.cz/pictoreview/
metrics/. Finally, a sample mapping (that used in the example below) is at http:
//kizi.vse.cz/pictoreview/map/semantics18/.


4     Demo Suite

We developed a suite of four simple tools to demonstrate the whole concept. They are
bundled by the web page http://pictoreview.vse.cz/. The source code for the
first three tools is at https://github.com/jurs02/PictoReviewDev.
     The first tool allows the user to create a mapping from the custom set of review form
fields of a particular event to the proposed set of generic metrics. The mapping can be
1:1, 1:N or N:1. An example of a mapping (for the SEMANTiCS’18 Research Track)
is in the first two columns of Tab. 1. The mapping can be currently stored as a JSON
structure or as an RDF dataset described by our ontology from Sect. 3.
 3
     https://sparontologies.github.io/fr/current/fr.html
 4
     https://sparontologies.github.io/bido-review-measures/current/
     bido-review-measures.html
    The second tool is a simple mapping execution API, which transforms a set of re-
view form fillings of a specific conference (a JSON structure) to the generic metrics
(also output in JSON), using the JSON mapping (valid for that conference) authored by
the first tool. For the N:1 mapping (i.e., of multiple fields to the single metric), a numer-
ical mean of the values is computed. Note that the first and second tool together provide
(a baseline of) a general review data interoperability infrastructure, usable independent
of the rest of the demo; for example, the reviewing emphases of different conferences
could be compared based on the mappings.
    The third tool emulates the role of a hypothetical plug-in to an off-the-shelf review
management system (RMS). The user manually enters both JSON data structures ex-
pected by the second tool: the (saved) mapping, and the specific review form fillings,
for example, those from the last three columns of Tab. 1. The data is then transformed
to generic metrics (by the second tool) and passed to the fourth tool.
    The fourth tool, the pictogram generator, eventually, converts the generic metric
values to components of a complex pictorial metaphor. We identified ‘car’ as a rela-
tively close metaphor to a research paper, and car components (plus other ‘car race’
features) as visual variables expressing the metrics values. In Fig. 1 we see the visual
representation of the set of reviews from our SEMANTiCS’18 example, cf. Tab. 1. The
whole picture encodes 27 numerical values: 9 metrics × 3 reviewers. For brevity let us
only point out the ‘good’ and ‘bad’ scores. Reviewer 3 (R3) appreciated the novelty
of the paper (big engine), its evaluation (solid wheels), and also presentation (smiling
face). R1 valued the state of the art (shining headlamp) and technical quality (body style:
cabrio as most suitable for a racing car), and also evaluation (wheels). R2, in turn, only
praised the paper for its high relevance (this would be indicated by the track quality,
however, the difference is too small;5 with an even lower value, the track would change
to dirt or even turf), while the presentation was poor in particular (frowning face). The
reviewer confidence (lower for R1) does not measure the paper quality as such; there-
fore we use an orthogonal visual magnitude paradigm, the color saturation/salience.
Finally, the cars are positioned on the track by their overall scores.


                 Table 1. Numerical scores of the example SEMANTiCS’18 paper

               Original review field          Mapped to metric Rev. 1 Rev. 2 Rev. 3
               Appropriateness                Relevance          4      5      4
               Originality / innovativeness   Novelty            3      3      4
               Implementation and soundness Technical quality    4      3      3
               Related work                   State of the art   4      3      3
               Evaluation                     Evaluation         4      3      4
               Impact of ideas and results    Significance       3      3      3
               Clarity and quality of writing Presentation       3      2      5
               Reviewer’s confidence          Confidence         3      4      4
               Overall evaluation             Overall score      0     -1      2



 5
     For simplicity, all scores except the overall evaluation are mapped to three-valued visual vari-
     ables only, thus 4 and 5 fall to the same interval.
                           Table 2. Proposed mapping between generic review metrics and form fields of KE conferences

                                                                                                               ISWC,
Review                     EKAW
            ECAI (2016)              ESWC (2018)                                  FOIS (2016)     IJCAI (2019) SEMANTiCS             KR (2014)
metric                     (2020)
                                                                                                               (2018)
                                                                                                                                     Relevance
Relevance Relevance        NA        Relevance to ESWC                            NA              Relevance      Appropriateness     of the paper
                                                                                                                                     to KR
                                                                                                                                     Novelty of
                                                                                  Novelty or                     Originality /
Novelty     Originality    Novelty Novelty of the proposed solution                               Originality                        the
                                                                                  innovation                     innovativeness
                                                                                                                                     contribution
                           Techni-
                           cal      Correctness and completeness of the proposed Scientific or
Technical   Technical                                                                             Technical      Implementation      Technical
                           sound- solution; Demonstration and discussion of the technical
quality     quality                                                                               quality        and soundness       quality
                           ness and properties of the proposed approach          quality
                           depth
                                                                                                                                     Discussion
State of the
             Scholarship   NA        Evaluation of the state-of-the-art           References      Scholarship    Related work        of related
art
                                                                                                                                     work
                                     Reproducibility and generality of the
Evaluation NA              NA                                                     NA              NA             Evaluation          NA
                                     experimental study
Signifi-                                                                                                         Impact of ideas and
            Significance   NA        NA                                           NA              Significance                       NA
cance                                                                                                            results
                           Clarity
                           and                                                                    Clarity and                        Quality of
Presenta-   Presentation                                                                                         Clarity and quality
                           quality   NA                                           Presentation    quality of                         the
tion        quality                                                                                              of writing
                           of                                                                     writing                            presentation
                           writing
        Fig. 1. Visual metaphor of three reviews of the example SEMANTiCS’18 paper


5   Future Prospects
The paper presents an initial proof of concept of a review form interoperability frame-
work, plus a review pictogram generator on the top of it. To bring the concept closer to
real usage, we have to undertake experiments determining whether and in what setting
the pictograms provide an added value over numerical tables. Some of the visual vari-
ables adhere to metaphors studied by psychologists [1] (e.g., “linear scales are paths” for
overall score, or “thought is motion” for originality) and might thus be relatively intu-
itive; however, others might require a longer adaptation period. As regards the semantic
web aspects of the research, we plan to submit the current review metric ontology to a
redesign process based on competence questions; review ontologies (such as FAIR and
BIDO), and possibly even multimedia ontologies, are likely to be reused.
The research has been supported by CSF 18-23964S (authors SJ, RS, and PS) and by
VSE IGS no. 43/2020 (authors VS and SJ). The authors are grateful to Jaroslav Svo-
boda, Martin Voldřich and Stanislav Vojíř for their help in setting up the infrastructure,
and to Kristýna Horná for providing the car racing graphics.


References
1. Lakoff, G.: The contemporary theory of metaphor. In A. Ortony (Ed.), Metaphor and thought
   Cambridge, MA: Cambridge University Press (1993).
2. Nguyen V. B., Svátek V., Rabby G., Corcho O.: Ontologies Supporting Research-related Infor-
   mation Foraging Using Knowledge Graphs: Literature Survey and Holistic Model Mapping.
   In: EKAW 2020. Springer LNCS, to appear.