=Paper=
{{Paper
|id=Vol-2721/paper567
|storemode=property
|title=Creating and Exploiting the Mappings from Conference Review Forms to a Generic Set of Review Criteria
|pdfUrl=https://ceur-ws.org/Vol-2721/paper567.pdf
|volume=Vol-2721
|authors=Vojtěch Svátek,Sára Juranková,Radomír Šalda,Petr Strossa,Zdeněk Vondra
|dblpUrl=https://dblp.org/rec/conf/semweb/SvatekJSSV20
}}
==Creating and Exploiting the Mappings from Conference Review Forms to a Generic Set of Review Criteria==
Creating and Exploiting the Mappings from Conference Review Forms to a Generic Set of Review Criteria Vojtěch Svátek1 , Sára Juranková1 , Radomír Šalda2 , Petr Strossa1 , and Zdeněk Vondra2 1 Dept. of Information and Knowledge Engineering, University of Economics, Prague Nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic {svatek|jurs02|petr.strossa}@vse.cz 2 Dept. of Multimedia, University of Economics, Prague Nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic {salr00|zdenek.vondra}@vse.cz Abstract. Conference papers are evaluated according to many criteria reflected in numerical scores, and the wording of the criteria differs among conferences. This makes the role of meta-reviewers tough when summarizing the evaluation across multiple criteria and reviewers. Based on a micro-study within semantic technology conferences, we conjecture that the criteria can, for particular fields, be mapped on generic metrics, and provide a provisional ontological representa- tion for such a mapping and a set of metrics, as well as a manual mapping tool. Finally, we showcase an application exploiting the mappings: a graphics genera- tor that aggregates the review data into a complex pictorial metaphor. 1 Introduction Conference papers are often evaluated according to multiple criteria reflected in numer- ical scores. In large conferences with many reviews per paper this amounts to dozens of partial figures.This sheer number, and the fact that the wording of the criteria differs from one conference to another, make the role of meta-reviewers during the discussion periods difficult, and the effort invested into the detailed scoring may partly get lost. In the research we first explored whether the criteria can be generalized across events within a field such as Semantic Technology (ST) to a small set of review met- rics. Based on the positive outcome of this study, we designed a provisional ontology for representing the mappings between specific review forms and such generic criteria, and developed a simple mapping authoring tool and a mapping execution component. Finally, we developed a tool that demonstrates one possible way of exploiting the map- pings: a review visualizer that assembles the metric values, for a set of reviews of the same paper, into a compound pictogram relying on the racing cars metaphor. Contributions of this paper are thus both the small empirical study and a multi- part demo. In the demo we can demonstrate how: 1) a mapping from a form to the common set of metrics can be created and published, 2) the values for a concrete set of reviews can be manually entered, thus emulating an automatic input from a hypothetical component of a conference review system, and 3) the pictorial scene can be generated. Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). 2 Review Criteria Micro-Study and Mapping to Generic Metrics We analyzed the review forms of nine ST conferences, always for the latest edition we could access as author/s or reviewer/s. We semantically clustered the field labels (refer- ring to the reviewer guidelines where in doubts), yielding seven partial review metrics that we named as in the first column of Tab. 2, plus two global metrics, Confidence and Overall score, present in all forms. The partial criteria converged well despite the vary- ing wording, though some forms missed certain metrics at all. ISWC and SEMANTiCS were clearly influenced by one another, having the same set of fields. ESWC had two fields that we both ranged under ‘Technical quality’. We do not list K-CAP, as it had no partial numerical field; this may be related to the ‘workshop flavor’ of this event. 3 Ontological Representation of Review Forms and Metrics For a review of existing relevant ontologies we can refer to our up-to-date study of research-related ontologies in general [2]. The recently developed FAIR ontology3 cov- ers the overall review process (reviewers, reviews and venues). The associated Review Measures module of the BIDO ontology4 contains, among other, a large collection of individuals corresponding to different rating/confidence scales and their values. None of these ontologies however addresses the semantics of partial review metrics. There- fore, we rapidly prototyped an ontology (not yet considering all best practices, thus likely subject to revisions in the future) that supports the publishing of metrics and their relationships to review forms. The ontology is online at http://kizi.vse.cz/ pictoreview/ontology/, and contains the classes ReviewMetrics, ReviewForm, Re- viewFormField and F2M_Mapping (for the field-to-metric mapping), plus the connect- ing properties. The proposed metrics set (applicable on ST conferences, and proba- bly many other computing field’s ones) is at http://kizi.vse.cz/pictoreview/ metrics/. Finally, a sample mapping (that used in the example below) is at http: //kizi.vse.cz/pictoreview/map/semantics18/. 4 Demo Suite We developed a suite of four simple tools to demonstrate the whole concept. They are bundled by the web page http://pictoreview.vse.cz/. The source code for the first three tools is at https://github.com/jurs02/PictoReviewDev. The first tool allows the user to create a mapping from the custom set of review form fields of a particular event to the proposed set of generic metrics. The mapping can be 1:1, 1:N or N:1. An example of a mapping (for the SEMANTiCS’18 Research Track) is in the first two columns of Tab. 1. The mapping can be currently stored as a JSON structure or as an RDF dataset described by our ontology from Sect. 3. 3 https://sparontologies.github.io/fr/current/fr.html 4 https://sparontologies.github.io/bido-review-measures/current/ bido-review-measures.html The second tool is a simple mapping execution API, which transforms a set of re- view form fillings of a specific conference (a JSON structure) to the generic metrics (also output in JSON), using the JSON mapping (valid for that conference) authored by the first tool. For the N:1 mapping (i.e., of multiple fields to the single metric), a numer- ical mean of the values is computed. Note that the first and second tool together provide (a baseline of) a general review data interoperability infrastructure, usable independent of the rest of the demo; for example, the reviewing emphases of different conferences could be compared based on the mappings. The third tool emulates the role of a hypothetical plug-in to an off-the-shelf review management system (RMS). The user manually enters both JSON data structures ex- pected by the second tool: the (saved) mapping, and the specific review form fillings, for example, those from the last three columns of Tab. 1. The data is then transformed to generic metrics (by the second tool) and passed to the fourth tool. The fourth tool, the pictogram generator, eventually, converts the generic metric values to components of a complex pictorial metaphor. We identified ‘car’ as a rela- tively close metaphor to a research paper, and car components (plus other ‘car race’ features) as visual variables expressing the metrics values. In Fig. 1 we see the visual representation of the set of reviews from our SEMANTiCS’18 example, cf. Tab. 1. The whole picture encodes 27 numerical values: 9 metrics × 3 reviewers. For brevity let us only point out the ‘good’ and ‘bad’ scores. Reviewer 3 (R3) appreciated the novelty of the paper (big engine), its evaluation (solid wheels), and also presentation (smiling face). R1 valued the state of the art (shining headlamp) and technical quality (body style: cabrio as most suitable for a racing car), and also evaluation (wheels). R2, in turn, only praised the paper for its high relevance (this would be indicated by the track quality, however, the difference is too small;5 with an even lower value, the track would change to dirt or even turf), while the presentation was poor in particular (frowning face). The reviewer confidence (lower for R1) does not measure the paper quality as such; there- fore we use an orthogonal visual magnitude paradigm, the color saturation/salience. Finally, the cars are positioned on the track by their overall scores. Table 1. Numerical scores of the example SEMANTiCS’18 paper Original review field Mapped to metric Rev. 1 Rev. 2 Rev. 3 Appropriateness Relevance 4 5 4 Originality / innovativeness Novelty 3 3 4 Implementation and soundness Technical quality 4 3 3 Related work State of the art 4 3 3 Evaluation Evaluation 4 3 4 Impact of ideas and results Significance 3 3 3 Clarity and quality of writing Presentation 3 2 5 Reviewer’s confidence Confidence 3 4 4 Overall evaluation Overall score 0 -1 2 5 For simplicity, all scores except the overall evaluation are mapped to three-valued visual vari- ables only, thus 4 and 5 fall to the same interval. Table 2. Proposed mapping between generic review metrics and form fields of KE conferences ISWC, Review EKAW ECAI (2016) ESWC (2018) FOIS (2016) IJCAI (2019) SEMANTiCS KR (2014) metric (2020) (2018) Relevance Relevance Relevance NA Relevance to ESWC NA Relevance Appropriateness of the paper to KR Novelty of Novelty or Originality / Novelty Originality Novelty Novelty of the proposed solution Originality the innovation innovativeness contribution Techni- cal Correctness and completeness of the proposed Scientific or Technical Technical Technical Implementation Technical sound- solution; Demonstration and discussion of the technical quality quality quality and soundness quality ness and properties of the proposed approach quality depth Discussion State of the Scholarship NA Evaluation of the state-of-the-art References Scholarship Related work of related art work Reproducibility and generality of the Evaluation NA NA NA NA Evaluation NA experimental study Signifi- Impact of ideas and Significance NA NA NA Significance NA cance results Clarity and Clarity and Quality of Presenta- Presentation Clarity and quality quality NA Presentation quality of the tion quality of writing of writing presentation writing Fig. 1. Visual metaphor of three reviews of the example SEMANTiCS’18 paper 5 Future Prospects The paper presents an initial proof of concept of a review form interoperability frame- work, plus a review pictogram generator on the top of it. To bring the concept closer to real usage, we have to undertake experiments determining whether and in what setting the pictograms provide an added value over numerical tables. Some of the visual vari- ables adhere to metaphors studied by psychologists [1] (e.g., “linear scales are paths” for overall score, or “thought is motion” for originality) and might thus be relatively intu- itive; however, others might require a longer adaptation period. As regards the semantic web aspects of the research, we plan to submit the current review metric ontology to a redesign process based on competence questions; review ontologies (such as FAIR and BIDO), and possibly even multimedia ontologies, are likely to be reused. The research has been supported by CSF 18-23964S (authors SJ, RS, and PS) and by VSE IGS no. 43/2020 (authors VS and SJ). The authors are grateful to Jaroslav Svo- boda, Martin Voldřich and Stanislav Vojíř for their help in setting up the infrastructure, and to Kristýna Horná for providing the car racing graphics. References 1. Lakoff, G.: The contemporary theory of metaphor. In A. Ortony (Ed.), Metaphor and thought Cambridge, MA: Cambridge University Press (1993). 2. Nguyen V. B., Svátek V., Rabby G., Corcho O.: Ontologies Supporting Research-related Infor- mation Foraging Using Knowledge Graphs: Literature Survey and Holistic Model Mapping. In: EKAW 2020. Springer LNCS, to appear.