<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantification of CEUR-WS with Wikidata as a target Knowledge Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wolfgang Fahl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Holzheim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Decker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Christoph Lange</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer FIT</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>RWTH Aachen University</institution>
          ,
          <addr-line>Computer Science i5, Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>3352</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>For modern scholarly communication infrastructure, knowledge graphs are already ubiquitous. Some platforms of the publication lifecycle still struggle with catching up. In this paper, we use the example of the CEUR-WS publishing platform to show a viable approach to transition from a traditional HTML/PDF/text based environment to a single point of truth that separates the data and metadata storage from its presentation. This is possible using a public infrastructure such as Wikidata, which minimizes the maintenance efort and improves the publishing workflow. The CEUR Workshop Proceedings (CEUR-WS) publishing platform (https://ceur-ws.org/) has been introduced in 1995 as a means to publish proceedings of scientific workshops (and smaller conferences) in computer science. Technically HTML, PDF and a filesystem directory hierarchy are the core formats being used and delivery is via the HTTP and FTP protocols. There have been multiple attempts in the past to make the metadata of the CEUR-WS platform available for computer based analysis and querying. None of these attempts has been consistent and continuous so far. We report on the successful start of semantification of CEUR-WS with Wikidata as a target knowledge graph with the goal to achieve consistency and continuity for the future. The challenge was in handling the textual natural language description parts of the CEUR-WS content that is inherently still part of the semi-digitized approach of using HTML and PDF. We propose to have a better separation of concerns of metadata, display and storage and started implementing it.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Publishing</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Metadata Extraction</kwd>
        <kwd>Named Entity Linking</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>NLP</kwd>
        <kwd>RDF</kwd>
        <kwd>Semantification</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Wikidata</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern scientific publishing infrastructure uses knowledge graphs as a native basis. Google
Scholar1 is a popular and famous example. Microsoft Academic Graph [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] had the “graph” in its
name. Although Microsoft terminated the service by end of 2021 the idea and content has been
picked up and extended by OpenAlex [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Publishing and indexing platforms that are based on
older technology such as MARC [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], PICA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or XML struggle to catch up but gradually move
to graph based approaches as DBLP did in March 20222 by ofering SPARQL dumps that may be
queried, e.g., using the QLever dblp SPARQL endpoint3 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1.1. CEUR Workshop Proceedings</title>
        <p>CEUR-WS is a free online service that provides open access to proceedings of workshops (and
smaller conferences) in the field of computer science and information technology, hosted by
by RWTH Aachen University’s Chair for Information Systems and Databases. CEUR-WS is
operated by the CEUR-WS Editors – a team of unpaid volunteers – working as a non-profit
organization. Over 3,300 Volumes containing over 65,000 PDF documents in total have been
published until March 2023.</p>
        <p>
          CEUR-WS does not use a Content Management System for publishing but relies on pure
HTML and PDF for rendering its public website4. The metadata for these publications is only
indirectly available by indexing services such as dblp [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and K10plus[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]5. Unfortunately both
dblp and k10plus do not have a complete set of metadata records for all volumes as of March
2023 and in both cases there is a delay of some weeks/months before new volumes are picked
up for indexing.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. The Trend towards FAIR Data and Open Science: Semantification</title>
        <p>
          Since inception the FAIR principles [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] have been a success. They have been adopted by various
industries (e.g., the pharmaceutical industry) and national and international projects (e.g., the
Common European Dataspaces). Persistent Identifiers (PIDs) and rich metadata are the core
components of the FAIR principles, and they provide the means to create Knowledge Graphs [9]
.
        </p>
        <p>Representing digital traces of scholarly communication in Knowledge Graphs (KGs) [10] is
useful for supporting use cases such as literature search and recommendation of events for
attendance or publishing. The metadata of the most relevant entities as outlined in Figure 16
need to be made available to ofer such a knowledge graph.</p>
        <p>The term “Semantic Web” [11] has been coined by Tim-Berners Lee et al. to describe the
efect of resources on the Web interlinked making use of such metadata. Therefore we choose
“Semantification” as the title of the project and this paper to describe the process of creating a
Knowledge Graph and making the results available in Semantic Web fashion. The
Semantification of CEUR-WS has been attempted multiple times in the past [12, 13] – always under the
assumption that a local RDF/SPARQL endpoint would be the goal to achieve. The results have
not been consistent and durable since the publishing workflow has not been adapted and the
single point of truth for the metadata is still buried in the HTML/PDF/text documents. The
2https://blog.dblp.org/2022/03/02/dblp-in-rdf/
3https://qlever.cs.uni-freiburg.de/dblp
4https://ceur-ws.org
5https://dblp.org/, https://opac.k10plus.de
6The original SVG graphic is clickable and leads to the corresponding Wikidata properties and entity types
afiliation
proceedings
institution
published in
is proceedings of</p>
        <p>event
event series
cites
paper
scholar
author
editor
part of the series
fear of maintenance follow-up problems, given that CEUR-WS is a non-profit service with no
budget, was a major obstacle.</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Challenges in making CEUR-WS more FAIR via Semantification</title>
        <p>
          To semantify CEUR-WS, the following requirements where most relevant:
• the metadata should follow the FAIR [
          <xref ref-type="bibr" rid="ref8">14, 8</xref>
          ] principles
– F1: (Meta)data are assigned globally unique and persistent identifiers (see
Section 2.2).
– F2: Data are described with rich metadata.
– F3: Metadata clearly and explicitly include the identifier of the data they describe.
– F4: (Meta)data are registered or indexed in a searchable resource.
• relevant queries should be supported, as derived from the original set of queries of the
2014 Semantic publishing challenge as outlined in Section 2.1
• the metadata should reuse an established ontology
• the manual and automatic curation of entries should be possible with public access for all
stakeholders, e.g., editors, authors, organizers, publishers, indexers
• the infrastructure should be stable and there should be suficient trust in its long term
availability
• an open source non-profit infrastructure is preferred since this is also the mode of
operation of CEUR-WS
        </p>
        <p>Given the HTML/PDF/text input of CEUR-WS, we need to create the corresponding
Knowledge Graph and separate the single-point-of-truth computer readable metadata from the diferent
representations such as HTML so that the above requirements are fulfilled.</p>
        <p>Both the HTML and PDF encoding of the original scientific content are structured for the
purpose of optimizing the display / output on paper or screens; therefore, there is a structure
scientist
writes findings with text processor</p>
        <p>e.g LaTeX/Word
community</p>
        <p>text processor document
reads findings</p>
        <p>renders e.g. to PDF/HTML
display document</p>
        <p>publisher
submit for publishing</p>
        <p>prints/puts online
loss compared to what was originally available in the text document processor files that the
authors might have been using [15]. In Figure 2 shows the part of the publishing process
where the rendering step causes this loss. Most scholars are not aware of this loss in the daily
use of published content just because the documents are optimized for display and human
consumption [16]. For the metadata extraction and use in knowledge graphs the diference is
sometimes disastrous, e.g., when a simple text from a PDF document can not be extracted any
more due to exotic styling and formatting or simply because only a scanned graphic image
version of an older document is available that needs Optical Character Recognition to extract
the textual content.</p>
        <p>The metadata needed for creating the knowledge graph is only available in natural
language/text form and follows rules that have been changed multiple times during the history of
CEUR-WS. From 2013 to 2023, there have been 33 diferent versions of the index file template
with no proper tracing of what was changed from version to version.</p>
      </sec>
      <sec id="sec-1-4">
        <title>1.4. Contributions of this Paper</title>
        <p>Our first contribution is to provide the tools, infrastructure and approaches to modify the
CEURWS publication workflow to consistently supply FAIR high quality metadata (Semantification).</p>
        <p>Secondly, splitting the metadata for the three main entities Event, Event Series and
Proceedings is a major step towards improving the metadata quality, e.g., by allowing event series
data to be checked for completeness and be completed from diferent sources where possible.
Unfortunately, currently non of the stakeholders have the motivation to do this completion,
while it is valuable, e.g., for assessing the quality of an event series. The necessary change
of perspective to convince the stakeholders is a chicken-egg problem, which the CEUR-WS
semantification will help to facilitate.</p>
        <p>Provide a bootstrapping [17, 18] approach to allow for getting rid of manual editing of the
CEUR-WS website content and instead using a CMS approach based on the single-point-of-truth
metadata that separates the concerns of storage and display. Making sure the results are already
visible and usable during the ongoing project.</p>
        <p>These contributions are fit to be generally applied to other publishing use cases.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Semantic Publishing challenge</title>
        <p>The Semantification of CEUR-WS has been publicly pursued by the Semantic Publishing
challenge [12] from 2014. Given an excerpt of CEUR-WS, scholars were asked to prepare a RDF
knowledge graph to allow for a set of 20 queries to be answered to complete Task 1.</p>
        <p>One original task of the Semantic Publishing Challenge (SemPub2015)7 read as follows: Task
1: Extraction and assessment of workshop proceedings information. Participants are required to
extract information from a set of HTML tables of contents, partly including microformat and
RDFa annotations but not necessarily being valid HTML, of selected computer science workshop
proceedings published with the CEUR-WS.org open access service. The extracted information is
expected to answer queries about the quality of these workshops, . . . .</p>
        <p>Kolchin et al. [19, 20] have submitted results to the challenge twice with an approach using
XPath Queries on the HTML DOM markup and converted the results directly to triples; see
ceur-ws-lod repository on GitHub8. The reusability of this approach is limited since the parsing
and generation code are intermixed.</p>
        <p>Sateli and Witte [13] applied the GATE framework [21, 22] and a pipeline to create triples
from the parsing result [23]; see also their supplementary material9.</p>
        <p>The 2015 work of Milicka and Burget [24] used awk text pattern matching10 as a tool to parse
the table of content files per Volume.</p>
        <p>The objective of these attempts was to create a local SPARQL endpoint to allow to perform
the required queries and fulfill the role of a target knowledge graph and point of truth.</p>
        <p>Tasks 2 and 3 called for the detailed analysis of the PDF files.</p>
        <p>All challenge contributions had a purely scientific focus and where not fit for making the
results operational and being used in the actual CEUR-WS publishing workflow.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Persistent Identifiers in a scholarly publishing context</title>
        <p>A study about Linked Data [25] found that each year about 10% of Linked Data URIs are no
longer dereferencable. One way to mitigate the issue is to introduce persistent identifiers (PIDs),
which aim to fulfill the following principles [ 26]: longevity, scalability, extensibility, and security.
As noted in [27], the importance to ensure the longevity of PIDs is that “persistence is purely a
matter of service”. Thus, PIDs can only remain persistent if someone is committed to ensuring
they stay accessible to users. This requires an engagement or a service level agreement for PID
availability, in contrast to URIs, where no such agreement exists.</p>
        <p>7https://github.com/ceurws/lod/wiki/SemPub2015
8https://github.com/ailabitmo/ceur-ws-lod
9https://www.semanticsoftware.info/sempub-challenge-2015
10https://github.com/FitLayout/ToolsEswc/tree/master/awk</p>
        <p>As [9] notes, PIDs can be resolved via a URI, which follows the first principle of the Den Haag
Manifesto from 201111. With this alignment, Semantic Web tools, standards, and concepts to
link, map, query, and integrate diferent data formats and data sources, and knowledge graphs
become usable data based on the FAIR data principles.</p>
        <p>Franken et al. [28] have promoted the idea of using persistent identifiers (PIDs) for scientific
events in the same way as there are already persistent identifiers for papers (DOI), people
(ORCID), organisation (GRID, ROR) and books (ISBN). They argue that it is also becoming more
and more common practice to use PIDs to identify other important entities or objects. But as
mentioned by Bryl et al. [29], it will only be beneficial if more metadata is provided and the PID
is actively used to interlink with other entities.</p>
        <p>Introducing PIDs in form of DOIs to CEUR-WS brings up the follow-up problem of who should
be responsible for minting the DOIs, when the minting should be done and what the target URL
of the DOI should be - not all organizers might like the landing page not to be under their own
control. Wikidata Entity identifiers (Q-identifiers) seem to be a better alternative, since a rich
set of other identifiers may be linked to any Wikidata Entity including DOIs, homepages and
local and internationally known library and commercial and non-commercial indexing service
identifiers.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Metadata Extraction from PDF, HTML and Text</title>
        <p>As part of the Scholia Open Source Project12, Nielsen [30] created a scraper tool capable of
creating QuickStatements [31] output; see scrape/ceurws.py13. Using the scrape/QuickStatements
chain allows for creating Wikidata entries for each paper. The Vol-3184/paper414 Wikidata
entry has been created this way by us to show the efect. Unfortunately, the author name string
(P2093)15 property is used instead of immediately doing the disambiguate step for the author
strings.</p>
        <p>Proceedings Title Parser16 [32] has a CEUR-WS parsing mode, which already had the RDFa
extractor capability (allowing to cover more recent CEUR-WS volumes using that markup style).
Part of this work has been reused and extended to a fully fledged parser in the work we are
reporting here.</p>
        <p>CERMINE (Content ExtRactor and MINEr) [33] is a software library and a web service17 for
extracting metadata and content from PDF files containing academic publications. The text
content is analysed and a structured XML document containing metadata on, e.g., authors
and citations created 18. The lookup features of CERMINE are limited, e.g., to finding the ISO
country code of the country of an institution an author is afiliated with.</p>
        <p>GROBID (GeneRation Of BIbliographic Data) [34] has been gradually extended to be a machine
learning library for extracting, parsing and re-structuring raw documents such as PDF into
PDF</p>
        <p>Text HTML</p>
        <p>regenerate
CEUR-WS
Single Point
of Truth
Proceedings</p>
        <p>Paper</p>
        <p>Author
Semanti ed Metadata Records
Editor</p>
        <p>Af liation</p>
        <p>Event
Series
publish
Preprocessing</p>
        <p>Tokenization</p>
        <p>NER
Matching / Reconciliation</p>
        <p>NEL</p>
        <p>ID Lookup /
Enrichment</p>
        <p>Metadata Records
Proceedings Paper</p>
        <p>Event
Series</p>
        <p>Editor
Af liation
structured XML/TEI encoded documents with a particular focus on technical and scientific
publications. We have tried out GROBID on over 50,000 papers so that the results are now a
further potential source for disambiguation according to the original tasks 2 and 3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. CEUR-WS Semantification</title>
      <sec id="sec-3-1">
        <title>3.1. Overview</title>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Preprocessing</title>
        <p>To extract the relevant markup elements we use the BeautifulSoup4 python library with additions
to handle the RDFa-like annotations that have been applied in newer CEUR-WS volumes. The
main obstacle for the extraction of the markup elements is that so far 33 diferent versions
have been used for the volume pages, which were often also edited manually, resulting in small
diferences even within the versions. The usage of the diferent page versions follows a long-tail
Zipfian distribution with 5 versions covering 60% of all volumes.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Tokenization</title>
        <sec id="sec-3-3-1">
          <title>3.3.1. Disambiguation using Event Signatures</title>
          <p>As outlined in Section 2.2 PIDs would be useful for uniquely identifying scientific events. While
PIDs are not available, it is necessary to use a quasi-identifier [ 35] consisting of a set of metadata
elements that we call “Event Signature”. There is neither a standardized definition of event
signatures nor a recommendation for their use in references and proceedings titles. Retrieving
the signature from the volume’s textual description is a core step in creating the CEUR-WS
knowledge graph.</p>
          <p>As part of [28], the main author has shown that a typical scientific event “signature” consists
of the following metadata (the example event being ISWC 201919 The Semantic Web – ISWC
2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019):
acronym a short name for the conference, often consisting of 3 to 8 upper case letters trying
to be unique but actually often being ambiguous. For instance, ISWC may refer to
the International Semantic Web Conference or to the International Symposium on
Wearable Computing.
frequency annual, biennial, triennial – most events have an annual frequency and this is
mostly not stated explicitly.
event reach target reach of the conference such as international, European, East Asian
event type such as Conference, Workshop, Symposium
year a two or four digit reference to the year in which the event took place – not to be confused
with the year of publication of the proceedings, which might be diferent ( 2019)
ordinal often used to enumerate the conference series instances (18th)
date start date and end date or date range of the conference (October 26–30)
location description of the location of the conference often consisting of country, region and
city – sometimes with details about the exact venue. (Auckland, New Zealand)
title the title often contains scope, type and subject of the conference (International Semantic</p>
          <p>Web Conference)
subject description what the conference is about often prefixed with “on” ( Semantic Web)
delimiters a variety of syntactic delimiters such as blanks, comma, colon, brackets are used
depending on the citation style.</p>
          <p>The event signature needs to be extracted from the CEUR-WS main and volume tables of
content and stored as triples for the target KG.</p>
          <p>The distinction between proceedings, event and event series needs to be made – therefore
the result needs to be split and disambiguated against existing entries in the target KG.</p>
          <p>The mapping as outlined in Table 1 has been used to map to the “event” and “proceedings”
entry in Wikidata. The top rows of the table show the common properties of the proceedings
and event item entries, followed by the special properties for event followed by the special
properties for proceedings.</p>
          <p>The series entry for the Text2KG example has been created and the new “colocated” property is
iflled for this example. The Text2KG Workshop series scholia overview 20 shows the connections.</p>
          <p>As an example, we are using the Wikidata entry for the Text2KG@ESWC-202221
workshop, whose proceedings have been published as CEUR-WS Volume 318422 with the Wikidata
proceedings item being shared by another workshop (a special but still frequent case).
20https://scholia.toolforge.org/event-series/Q116982161
21https://www.wikidata.org/wiki/Q113512465
22https://ceur-ws.org/Vol-3184/</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Named Entity Recognition and Linking (NER &amp; NEL)</title>
        <p>The Named Entity Recognition (NER) and Named Entity Linking (NEL) tasks for the CEUR-WS
Semantification are based on the textual input from the HTML markup that needs parsing
into tokens that represent entities and then matching the textual content of the tokens against
Wikidata entries that might need disambiguation. The semi-structured HTML markup helps
hereby to reduce the input complexity for the parsing of the diferent entity types and therefore
increases the accuracy of the matching process [36]. Items to disambiguate are derived from
the event signature as outlined in section 3.3.1: Volumes, Papers, Editors, Authors, Locations
(Country/Region/City), Dates, Ordinals, Acronym, Homepage.</p>
        <p>For the phase of the project we are reporting, the mass creation of Proceedings and Event
entries was in the focus. The Paper, Editor and Author disambiguation and Event series
completion has been prepared and example results are available to show that the elements are
available and may be systematically created and queried.</p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Location NER and NEL</title>
          <p>The location of an event is described in the table of contents in span elements usually classified
as CEURLOCTIME. Their value contains semi-structured information about the event’s location
and date range, e.g., “Hersonissos, Greece, May 30th, 2022”. Besides the varying formats of
the location and date definition the location information can be fairly easy separated from the
date. This leaves a string that should contain information about the city and country. There
exist cases were also the region or venue is named, and since more and more conferences
moved to virtual meetings since 2020, the location string could also contain indications for
that. Since location string can be identified on extraction Named Entity Recognition (NER)
and Named Entity Linking (NEL) are done in one step to get Wikidata Qids of the mentioned
locations. For the NEL we used geograpy3, a Python library that has a database with the labels
of countries, regions and cities in multiple languages linking to the corresponding Wikidata Qid.
The response of this label lookup is a list of possible locations of the aforementioned categories.
The list is sorted by the category order city, region, country were the cities are also ordered by
population. To this list we apply again a ranking since we know that in most cases the country
is named within the string so we can verify that we select a city were its country was also
detected. For the given example the result would then be “Hersonissos, Greece” → Chersonesos
Irakliou (Q1018106)23 (Greece (Q41)24)</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Editors and Authors</title>
          <p>The author and editor name disambiguation is one of the main challenges of libraries and
indexers [37]. Due to the common occurrence of duplicate names, abbreviations of first names,
typos and encoding errors disambiguation is an expensive and error prone task if high accuracy
is aimed.</p>
          <p>In CEUR-WS, the editor information is given in an HTML element containing the editor
signature usually given name and family name with a reference to the afiliations as shown in
23https://www.wikidata.org/wiki/Q1018106
24https://www.wikidata.org/wiki/Q41
&lt;b&gt; TEXT2KG edited by &lt;/b&gt;
&lt;/p&gt;&lt;h3&gt;
&lt;span class="CEURVOLEDITOR"&gt;Sanju Tiwari&lt;/span&gt; ... 1
&lt;span class="CEURVOLEDITOR"&gt;Nandana Mihindukulasooriya&lt;/span&gt; ... 2
&lt;span class="CEURVOLEDITOR"&gt;Francesco Osborne&lt;/span&gt; ... 3 4
&lt;span class="CEURVOLEDITOR"&gt;Dimitris Kontokostas&lt;/span&gt; ... 5
&lt;span class="CEURVOLEDITOR"&gt;Jennifer D’Souza&lt;/span&gt; ... 6
&lt;span class="CEURVOLEDITOR"&gt;Mayank Kejriwal&lt;/span&gt; ... 7
&lt;/h3&gt;</p>
          <p>Listing 1: CEUR-WS volume page HTML markup excerpt of editor definition
Listing 1. For newer publications ORCIDs of Authors and Editors might be directly available
from the PDF input. For older volumes, only the plain name, afiliation and no identifier is
provided that could simplify the disambiguation. Fortunately, around 80% of the volumes are
indexed at dblp. dblp provides high quality disambiguated data about the proceeding editors
and paper authors also accomplished through manual curation [38]. Therefore, extracting
the editors and resolving the name to an identifier by looking up the dblp id seems to be the
best option. The same strategy as used for the editors also applies to the authors but here the
afiliation needs to be extracted from the paper PDF.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. ID Lookup/Enrichment</title>
        <sec id="sec-3-5-1">
          <title>3.5.1. dblp and k10plus Volume matching and linking</title>
          <p>dblp and k10plus records may be trivially matched by the unique identifier volume number of a
proceeding combined with the URN of the CEUR-WS proceedings series.</p>
          <p>We are using the QLever dblp SPARQL endpoint mentioned in Section 1 to match CEUR-WS
volumes against dblp entries by volume number.</p>
          <p>
            For the k10plus matching we use the catmandu library which allows to query the PICA [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]
based k10plus database for URN matches see the PPN/Volume/WikidataItem matching SPARQL
Query25.
          </p>
          <p>Based on the resolved IDs from the ID lookup and NEL, we can now enrich our data by
querying additional or missing data. For example in Volume 335626, only “Tokyo” is defined as
location; linking the string to Tokyo (Q1490)27 then enables querying for the missing country
information. Similar enrichments are done for editors and authors to complete the records.</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Decision to use Wikidata as a target</title>
        <p>Wikidata [39] is a knowledge graph based on an RDF triple store that has been successfully
used to gather and link metadata of scholarly communication artifacts [30].
25https://w.wiki/6Qm5
26https://ceur-ws.org/Vol-3356/
27https://www.wikidata.org/wiki/Q1490</p>
        <p>In 2022, we decided to directly target Wikidata instead of trying to set up our own RDF/SPARQL
endpoint as outlined in Section 2.1. Wikidata is well suited to to handle the challenges listed in
Section 1.3.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Workshop colocated with conference</title>
        <p>Most CEUR-WS Volumes are proceedings of workshops, whose majority is colocated with a
conference. For this “colocation” relation there was no specific property in Wikidata when
we started the semantification. Kolchin [ 20] had already pointed out that “BIBO doesn’t have
an event is part of bigger event semantics” in 2015 so the need was long known. We initiated
the creation of P11633 (colocated with)28 by starting a property proposal29 according to the
Wikidata’s property proposal process30 which states “When after some time there are some
supporters, but no or very few opponents, the property is created by a property creator or an
administrator.”</p>
        <p>One further criterion was that at least three examples need to be supplied. Unfortunately we
did present a few dozen examples but not in the format expected which held up the process by
a few months.</p>
        <p>After that there was a lively and productive discussion that lead to the clarification that the
property is asymmetric. The property was considered as highly relevant and well defined. After
almost a year of preparation and discussion the Property is now available and shall be used in
the future.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation and Demos</title>
      <sec id="sec-4-1">
        <title>4.1. Open Source</title>
        <p>The Python library for the CEUR-WS semantification including the source code for the CEUR-WS
Volume browser31 is available as open source32.</p>
        <p>A prototype for the presentation33 of the CEUR-WS Semantification results has been created
using Semantic MediaWiki [40] which is using the same open source Platform as Wikipedia but
with extensions for markup that is transformed to RDF triples, which leads to “Semantification”
of the Wiki.</p>
        <p>A GitHub project for the single-point-of-truth metadata handling and conversion to diferent
representations has been started at ceurws/ceur-spt34.</p>
        <p>Further background research material is supplied via the Semantic MediaWikis for Wolfgang
Fahl’s PhD35 (public) and the ConfIDent requirements wiki36 (access on request).
28https://www.wikidata.org/wiki/Property:colocated with
29https://www.wikidata.org/wiki/Wikidata:Property_proposal/colocated_with
30https://www.wikidata.org/wiki/Wikidata:Property_proposal
31http://ceur-ws-browser.bitplan.com/
32https://github.com/WolfgangFahl/pyCEURmake
33http://ceur-ws.bitplan.com
34https://github.com/ceurws/ceur-spt
35https://cr.bitplan.com/index.php/Category:Text2KG
36http://rq.bitplan.com</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. CEUR-WS Volume Browser</title>
        <p>Figure 4 shows a screen shot of the CEUR-WS Volume Browser which we created as a means
to support semantification tasks such as transferring meta-data of recently added volumes to
Wikidata as well as showing the available index entries for volumes that have been published
for a few weeks/months already. The example shown is for CEUR-WS Volume 3262 Wikidata
Workshop 202237.</p>
        <p>Figure 5 shows an enlarged section of the screenshot where the links between the
proceedings volumes and the external knowledegraphs are presented. For the example these are
wikidataitem38, dblp39, k10plus40, and the scholia links to the proceedings, event and event
series41 (which has links to the event and proceedings).</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. CEUR-WS Semantic MediaWiki</title>
        <p>The CEUR-WS Semantic MediaWiki is available as a prototype as depicted in Figure 6 that
shows how a content management system approach may be applied to the metadata, which
37https://ceur-ws.org/Vol-3262/
38http://www.wikidata.org/entity/Q115053286
39https://dblp.org/db/conf/semweb/wikidata2022
40https://opac.k10plus.de/DB=2.299/PPNSET?PPN=1830580760
41https://scholia.toolforge.org/event-series/Q106429025
allows for new features such as full text search. Semantic MediaWiki is a useful prototyping
tool, since it allows to try out semantic properties and relations that are not yet fit for full public
exposure via Wikidata. The example screenshot shows how a MediaWiki displays links with
non-existing targets in red, allowing to judge the coverage of the disambiguation easily.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Disambiguation</title>
        <p>With our extraction method for editors we are able to obtain 11,764 editor signatures from 3354
volumes. Comparing these signatures against the editor records, we were able to query from
dblp, it showed that 9321 signatures (4942 unique editors) can be linked to dblp and thus have
at least a DBLP author id (P2456)42. For 2233 volumes this means that all their editors can be
extracted and disambiguated to a dblp author id. But it also showed that for 387 volumes the
extraction method returned fewer editors as defined in dblp with the majority of these volumes
being the early 500 volumes which were manually created with a high variety in format.</p>
        <p>With the goal to enter the editors into Wikidata, the editor signature also needs to be
disambiguated to a Qid. Having the DBLP author id greatly helps in the disambiguation process
as it allows to query dblp for more person identifier. This list of identifiers can than be used
to query Wikidata to check if a person exists with at least on of those identifiers. The check
against Wikidata showed that 1467 editors could be identified and it also showed 62 conflicting
items. We found that dblp’s coverage of person metadata synchronized with Wikidata is already
leading to only 77 CEUR-WS editors missing. Applying the disambiguation results and linking
to CEUR-WS and dblp metadata will require mass editing of Wikidata via a special CEUR-WS
bot, which requires approval by Wikidata before being applicable in the next project phase.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Metadata Query capability</title>
        <p>Having the CEUR-WS metadata available in Wikidata allows for standard SPARQL queries,
e.g., using the Wikidata Query Service to be applied to analyze it. Figure 4 shows a map of
the distribution of event locations created with such a query. The relevance of the original set
of 20 queries for Task 1 that were set as a benchmark in 2014 for diferent stakeholders was
subjectively rated by us to sort the queries by priority43. The most relevant queries and the 5
queries Q1.5, Q1.12, Q1.13, Q1.16 and Q1.17 that rely on the main index have been implemented
as SPARQL queries44 that are compatible with the Wikidata Query Service endpoint to prove
that our approach covers the intentions of the original challenge. Our result supplies even more
capabilities given the option to do federated SPARQL queries over the connected Wikidata,
dblp and k10plus knowledge graphs. The use of Wikidata ids as persistent identifiers is a core
success factor here.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Further Evaluation</title>
        <p>Making the CEUR-WS Volume metadata available on Wikidata has improved the indexing
Coverage to 100% of all valid Volumes compared to 69% for k10plus and 76% for dblp.</p>
        <p>The timeliness of the CEUR-WS metadata in Wikidata is much higher than for dblp or k10plus.
For dblp it takes a few days to weeks, for k10plus it may take weeks to months before the
metadata shows up. The Wikidata update may be done immediately when publishing with
no delay. With the separation of event and proceeding entries it is now possible to show
future events for which there are no proceedings available yet as soon as the events have been
announced and later link the detailed proceedings metadata to the event record.
43https://cr.bitplan.com/index.php/List_of_Queries
44https://cr.bitplan.com/index.php/Semantic_Publishing_Challenge_Queries
)
se2 taaD
haP ewN
(
extract and
reconcile</p>
        <p>generate pages
add new volumes
DOI minting
generate HTML
view
exchange event metadata
keep in sync
keep in sync
keep in sync</p>
        <p>K10plus</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We have presented the first steps of CEUR-WS Semantification that result in the metadata
of CEUR-WS Volumes being available in Wikidata. The linking of the relevant entities for
workshops, the conferences these workshops might be colocated with, the event series the
workshops and events might form as well as the linking to editor, author and paper entries and
the afiliated institutions has been prototyped.</p>
      <p>The cross-linking with dblp and k10plus has been performed and may now be continuously
applied in the future.</p>
      <p>Given that all four involved meta data sources – CEUR-WS, Wikidata, dblp and k10plus –
have a lot of manual curation involved, data quality errors which derive from human errors still
have to be mitigated with the goal to achieve a lower error rate than would be possible with
manual eforts alone.</p>
      <sec id="sec-6-1">
        <title>6.1. Future Work</title>
        <p>Acknowledgements. Main open source libraries being used by the work described:
BeautifulSoup445, Catemandu46, justpy47, geograpy348 py-yprinciple-gen49</p>
        <p>We would like to thank Jakob Voß for helping with the k10plus matching and creating the
Wikidata property colocated with (P11633)50 in due time.</p>
        <p>This paper is dedicated to the memory of CEUR-WS board member Ralf Klamma † January
2023.</p>
        <p>This research has been partly funded by a grant of the Deutsche Forschungsgemeinschaft
(DFG). 51
45https://pypi.org/project/beautifulsoup4/
46https://github.com/LibreCat/Catmandu
47https://github.com/justpy-org/justpy
48https://github.com/somnathrakshit/geograpy3
49https://github.com/WolfgangFahl/py-yprinciple-gen
50https://www.wikidata.org/wiki/Property:P11633
51ConfIDent project; see https://gepris.dfg.de/gepris/projekt/426477583</p>
        <p>T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A.
GonzalezBeltran, A. J. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. ’t Hoen, R. Hooft,
T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson,
P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater,
G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop,
A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, B. Mons, The FAIR guiding
principles for scientific data management and stewardship, Scientific Data 3 (2016). URL:
https://doi.org/10.1038/data.2016.18. doi:10.1038/sdata.2016.18.
[9] H. Cousijn, R. Braukmann, M. Fenner, C. Ferguson, R. van Horik, R. Lammey, A. Meadows,
S. Lambert, Connected Research: The Potential of the PID Graph, Patterns (New York,
N.Y.) 2 (2021) 100180. doi:https://doi.org/10.1016/j.patter.2020.100180.
[10] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. de Melo, C. Gutierrez, S. Kirrane,
J. E. L. Gayo, R. Navigli, S. Neumaier, A.-C. Ngonga Ngomo, A. Polleres, S. M. Rashid,
A. Rula, L. Schmelzeisen, J. Sequeda, S. Staab, A. Zimmermann, Knowledge graphs, ACM
Computing Surveys 54 (2021) 1–37. URL: https://doi.org/10.1145/F3447772. doi:10.1145/
3447772.
[11] T. Berners-Lee, J. Hendler, O. Lassila, The semantic web. a new form of web content that is
meaningful to computers will unleash a revolution of new possibilities., Scientific American
284 (5) (2001) 34–43. URL: https://www.scientificamerican.com/article/the-semantic-web/.
[12] C. Lange, A. Di Iorio, Semantic publishing challenge – assessing the quality of scientific
output, in: Communications in Computer and Information Science, Springer International
Publishing, 2014, pp. 61–76. URL: https://doi.org/10.1007/978-3-319-12024-9_8. doi:10.
1007/978-3-319-12024-9_8.
[13] B. Sateli, R. Witte, Automatic construction of a semantic knowledge base from CEUR
workshop proceedings, in: Semantic Web Evaluation Challenges, Springer International
Publishing, 2015, pp. 129–141. URL: https://doi.org/10.1007/978-3-319-25518-7_11. doi:10.
1007/978-3-319-25518-7_11.
[14] GO FAIR International Support and Coordination Ofice, Fair principles, 2019. URL: https:
//www.go-fair.org/fair-principles/.
[15] W. Fahl, The history of scientific publishing, 2023. URL: https://cr.bitplan.com/index.php/</p>
        <p>The_History_of_Scientific_Publishing.
[16] C. Yu, C. Zhang, J. Wang, Extracting body text from academic PDF documents
for text mining, CoRR abs/2010.12647 (2020). URL: https://arxiv.org/abs/2010.12647.
arXiv:2010.12647.
[17] T. Bardini, Bootstrapping: Douglas Engelbart, Coevolution, and the Origins of Personal</p>
        <p>Computing, Stanford University Press, USA, 2001.
[18] Christina Engelbart, About Bootstrapping , https://www.dougengelbart.org/content/view/
226/269/, 2007. Online; accessed 12 March 2023.
[19] M. Kolchin, F. Kozlov, A template-based information extraction from web sites with
unstable markup, in: Semantic Web Evaluation Challenge, Communications in Computer
and Information Science, Springer International Publishing, 2014, pp. 89–94. URL: https:
//doi.org/10.1007/978-3-319-12024-9_11. doi:10.1007/978-3-319-12024-9_11.
[20] M. Kolchin, E. Cherny, F. Kozlov, A. Shipilo, L. Kovriguina, CEUR-WS-LOD: Conversion of
CEUR-WS workshops to linked data, in: Semantic Web Evaluation Challenges, Springer
International Publishing, 2015, pp. 142–152. URL: https://doi.org/10.1007/978-3-319-25518-7_
12. doi:10.1007/978-3-319-25518-7_12.
[21] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, Gate: An architecture for
development of robust hlt applications, in: Proceedings of the 40th Annual
Meeting on Association for Computational Linguistics, ACL ’02, Association for
Computational Linguistics, USA, 2002, p. 168–175. URL: https://doi.org/10.3115/1073083.1073112.
doi:10.3115/1073083.1073112.
[22] H. Cunningham, V. Tablan, A. Roberts, K. Bontcheva, Getting more out of
biomedical documents with GATE’s full lifecycle open source text analytics, PLoS
Computational Biology 9 (2013) e1002854. URL: https://doi.org/10.1371/journal.pcbi.1002854.
doi:10.1371/journal.pcbi.1002854.
[23] B. Sateli, R. Witte, From papers to triples: An open source workflow for semantic publishing
experiments, in: Semantics, Analytics, Visualization. Enhancing Scholarly Data, Springer
International Publishing, 2016, pp. 39–44. URL: https://doi.org/10.1007/978-3-319-53637-8_
5. doi:10.1007/978-3-319-53637-8_5.
[24] M. Milicka, R. Burget, Information extraction from web sources based on
multiaspect content analysis, in: Semantic Web Evaluation Challenges, Springer
International Publishing, 2015, pp. 81–92. URL: https://doi.org/10.1007/F978-3-319-25518-7_7.
doi:10.1007/978-3-319-25518-7_7.
[25] T. Käfer, A. Abdelrahman, J. Umbrich, P. O’Byrne, A. Hogan, Observing linked data
dynamics, in: The Semantic Web: Semantics and Big Data, Springer Berlin Heidelberg,
Berlin, Heidelberg, 2013, pp. 213–227. doi:10.1007/978-3-642-38288-8_15.
[26] K. R. Sollins, Pervasive persistent identification for information centric networking, in:
Proceedings of the second edition of the ICN workshop on Information-centric networking,
ACM, New York, NY, USA, 2012, pp. 1–6. doi:10.1145/2342488.2342490.
[27] J. Kunze, E. Bermès, The ark identifier scheme, 2022. URL: https://www.ietf.org/archive/id/
draft-kunze-ark-36.html.
[28] J. Franken, A. Birukou, K. Eckert, W. Fahl, C. Hauschke, C. Lange, Persistent identification
for conferences, Data Science Journal 21 (2022). doi:10.5334/dsj-2022-011.
[29] V. Bryl, A. Birukou, K. Eckert, M. Kessler, What’s in the proceedings? combining publisher’s
and researcher’s perspectives, in: A. García Castro, C. Lange, P. Lord, R. Stevens (Eds.),
4th Workshop on Semantic Publishing (SePublica), number 1155 in CEUR Workshop
Proceedings, Aachen, 2014. URL: http://ceur-ws.org/Vol-1155#paper-01.
[30] F. Å. Nielsen, D. Mietchen, E. Willighagen, Scholia, scientometrics and wikidata, in:
E. Blomqvist, K. Hose, H. Paulheim, A. Ławrynowicz, F. Ciravegna, O. Hartig (Eds.), The
Semantic Web: ESWC 2017 Satellite Events, Springer International Publishing, Cham, 2017,
pp. 237–259. doi:10.1007/978-3-319-70407-4_36.
[31] M. Manske, QuickStatements, 2016. URL: https://www.wikidata.org/wiki/Help:</p>
        <p>QuickStatements.
[32] W. Fahl, K. Eckert, C. Lange, Extracting event metadata from proceedings titles, 2022. URL:
https://zenodo.org/record/6568728. doi:10.5281/ZENODO.6568728.
[33] D. Tkaczyk, P. Szostek, M. Fedoryszak, P. J. Dendek, Ł. Bolikowski, CERMINE: automatic
extraction of structured metadata from scientific literature, International Journal on
Document Analysis and Recognition (IJDAR) 18 (2015) 317–335. URL: https://doi.org/10.
1007/s10032-015-0249-8. doi:10.1007/s10032-015-0249-8.
[34] P. Lopez, GROBID: Combining automatic bibliographic data recognition and term
extraction for scholarship publications, in: Research and Advanced Technology for Digital
Libraries, Springer Berlin Heidelberg, 2009, pp. 473–474. URL: https://doi.org/10.1007/
978-3-642-04346-8_62. doi:10.1007/978-3-642-04346-8_62.
[35] OECD, OECD Glossary of Statistical Terms, OECD, 2008. URL: https://doi.org/10.1787/
9789264055087-en. doi:10.1787/9789264055087-en.
[36] M. Cochinwala, V. Kurien, G. Lalk, D. Shasha, Eficient data reconciliation,
Information Sciences 137 (2001) 1–15. URL: https://www.sciencedirect.com/science/article/pii/
S0020025500000700. doi:https://doi.org/10.1016/S0020-0255(00)00070-0.
[37] S. Subramanian, D. King, D. Downey, S. Feldman, S2and: A benchmark and evaluation
system for author name disambiguation, in: 2021 ACM/IEEE Joint Conference on Digital
Libraries (JCDL), IEEE, 2021, pp. 170–179. URL: https://doi.org/10.1109/jcdl52503.2021.
00029. doi:10.1109/jcdl52503.2021.00029.
[38] J. Kim, Evaluating author name disambiguation for digital libraries: a case of DBLP,
Scientometrics 116 (2018) 1867–1886. URL: https://doi.org/10.1007/s11192-018-2824-5.
doi:10.1007/s11192-018-2824-5.
[39] D. Vrandečić, M. Krötzsch, Wikidata, Communications of the ACM 57 (2014) 78–85. URL:
https://doi.org/10.1145/2629489. doi:10.1145/2629489.
[40] M. Krötzsch, D. Vrandečić, Semantic MediaWiki, in: Foundations for the Web of
Information and Services, Springer Berlin Heidelberg, 2011, pp. 311–326. URL: https:
//doi.org/10.1007/2F978-3-642-19797-0_16. doi:10.1007/978-3-642-19797-0_16.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kanakia</surname>
          </string-name>
          , Microsoft Academic Graph:
          <article-title>When experts are not enough</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>396</fpage>
          -
          <lpage>413</lpage>
          . URL: https: //doi.org/10.1162/qss_a_00021. doi:
          <volume>10</volume>
          .1162/qss_a_
          <fpage>00021</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Priem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Piwowar</surname>
          </string-name>
          , R. Orr,
          <article-title>OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts</article-title>
          ,
          <source>26th International Conference on Science, Technology and Innovation Indicators (STI</source>
          <year>2022</year>
          )
          <article-title>(</article-title>
          <year>2022</year>
          ). URL: https://zenodo.org/record/6936227. doi:
          <volume>10</volume>
          .5281/ZENODO.6936227.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ganseman</surname>
          </string-name>
          ,
          <article-title>Refactoring a library's legacy catalog: a case study</article-title>
          ,
          <source>in: IAML Congress</source>
          <year>2015</year>
          ,
          <year>2015</year>
          . URL: http://wiki.muziekcollecties.be/images/IAML2015_JG.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Costers</surname>
          </string-name>
          ,
          <source>The pica catalogue system - paper 26, in: Proceedings of the IATUL Conferences</source>
          <year>1979</year>
          , Purdue University,
          <year>1979</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>77</lpage>
          . URL: https://docs.lib.purdue.edu/iatul/1979/ papers/26/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Buchhold</surname>
          </string-name>
          , QLever, in
          <source>: Proceedings of the 2017 ACM Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>647</fpage>
          -
          <lpage>656</lpage>
          . URL: https://doi.org/10.1145/ 3132847.3132921. doi:
          <volume>10</volume>
          .1145/3132847.3132921.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ley</surname>
          </string-name>
          ,
          <string-name>
            <surname>DBLP</surname>
          </string-name>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>2</volume>
          (
          <year>2009</year>
          )
          <fpage>1493</fpage>
          -
          <lpage>1500</lpage>
          . URL: https: //doi.org/10.14778/1687553.1687577. doi:
          <volume>10</volume>
          .14778/1687553.1687577.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wiermann</surname>
          </string-name>
          ,
          <fpage>K10plus</fpage>
          - Zehn Bundesländer in einem Bibliothekssystem,
          <year>2019</year>
          . URL: https://blog.slub-dresden.de/beitrag/2019/03/27/ k10plus-zehn
          <article-title>-bundeslaender-in-einem-bibliothekssystem.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bouwman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Brookes</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>