=Paper=
{{Paper
|id=Vol-3640/paper3
|storemode=property
|title=Bravo MaRDI: A Wikibase Knowledge Graph on Mathematics
|pdfUrl=https://ceur-ws.org/Vol-3640/paper3.pdf
|volume=Vol-3640
|authors=Moritz Schubotz,Eloi Ferrer,Johannes Stegmüller,Daniel Mietchen,Olaf Teschke,Larissa Pusch,Tim Conrad
|dblpUrl=https://dblp.org/rec/conf/wikidata/SchubotzFSMTP023
}}
==Bravo MaRDI: A Wikibase Knowledge Graph on Mathematics==
https://ceur-ws.org/Vol-3640/paper3.pdf
Bravo MaRDI: A Wikibase Knowledge Graph on
Mathematics
Moritz Schubotz1 , Eloi Ferrer2 , Johannes Stegmüller1 , Daniel Mietchen1 ,
Olaf Teschke1 , Larissa Pusch2 and Tim Conrad2
1
FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Berlin, Germany
2
Zuse Institute Berlin, Berlin, Germany
Abstract
Mathematical world knowledge is a fundamental component of Wikidata. However, to date, no expertly
curated knowledge graph has focused specifically on contemporary mathematics. Addressing this
gap, the Mathematical Research Data Initiative (MaRDI) has developed a comprehensive knowledge
graph that links multimodal research data in mathematics. This encompasses traditional research data
items like datasets, software, and publications and includes semantically advanced objects such as
mathematical formulas and hypotheses. This paper details the abilities of the MaRDI knowledge graph,
which is based on Wikibase, leading up to its inaugural public release, codenamed Bravo, available on
https://portal.mardi4nfdi.de.
This paper is submitted to the ISWC Wikidata Workshop 2023 novel resource track.
1. The One-Stop-Portal Vision
The Mathematical Research Data Initiative (MaRDI) [10] strives to bring the FAIR data principles
[23] to life for mathematics and its research data (MathRD). A critical aspect of this ambition
involves creating a unified portal1 that offers comprehensive access to all open research data
within mathematics and related fields.
In this manuscript, we introduce the first Bravo release of the portal, now freely accessible
through direct access or various APIs. Rather than constructing such a portal from scratch - a
tactic adopted by parallel initiatives in Germany’s National Research Data Initiative (NFDI) - we
chose to utilize Wikimedia technology, predominantly MediaWiki, as a foundation, for several
reasons:
• MediaWiki offers a thriving open-source environment with a myriad of free and open-
source software.
• A considerable user base exists for MediaWiki; many mathematicians are already ac-
quainted with Wikipedia and, to some extent, Wikidata.
Wikidata’23: Wikidata workshop at ISWC 2023
Envelope-Open moritz.schubotz@fiz-karlsruhe.de (M. Schubotz); ferrer@zib.de (E. Ferrer);
johannes.stegmueller@fiz-karlsruhe.de (J. Stegmüller); daniel.mietchen@fiz-karlsruhe.de (D. Mietchen);
olaf.teschke@fiz-karlsruhe.de (O. Teschke); pusch@zib.de (L. Pusch); conrad@zib.de (T. Conrad)
GLOBE https://schubotz.org (M. Schubotz)
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings CEUR Workshop Proceedings (CEUR-WS.org)
http://ceur-ws.org
ISSN 1613-0073
1
Analogous initiatives name this type of service as ’Single Point of Access’ [11].
1
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Moritz Schubotz et al. CEUR Workshop Proceedings 1–13
• Upon the project’s completion, we have the option to incorporate our data into Wikidata
or another compatible platform.
• Both the Wikimedia mathematics community and portal users, notably mathematicians
based in Germany, mutually benefit from our advancements.
• We seamlessly inherit knowledge from Wikipedia and Wikidata, painstakingly curated
by hundreds of thousands of contributors over the past two decades.
However, this decision comes with challenges; becoming part of the Wikimedia movement
necessitates embracing their work culture and guidelines and engaging with the community to
achieve consensus on sustainable solutions. Our contributions in this context are as follows:
1. We deliver a current ecosystem to house Wikibase and associated services on-premises.
2. We customize the general-purpose Wikibase infrastructure to cater to the specific needs
of the mathematical community.
3. We have devised a mechanism for importing external data that aligns with the Wikidata
data model.
4. We have actively enhanced the MediaWiki ecosystem by expanding and improving
services such as QuickStatements, MediaWiki extensions like Math and MathSearch, and
libraries like the Wikibase Integrator.
5. We have seeded the MaRDI KG with nearly 100 million triples2 aggregated from various
sources.
This paper is organized as follows: Section 2 begins with an overview of our technology stack.
This is followed by Section 3, which presents our mathematics-specific extensions. Subsequently,
Section 4 details the infrastructure for seeding and updating the knowledge graph. In Section 5,
we illustrate the functionality of our system with an example. Finally, Section 6 provides a
conclusion and discusses future prospects.
2. Technical Infrastructure Overview
The MaRDI portal’s infrastructure was initially constructed using the Wikibase release pipeline.
The release pipeline incorporates a docker-compose file that delineates several services, includ-
ing MediaWiki with Wikibase extensions, the Wikidata Query Service, and the Blazegraph
Knowledge Graph backend. The docker-compose file allows for the launching of a pre-set
group of services via a single command, thus optimizing reproducibility and system or web
host independence.
The MaRDI portal operates on a solitary open-stack instance with 32 GB of main memory
and 8 CPUs, alongside a 300 GB HDD. Only Docker and Git are installed on the host system. All
2
Refer to the latest statistics from https://portal.mardi4nfdi.de/wiki/Portal.
2
Moritz Schubotz et al. CEUR Workshop Proceedings 1–13
configurations are housed in partially encrypted Git repositories, which feature a CI pipeline
enabling direct deployment to production without necessitating access to the actual machine. If
the data volume increases, we have the capability to transition from a single-node deployment
to a Docker swarm with minimal modifications to our service specifications.
The following provides a concise description of selected services from our infrastructure,
which currently comprises 28 services [8]:
Traefik As the primary entry point to our portal, all external network traffic is channeled
through the Traefik reverse proxy. Traefik is responsible for managing SSL certificates
properly via in-built lets-encrypt management and protective measures to guard individual
services against unauthorized use. Additionally, Traefik collates access logs, enabling
centralized usage examinations for our interconnected services.
MediaWiki In order to maintain compatibility with Wikipedia and manage only one version
of mathematics-specific extensions, we chose to fork the infrastructure and update the
container with the same versions utilized by Wikipedia in production starting from a
previous effort [4].
WDQS WDQS is the abbreviation of Wikidata Query Service which we have adapted to query
the MaRDI portal and renamed it to MaRDI Query Service. This service is deployed based
on four Docker containers: the Blazegraph database storing the knowledge graph, an
updater that synchronizes the Wikibase entries with the Blazegraph backend, a frontend
GUI to perform SPARQL queries, and a proxy to protect the SPARQL endpoint. The GUI
for this service can be accessed at query.portal.mardi4nfdi.de.
LaTeXML LaTeXML is a service that converts LaTeX input to content and presentation
MathML. It is employed for MathSearch.
BaseX BaseX is an XML database utilized for formula search. The MathSearch extension is
responsible for keeping this service current.
Backup A straightforward, custom backup process was installed to preserve precisely the data
we wish to retain, including the information saved on the SQL database, the MediaWiki
pages as XML and the uploaded files.
Scholia Scholia is a tool that facilitates scholarly exploration of subsets of Wikidata’s knowl-
edge graph via a web browser [19]. Its backend queries the Wikidata Query Service
through predefined SPARQL queries parametrized by Wikidata identifiers, and its fron-
tend displays the results of several of these queries as profiles. A multitude of such profile
types exist, including works, venues, topics, authors, awards, events, or organizations.
MaRDI’s instance3 is deployed via a fork [3] of the original [2] deployed on the Wiki-
media Toolforge4 . The intent behind the fork is to ensure maximum customizability for
mathematics, with general improvements fed back upstream wherever possible. The
3
https://scholia.portal.mardi4nfdi.de/
4
https://scholia.toolforge.org/
3
Moritz Schubotz et al. CEUR Workshop Proceedings 1–13
ongoing customization includes efforts towards federated queries that utilize both the
MaRDI knowledge graph and Wikidata.
Portainer Portainer is a management tool for Docker containers, images, volumes, and net-
works that operates via a web interface. It provides multiple ways of interacting with
containers, including viewing their logs and accessing the container console. Portainer is
compatible with standalone Docker installations, Docker Swarm, and Kubernetes.
Watchtower Watchtower is a tool that keeps Docker images up to date and guarantees the early
installation of critical patches, provided Docker containers follow semantic versioning.
Prometheus & Grafana Prometheus is a monitor and alerting toolkit to monitor the health
of the portal and the infrastructure. We visualize the data gathered by Prometheus using
the dashboarding tool Grafana.
To guard against unintended side effects arising from changes to this multifaceted service
infrastructure, we implemented a Testing concept that is triggered by any modification. This
helps increase our confidence in not deploying erroneous code to our production environment.
Additionally, Identity and Access Management (NFDIIAM) is a crucial aspect. MaRDI utilizes
OAuth as a protocol, which is also supported by the NFDI’s wide basic service, NFDIIAM.
3. Mathematics-specific Extensions
Our portal leverages two math-specific MediaWiki extensions, namely, Extension-Math [26,
22, 15, 14] and Extension-MathSearch [28, 20, 17]. The former extension, deployed across all
Wikimedia projects, including Wikipedia, facilitates the rendering of mathematical expressions.
The latter makes mathematical expressions searchable and links knowledge graph data to
HTML5.
3.1. Rendering of Mathematical Expressions
The MaRDI portal, alongside Wikipedia and Wikidata, employs texvc markup to express math-
ematical formulas. This LaTeX variant does not permit runtime syntax changes, favoring
context-free formula processing with a regular grammar. Mathematical expressions in wikitext
are denoted by the wikitext tag