=Paper= {{Paper |id=Vol-3640/paper3 |storemode=property |title=Bravo MaRDI: A Wikibase Knowledge Graph on Mathematics |pdfUrl=https://ceur-ws.org/Vol-3640/paper3.pdf |volume=Vol-3640 |authors=Moritz Schubotz,Eloi Ferrer,Johannes Stegmüller,Daniel Mietchen,Olaf Teschke,Larissa Pusch,Tim Conrad |dblpUrl=https://dblp.org/rec/conf/wikidata/SchubotzFSMTP023 }} ==Bravo MaRDI: A Wikibase Knowledge Graph on Mathematics== https://ceur-ws.org/Vol-3640/paper3.pdf
                                Bravo MaRDI: A Wikibase Knowledge Graph on
                                Mathematics
                                Moritz Schubotz1 , Eloi Ferrer2 , Johannes Stegmüller1 , Daniel Mietchen1 ,
                                Olaf Teschke1 , Larissa Pusch2 and Tim Conrad2
                                1
                                    FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Berlin, Germany
                                2
                                    Zuse Institute Berlin, Berlin, Germany


                                                                         Abstract
                                                                         Mathematical world knowledge is a fundamental component of Wikidata. However, to date, no expertly
                                                                         curated knowledge graph has focused specifically on contemporary mathematics. Addressing this
                                                                         gap, the Mathematical Research Data Initiative (MaRDI) has developed a comprehensive knowledge
                                                                         graph that links multimodal research data in mathematics. This encompasses traditional research data
                                                                         items like datasets, software, and publications and includes semantically advanced objects such as
                                                                         mathematical formulas and hypotheses. This paper details the abilities of the MaRDI knowledge graph,
                                                                         which is based on Wikibase, leading up to its inaugural public release, codenamed Bravo, available on
                                                                         https://portal.mardi4nfdi.de.


                                           This paper is submitted to the ISWC Wikidata Workshop 2023 novel resource track.


                                1. The One-Stop-Portal Vision
                                The Mathematical Research Data Initiative (MaRDI) [10] strives to bring the FAIR data principles
                                [23] to life for mathematics and its research data (MathRD). A critical aspect of this ambition
                                involves creating a unified portal1 that offers comprehensive access to all open research data
                                within mathematics and related fields.
                                   In this manuscript, we introduce the first Bravo release of the portal, now freely accessible
                                through direct access or various APIs. Rather than constructing such a portal from scratch - a
                                tactic adopted by parallel initiatives in Germany’s National Research Data Initiative (NFDI) - we
                                chose to utilize Wikimedia technology, predominantly MediaWiki, as a foundation, for several
                                reasons:

                                                  • MediaWiki offers a thriving open-source environment with a myriad of free and open-
                                                    source software.
                                                  • A considerable user base exists for MediaWiki; many mathematicians are already ac-
                                                    quainted with Wikipedia and, to some extent, Wikidata.
                                Wikidata’23: Wikidata workshop at ISWC 2023
                                Envelope-Open moritz.schubotz@fiz-karlsruhe.de (M. Schubotz); ferrer@zib.de (E. Ferrer);
                                johannes.stegmueller@fiz-karlsruhe.de (J. Stegmüller); daniel.mietchen@fiz-karlsruhe.de (D. Mietchen);
                                olaf.teschke@fiz-karlsruhe.de (O. Teschke); pusch@zib.de (L. Pusch); conrad@zib.de (T. Conrad)
                                GLOBE https://schubotz.org (M. Schubotz)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings          CEUR Workshop Proceedings (CEUR-WS.org)
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073




                                                  1
                                                      Analogous initiatives name this type of service as ’Single Point of Access’ [11].



                                                                                                                                           1




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Moritz Schubotz et al. CEUR Workshop Proceedings                                            1–13


    • Upon the project’s completion, we have the option to incorporate our data into Wikidata
      or another compatible platform.

    • Both the Wikimedia mathematics community and portal users, notably mathematicians
      based in Germany, mutually benefit from our advancements.

    • We seamlessly inherit knowledge from Wikipedia and Wikidata, painstakingly curated
      by hundreds of thousands of contributors over the past two decades.

  However, this decision comes with challenges; becoming part of the Wikimedia movement
necessitates embracing their work culture and guidelines and engaging with the community to
achieve consensus on sustainable solutions. Our contributions in this context are as follows:

   1. We deliver a current ecosystem to house Wikibase and associated services on-premises.

   2. We customize the general-purpose Wikibase infrastructure to cater to the specific needs
      of the mathematical community.

   3. We have devised a mechanism for importing external data that aligns with the Wikidata
      data model.

   4. We have actively enhanced the MediaWiki ecosystem by expanding and improving
      services such as QuickStatements, MediaWiki extensions like Math and MathSearch, and
      libraries like the Wikibase Integrator.

   5. We have seeded the MaRDI KG with nearly 100 million triples2 aggregated from various
      sources.

  This paper is organized as follows: Section 2 begins with an overview of our technology stack.
This is followed by Section 3, which presents our mathematics-specific extensions. Subsequently,
Section 4 details the infrastructure for seeding and updating the knowledge graph. In Section 5,
we illustrate the functionality of our system with an example. Finally, Section 6 provides a
conclusion and discusses future prospects.


2. Technical Infrastructure Overview
The MaRDI portal’s infrastructure was initially constructed using the Wikibase release pipeline.
The release pipeline incorporates a docker-compose file that delineates several services, includ-
ing MediaWiki with Wikibase extensions, the Wikidata Query Service, and the Blazegraph
Knowledge Graph backend. The docker-compose file allows for the launching of a pre-set
group of services via a single command, thus optimizing reproducibility and system or web
host independence.
  The MaRDI portal operates on a solitary open-stack instance with 32 GB of main memory
and 8 CPUs, alongside a 300 GB HDD. Only Docker and Git are installed on the host system. All

   2
       Refer to the latest statistics from https://portal.mardi4nfdi.de/wiki/Portal.



                                                            2
Moritz Schubotz et al. CEUR Workshop Proceedings                                              1–13


configurations are housed in partially encrypted Git repositories, which feature a CI pipeline
enabling direct deployment to production without necessitating access to the actual machine. If
the data volume increases, we have the capability to transition from a single-node deployment
to a Docker swarm with minimal modifications to our service specifications.
   The following provides a concise description of selected services from our infrastructure,
which currently comprises 28 services [8]:

Traefik As the primary entry point to our portal, all external network traffic is channeled
     through the Traefik reverse proxy. Traefik is responsible for managing SSL certificates
     properly via in-built lets-encrypt management and protective measures to guard individual
     services against unauthorized use. Additionally, Traefik collates access logs, enabling
     centralized usage examinations for our interconnected services.

MediaWiki In order to maintain compatibility with Wikipedia and manage only one version
    of mathematics-specific extensions, we chose to fork the infrastructure and update the
    container with the same versions utilized by Wikipedia in production starting from a
    previous effort [4].

WDQS WDQS is the abbreviation of Wikidata Query Service which we have adapted to query
   the MaRDI portal and renamed it to MaRDI Query Service. This service is deployed based
   on four Docker containers: the Blazegraph database storing the knowledge graph, an
   updater that synchronizes the Wikibase entries with the Blazegraph backend, a frontend
   GUI to perform SPARQL queries, and a proxy to protect the SPARQL endpoint. The GUI
   for this service can be accessed at query.portal.mardi4nfdi.de.

LaTeXML LaTeXML is a service that converts LaTeX input to content and presentation
     MathML. It is employed for MathSearch.

BaseX BaseX is an XML database utilized for formula search. The MathSearch extension is
     responsible for keeping this service current.

Backup A straightforward, custom backup process was installed to preserve precisely the data
    we wish to retain, including the information saved on the SQL database, the MediaWiki
    pages as XML and the uploaded files.

Scholia Scholia is a tool that facilitates scholarly exploration of subsets of Wikidata’s knowl-
     edge graph via a web browser [19]. Its backend queries the Wikidata Query Service
     through predefined SPARQL queries parametrized by Wikidata identifiers, and its fron-
     tend displays the results of several of these queries as profiles. A multitude of such profile
     types exist, including works, venues, topics, authors, awards, events, or organizations.
     MaRDI’s instance3 is deployed via a fork [3] of the original [2] deployed on the Wiki-
     media Toolforge4 . The intent behind the fork is to ensure maximum customizability for
     mathematics, with general improvements fed back upstream wherever possible. The

   3
       https://scholia.portal.mardi4nfdi.de/
   4
       https://scholia.toolforge.org/



                                                3
Moritz Schubotz et al. CEUR Workshop Proceedings                                                     1–13


        ongoing customization includes efforts towards federated queries that utilize both the
        MaRDI knowledge graph and Wikidata.

Portainer Portainer is a management tool for Docker containers, images, volumes, and net-
     works that operates via a web interface. It provides multiple ways of interacting with
     containers, including viewing their logs and accessing the container console. Portainer is
     compatible with standalone Docker installations, Docker Swarm, and Kubernetes.

Watchtower Watchtower is a tool that keeps Docker images up to date and guarantees the early
    installation of critical patches, provided Docker containers follow semantic versioning.

Prometheus & Grafana Prometheus is a monitor and alerting toolkit to monitor the health
    of the portal and the infrastructure. We visualize the data gathered by Prometheus using
    the dashboarding tool Grafana.

   To guard against unintended side effects arising from changes to this multifaceted service
infrastructure, we implemented a Testing concept that is triggered by any modification. This
helps increase our confidence in not deploying erroneous code to our production environment.
   Additionally, Identity and Access Management (NFDIIAM) is a crucial aspect. MaRDI utilizes
OAuth as a protocol, which is also supported by the NFDI’s wide basic service, NFDIIAM.


3. Mathematics-specific Extensions
Our portal leverages two math-specific MediaWiki extensions, namely, Extension-Math [26,
22, 15, 14] and Extension-MathSearch [28, 20, 17]. The former extension, deployed across all
Wikimedia projects, including Wikipedia, facilitates the rendering of mathematical expressions.
The latter makes mathematical expressions searchable and links knowledge graph data to
HTML5.

3.1. Rendering of Mathematical Expressions
The MaRDI portal, alongside Wikipedia and Wikidata, employs texvc markup to express math-
ematical formulas. This LaTeX variant does not permit runtime syntax changes, favoring
context-free formula processing with a regular grammar. Mathematical expressions in wikitext
are denoted by the wikitext tag  (to be distinguished from the HTML5 element  ).
These  tags enclose the texvc expressions, which are then rendered. In contrast, Wikibase
introduces a data type specifically for mathematical expressions. Examples of such notations
can be found in the DLMF defining formula in the MaRDI-KG [24]5 and in math-tagged elements
in Wikitext6 .
   Typically, portal users view page content in their web browsers. The Extension-Math pro-
cesses the texvc math from the Wikitext or the Wikibase entities into HTML, allowing the

    5
    https://portal.mardi4nfdi.de/wiki/Item:Q1799
    6
    https://portal.mardi4nfdi.de/wiki/Non-negative_Matrix_Factorization_for_Time-Resolved_Raman_Spectrosco
py_Data



                                                    4
Moritz Schubotz et al. CEUR Workshop Proceedings                                            1–13


browser to display the rendered formulas correctly. This rendering pipeline leverages an external
REST-based service to generate SVG images from texvc within the MediaWiki ecosystem [26].
   As of 2023, all Chromium-based browsers support MathML, a web standard defined by the
W3C7 . MathML is part of HTML5, akin to SVG, and enables native math rendering without
needing images or browser extensions. Thus the latest versions of Edge, Chrome, Firefox,
and Safari8 fully support HTML5, including MathML since January 2023. Due to changes
in the MediaWiki ecosystem9 shifting away from microservices, and to augment processing
capabilities for semantic annotation, the MaRDI Project developed a rendering pipeline written
in pure PHP [5]. It generates MathML directly within the Math extension without the need for
communication with REST-based services.
   Initially, TexVC leverages a parsing expression grammar to extract a language-independent
parse tree from LaTeX formulas. This parse tree verifies the formula’s syntactical correctness
and provides feedback to authors in edit mode [12]. To validate this component’s functionality,
we used 312,190 formulas extracted from the English Wikipedia [27].
   The parse tree then undergoes root-first traversal, translating each item to a corresponding
MathML item. The native MathML component creates the correct MathML representation for
the 724 LaTeX commands supported by TexVC. The MathML used for automatic validation is
generated from Mathoid [26] and LaTeXML [29].
   The MaRDI project contributed to the open-source community by developing a PHP version10
of the TypeScript-based mhchemParser [1, 13], integrating it into the MediaWiki ecosystem
[9]. The mhchem syntax is employed alongside the mathematical formula notation to write
chemical equations within the Wikibase ecosystem. Implementing this component in PHP
enables the processing of these equations within the PHP-based math processing pipeline. The
component’s functionality was validated using 116 automated tests, which incorporate the
mhchem specification11 .
   The native MathML pipeline in the MaRDI Wikibase paves the way for further advancements,
such as parsing formulas from arXiv, semantic annotation of formulas, enhancing accessibility
for visually impaired users, and resolving disambiguation scenarios.

3.2. Formula Search in the MaRDI Portal
The MaRDI Portal employs the MathSearch extension[28]12 to search mathematical expressions
based on texvc input. The original version utilized the Math Web Search engine, also used in
the formula search of zbMATH Open [21]. In the MaRDI project, we optimized the formula
search components for the Docker-based infrastructure running the latest MediaWiki versions.
The MaRDI setup13 uses the BaseX database to store the search index created for all formulae
in the MaRDI portal.

   7
      https://www.w3.org/TR/MathML/
   8
      https://www.lambdatest.com/web-technologies/mathml
    9
      https://phabricator.wikimedia.org/T303822
   10
      https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/923597
   11
      https://texdoc.org/serve/mhchem/0
   12
      https://www.mediawiki.org/wiki/Extension:MathSearch
   13
      https://github.com/MaRDI4NFDI/formulasearch



                                                      5
Moritz Schubotz et al. CEUR Workshop Proceedings                                              1–13


4. Seeding Mathematical Research Data
4.1. Defining Mathematical Research Data (MathRD)
In the realm of mathematics, research data (MathRD) [16, 6] encompasses all data forms integral
to the research process. These data forms facilitate the creation and analysis of mathematical
models, substantiate proofs, test algorithms, and generally elucidate mathematical phenom-
ena. MathRD presents itself in various types, structures, and degrees of accessibility and
interoperability.
   MathRD ranges from highly structured forms like symbolic or numerical data, used for
encapsulating theorems, proofs, number sequences, or matrices. Geometric data, delineating
objects like curves, surfaces, or polytopes, also fall into this category. In contrast, MathRD may
also include less structured forms such as mathematical models or observational data. At the
extreme end of the spectrum, plain text data like scientific papers, online resources, articles,
and books are also encompassed under MathRD.
   The MaRDI knowledge graph is engineered to effectively capture the intricate structure of this
data. It does this by aggregating metadata from diverse sources, each describing mathematical
research data, into a consolidated graph.

4.2. Constructing the MaRDI Knowledge Graph
As previously alluded to in Section 1, the data model for the MaRDI knowledge graph is an
extension of the existing data model in Wikidata. Practically, this indicates that entities for the
MaRDI knowledge graph are sourced from Wikidata if pre-existing there. Once integrated into
the MaRDI knowledge graph, these entities can be further enriched with additional statements,
though the detailed modalities of whether and how to keep the MaRDI data in sync with
Wikidata remain to be worked out.
   This approach is particularly vital in the case of properties, as it necessitates the design of
our data model based on the available properties in Wikidata. New properties are defined only
when a corresponding one is absent in Wikidata. As a result, many of the properties found in
Wikidata are also present in the MaRDI knowledge graph. We anticipate the introduction of
new properties as the mathematical knowledge graph expands, which will be specifically linked
to mathematics.
   To facilitate entity import from Wikidata, we have established an entity importer functionality
built upon the Python module WikibaseIntegrator14 . This module forms the basis for a Python
class capable of transferring any given entity or list of entities from Wikidata to our Wikibase
instance. This importer class copies not only the label, description, and aliases of the entity
but also its statements, with the option to import additional connected entities. The importer
class includes a parameter that determines the desired depth level of the import. By default,
one level of depth is selected, importing all statements for each imported entity, and importing
only label, description, and aliases for the entities mentioned in these statements.
   The MaRDI knowledge graph further includes properties such as wikidata PID 15 and wikidata

   14
        https://github.com/LeMyst/WikibaseIntegrator
   15
        https://portal.mardi4nfdi.de/wiki/Property:P11



                                                         6
Moritz Schubotz et al. CEUR Workshop Proceedings                                              1–13


QID 16 , which store the Wikidata identifier for each imported property and item, respectively.
This information is also stored in an internal SQL table, mapping the MaRDI knowledge graph
identifiers to Wikidata identifiers. This table additionally incorporates a parameter for each
entity indicating whether all its statements have been imported or only its label, description,
and aliases.
   This approach facilitates a comprehensive understanding of the overlap between the MaRDI
knowledge graph and Wikidata at any given time. Moreover, it ensures synchronization between
imported entities and Wikidata, and maintains consistency between the knowledge graphs,
thus making the eventual integration of the MaRDI knowledge graph into Wikidata seamless.
   The properties imported from Wikidata bolster the creation of new items in the MaRDI
knowledge graph. In addition to the Wikidata items, metadata from eight distinct sources have
been imported to varying degrees. These include formula metadata from DLMF, software meta-
data from swMATH, publication metadata from zbMATH Open, arXiv and crossref, metadata
from resources stored at Zenodo, metadata on discrete geometric objects stored at polyDB17
and metadata on R packages published at CRAN18 .

4.3. Persistent Identifiers and Compliance with FAIR Principles
Adherence to FAIR principles necessitates the assignment of persistent identifiers to each linked
resource in the MaRDI knowledge graph. It’s essential to differentiate between extrinsic and
intrinsic identifiers:

    • Extrinsic identifiers: These identifiers are linked to a specific resource through a register.
      DOIs, Wikibase IDs or ORCID IDs are instances of this category.

    • Intrinsic identifiers: These identifiers are innately linked to the resource and do not
      necessitate an external register for association. An example is the name of a software
      package, if used as its identifier.

  Every resource in the MaRDI knowledge graph is assigned an extrinsic identifier by being
created as a Wikibase entity. These follow the format used in Wikidata, with item identifiers
beginning with ’Q’ and property identifiers with ’P’. Additionally, entities imported from a given
source incorporate a statement that includes the identifier from the original source. Currently,
the MaRDI knowledge graph supports the following extrinsic identifiers:

    • DOI

    • ORCID iD

    • Digital Library of Mathematical Functions ID

    • swMATH work ID

   16
      https://portal.mardi4nfdi.de/wiki/Property:P12
   17
      https://polydb.org/
   18
      https://cran.r-project.org/



                                                       7
Moritz Schubotz et al. CEUR Workshop Proceedings                                              1–13


    • zbMATH Open document ID

    • zbMATH author ID

    • arXiv ID

    • arXiv author ID

    • Mathematics Subject Classification ID

    • Zenodo ID

  Further, two types of resources are linked to their source using extrinsic identifiers:

    • CRAN Project: This identifier, imported from Wikidata19 , links to an R package published
      at the Comprehensive R Archive Network. The identifier is a string that corresponds to
      the name of the R package.

    • PolyDB ID: This identifier, created in the MaRDI knowledge graph, links to the original
      metadata source of a polyDB collection. The identifier string corresponds to the name of
      the collection.


5. Case study: Orthogonal Polynomials and Special Functions
In this section, we showcase some capabilities of the MaRDI portal using semantically enhanced
Mathematical formulas as an example, specifically focusing on Orthogonal Polynomials and
Special Functions (OPSF).
   In the field of OPSF, relationships between functions are essential. The NIST Digital Library
of Mathematical Functions, along with its companion, the Digital Repository of Mathematical
Formulae [25, 24], is an indispensable source of identifiers and formulas pertaining to this field.
Their main function is as aggregators of well-established knowledge, rather than providers of
novel work.
   A special variant of LaTeX has been developed by the team to enable a higher level of
machine-readable semantics. For example, one typically writes i in LaTeX to represent the
imaginary unit. In the context of DLMF/DRMF, however, one would use \iunit . Both versions
render the same output. However, when the LaTeX source code is web-rendered rather than
PDF-generated, the \iunit version offers the added advantage of being clickable. This improves
accessibility by linking to a human-readable explanation of what the imaginary unit represents.
   In Figure 1a, \iunit links to https://dlmf.nist.gov/1.9#E1, which is considered a permanent
identifier. This means that the numbering remains consistent, irrespective of the introduction
of additional material.
   To make DLMF data accessible from the MaRDI portal, we have imported the formulas into
our knowledge graph. Each formula has a corresponding Wikibase item. For example, the
formula displayed in Figure 1a corresponds to Q3045 and links to the definition Q1399 .

   19
        https://www.wikidata.org/wiki/Property:P5565



                                                       8
Moritz Schubotz et al. CEUR Workshop Proceedings                                                 1–13




(a) Screenshot from the NIST Digital Library of Mathematical Func-
    tions (DLMF) https://dlmf.nist.gov/7.5 demonstrating the use of
    \iunit .




                                                                      (c) User friendly formula home
                                                                          page of 1a in the MaRDI Por-
                                                                          tal.




    (b) Representation of 1a in the MaRDI Wikibase Instance.

Figure 1: A DLMF formula in different representations.


  Our SPARQL query interface enables users to retrieve all formulas that include a link to the
imaginary unit with the following command20 :
SELECT ?item WHERE { ?item wdt:P4 wd:Q1399 .}

   More complex queries are also supported, such as searching for all formulas indirectly
dependent on the gamma function.
   We have additionally created formula homepages, similar to the DRMF project, to make the
data available for classic searches and to offer a more convenient view compared to the standard
Wikidata item view. Once the import of zbMATH articles is completed, bibliographic references
related to the formula, software models, and algorithms will be linked from here.
   20
        https://tinyurl.com/2bd4jv8k



                                                  9
Moritz Schubotz et al. CEUR Workshop Proceedings                                             1–13


  Our software can also translate the expression from DLMF to Maple and Mathematica [7].
However, due to potential licensing issues, we only display the formulas on the separate site
https://lct.wmflabs.org, and not in the portal.


6. The road ahead
The Bravo Release of the MaRDI platform served as a proof of concept for leveraging MediaWiki
and associated services from the Wikimedia ecosystem as a platform for the National Research
Data Infrastructure, focusing on the field of mathematics. We successfully demonstrated
the platform’s capability to handle domain-specific requirements, such as the manipulation
of mathematical expressions, proving the adaptability of this general-purpose platform for
specialized needs.
   However, we recognize that several aspects of the platform’s usability require enhancement.
The current interface design, reminiscent of a Wikipedia fork, needs to be reimagined to reflect
the nuances of a research data management platform. Furthermore, while advanced users might
find formulating SPARQL queries straightforward, the average user might need additional
support. Although Wikidata provides visual aids to improve query generation, we aim to go a
step further by integrating a conversational interface to facilitate SPARQL query generation
following our previous MathQA efforts [18].
   The platform’s citation capabilities also leave room for improvement. While MediaWiki has
long provided permanent links (using the oldid flag), these could be further enhanced. For
instance, we are contemplating the implementation of a feature that stores a snapshot of the
PDF on Zenodo for convenience. However, the issue of citing query results remains an open
research problem, posing a challenge for reproducibility in data-intensive mathematics.
   The road ahead is not without its obstacles, but we are committed to continuous improvement
and development. Our aim is to deliver a platform that not only adheres to the FAIR principles but
also becomes an integral part of the mathematical research data ecosystem, thereby facilitating
collaboration and advancement in mathematical research.


Acknowledgments
This work was funded by the MaRDI project [10] under DFG grant number 460135501. The text
of this manuscript was improved with several AI tools, including Grammarly and ChatGPT.


References
 [1] [Software excerpt] M. Hensel and M. van den Berg, mhchemParser. swhid: ⟨https://githu
     b.com/mhchem/mhchemParserswh:1:cnt:ddf2c75241bf79a175e655176b7808b07f7e9783;or
     igin=https://github.com/mhchem/mhchemParser;visit=swh:1:snp:7ec88857e3585721a552
     5a67085731cde73bfb51;anchor=swh:1:rev:a4456994f67c21819affc40e415ce5b404df05ab;pa
     th=/src/mhchemParser.ts⟩.




                                               10
Moritz Schubotz et al. CEUR Workshop Proceedings                                           1–13


 [2] [Software Release] D. Mietchen, F. Å. Nielsen, C. MacKenzie, E. Willighagen, A. Rutz, and
     J. Poelen, Scholia. swhid: ⟨swh:1:dir:a90e241734badc22ae4154488d320b1087fd4263;origin
     =https://github.com/WDscholia/scholia;visit=swh:1:snp:475bacef8c802eefdad22c940284b
     89767fa8e37;anchor=swh:1:rev:6d14811f1a52ec68c2b359f64807b306fd82d257⟩.
 [3] [Software Release] D. Mietchen and M. Schubotz, MaRDI Fork of Scholia. swhid: ⟨swh:1
     :dir:b19ebb2df2a5a706b9ed0123e2e4c9847bb3b802;origin=https://github.com/MaRDI4
     NFDI/scholia;visit=swh:1:snp:293cd1547e8700fb806efabd54a8fa272538c061;anchor=swh:
     1:rev:cd9083745656c676edb1c81f8655d5d7a333a5b3⟩.
 [4] [Software Release] M. Schubotz, mediawiki-docker. swhid: ⟨swh:1:dir:c574bf0d9641a95d6
     5cb2eb676b9cfb161781b60;origin=https://github.com/physikerwelt/mediawiki-docker;vis
     it=swh:1:snp:1996cb919e55391cc057329bf311c28a5f3be4e7;anchor=swh:1:rev:56e6f1f27d4
     d1bde6fe2cdfd8f004c15636d6a29⟩.
 [5] [Software Module] J. Stegmüller and M. Schubotz, Extension:Math. lic: GPL. swhid: ⟨sw
     h:1:dir:96973202c47de62339be4e20031446171c0e78eb;origin=https://github.com/wikimed
     ia/mediawiki-extensions-Math;visit=swh:1:snp:5615bfab6c6038fb784d9b86419783bfbeefd
     826;anchor=swh:1:rev:73f2bd4e7008960b3e97e1de16f34e6a5816b8ac;path=/src/TexVC/⟩.
 [6] P. Benner, K. Lund, and J. Saak. “Towards a Benchmark Framework for Model Order
     Reduction in the Mathematical Research Data Initiative (MaRDI)”. In: CoRR abs/2307.00137
     (2023). doi: 10.48550/arXiv.2307.00137. arXiv: 2307.00137.
 [7] A. Greiner-Petter, M. Schubotz, C. Breitinger, P. Scharpf, A. Aizawa, and B. Gipp. “Do
     the Math: Making Mathematics in Wikipedia Computable”. In: IEEE Trans. Pattern Anal.
     Mach. Intell. 45.4 (Apr. 2023), pages 4384–4395. issn: 0162-8828. doi: 10.1109/TPAMI.2022
     .3195261.
 [8] [Software excerpt] M. Schubotz, A. Ortiz, J. Stegmüller, L. Pusch, E. Ferrer, and D. Nolte,
     Docker-compose repository for MaRDI 2023. swhid: ⟨swh:1:cnt:0a0d351f48352fa08626eb29
     3ed856f036636caa;origin=https://github.com/MaRDI4NFDI/portal-compose;visit=swh:1:s
     np:476d7e7e164fe7f054feffe57b4df4b98326af2e;anchor=swh:1:rev:2e1f77a314a041c736eda
     a37853b63f54274d915;path=/docker-compose.yml;lines=24-508⟩.
 [9] J. Stegmüller. “mhchemParser in PHP”. In: Zenodo, June 2023. doi: 10.5281/zenodo.81896
     13.
[10]   M. Hintermüller. “MaRDI: Mathematical Research Data Initiative Proposal”. In: (May
       2022). doi: 10.5281/ZENODO.6552435.
[11]   R. Shigapov. Knowledge graphs in BERD and in NFDI. English. 2022. doi: 10.5281/ZENOD
       O.7373258.
[12]   J. Stegmüller and M. Schubotz. “Formula validation in MediaWiki with TexVC”. In: Zenodo,
       Nov. 2022. doi: 10.5281/zenodo.7305299.
[13]   M. Hensel. “The mhchem Bundle”. In: available via CTAN, eg, at the site www. dante.
       de/cgi-bin/ctan-index (2021).




                                               11
Moritz Schubotz et al. CEUR Workshop Proceedings                                             1–13


[14]   P. Scharpf, M. Schubotz, and B. Gipp. “Mathematics in Wikidata”. In: Proceedings of the
       2nd Wikidata Workshop (Wikidata 2021) co-located with the 20th International Semantic
       Web Conference (ISWC 2021), Virtual Conference, October 24, 2021. Edited by L. Kaffee,
       S. Razniewski, and A. Hogan. Volume 2982. CEUR Workshop Proceedings. CEUR-WS.org,
       2021.
[15]   M. Schubotz, A. Greiner-Petter, N. Meuschke, O. Teschke, and B. Gipp. “Mathematical
       Formulae in Wikimedia Projects 2020”. In: JCDL ’20: Proceedings of the ACM/IEEE Joint
       Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. Edited by
       R. Huang, D. Wu, G. Marchionini, D. He, S. J. Cunningham, and P. Hansen. ACM, 2020,
       pages 447–448. doi: 10.1145/3383583.3398557.
[16]   K. Hulek, F. Müller, M. Schubotz, and O. Teschke. “Mathematical Research Data – An
       Analysis Through zbMATH References”. In: EMS Newsletter 2019–9.113 (Aug. 2019),
       pages 54–57. issn: 1027-488X. doi: 10.4171/NEWS/113/14.
[17]   P. Scharpf, M. Schubotz, and B. Gipp. “Representing Mathematical Formulae in Content
       MathML Using Wikidata”. In: Proceedings of the 3rd Joint Workshop on Bibliometric-
       enhanced Information Retrieval and Natural Language Processing for Digital Libraries
       ({BIRNDL} 2018) Co-Located with the 41st International {ACM} {SIGIR} Conference on Re-
       search and Development in Information Retrieval ({SIGIR} 2018). Edited by P. Mayr, M. K.
       Chandrasekaran, and K. Jaidka. Volume 2132. CEUR Workshop Proceedings. CEUR-
       WS.org, 2018, pages 46–59.
[18]   M. Schubotz, P. Scharpf, K. Dudhat, Y. Nagar, F. Hamborg, and B. Gipp. “Introducing
       MathQA - a Math-Aware Question Answering System”. In: Proceedings of the Annual Inter-
       national ACM/IEEE Joint Conference on Digital Libraries (JCDL), Workshop on Knowledge
       Discovery. Fort Worth, USA, June 2018. doi: 10.1108/idd-06-2018-0022.
[19]   F. Å. Nielsen, D. Mietchen, and E. Willighagen. “Scholia and scientometrics with Wikidata”.
       In: Scientometrics 2017. Nov. 2017, pages 237–259. doi: 10.1007/978-3-319-70407-4_36.
[20]   M. Schubotz. Augmenting Mathematical Formulae for More Effective Querying & Efficient
       Presentation. Epubli Verlag, Berlin, 2017. isbn: 978-3-7450-6208-3. doi: 10.14279/deposito
       nce-6034.
[21]   F. Müller and O. Teschke. “Full text formula search in zbMATH”. In: Eur. Math. Soc. Newsl
       102 (2016), page 51.
[22]   M. Schubotz and A. P. Sexton. “A Smooth Transition to Modern mathoid-based Math
       Rendering in Wikipedia with Automatic Visual Regression Testing”. In: Joint Proceedings
       of the FM4M, MathUI, and ThEdu Workshops, Doctoral Program, and Work in Progress at the
       Conference on Intelligent Computer Mathematics 2016 co-located with the 9th Conference on
       Intelligent Computer Mathematics (CICM 2016), Bialystok, Poland, July 25-29, 2016. Edited
       by A. Kohlhase, P. Libbrecht, B. R. Miller, A. Naumowicz, W. Neuper, P. Quaresma, F. W.
       Tompa, and M. Suda. Volume 1785. CEUR Workshop Proceedings. CEUR-WS.org, 2016,
       pages 132–145.




                                               12
Moritz Schubotz et al. CEUR Workshop Proceedings                                               1–13


[23]   M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N.
       Blomberg, J.-W. Boiten, L. B. Da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes,
       T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-
       Beltran, A. J. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. ’T Hoen, R. Hooft,
       T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson,
       P. Rocca-Serra, M. Roos, R. Van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater,
       G. Strawn, M. A. Swertz, M. Thompson, J. Van Der Lei, E. Van Mulligen, J. Velterop, A.
       Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons. “The FAIR Guiding
       Principles for Scientific Data Management and Stewardship”. In: Scientific Data 3.1 (Mar.
       2016), page 160018. issn: 2052-4463. doi: 10.1038/sdata.2016.18.
[24]   H. S. Cohl, M. Schubotz, M. A. McClain, B. V. Saunders, C. Y. Zou, A. S. Mohammed, and
       A. A. Danoff. “Growing the Digital Repository of Mathematical Formulae with Generic
       Sources”. In: Intelligent Computer Mathematics, Lecture Notes in Artificial Intelligence 9150.
       Edited by M. Kerber, J. Carette, C. Kaliszyk, F. Rabe, and V. Sorge. Volume 9150. LNCS.
       Springer, 2015, pages 280–287. doi: 10.1007/978-3-319-20615-8_18.
[25]   H. S. Cohl, M. A. McClain, B. V. Saunders, M. Schubotz, and J. C. Williams. “Digital Repos-
       itory of Mathematical Formulae”. In: Intelligent Computer Mathematics - International
       Conference, CICM 2014, Coimbra, Portugal, July 7-11, 2014. Proceedings. Edited by S. M.
       Watt, J. H. Davenport, A. P. Sexton, P. Sojka, and J. Urban. Volume 8543. Lecture Notes in
       Computer Science. Springer, 2014, pages 419–422. doi: 10.1007/978-3-319-08434-3_30.
[26]   M. Schubotz and G. Wicke. “Mathoid: Robust, scalable, fast and accessible math rendering
       for wikipedia”. In: Intelligent Computer Mathematics: International Conference, CICM 2014,
       Coimbra, Portugal, July 7-11, 2014. Proceedings. Springer. 2014, pages 224–235.
[27]   Schubotz and Obrovac. LaTeX formulae from English Wikipedia. Version 1.0.0. Zenodo,
       Nov. 2014. doi: 10.5281/zenodo.7494266.
[28]   M. Schubotz. “Making Math Searchable in Wikipedia”. In: arXiv:1304.5475 (Apr. 2013).
       doi: 10.48550/arXiv.1304.5475. arXiv: 1304.5475 [cs] .
[29]   D. Ginev, H. Stammerjohanns, B. Miller, and M. Kohlhase. “The LaTeXML Daemon:
       Editable Math on the Collaborative Web”. en. In: 6824. Proceedings of the Conference on
       Intelligent Computer Mathematics, Bertinoro, IT, Dec. 2011. doi: https://doi.org/10.1007
       /978-3-642-22673-1_25.




                                                 13