=Paper= {{Paper |id=Vol-3066/paper4 |storemode=property |title=Analysis of Crossref Reports in Order to Improve the Quality of Metadata of Scientific Publications |pdfUrl=https://ceur-ws.org/Vol-3066/paper4.pdf |volume=Vol-3066 |authors=Aleksey Ermakov |dblpUrl=https://dblp.org/rec/conf/ssi/Ermakov21 }} ==Analysis of Crossref Reports in Order to Improve the Quality of Metadata of Scientific Publications== https://ceur-ws.org/Vol-3066/paper4.pdf
Analysis of Crossref Reports in Order to Improve the Quality
of Metadata of Scientific Publications
Aleksey Ermakov
Keldysh Institute of Applied Mathematics. Moscow, Russia

                 Abstract
                 Issues related to improving the quality of metadata of scientific publications placed in the
                 Crossref bibliographic database are considered. All information contained in metadata
                 obtained from publishers of scientific publications is analyzed by Crossref and displayed in
                 various reports. Analysis of these reports gives publishers an idea of the completeness and
                 correctness of the bibliographic data presented. The quality of metadata directly or indirectly
                 affects the number of views and links to a publication, respectively, on the ratings of scientific
                 publications, authors and organizations.

                 Keywords 1
                 metadata of publications, Crossref reports, citations, ratings of scientific publications

1. Introduction
     The preparation of a scientific publication is inextricably linked with the topic and direction of
research in which the authors work. As a rule, publications are a continuation of the research team's
work in this direction and are based on previous research results carried out earlier either by the authors
themselves, or by their scientific advisors or colleagues. This reliance in publications on prior scientific
research results is accompanied by citation of relevant scientific materials. The correctness of citation
is very important not only for the authors of a new scientific article, but also for the authors of the cited
articles, as well as for their scientific publications, since it directly or indirectly affects the number of
views and links to a publication, respectively, on ratings scientific publications, authors and
organizations.
     Publication metadata is needed to make bibliographic data about the article itself, authors, etc.,
identifiable, public and accessible to others.
All published scientific materials are assigned a digital identifier DOI (Digital Object Identifier), which
provides a link (URL) to the permanent location of an object or information about it (metadata) on the
Internet.
      The overwhelming majority of authors have ORCID – this is a unique code (ID), which the author
of scientific works receives to identify the results obtained and written works. The main task of ORCID
is to unambiguously identify the researcher in all bibliographic databases.
Association Crossref [1], a member of which since 2016 is the Keldysh Institute of Applied
Mathematics (KIAM), maintains a worldwide collaborative cross-citation service that acts as a gateway
between publishers' electronic platforms. This service does not store full texts of scientific publications,
but enters into its database information about the relationship of publications using DOI technology [2],
as well as metadata of published scientific materials.
      The tools under development by Crossref (and some other organizations such as Google Scholar,
Scopus and Web of Science, which use different sources for their citation data) make it easier for both
the author of the publication and readers to find, cite, evaluate, reuse re – the results of scientific
research. In addition, the statistics of “reading” (transitions to publication), as well as back citations [3]


SSI-2021: Scientific Services & Internet, September 20–23, 2021, Moscow (online)
EMAIL: Ermakov@keldysh.ru (Alexey V. Ermakov);
ORCID: 0000-0002-6054-0813 (Alexey V. Ermakov)
              © 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
(links to publications referring to this article) in one way or another affect the rating of researchers and
the Institute as a whole. Therefore, IPM, both as a source of scientific materials and as a research center,
is important to monitor the correctness of the metadata of published scientific publications.
To check the quality of metadata, Crossref has developed a fairly large set of tools that help publishers
evaluate and improve their metadata.
    This paper reviews reports that any publisher or interested author can access on the Crossref website.
Analysis of the information presented in these reports allows us to qualitatively assess the completeness
of the content being uploaded related to published scientific materials and outline ways to improve the
infrastructure of scientific publications and tools for interaction with Crossref.

2. List of publications with available for viewing metadata
    Let's talk in detail about the Crossref reports using the example of some scientific publications of
KIAM and Kazan Federal University (KFU). Using the KIAM publications as an example, we will
show how a publisher can use Crossref reports to find errors and / or inaccuracies in publication
metadata and make the necessary corrections. Using the KFU publications as an example, it will be
shown how an interested author can discover inaccuracies or incompleteness of the metadata of the
publication of interest. Further, the author can contact the publisher, pointing out the inaccuracies, and
ask them to correct them.
    Any scientific publishing house can study the Crossref reports in the same way, focusing on their
scientific publications.
    All organizations cooperating with Crossref are given the opportunity not only to see information
about their publications, but also to make the necessary adjustments.
    The information is presented in the form of three lists, sorted by alphabet:
    • Magazines
    https://www.crossref.org/06members/51depositor.html
    • Conferences proceedings
    https://www.crossref.org/06members/51depositorCP.html
    • Books
    https://www.crossref.org/06members/51depositorB.html
    In Fig. 1 shows a fragment of the alphabetical list of world publishers of journals, among which the
Keldysh Institute of Applied Mathematics (Keldysh Institute of Applied Mathematics) is represented
by three journals:
    • Mathematica Montisnigri (Mathematics of Montenegro);
    • Mathematicheskoe modelirovanie (Mathematical modeling);
    • Keldysh Institute Preprints.
    In Fig. 2 shows a fragment of the alphabetical list of world publishers of conference proceedings,
among which the Keldysh Institute of Applied Mathematics is presented by the materials of two
conferences [4]:
    • Futurity designing. Digital reality problems – 3 issues – 2018, 2019, 2020.
    • Scientific Services & Internet – 5 issues – 2016 and 2017, and then combined into a series of 2018,
2019 and 2020.
    In Fig. 3 shows a fragment of the alphabetical list of world magazine publishers, among which Kazan
Federal University is represented by 11 magazines. Further, in section 6, we will dwell on one of them
in more detail – the Russian Digital Libraries Journal [5].
    In various Crossref tools (when loading content, defining a link, etc.), depending on the requirements
of simplicity-complexity and efficiency-accuracy, one or another algorithm is used. But in any case, the
author or employee of the publishing house is advised to independently check the correctness of the
recognition result of bibliographic records and the accuracy of the reference obtained.




                                                      35
Figure 1: KIAM in the list of magazine publishers




Figure 2: KIAM in the list of conference proceedings publishers




                                                    36
   Figure 3: KFU in the list of magazine publishers

3. Depositor’s report
   Perhaps to emphasize the fact that the publisher, by uploading metadata to Crossref, receives a
certain benefit and some free services, the authors of the project began to use certain banking terms: the
placement of content was called deposit, the publisher itself was called a contributor, etc.
   Contributor reports for each publisher are used to verify basic information about DOI registrations.
The reports are “tied” to three key lists of publishers supported by Crossref– the list of magazine
publishers, the list of conference proceedings publishers, and the list of monograph (book) publishers
(see Section 1). There are currently no contributor reports for other types of content, such as photos,
pictures, videos, and audio files.
   The index page is updated weekly. Header-level reports are updated as the metadata is updated. It is
possible to obtain and analyze a report on all IPM publications (journals, conference proceedings,
monographs), but we will consider one of the journals as an example.
   Selecting the publisher Keldysh Institute of Applied Mathematics in the list of journals (see Fig. 1),
and then selecting, for example, Mathematicheskoe modelirovanie, we will receive a detailed report,
where for each DOI the owner prefix, timestamp, date of the last update of the record and the number
of citations are indicated (according to Crossref data) of this publication (see Fig. 4).




                                                      37
Figure 4: Fragment of the Contributor's Report for the journal “Mathematicheskoe modelirovanie”

4. Conflict report
   As you know, DOI is a unique identifier, so there should always be only one DOI for each content
item. And the publisher will receive a conflict report if it has at least one DOI conflict.
   It is important to fix these conflicts as soon as possible because they can lead to problems in the
future. The presence of two different DOIs for the same content means that the researcher will not know
which one to cite, thereby risking distorting the number of citations. In addition, the publisher may
forget that he has two DOIs for one object, and update only one of them when the content changes. This
means that anyone who uses a different DOI that has not been updated will click on the broken link.
Therefore, bad metadata should be quickly eliminated and thus the problem should be solved.
   A conflict report shows where two (or more) DOIs were submitted with the same metadata, or
indicates that the publisher of a scientific publication might have duplicate DOIs when submitting
metadata to Crossref.
   All DOI conflicts associated with journal articles, conference proceedings or monographs are
reported in a single Conflict Report on the Crossref website (Figure 5). If we have active conflicts, we
will receive a reminder by email every month.




Figure 5: The journal “Mathematics Montisnigri” in the list of DOI conflicts




                                                    38
    The DOI Conflict Report shown in Fig. 5, it is said about the presence of 15 conflict situations that
arose when the metadata of the journal “Mathematics Montisnigri” was placed in Crossref. The first
line in this report with an unspecified journal name and zero conflicts is a small glitch of the Crossref
programmers, which we informed them about and with which they fully agreed.
    And we need to understand the cause of the conflict and try to correct it. By clicking on the name of
the journal, we get a full report on all 15 conflict situations. An example of one of them:
    Created: 2020-03-19 14:23:13.0
    ConfID: 5583368
    CauseID: 1465359938
    OtherID: 1462438643
    JT: Mathematica Montisnigri
    MD: Jokanović, 46 ,null,5,2019,A breaf survey on Armendariz and central Armendariz rings
    DOI: 10.20948/mathmon-2019-46-1(Journal) (5583368-N )
    DOI: 10.20948/mathmontis-2019-46-1(Journal)

   As it turned out, the attempt made by the editorial staff of the journal to switch to new DOI identifiers
simultaneously with the use of old identifiers is unacceptable from the point of view of DOI ideology
and Crossref technology.
   As you can see from the presented example, the first article of the 46th volume was uploaded twice
with different identifiers (DOIs). Although for each content element there should always be only one
DOI, since the presence of two different DOIs for the same content can mislead readers getting
acquainted with the materials of this publication. Publishers themselves can get confused when
changing metadata and re-uploading content.
   Crossref offers 3 scenarios for correcting DOI conflicts:
   Scenario 1: If you assigned two DOIs to different content items, but accidentally sent the same
metadata for both of them. In this case, one of the DOIs has incorrect metadata. If you re-upload the
corrected metadata of this DOI, the conflict will be resolved.
   Scenario 2. If you have assigned two DOIs to the same content item. In this case, you can resolve
the conflict by assigning one of the DOIs as the primary and the other as an alias. The DOI alias will
be automatically redirected to the main DOI, so it will be enough to support only the main one.
   Scenario 3: If two DOIs refer to different content items, but their metadata is so similar that a conflict
has been noted. This happens when very little metadata is included in the items. It is best to register
additional metadata to resolve the conflict. Alternatively, you can accept the conflict by removing the
conflict status and setting it to resolved. This will not affect the metadata or DOI records, but will
remove the conflicts from the conflict report.
   For us, the simplest solution was to remove the "invalid" DOI. But you cannot delete DOI - this is
the fundamental principle of DOI. Therefore, it was decided to follow the second path proposed by
Crossref – to call the “wrong” DOI an alias for the “correct” one using the DOI administration
subsystem (Fig. 6).




Figure 6: Change DOI status to resolve conflict




                                                      39
   After changing the status of the “incorrect” DOI, any call to it will cause an automatic transition to
the link specified in the primary (“correct”) DOI.
   Having thus updated the content of all “incorrect” DOIs and changed their status to “pseudonym”,
we excluded IPM publications from the list of DOI conflicts.

5. Fields or missing metadata report
   The report on fields or missing metadata provides detailed information on the completeness of the
metadata. Like the Contributor's Report, it is “tied” to one of the three key lists of publishers supported
by Crossref – the list of magazine publishers, the list of conference proceedings publishers and the list
of monographs (books) publishers (see Section 1).
   This report can be accessed by selecting the icon (green right arrow) next to the scientific publisher's
name in one of the above lists.
   It should be noted that the sets of metadata transmitted for journal articles, scientific publications in
conference collections and monographs are somewhat different. For any scientific publication, its title
in Russian and English, a list of authors (for each author, the first name and surname in Russian and
English, as well as ORCID), the year of publication, the number or range of pages are transmitted. For
journals and collections, the volume number and / or issue number are added. In addition, over the past
2 years, we began to upload abstracts of scientific publications and bibliography to Crossref, which is
very important for tracking mutual citation [3].
   Consider the information on our journals presented in the reports on fields or missing metadata
(Fig. 7).




Figure 7: Report on fields or missing metadata in KIAM journals

    The table column headings in Fig. 7 (fields are marked in red; from the point of view of Crossref,
they are incomplete or incorrectly filled in):
    • Participation Title – title of the publication;
    • Ignore Fields – an indication, the absence of which fields should be ignored;
    • V = volume – volume number;
    • I = issue – issue number;
    • P = page – number or range of pages;
    • A = author – author;
    • S = single-author – single author;
    • T = article title– the title of the article;
    • N = no-first-name – author's name is not specified;
    • F = first name initial only – only the initials of the author are specified as the name;
    • U = missing iParadigms Url–Crossref has partnered (since 2008) with iParadigms LLC to offer its
members – leading scientific and professional publishers – the ability to check originality using
CrossCheck and iThenticate services. The CrossCheck database includes full-text journals from leading
academic publishers and is growing rapidly as publishers participating in Crossref subscribe to the
service.
    As we can see, using the Field Report or Missing Metadata, Crossref emphasizes that although some
bibliographic metadata is optional for content registration purposes, it is strongly recommended that all
publishers register as complete metadata as possible for each registered item. And even highlights in
red in the report fields for which, from the point of view of Crossref, the publisher grossly does not
follow the specified recommendations.



                                                      40
    Analyzing the information presented in Fig. 7, it can be noted that for all issues of the KIAM
Preprints volume numbers are not assigned, for all volumes of the journal “Mathematics Montisnigri”
issue numbers are not assigned. For the journal “Mathematical Modeling”, the comments concern not
all issues, but only several volumes of the initial period of assigning DOIs to scientific publications,
when our journal was assigned DOIs by some external organizations, even incorrectly registered in
Crossref.
    It should be noted that we assigned DOIs to all articles of the “Mathematics Montisnigri” magazine.
And, accordingly, we can make the necessary corrections and / or additions to the metadata sets loaded
into Crossref. The situation with the journal “Mathematical Modeling” is more complicated – until
2020, as a result of competitions, the publication passed from hand to hand and received completely
different DOIs. At the same time, metadata of scientific articles of the journal were shipped, completely
different in completeness and correctness. And now KIAM, as a journal publisher, has neither the right
nor the opportunity to correct or supplement anything in the information on the 2016–2019 issues.
    Analyzing further Crossref's comments on registering publication metadata, we see marks in the
column for assigning a single author (S = single-author) and some others.
    Similar remarks are available on the metadata of conference proceedings and monographs. We are
carefully examining these (and other) Crossref comments and, if possible, try to eliminate them by
making the necessary edits to the metadata description.
    However, the publisher of a particular magazine can refuse such obsession with Crossref by using
the Change switch (second column, Fig. 7 – Ignore Fields), specifying which fields should be ignored
(Fig. 8).




Figure 8: An indication of which fields are missing to ignore

6. DOI Scanner Report
    The DOI Scanner report is executed only for magazines and, accordingly, is “tied” to the list of
magazine publishers.
    The DOI Scanner samples articles for each journal from a particular publisher to ensure that the
DOIs you specify translate to the correct page. For each viewed journal, DOIs are selected, the number
of which is approximately 5% of the total DOIs for the journal (up to a maximum of 50 DOIs). In Fig.
9 lists all the journals published by the Institute of Applied Mathematics, and for each the total number
of DOIs and the date of the last scan are indicated.




                                                    41
Figure 9: Date of the last scan of the KIAM journals

   You can get access to the details of the crawler's work for a given journal by selecting a date in the
“Last Crawl date” column.




Figure 10: Date of the last scan of the KIAM journals

   No errors, as follows from the scanner report in Fig. 10, there are no KIAM Preprints. At the same
time, 10 DOIs were scanned (and validated). These fields are “clickable”, so you can see the results of
the scanning procedure in more detail (Figs. 11, 12).




                                                    42
Figure 11: List of selected DOIs for Crawling




Figure 12: Checked and confirmed DOIs

    This function (DOI scanning), in our opinion, is quite interesting and useful, especially for those
publications that have changed the database server or administration, since such changes could
introduce serious errors in the procedure for depositing content and make publication materials
inaccessible.
    However, the implementation of this function by Crossref programmers seems a little strange and
raises several questions:
    • Why, for example, for KIAM Preprints, numbering 877 issues since the beginning of the DOI
assignment, only 10 elements are checked, although the description of the algorithm says about 5% of
the total number?
    • Why is the Re-Crawl using the same sample as a few years ago in the previous scan? After all, the
publisher is interested in tracking the correctness not only of old materials, but also of all publications
of the studied journal.
    So far, we have not received an answer to these questions from Crossref Support, but we continue
to actively interact with it.
    Another important addition is related to the fact that Crossref is one of the DOI recorders. And since
DOI registrars do not exchange metadata, publishers associated with other registration services will not
be able to use the Crossref tools.

7. Participation report
   The authors of the project and the developers of Crossref urge publishers not only to post metadata
of scientific publications, but to make them as complete as possible. In addition, Crossref encourages
the scientific and publishing community to actively use the services offered, designed to analyze the
completeness and correctness of the downloaded information, thereby, as it were, participating in the
development and expansion of the range of these services.
   For each publisher working with Crossref, there is a separate Participation report that shows what
percentage of their contributing data is recorded for each of the ten key metadata elements. Participation
reports show where there are gaps and what can be improved in terms of completeness of metadata.




                                                     43
Figure 13: Participation report for scientific publications KFU

    In Fig. 13 shows the title of the Report on the participation of the publisher of Kazan Federal
University – the total number of content elements is 2187, including 1699 journal articles, 488 articles
in conference proceedings. At the bottom of the title (Fig. 13) there are 2 menus – the choice of the type
of scientific publication (left) and the choice of the analyzed period (all the time of deposit, the current
period – the last 2 years, “old” materials – data downloaded more than 2 years ago). The central field
of the title of the Participation Report allows you to enter the name of the journal, collection or even
the title of the publication and analyze the completeness of the correspondingly loaded metadata.




Figure 14: Percentages of the Russian Digital Libraries Journal Participation Report



                                                      44
    The main part of the Participation Report for Russian Digital Library Journal is shown in Fig. 14
(the current time is set – the last 2 years):
    • for 0% of publications, a list of references (References) is loaded;
    • 0% of links are open (Open References) – available to all users of all Crossref services (there are
no available links, since there are no lists of literature);
    • ORCID is indicated for 21% of authors;
    • for 0% of publications, the name and identifier (Funder Registry IDs) of the sponsor are indicated
- at least one of the organizations that funded the study;
    • for 0% of publications, the Funding award numbers are indicated;
    • the percentage of content (in this case 0%) using the Crossmark service (Crossmark-enabled),
which gives readers quick and easy access to the current status of a content item (as part of the
publisher's policy for revisions, rebuttals, revocations and other updates);
    • The percentage of registered content (in this case 10%) containing Text-mining URLs for a
scientific publication – the automatic analysis and extraction of information from a large number of
documents. At the moment, most scientific organizations in the world (and KFU, as it seems to us,
including) are not interested in setting a special set of instructions, with the help of which someone for
some reason will investigate their scientific materials;
    • Percentage of metadata publications (in this case 0%) containing URLs that indicate the license
(License URLs), which defines the conditions under which readers can access the content;
    • percentage of publication metadata (in this case 10%), which includes URLs for checking
similarity (Similarity Check URLs), for publications that cooperate with CrossCheck and iThenticate;
    • 22% of publication metadata includes Abstracts, which gives a deeper understanding of the content
of the work.
    In our opinion, one should not chase 100% indicators, but at the same time it should be clear that a
more complete and accurate filling of publication metadata in one way or another [6] affects the ratings
of publications, authors and organizations. And the indication of grants and funds to support scientific
activities has a positive effect on the relationship with these funds.

8. References
[1] Association Crossref. URL: https://www.crossref.org/about/.
[2] International DOI Foundation (IDF). URL: https://www.doi.org/.
[3] A. V. Ermakov, Bibliograficheskaya ssylka kak instrument avtora I chitatelya // Nauchnyj servis
    v seti Internet: trudy XXII Vserossijskoj nauchnoj konferencii (21–25 sentyabrya 2020 g., onlajn).
    M.: IPM im. M.V. Keldysha, 2020. S. 268–275. https://doi.org/10.20948/abrau-2020-55
[4] M. I. Slepenkov, Materialy konferencij v onlajnovoj biblioteke IPM im. M.V. Keldysha. Preprinty
    IPM im. M.V. Keldysha. 2020. № 18. 16 s. https://doi.org/10.20948/prepr-2020-18
[5] Russian Digital Libraries Journal. URL: https://elbib.ru.
[6] T. A. Polilova, Infrastruktura nauchnyh publikacij // Preprinty IPM im. M.V. Keldysha. 2009.
    № 15. 30 s. URL: https://library.keldysh.ru/preprint.asp?id=2009-15




                                                     45