<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Knowledge Acquisition of Metadata on AI Progress</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhiyu Chen?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Trabelsi?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian D. Davison</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeff Heflin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lehigh University</institution>
          ,
          <addr-line>Bethlehem, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose an ontology to help AI researchers keep track of the scholarly progress of AI related tasks such as natural language processing and computer vision. We first define the core entities and relations in the proposed Machine Learning Progress Ontology (MLPO). Then we describe how to use the techniques in natural language processing to construct a Machine Learning Progress Knowledge Base (MPKB) that can support various downstream tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>dataset search</kwd>
        <kwd>information extraction</kwd>
        <kwd>ontology</kwd>
        <kwd>knowledge base</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In recent years, there has been a significant increase in the number of published
papers for AI related tasks, and this leads to the introduction of new tasks, datasets, and
methods. Despite the progress in scholarly search engines, it is challenging to connect
previous technologies with new work. Researchers from the semantic web community
have noticed the importance of organizing scholarly data from a large collection of
papers with tools like Computer Science Ontology [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Natural language processing
researchers have proposed methods to extract information from research articles for
better literature review [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Different from previous work which focuses on the extraction
of paper metadata and key insights, we propose to design an ontology and knowledge
base for better evaluation of AI research. Papers With Code1 is a website that shows
the charts of progress of machine learning models on various tasks and benchmarks.
Those charts can help researchers to identify the appropriate literature related to their
work, and to select appropriate baselines to compare against. Although manually
updating this leaderboard may keep it accurate, it will become more difficult and time
consuming because of the large increase in published papers.
      </p>
      <p>
        Knowledge extraction from research papers has been studied by the information
extraction (IE) community for years. Hou et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] extract hTask, Dataset, Metric, Scorei
tuples from a paper where the paper content is extracted from pdf files. In their
twostage extraction framework, they first extract hTask, Dataset, Metrici tuples, and then
for each tuple, they separately extract hDataset, Metric, Scorei tuples. Kardas et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
? equal contribution
      </p>
      <p>
        Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
1 https://paperswithcode.com/
specifically focus on extracting table results by taking the advantage of available latex
source code of papers. Work developed in parallel to ours is proposed by Jain et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
which uses the data from Papers With Code as a distant supervision signal and
introduces a new document-level IE dataset for extracting scientific entities from papers.
Our work is complementary to AI-KG [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which takes the abstract of a paper as input.
We also consider other sections and tables in a paper where the evaluation scores of
different metrics always occur. Our ontology can be considered as the front end of a
knowledge system that organizes all the extracted knowledge from different backend
IE tasks.
      </p>
      <p>In this paper, we first introduce the Machine Learning Progress Ontology (MLPO)
which defines the core entities and relations useful for progress tracking of AI
literature. Then, we propose to construct the Machine Learning Progress Knowledge Base
(MPKB) from a paper corpus using information extraction techniques. The ontology
definition and pipeline of knowledge construction are available online2.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Machine Learning Progress Ontology</title>
      <p>As shown in Figure 1, the MLPO focuses on the results of machine learning
experiments, which differentiates it from prior work. This ontology defines five core classes:
Task, Dataset, Result, Model and Paper. To support proper citation of results, it also
includes general properties such as Venue, Author and Title which have already been
defined in the BIBO ontology3. In total, MLPO has 22 classes, 18 object properties and
24 data properties.</p>
      <p>Dataset
mlp : testOnDataset
xsd:decimal
mlp : testOnMetric</p>
      <p>Model
Result
mlp : testOnModel</p>
      <p>mlp : solvedBy
mlp : reportResultsFrom
mlp : propose
mlp : onTask</p>
      <p>Paper
Task</p>
      <p>It is important to notice that the Result class connects to all other core classes. From
a single paper, we could extract multiple Result individuals and each Result individual
records the used dataset, the used model, the target task and also the reported
evaluation score. For Task class, we create different subclasses representing different AI tasks
(e.g., natural language processing task). We create various data properties for evaluation
metrics which have different range constraints. For example, the range of data property
2 https://github.com/Zhiyu-Chen/Machine-Learning-Progress-Ontology
3 https://www.dublincore.org/specifications/bibo/
“TestOnEM” which represents the exact matching metric, is a decimal as shown in
Figure 2. We use WebProte´ge´4 to develop our ontology and an example of extracted
individuals is shown in Figure 2.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Knowledge Base Construction</title>
      <p>Constructing the Machine Learning Progress Knowledge Base (MPKB) involves two
tasks: scientific entity recognition (SER) and relation classification (RC). For the SER
task, we identify the core entities in a paper which are datasets, tasks and metrics. For
the RC task, for simplicity, here we only show how to identify two relations in a
paper: whether a dataset is used for a task and evaluated with a metric. For the example
in Figure 2, we would like to know whether “SQUAD2 dev” is used for the task of
“machine reading comprehension” and is evaluated with “TestOnEM”. We believe the
methods can also be applied to recognize other entities and relations. We leave
extracting all the mentioned relations defined in MLPO to future work.
3.1</p>
      <p>
        Scientific Entity Extraction
We treat entity extraction as a sequence tagging problem. One challenge is that we only
have document-level instead of sequence-level annotations. As a solution, we use fuzzy
matching to find the entity spans in a paper. Given the text of a paper, we first use spaCy5
to find the noun phrases. Then we match the noun phrases with pre-curated entity names
using the similarity measure based on Levenshtein Distance6. For tasks and metrics, we
set the similarity threshold to 0.5. For datasets, we set the matching threshold to 1 (i.e.,
4 https://webprotege.stanford.edu/
5 https://spacy.io/
6 https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-\
matching-in-python/
exact match). If the fuzzy matching similarity between a noun phrase and an entity
name is larger than the corresponding threshold, then we annotate the noun phrase as
the target entity. We also designed a tagging schema similar to BILOU [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For section
titles in the paper, we annotate every token either as at the first, middle or last position.
For every sentence in each section, we tag the word as at the first, middle or last position
of the sentence. For tokens belonging to an entity in a sentence, we tag them with the
corresponding entity types. Based on the paper text and annotated tags, we train a
BiLSTM-CRF model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to predict the tags of test data.
3.2
      </p>
      <p>
        Relation Classification
We use an information retrieval method for relation classification. To construct the
query q, we concatenate the text of a result tuple hTask, Dataset, Metrici. We select the
first 100 tokens from each section of a paper as its text representation Tp. Finally, we
match the two inputs with a neural ranking model. In particular, we use Conv-KNRM
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to predict the binary relevance score of a triple-paper pair:
label = ConvKN RM (q; Tp)
(1)
The label is equal to 1 if the triple is relevant to the paper, otherwise the label is equal to
0. We choose Conv-KNRM in this paper because it is efficient. A state-of-the-art model
like BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] can also be used as in Hou et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Evaluation</title>
      <p>
        We randomly divided the paper collection of the NLP-TDMS dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] into training
(80%) and testing (20%) sets. For the Bi-LSTM-CRF model, we set the embedding
dimension to 100. We use a bi-LSTM with 2 layers. When training relation classification,
we create k positive result tuple-paper pairs (one for each tuple used to annotate the
paper) and n k negative pairs, where n is the total number of result tuples in the ground
truth. This results in many more negative samples than positive samples: 94% of result
tuple-paper pairs are negative. To address this imbalance, we oversample the positive
class by creating 20 copies of each positive sample.
      </p>
      <p>From the result tables, we can see that among different entity types, Task is the
easiest type to recognize. Dataset has higher precision but lower recall than Metric.
Such variances may indicate that tasks have more observable patterns to appear in a
paper than other entity types, so that the predicted sequence tagging is more accurate.
Conv-KNRM achieves high results on all the evaluation metrics for predicting
irrelevant paper-triple pairs. The most challenging part for the neural network is to capture
the semantic similarities between paper content and hTask, Dataset, Metrici triple for
positive pair.</p>
      <p>Tag</p>
      <p>Task
Dataset
Metric</p>
      <sec id="sec-4-1">
        <title>Paper-triple label Precision Recall F1</title>
        <p>Irrelevant (0) 0.93 0.99 0.96
Relevant (1) 0.98 0.51 0.67</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        We have proposed an ontology specifically designed for progress tracking of AI tasks.
We also proposed methods to extract information from papers to construct a
knowledge base for AI evaluation. The resulting knowledge graph can be used for various
downstream tasks. For example, we can request the system to return the top-k text
classification models ranked by accuracy on Yelp reviews dataset [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] by constructing the
corresponding SPARQL query. Combined with methods of document summarization,
we may be able to automatically generate a survey paper for a given task.
      </p>
      <sec id="sec-5-1">
        <title>Acknowledgment</title>
        <p>This material is based upon work supported by the National Science Foundation under
Grant No. IIS-1816325.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for soft-matching ngrams in ad-hoc search</article-title>
          .
          <source>In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining</source>
          . p.
          <fpage>126</fpage>
          -
          <lpage>134</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Dessi',
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.R.</given-names>
            ,
            <surname>Buscaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Motta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Sack</surname>
          </string-name>
          , H.:
          <article-title>Ai-kg: an automatically generated knowledge graph of artificial intelligence</article-title>
          . In: International Semantic Web Conference. Springer (In Press) (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jochim</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gleize</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganguly</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction</article-title>
          .
          <source>In: 57th ACL</source>
          . pp.
          <fpage>5203</fpage>
          -
          <lpage>5213</lpage>
          (
          <year>Jul 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bidirectional lstm-crf models for sequence tagging</article-title>
          .
          <source>arXiv preprint arXiv:1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jain</surname>
            , S., van Zuylen,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajishirzi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beltagy</surname>
          </string-name>
          , I.:
          <article-title>SciREX: A challenge dataset for document-level information extraction</article-title>
          .
          <source>In: Proc. 58th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>7506</fpage>
          -
          <lpage>7516</lpage>
          . Online (Jul
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kardas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Czapla</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stenetorp</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>Stojnic</surname>
          </string-name>
          , R.: Axcell:
          <article-title>Automatic extraction of results from machine learning papers</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>14356</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nasar</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaffry</surname>
            ,
            <given-names>S.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malik</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          :
          <article-title>Information extraction from scientific articles: a survey</article-title>
          .
          <source>Scientometrics</source>
          <volume>117</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1931</fpage>
          -
          <lpage>1990</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ratinov</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          .
          <source>In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)</source>
          . pp.
          <fpage>147</fpage>
          -
          <lpage>155</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Salatino</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thanapalasingam</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannocci</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osborne</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The computer science ontology: a large-scale taxonomy of research areas</article-title>
          . In: International Semantic Web Conference. pp.
          <fpage>187</fpage>
          -
          <lpage>205</lpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Document modeling with gated recurrent neural network for sentiment classification</article-title>
          .
          <source>In: Proceedings of the 2015 conference on empirical methods in natural language processing</source>
          . pp.
          <fpage>1422</fpage>
          -
          <lpage>1432</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>