<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preface of the First International Biochemical Knowledge Extraction Challenge (BiKE)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edgard Marx</string-name>
          <email>edgard.marx@htwk-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marilia Valli</string-name>
          <email>marilia.valli@ifsc.usp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joao da Silva e Silva</string-name>
          <email>jvictor.ssilva@ifsc.usp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanju Tiwari</string-name>
          <email>tiwarisanju18@ieee.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paulo do Carmo</string-name>
          <email>paulo.carmo@htwk-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leipzig University of Applied Science</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sao Paulo University</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Autonoma de Tamaulipas</institution>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>The knowledge of over 50 years of studies on biodiversity available in scientific articles can become easier accessible when organized and shared through knowledge graphs. It can assist in the development of diferent fields of science and bio-friendly products with high added value as well as guide public policies to bring benefits both to science and to strengthen the bio-economy. However, to date, most of the structured biochemical information available on the Web is manually curated, and it is practically impossible to keep pace with the research being constantly published in scientific articles.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Challenge</title>
      <p>BiKE challenge invited researchers to participate by re-using or designing new innovative
Biochemical Extraction methods. The challenge consisted of extracting relevant information
from biochemical research articles and constructing a Biochemical Knowledge Graph (BKG)
through a given ontology. Biochemical Knowledge Graphs are knowledge graphs containing
bio- and chemical information from living organisms.</p>
    </sec>
    <sec id="sec-2">
      <title>Training &amp; Test Data sets</title>
      <p>The dataset used for evaluation and training was generated from hundreds of peer-reviewed
scientific articles with information on more than 2,521 possibilities of natural product extraction.
The dataset was built manually by chemistry specialists that read the articles annotating four
relevant properties associated with each natural product discussed in the academic
publication. For this challenge, we focus on five NuBBE properties for training and prediction: (I)
compound name (rdfs:label), (II) bioactivity (nubbe:biologicalActivity), (III) species from where
natural products were extracted (nubbe:collectionSpecie), (IV) collection site of these species
(nubbe:collectionSite), and (V) isolation type (nubbe:collectionType). The table below presents
an overview of the number of unique properties.</p>
      <p>All papers are present on all train splits, but the papers selected for each test split have all
links to manually extracted characteristics removed. This means that these papers were not
connected to the rest of the knowledge graph. The provided code for generating the knowledge
graph representation with Python’s networkx uses BERTopic’s extracted topics for reconnecting
the knowledge graph. The assigned topics were also filtered by the following rule: if the topic
is present in more than 80% of examples, it is eliminated since it does not discriminate from the
others. Part of the challenge was to figure out other ways to reconnect the knowledge graph
with automatically extracted characteristics like citation networks for the authors, conferences,
and others.</p>
      <p>
        The challenge provided the original flat data and the original networkx knowledge graph.
It also provided 10 previously randomized train/test splits that contain the links maintained
and removed, respectively. For every train/test split, we also provide a prepared networkx
knowledge graph. The source code and documentation for the benchmark dubbed as NatUKE
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is publicly available at https://github.com/AKSW/natuke.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Metrics</title>
      <p>The challenge was focused on ranking the correct document prediction of real links that were
hidden in the knowledge graph. Together with MRR (Mean Reciprocal Rank), hits@k is a
ranking metric for when there is only one correct document. On the other hand, mAP (mean
Average Precision) and nDCG (normalized Discounted Cumulative Gain) are designed for
ranking when a list of relevant documents is available. The hits@k was chosen because it allows
the evaluation of each characteristic extraction with reasonable expectations by customizing
the k value. Following the rule used in NatUKE the final k values in this table are from 1 to
50 considering values multiples of 5 and two thresholds: (1) a score equal or higher than 0.50
is achieved; and (2) a score equal or higher than 0.20 is achieved. Please refer to the NatUKE
benchmark paper for further details.</p>
    </sec>
    <sec id="sec-4">
      <title>Talks</title>
      <p>• Towards Natural Inspired Products from Biodiversity</p>
      <p>Edgard Marx, Leipzig University of Applied Sciences (HTWK), Germany
• NaTUKE: A Benchmark for Natural Product Knowledge Extraction from
Academic Literature</p>
      <p>Paulo Ricardo Viviurka do Carmo, Leipzig University of Applied Sciences (HTWK), Germany
• The NuBBE Knowledge Graph: A Biochemical Knowledge Graph of Natural
Products from Brazilian Biodiversity</p>
      <p>István J. Mócsy, Leipzig University of Applied Sciences (HTWK), Germany</p>
    </sec>
    <sec id="sec-5">
      <title>Best Knowledge Extraction Awards</title>
      <p>The Best Extraction Method award was intended to recognize the top three competitors who
have demonstrated exceptional abilities, commitment, and a comprehensive comprehension of
the ideas and procedures involved in extracting pertinent data from challenging biochemical
datasets. These people have demonstrated outstanding critical thinking, problem-solving skills,
and a thorough understanding of cutting-edge computational tools and methods. The prizes
were given with a unique certificate of appreciation that features the winner’s name and special
workshop accomplishments, followed by a monetary reward. The three winners of the first
edition of the BiKE challenge were:
1st BiKE Challenge: Result of ChemiScope by using ChatGPT</p>
      <p>Matthias Jooß, Jonas Gwozdz and Pit Fröhlich
2nd Improving Natural Product Automatic Extraction with Named Entity Recognition</p>
      <p>Stefan Schmidt-Dichte and István J. Mócsy
3rd Enhancing Biochemical Extraction with BFS-driven Knowledge Graph
Embedding approach</p>
      <p>Bhushan Zope, Sashikala Mishra and Sanju Tiwari</p>
    </sec>
    <sec id="sec-6">
      <title>General Chair</title>
    </sec>
    <sec id="sec-7">
      <title>Organizing Committee</title>
      <p>• Edgard Marx, Leipzig University of Applied Sciences (HTWK), Germany
• Marilia Valli, Sao Paulo University (USP), Brazil
• Joao Victor da Silva e Silva, Sao Paulo University (USP), Brazil
• Sanju Tiwari, Universidad Autonoma de Tamaulipas (UAT), Mexico
• Paulo Ricardo Viviurka do Carmo, Leipzig University of Applied Sciences (HTWK),</p>
      <p>Germany</p>
    </sec>
    <sec id="sec-8">
      <title>Advisory Committee</title>
      <p>• Vanderlan da Silva Bozani, Sao Paulo State University (UNESP), Brazil
• Adriano Defini Andricopulo, Sao Paulo University (USP), Brazil
• Thomas Riechert, Leipzig University of Applied Sciences (HTWK), Germany
• Alan Pilon, Sao Paulo University (USP), Brazil
The editors would like to thank the advisory team, authors, program committee, and other
organizers for their constant support in making this event successful.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. V.</given-names>
            <surname>Do Carmo</surname>
          </string-name>
          , E. Marx,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marcacini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Valli</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. V.</surname>
          </string-name>
          <article-title>Silva e Silva, A. Pilon, NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature</article-title>
          , in: 2023
          <source>IEEE 17th International Conference on Semantic Computing (ICSC)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>203</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICSC56153.
          <year>2023</year>
          .
          <volume>00039</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>