<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Nara, Japan
* Corresponding author.
$ debayan.banerjee@leuphana.de (D. Banerjee); tilahun.tafa@leuphana.de (T. A. Tafa); ricardo.usbeck@leuphana.de
(R. Usbeck)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>DBLPLink 2.0 - An Entity Linker for the DBLP Scholarly Knowledge Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Debayan Banerjee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tilahun Abedissa Tafa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Usbeck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leuphana University of Lüneburg</institution>
          ,
          <addr-line>Lüneburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Hamburg</institution>
          ,
          <addr-line>Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In this work we present an entity linker for DBLP's 2025 version of RDF-based Knowledge Graph. Compared to the 2022 version, DBLP now considers publication venues as a new entity type called dblp:Stream. In the earlier version of DBLPLink, we trained KG-embeddings and re-rankers on a dataset to produce entity linkings. In contrast, in this work, we develop a zero-shot entity linker using LLMs using a novel method, where we re-rank candidate entities based on the log-probabilities of the "yes" token output at the penultimate layer of the LLM. The demo can be accessed at https://dblplink-2.skynet.coypu.org/.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Entity Linker</kwd>
        <kwd>DBLP</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>LLM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Related Work</title>
      <p>around two major entity types: Creator and Publication. Subsequently in June 2024, DBLP introduced2
a new entity type dblp:Stream which encompasses multiple sub-classes under the broad category of
publication venues, for example, conferences, journals, series and repositories.</p>
      <p>Our initial thought was to retrain DBLPLink 1.0 on the new KG and produce DBLPLink 2.0. However,
moving DBLPLink 1.0 to a new KG requires computing new KG embeddings for all entities, retraining the
entity label span detector, and re-training the re-ranker. In light of recent approaches using LLM-based
prompting and zero-shot methods, we decided to build a new architecture from scratch for DBLPLink 2.0.
DBLPLink 2.0 is able to link Person and Publication entity types as before, and additionally, can also link
Stream entity types. DBLPLink 2.0 can be accessed at https://dblplink-2.skynet.coypu.org/. The code
and data used to build this demo can be accessed at https://github.com/semantic-systems/dblplink-2.0.</p>
    </sec>
    <sec id="sec-2">
      <title>2. User Interface</title>
      <p>As seen in Figure 1, the web UI is divided into five elements. A presents a set of question templates
which maybe clicked and selected to fill up the text box. B carries the text box where the user may
type an input text, and click on Submit to start the entity linking process. C displays a process log
which is dynamically updated from the backend, keeping the user informed on the current step being
executed. D is results area, where the detected mention spans and their types are displayed. Later, the
fetched candidates form the text search are displayed. Finally the linked results are displayed under
2https://blog.dblp.org/2024/06/14/the-dblp-knowledge-graph-major-extension-and-an-update-to-the-rdf-schema/
"Final Linked Results". E is a carousel of sub-pages, which provides further information, such as how to
access the entity linker via an API call, more information about the backend entity linker architecture,
and details of how to contact the authors and maintainers.</p>
      <p>Further, as seen in Figure 2, the final linked results tab, when expanded, displays a sorted list of
linked entities by log probability score, per span. The first column is the Span ID, where 0 stands for
the first span, 1 stands for the second span and so forth. The second column is the entity label of the
candidate as fetched from the Elasticsearch label database. The third column displays the DBLP type
for the entity candidate. The fourth column displays the log probability score of the given entity. Note
that the scores are in negative, and hence, they appear sorted in descending absolute value scores. The
iffth column is also called the evidence sentence, which is the triple that produced the strongest log
probability score for among all the triples for this given entity. The sixth column provides a clickable
URL link for the entity, which takes the user to the entity’s DBLP page.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Entity Linker Architecture</title>
      <p>Our entity linking pipeline combines prompted large language models (LLMs), type-specific retrieval
from an Elasticsearch index, and neighborhood-based re-ranking using KG context. We illustrate the
method using the input question:</p>
      <p>Who are the co-authors of Ashish Vaswani in "Attention is All You
Need" in neurips?</p>
      <sec id="sec-3-1">
        <title>3.1. Mention and Type Extraction via Prompted LLM</title>
        <p>We first extract named entity mentions from the input using a prompted LLM. The prompt is as follows:</p>
        <p>You are an information extraction assistant.</p>
        <p>Extract named entities from the following sentence and classify them into
one of the following types: person, publication, venue.</p>
        <p>Let the output be a JSON array of objects with fields ’label’ and ’type’.
Not all types may be present in a sentence. Now extract entities from the
following sentence:
Sentence: "Who are the co-authors of Ashish Vaswani in the ’attention is
all you need’ paper in neurips?"</p>
        <p>Entities:
{"label": "Ashish Vaswani", "type": "person"},
{"label": "attention is all you need", "type": "publication"},
{"label": "neurips", "type": "venue"}</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Candidate Entity Retrieval</title>
        <p>Each extracted label is matched against a type-specific Elasticsearch index to retrieve a list of candidate
entities. For example:
• "Ashish Vaswani" → [Ashish Vaswani, Vicky Vaswani, ...]
• "attention is all you need" → [doi:10.5555/attention-paper, ...</p>
        <p>• "neurips" → [NeurIPS, NeurIPS 2022, NeurIPS 2023, ...]</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Knowledge Graph Neighborhood Expansion</title>
        <p>For each candidate entity, we fetch up to  one-hop neighbors from a knowledge graph. These triples
are converted into readable sequences using a template of the form:</p>
        <p>[Head] - [Relation] - [Tail]
Example for Ashish Vaswani (author):
• Ashish Vaswani - authored - attention is all you need
• Ashish Vaswani - affiliated with - Google Brain
• Ashish Vaswani - published at - NeurIPS</p>
        <p>This yields a set of short sentences describing the local graph structure of each candidate.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Candidate Scoring with LLM Log-Probability</title>
        <p>Each linearized triple is evaluated by an LLM in the context of the original question. The prompt is:</p>
        <p>Given this input text: "Who are the co-authors of Ashish Vaswani in the
’attention is all you need’ paper in neurips?"
And the neighborhood context:
Ashish Vaswani - authored - attention is all you need
Is this the correct entity?</p>
        <p>Answer with ’yes’ or ’no’.</p>
        <p>We extract the log-probability of the next token being "yes" (before generation), which serves as a
soft alignment score for that triple. Each candidate entity receives multiple such scores — one per
triple. These are aggregated using mean pooling, where the average log-probability over all triples is
computed.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Entity Re-ranking</title>
        <p>All candidate entities for a given mention are ranked according to their aggregated log-probability
scores. The top-ranked candidate is selected as the final linked entity.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation Details</title>
      <p>
        The web demo is implemented using the Reflex web development framework 3 which allows building
dynamic web interfaces written purely in Python. For finding optimal parameters for the diferent
components of the entity linker pipeline, we randomly selected a set of 100 questions from the test set
of the DBLP_QuAD dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. As seen in Table 1, we tested several diferent LLMs of small sizes,
keeping in mind the limited GPU infrastructure available to us as university based researchers. We
tested 0.5B, 1.5B, 3B, 7B, 14B models of the Qwen-2.5 family and Llama-3.1-8B and Mistral-7B-Instruct-v0.2.
Based on the results of our experiments, we found the Mistral model lagging far behind, with F1 score
of 0.09. In comparison, Qwen-2.5-3B provided an optimal balance between size and performance, hence
the web demo makes use of this model. The "text only" performance in the fourth row is a setting
where the top text-based match is chosen as the final entity linking result. In efect, the subsequent
neighbourhood-based re-ranking step is skipped. When comparing this result to the row above, it is
clear that the entity linker is performing better than pure text-match-based entity linking. Additionally,
from the last column’s results, it seems that only for 62% of the cases do the labels produced by the
mention span detector translate to relevant candidates being fetched from the Elasticsearch labels
database. All the experiments were performed with a setting of n=10 and k=10, where n=number of
candidates from text search and k=number of neighbours from entities. We performed experiments
with greater n and k, but saw negligible improvements when compared to the rise in execution time
given the larger context to be parsed by the LLMs. Hence, we settled for values of 10 for n and k.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Limitations and Future Work</title>
      <p>Due to non-availability of a new entity linking dataset over the current DBLP schema, we were unable
to perform extensive evaluation for this task, especially on the new dblp:Stream entity type. Also,
because the underlying KGs are diferent, we could not directly compare DBLPLink 2.0’s performance
with DBLPLink 1.0. As future, work, we shall prioritise the collection of a new dataset which would
allow deeper analysis of our entity linker.</p>
      <p>Hits@10
0.0000
0.3100
0.4900
0.4300
0.5000
0.0300
0.4600
0.4000
0.1000
No use of generative AI was made in writing this paper. We relied on the spell-check feature of
Sharelatex software which was provided to us by the University of Leuphana as a tool to write research
papers. ChatGPT was used for generating the initial templates of the code that the demo runs on. The
code was later improved by the authors themselves to make it fully functional.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          , Wikidata:
          <string-name>
            <given-names>A Free</given-names>
            <surname>Collaborative</surname>
          </string-name>
          <string-name>
            <surname>Knowledgebase</surname>
          </string-name>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . URL: https://dl.acm.org/doi/10.1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ö.</given-names>
            <surname>Sevgili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arkhipov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <source>Neural Entity Linking: A Survey of Models based on Deep Learning, Semantic Web Journal</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>527</fpage>
          -
          <lpage>570</lpage>
          . URL: https://dl.acm.org/doi/10.3233/SW-222986.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>French</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T.</given-names>
            <surname>McInnes</surname>
          </string-name>
          ,
          <article-title>An Overview of Biomedical Entity Linking throughout the Years</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>137</volume>
          (
          <year>2023</year>
          )
          <fpage>104</fpage>
          -
          <lpage>252</lpage>
          . URL: https://www.sciencedirect.com/science/article/ abs/pii/S153204642200257X.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Elhammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V.S.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simpson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A High</given-names>
            <surname>Precision</surname>
          </string-name>
          <article-title>Pipeline for Financial Knowledge Graph Construction</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>967</fpage>
          -
          <lpage>977</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .coling-main.
          <volume>84</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Priem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Piwowar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Orr</surname>
          </string-name>
          ,
          <article-title>Openalex: A fully-open index of scholarly works, authors</article-title>
          , venues, institutions,
          <source>and concepts</source>
          ,
          <source>2022</source>
          . URL: https://arxiv.org/abs/2205.
          <year>01833</year>
          . arXiv:
          <fpage>2205</fpage>
          .
          <year>01833</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Jaradeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Oghli</surname>
          </string-name>
          , G. Heidari,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hussein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kabenamualu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Farfar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Prinz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Karras</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Vogt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>Fair scientific information with the open research knowledge graph</article-title>
          ,
          <source>FAIR Connect 1</source>
          (
          <year>2023</year>
          )
          <fpage>19</fpage>
          -
          <lpage>21</lpage>
          . URL: https://journals.sagepub.com/doi/abs/10.3233/FC-221513. doi:
          <volume>10</volume>
          .3233/FC-221513. arXiv:https://journals.sagepub.com/doi/pdf/10.3233/FC-221513.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ley</surname>
          </string-name>
          ,
          <article-title>The dblp computer science bibliography: Evolution, research issues, perspectives</article-title>
          , in: A.
          <string-name>
            <surname>H. F. Laender</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          Oliveira (Eds.),
          <source>String Processing and Information Retrieval</source>
          ,
          <source>SpringerLink Bücher</source>
          , Springer-Verlag Berlin Heidelberg, Berlin, Heidelberg,
          <year>2002</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 3-540-45735-6{\textunderscore}
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <article-title>Deola: A system for linking author entities in web document with dblp</article-title>
          ,
          <source>in: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          , CIKM '16,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>2449</fpage>
          -
          <lpage>2452</lpage>
          . URL: https://doi.org/10.1145/2983323.2983330. doi:
          <volume>10</volume>
          .1145/2983323.2983330.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          , Arefa,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          , C. Biemann,
          <article-title>DBLPLink: An Entity Linker for the DBLP Scholarly Knowledge Graph</article-title>
          ,
          <source>in: Proceedings of the 22nd International Semantic Web Conference Posters, Demos and Industry Tracks</source>
          , volume
          <volume>3632</volume>
          , Athens, Greece,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3632</volume>
          / ISWC2023_paper_428.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Awale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <article-title>Dblp-quad: A question answering dataset over the dblp scholarly knowledge graph</article-title>
          ,
          <source>in: Proceedings of the 13th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 45th European Conference on Information Retrieval (ECIR</source>
          <year>2023</year>
          ), Dublin, Ireland, April 2nd,
          <year>2023</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>51</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3617</volume>
          /paper-05.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>