Keyword Extraction in Scientific Documents
Susie Xi Rao1,∗ , Piriyakorn Piriyatamwong1,∗ , Parijat Ghoshal2,∗ , Sara Nasirian3 ,
Sandra Mitrović3 , Emmanuel de Salis4 , Michael Wechner5 , Vanya Brucker5 , Peter Egger1 and
Ce Zhang1
1
  Chair of Applied Economics, ETH Zurich, Switzerland
2
  Neue Zürcher Zeitung AG, Zurich, Switzerland
3
  Dalle Molle Institute for Artificial Intelligence, Lugano, Switzerland
4
  Haute-Ecole Arc, Neuchâtel, Switzerland
5
  Wyona AG, Zurich, Switzerland


                                           Abstract
                                           The scientific publication output grows exponentially. Therefore, it is increasingly challenging to keep track of trends and
                                           changes. Understanding scientific documents is an important step in downstream tasks such as knowledge graph building,
                                           text mining, and discipline classification. In this workshop, we provide a better understanding of keyword and keyphrase
                                           extraction from the abstract of scientific publications.


1. Introduction                                                                                               we see a large variation across domains (e.g., economics,
                                                                                                              computer science, mathematics, engineering fields, hu-
Keyphrases are single- or multi-word expressions (of-                                                         manities). For instance, publications in some disciplines,
ten nouns) that capture the main ideas of a given text,                                                       such as economics, are required to have author-generated
but do not necessarily appear in the text itself [1, 2, 3].                                                   or journal-curated keywords, while in other domains,
Keyphrases have been shown to be useful for many tasks                                                        such as computer science and engineering fields, not all
in the Natural Language Processing (NLP) domain, such                                                         publication venues (e.g., journals, proceedings) require
as (1.) indexing, archiving and pinpointing information                                                       authors to input keywords.
in the Information Retrieval (IR) domain [3, 4, 5, 6], (2.)                                                      In less technical domains, such as news media,
document clustering [3, 7, 8], and (3.) summarizing texts                                                     keyphrase lists may be more accessible in terms of
[3, 9, 10, 11], just to name a few.                                                                           the availability and the ease of manually curating the
   Keyphrase extraction has been at the forefront of vari-                                                    keyphrase list, even when reference lists are not readily
ous application domains, ranging from the scientific com-                                                     available. This is because in the news domain, people
munity [1, 2, 12], finance [13, 14], law [15], news media                                                     have particular interests in Named Entities (labelled en-
[11, 16, 17], patenting [18, 19], and medicine [20, 21, 22].                                                  tities such as person, location, event, time), as we will
Despite being a seemingly straightforward task for hu-                                                        discuss in Section 6. However, manually curating the
man domain experts, performing automatic keyphrase                                                            keyphrase list in general is often practically infeasible–
extraction is a challenging task.                                                                             hiring domain experts is costly, while crowdsourcing the
                                                                                                              annotation is difficult to control the quality [2, 3, 11].
Challenge 1: Benchmark Dataset and Keyword Ref-                                                                  With limited availability of benchmark datasets, large
erence List. One main reason is the lack of benchmark                                                         language models–which succeed in other NLP tasks–
datasets and keyword reference lists, as authors often                                                        simply fail to optimize and generalize, as they generally
do not provide their keyphrase list unless explicitly re-                                                     require a large, well-annotated training dataset [16]. The
quested or required to do so [3]. In scientific publications,                                                 lack of training datasets also poses challenges for the
                                                                                                              evaluation of keyword extraction systems.
SwissText 2022: Swiss Text Analytics Conference, June 08–10, 2022,
Lugano, Switzerland                                                                                           Challenge 2: Evaluation of Keyword Extraction.
∗
     Corresponding author.
†                                                                                                             Defining an evaluation protocol and a corresponding
    These authors contributed equally.
Envelope-Open srao@ethz.ch (S. X. Rao); ppiriyata@ethz.ch (P. Piriyatamwong);                                 metric is far from trivial for the following reasons.
parijat.ghoshal@nzz.ch (P. Ghoshal); sara.nasirian@supsi.ch
(S. Nasirian); sandra.mitrovic@supsi.ch (S. Mitrović);                                                                              (1.) We should look at the ground truth list of keywords
emmanuel.desalis@he-arc.ch (E. d. Salis);                                                                                                in a critical way. As we mentioned above, there can
michael.wechner@wyona.com (M. Wechner);                                                                                                  exist more than one ground truth list of keyphrases
vanya.brucker@wyona.com (V. Brucker); pegger@ethz.ch                                                                                     given an abstract. The keyword list provided in
(P. Egger); cezhang@ethz.ch (C. Zhang)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License      our dataset is a reference list of words provided by
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                           authors or by publishers. One should only treat
                        (a) Web of Science.                                        (b) Google Scholar.


                             (c) Scopus.                                         (d) Microsoft Academic.
Figure 1: Comparison of various academic products with the query for “data mining”.


     this list as a reference list, but not the one and only         cally equivalent matches [27, 28, 24]. There are other
     correct list of keywords.                                       evaluation methods which account for the ranks and
                                                                     orders in the extracted keywords, see this Medium
(2.) There are different aims in extracting keyphrases               article for inspiration [24].
     in system design. As we will introduce in the ra-
     tionale of designing the three systems in Section 3,
                                                                Challenge 3: Growing Number of Scientific Publi-
     the systems are designed to tackle various problems
                                                                cations. During the last decades, the number of scien-
     and, therefore, are optimized for different use cases.
                                                                tific publications has increased exponentially each year
     System 1 uses a simple TextRank algorithm (see Sec-
                                                                [29], making it increasingly challenging for researchers
     tion 4), which outputs the most prominent set of
                                                                to keep track of trends and changes, even strictly in their
     keyphrases/keywords; System 2 uses TextRank on
                                                                own field of interest [3, 30]. This bolsters the need for
     top of a clustering algorithm (see Section 5), which is
                                                                automatic keyword extraction for the use case as a text
     targeted at grouping similar articles and then learns
                                                                recommendation and summarization system. The effect
     from the cluster of articles; and System 3 uses pre-
                                                                of increasing publications is clearly visible in major aca-
     trained models and tools on Named-Entity Recogni-
                                                                demic search engines such as Google Scholar, Web of
     tion (NER) (see Section 6), with a goal to fully utilize
                                                                Science, Scopus, and Microsoft Academics. In a simple
     existing models and tools by only pre-processing the
                                                                query (“data mining”), three out of four failed to bring
     input and/or post-processing the output.
                                                                up relevant scientific publications that are prominent in
(3.) There are different objective functions that we want       the field and anticipated by human domain experts.
     to optimize. Precision, recall, accuracy, false posi-         See the query results in Figure 1 of a keyword search
     tive rate, and false negative rate are among the most      “data mining” in different academic products. We can see
     common performance metrics for various applica-            that the search results in different products vary largely,
     tion scenarios [23]. We might also consider the order      and it could be difficult for readers to choose between the
     of keyphrases, for example, as sorted by criteria such     different results without having prior knowledge of the
     as frequency, TextRank score [24, 25]. In search en-       field. So far, only Microsoft Academic Services (Figure 1
     gines, the hit rate is also an important metric [26].      (d)) has returned relevant research results that point to
     Furthermore, one can evaluate exact matches and            the most influential author and work in the field of data
     fuzzy matches. Fuzzy matches can also be broken            mining. This is because Microsoft Academic Service has
     down into two types: “partial” matches and semanti-        enabled a hierarchical discipline classification (indexed
by keyphrases) that supports its users when reviewing             (2.) We introduce three commonly used systems in
the search results. In summary, without relevant and                   academia and industry for keyword extraction. For
correct keyphrases, effective indexing and thus querying               the various use cases of keyword extraction, we also
is not feasible.                                                       design baseline evaluation metrics for each system.
                                                                  (3.) We encourage participants to discuss, extend, and
Challenge 4: Domain-Specific Keyword Extraction.
                                                                       evaluate the systems that we have introduced.
Another challenge in keyphrase extraction is its domain-
specific nature. One case is when a keyphrase extractor
trained in generic texts may miss out technical terms that        System Design of Keyword Extraction. For the key-
do not look like usual keyword noun chunks, such as the           word extraction, we provide two systems based on the
chemical name “C4H*Cl” [31]. The issue arises from the            unsupervised, graph-based algorithm TextRank [35]. Sys-
tokenization step: a non-alphabetic character such as “4”         tem 1 (see Section 4) is to develop the TextRank keyword
and “*” might be treated as a separator, and thus such            extractor from scratch in order to understand the rea-
a keyword gets split into “C”, “H” and “Cl”, losing its           soning behind it. System 2 (see Section 5) combines the
original notion. Even if the separator works perfectly,           TextRank algorithm with the K-Means clustering algo-
this type of chemical name would still confuse keyphrase          rithm [36, 37] to provide keyphrases for each specific
extractors that filter candidate keyphrase based on Part-         field (“cluster”). In System 3 (see Section 6), we cover
of-Speech (POS) tags. This is because for POS-based               the NER task, where an entity in the sentence is identi-
extractors, it is unclear whether “C4H*Cl” is an adjective,       fied as person, organization, and others from predefined
a noun or other POS tags.                                         categories. We will focus primarily on the biomedical
   Another case is when the keyphrase consists of a               domain using the state-of-the-art biomedical NER tool
mix of generic and specific words, such as “Milky Way”.           called HunFlair [38]. We also provide some baseline NERs
“Way” is generally a stopword [32], so the keyphrase ex-          for participants to evaluate.
tractor might only be able to detect “Milky” and throw               Beyond this workshop, the keyphrase extraction and
away “Way” without realizing that the term “Way” is not           NER methods we present are applicable to other text
a stopword in this specific context.                              corpora, including media texts and legal texts; one only
   Finally, we would like to mention KeyBERT, a state-of-         has to aware the domain-specific nature and properly
the-art BERT-based keyword extractor [33]. KeyBERT                adjust the algorithm pipeline. As such, we have linked
works by extracting multi-word chunks whose vector                the 20 newsgroup text dataset for the participants to try
embeddings are most similar to the original sentence.             their keyphrase extraction system on.
Without considering the syntactic structure of the text,
KeyBERT sometimes outputs keyphrases that are incor-              2. Benchmark Dataset
rectly trimmed, such as “algorithm analyzes”, “learning
machine learning”. This problem only worsens with the             We take a subset of 46,985 records from the Web of Sci-
aforementioned examples from chemistry and astronomy,             ence dataset (WOS). The original WOS dataset is provided
since it is not straightforward how to tokenize, i.e., “split”,   by Kamran Kowsari in the HDLTex: Hierarchical Deep
words and how to handle non-alphabetic characters.                Learning for Text Classification paper [34]. The original
                                                                  data was provided in .txt format.
Our Goals and Contributions in this Workshop.                        For the ease of work, we have pre-processed the origi-
Despite the challenges, keyphrase extraction is an im-            nal data and store it into .csv dataframe format, which
portant step for many downstream tasks, as already de-            would be most compatible with our Python working
scribed. In this workshop, we aim to cover the founda-            setup. The final dataframe is in the format as in Table 1,
tions of keyphrase extraction in scientific documents and         where (1) each record corresponds to a single scientific
provide a discussion venue for academia and industries            document, and (2) has the following columns:
on the topic of keyword extraction. Our contributions in
the workshop are as follows.                              • Domain : the domain the document belongs to,

(1.) We make a new use of the existing dataset from the           • area : the sub-domain the document belongs to,
     Web of Science (WOS) [34]. This dataset has been             • keywords : the list of keyphrases provided by the au-
     used as a benchmark dataset for hierarchical classi-           thors, stored as a single string with separator “;”,
     fication systems. Since it comes with reference lists
     of keywords, we utilize it as a benchmark dataset            • Abstract : the abstract of the document.
     for keyword extraction. In this workshop, together
     with the participants, we study the feasibility of that        Columns Y1 and Y2 which are simply the index of
     dataset in three systems.                                    column Domain and area , respectively. Column Y are the
 Y1   Y2   Y     Domain         area                                keywords                                                          Abstract
  5   50   122   Medical   Sports Injuries   Elastic therapeutic tape; Material properties; Tension test   The aim of this study was to analyze stabilometry in athletes...
                                                                                                              This study examined the influence of range of motion of
  5   48   120   Medical   Senior Health             Sports injury; Athletes; Postural stability                the ankle joints on elderly people’s balance ability...

Table 1
A sample of the WOS benchmark dataset.


sub-sub-domain, which we do not use here but includes                                cases in the NLP domain including webpage ranking (bet-
for reference.                                                                       ter known as PageRank), extractive text summarization,
   In the corpus, we are provided with scientific arti-                              and keyword extraction [35, 39, 19, 17, 40, 41]. Across
cles from seven domains: Medical, Computer Science                                   different use cases, the base TextRank algorithm remains
(CS), Biochemistry, Psychology, Civil, Electronics and                               the same; one only needs to adjust what is designated
Communication Engineering (ECE), and Mechanical and                                  as nodes, edges, and edge weights when constructing
Aerospace Engineering (MAE). Therefore, column Y1 con-                               the graph from the text corpus. The higher edge weight
sists of unique values from 0 to 6.                                                  means the higher chance of choosing this particular edge
   In Table 1, note that both records have the same do-                              to proceed to the next node. For example, in the web con-
main Y1 as “5” corresponding to Domain as “Medical”.                                 text, the PageRank Algorithm considers different web-
Their sub-domain Y2 differs: the first record is about                               pages as nodes and the hyperlinks between webpage
“Sports Injuries”, while the second record is about “Se-                             pairs as edges. Here, the edges are asymmetrically di-
nior Health”. keywords and Abstract of each record                                   rected, since there could be a hyperlink from one page
match its sub-domain.                                                                to another but not necessarily vice versa. The edges can
   Finally, the records are splitted at the ratio 70:30 into                         then be weighted by the number of hyperlinks.
the train/test sets with 32,899 and 14,096 abstracts, re-                               In our keyword extraction, the TextRank algorithm
spectively. We provide the training set with keywords                                works by considering terms in text as graph nodes, term
column to the participants for the training of their key-                            co-occurence as edges, and the number of co-occurence
word and/or NER extraction system, and the test set                                  of two terms within a certain window as the edge
for the participants to evaluate the system. The reason                              weights. Note that the co-occurence window is a fixed
for splitting the dataframe is so that the participants do                           pre-specified size (say, 5-gram within sentence boundary).
not overfit their system towards the whole dataset. We                               Based on this notion, the graph is treated as weighted
encourage them to design their system based on the fea-                              but undirected.
tures learnt from the training set and apply the identical                              Subsequently, each term score is given by how “likely”
pipeline to the test set.                                                            an agent, starting at a random point in the graph and
                                                                                     continuously jumping along the weighted edges, will end
                                                                                     up at that term node after a long time horizon. The terms
3. Systems                                                                           with higher scores are then considered more important,
                                                                                     that is, the “keywords” extracted by the TextRank sys-
Now we discuss the three systems we provide to the
                                                                                     tem.1
participants as simple baselines for keyword extraction
using the benchmark dataset. Certainly, there are vari-
ous possible extensions to them. We list the participant 4.2. Implementation
contributions under Section 7.                            We implement a very basic keyword extraction system
                                                          based on the TextRank algorithm from scratch, in order
4. System 1: TextRank Algorithm for the participants to get hands-on experience on how
                                                          the algorithm works. Subsequently, we propose addi-
In System 1, we build the TextRank algorithm from tional improvement ideas so that participants have the
scratch and add customizations to our needs, e.g., fil- opportunity to be creative and improve the basic system.
tering by Part-of-Speech tags.                               For implementation, we mainly use the Python pack-
                                                          age for natural language processing called spaCy [42].
                                                          spaCy utilizes pre-trained language models to perform
4.1. TextRank
                                                          many NLP tasks, among other things, Part-of-Speech tag-
The TextRank algorithm is a graph-based algorithm
which, as the name suggests, is used to assign scores to 1 In the web analogy, the webpage score would correspond to the
texts, thereby giving a ranking [35]. It has numerous use chance that an Internet user would end up in that webpage after
                                                                                      continuously browsing through the hyperlinks. In this sense, we
                                                                                      retrieve the most popular webpages.
ging (PoS tagging), semantic dependency parsing, and          – Use a domain-specific tokenizer such as ScispaCy
Named-Entity Recognition. In our case, we use spaCy             [45] for biomedical data.
along with its small pre-trained model for English lan-       – Lemmatize or stem tokens before recording them
guage (en_core_web_sm ) as a text pre-processor and to-         in the vocabulary list and building the adjacency
kenizer. The rest of tasks are handled by usual built-in        matrix, so that different versions of the same words
Python libraries.                                               (such as plural “solitons” and singular “soliton”) are
   Our basic system consists of the following steps:            mapped to the same record.
(1.) Text pre-processing: stopword and punctuation re- • Add the post-processing step:
     moval.
                                                          – Exclude keywords that are too short.
(2.) Text tokenization: tokenizing the text and build a • Agglomerate keywords (and perhaps add back some
     vocabulary list.                                     stopwords) to form “keyphrases” (“the” and “of” should
(3.) Build the adjacency matrix from the graph.               not be removed within “the Department of Health”).
     • Matrix index in row and column: terms in the           Advanced participants are also directed to another
       vocabulary list.                                    Python package NetworkX , which has a built-in, com-
     • Matrix entries: co-occurence of term pairs within   putationally efficient implementation for the TextRank
       the same window of pre-specified size.              algorithm [46].

(4.) Normalize the matrix and compute the stationary
     distribution of the matrix.
                                                           4.4. Evaluation: Instance-Based
                                                                Performance
(5.) Retrieve keyword(s) corresponding to terms with
     highest stationary probabilities.                     In System 1, the objective is instance-based, that is, for
                                                           each abstract, we need to evaluate how well the algo-
   The implemented code is stored as a Jupyter notebook rithm performs. The metric could be accuracy, that is,
and hosted on Google Colaboratory and allows the partic- the ability to find as many keyphrases (compared to the
ipant to test and work directly on the code online without reference list) as possible. We can also compute the pre-
local installation. There, the step-by-step description is cision and recall scores (micro or macro). We provide
provided and a code sanity check was performed. For ex- a simple baseline evaluation function in the notebook.
ample, our system extracts valid keywords “cute”, “dog”, Here, we allow fuzzy matching algorithms on the phrase
“cat” (in descending order by term prominence) for a level, where the cut-off ratio and the edit distance be-
short text: “This is a very cute dog. This is another cute tween the candidate term and the reference term can be
cat. This dog and this cat are cute”.                      adjusted.


4.3. Further Ideas                                         5. System 2: TextRank with
Inspired by existing keyword extraction systems in            Clustering
Python such as summa [43] and pke [44], we have pro-
vided participants with a list of ideas to further im-     In System 2, we extend the TextRank keyword extraction
prove the keyword extraction system along with hints       described in System 1 (see Section 4) and apply it to a
for Python implementation using spaCy (see the Jupyter     group of texts clustered by the K-Means algorithm. In this
notebook):                                                 way, we obtain a more focused keyword list specifically
                                                           for each text group and learn about its characteristics.
• Improve the pre-processing step:
  – Remove numbers.                                        5.1. K-Means Algorithm
  – Standardize casings, such as lower-casing the entire   The K-Means algorithm is a clustering algorithm which
    text.                                                  partitions points in a vector space into “K” clusters (“K”
  – Use a domain-specific or custom-made stopword          being pre-specified), such that each point belongs to the
    list.                                                  cluster with the nearest cluster centroid (called “Means”)
                                                           [36, 37]. It works in the following steps.
• Improve the tokenization step:
  – Filter by Part-of-Speech tags to only include nouns    (1.) Assign k random points as the cluster “means”.
    in the vocabulary list.                                (2.) Doing the following until the convergence:
       a) Assignment step: Assign each point to the clus- offers several pre-trained models for different purposes,
          ter with the least squared Euclidean distance to from which we choose the small model (all-MiniLM-L6-
          the cluster mean,                                 v2 ).
      b) Update step: Recalculate the “mean” as the av-        Second, to group the documents, we use the imple-
          erage of all the points assigned to each cluster, mentation     in the package sklearn [52]. Furthermore,
                                                            we provide a cluster visualization using the package
       c) Terminate when the cluster assignment stabi- matplotlib [53]. We set the parameter 𝐾 = 7 for the
          lizes.                                            K-Means algorithm, which is the number of disciplines
   We ultimately choose the K-Means algorithm for clus- in the WOS dataset.
tering because of its low complexity: it works very fast       Finally, we extract the keyphrases from each cluster.
for large datasets like ours [47, 48]. Often, one hidden    Unlike   in System 1, we do not implement the TextRank al-
caveat about the K-Means algorithm is the choice of the     gorithm    from scratch, but instead use the existing Python
number of clusters “K”. However, in our specific use case   package     pke [44]. pke provides implementations of nu-
with the scientific publications, we usually have a good merous keyword extraction algorithms from publications,
estimate based on the number of target disciplines. There- as well as allowing customization such as Part-of-Speech
fore, K-Means serves our purpose well.                      tag filters and the limit on the maximum number of words
                                                            in a single keyphrase. In our case, we simply use the basic
                                                            TextRank algorithm, also to demonstrate that even the
5.2. Preprocessing: Sentence-BERT                           very basic algorithm can already yield satisfying outputs.
       Embeddings                                              Like in System 1, the code implemented for System
                                                            2 is stored as a Jupyter notebook and hosted on Google
As mentioned in the previous section, K-Means clusters
                                                            Colaboratory. The step-by-step description is provided,
points in a vector space. Therefore, we need to transform
                                                            and a code sanity check succeeds at characterizing a clus-
each text in our dataset into a vector representation. This
                                                            ter: the cluster mostly consisting of medical articles has
is often done by averaging pre-trained word embeddings
                                                            relevant keyphrases such as “patient group”, “treatment
over all the words that appear in the document, regard-
                                                            effects”, “autism patient” among the top-10 extracted
less of whether they are context-free embeddings like
                                                            keyphrase list.
GloVe [49] or contextualized embeddings like BERT [50].
However, this has been shown to perform worse than
directly deriving contextualized sentence embeddings 5.4. Further Ideas
(Sentence-BERT [51]). Therefore, we opt for contextual-
                                                            We invite participants to explore improvement ideas and
ized sentence embeddings from Sentence-BERT, which
                                                            provide coding hints on how to implement them on pke :
is trained on the Siamese BERT networks [51]. More
technical details can be found in the original paper by N. • Customize the TextRank algorithm:
Reimers and I. Gurevych [51].
   The Sentence-BERT transforms each text into a 384-          – Change the window size.
dimensional semantically meaningful vector, which is • Use alternative keyword extraction algorithms to the
now ready to be an input to the K-Means algorithm for          TextRank algorithm, such as:
clustering.
                                                               – The TopicRank algorithm [54],

5.3. Implementation                                            – The Multipartite algorithm [55],
                                                           – The BERTopic algorithm [56].
We add the clustering step to our pipeline, which effec-
tively results in the following procedure:               • Impose extra criteria on valid keyphrases, such as:

   (1.) For each document, extract its Sentence-BERT           – Change the maximum number of words allowed in
        embedding,                                               a single keyphrase,
                                                               – Restrict the keyphrase to only contain the top cer-
   (2.) Cluster the documents into K groups based on             tain percentage of all keywords.
        their Sentence-BERT embeddings, i.e., by the sen-
        tence contents,
                                                             5.5. Evaluation: Cluster-Based
   (3.) For each document cluster, extract its keyphrases.        Performance
   First, we generate embedding representations for each Using a similar evaluation function as in System 1 (See
text, which is very easy by the Python package sentence- Section 4.4), we now look at a cluster-based objective.
transformers . The package sentence-transformers
This means that we take all the keywords from the arti-
cles clustered in the same group and build a new refer-
ence list of keywords. Subsequently, the evaluation of the
user-generated list will be compared with this expanded
list. Notably, this approach increases the coverage of
keywords in the reference, in the hope of covering more
out-of-abstract keywords in this expanded list. However,
it comes at the cost of increasing the denominator when
we compare the user-generated list to the reference list.
One way to better present the reference list of one clus-
ter is to process the list by criteria such as frequency.
Another way to evaluate is using word embedding sim-
ilarities (c.f. KeyBert [33] as an example of leveraging
embeddings). In this way, we have a better view of the
extracted keywords and the degree to which the user-
generated list is close to the reference list. In particular, Figure 2: NZZ Topic Page based on keywords and named
this technique is useful for assessing the difference set entities from news articles. Accessible at nzz.ch/themen.
between the user-generated list and the reference one.


6. System 3: Named-Entity                                      are not limited by the fixed categories of an NER model,
                                                               and may contain named entities if those entities are repre-
   Recognition as Keyword                                      sentative of a given document. For example, a document
   Extraction                                                  about Heathrow Airport can contain keywords such as
                                                               “arrival”, “customs”, “departure”, “duty free”, “immigra-
The goal of system 3 is to emulate some of the constraints     tion” and “London”. Depending on the model classes, an
that may exist in a practical setting. These could be sit-     NER model on the same text could extract entities such
uations where a keyword extractor system cannot be             as “British Airways” (ORG), “London” (LOC), “United
implemented as the output of these systems may be in-          Kingdom” (LOC), etc. In this example, there is overlap
correct or non-sensical. Another situation could be that       between the keywords and named entities; however, due
one is required to use existing tools such as a Named-         to the defining characteristics of both approaches, there
Entity Recognition system and must enact measures to           is a significant difference between the lists.
improve the output of the model.                                  Figure 2 demonstrates the use of keyword extraction
                                                               and named-entity recognition in the industry setting
6.1. Named Entities, Named-Entity                              at Neue Zürcher Zeitung (NZZ), where key terms are
     Recognition and Keyword Extraction                        extracted and relevant articles are assigned to the terms.

A named entity (NE) in most cases is a proper noun, the
                                                               6.2. Use of Keywords in the News Domain
most common categories being person, location and or-
ganization; however, other categories that are not proper      As mentioned above, for a given text, keywords and the
nouns, such as temporal expressions, are also possible.        output of a NER model may overlap. When it comes
Named-Entity Recognition consists of locating and classi-      to analyzing news, a typical NER model (with common
fying named entities mentioned in unstructured text into       categories such as person, organization, and location)
predefined categories [57, Chapter. 8.3]. Keywords are         excels at finding named entities for the model-specific
single or multi-word expressions that under ideal circum-      categories. However, only extracting the entities is inad-
stances should concisely represent the key content of a        equate for finding nuanced differences between multiple
document [58, Page 3]. As the goal of NER is to assign         articles that contain identical named entities. In Table 2
a label to spans of text [57, Chapter. 8.3], it is a classi-   we see the titles of 10 articles published in Neue Zürcher
fication task that can be solved by building a machine         Zeitung (NZZ) during March 2022. According to the NER
learning model [59].                                           model for German texts used internally by the NZZ, all
   The difference between keyword extraction and NER           articles have “Ukraine” (location) as a common named
is as follows. Named entities are words or phrases with a      entity. Despite the similarities, there are thematic dif-
specific label determined by predefined classes of a given     ferences between these articles. After using a keyword
NER model. Therefore, these entities may not necessarily       extraction system that uses similar methodologies men-
represent the essential content of a document. Keywords        tioned in Systems 1 and 2, keywords that are not named
                                                                                                              6.4. Pre-Trained NER Models
    Number   NZZ Article Title
      1      Eine Zürcherin nimmt ukrainische Flüchtlinge auf – und fühlt sich vom Staat alleingelassen
                             «Eine Solidaritätsbekundung auf Instagram zu posten, reicht nicht»:
       2
       3                                                        There are some disadvantages to using pre-trained NER
             Viele Zürcherinnen und Zürcher möchten Flüchtlinge aus der Ukraine bei sich zu Hause aufnehmen
             150 Ukraine-Flüchtlinge sind im Kinderdorf – wie geht es weiter?
       4     Krieg in der Ukraine: Wie ein SVP-Dorf Flüchtlinge aufnimmt
       5                                                        models. One should take into consideration that using a
             Neutralität im Ukraine-Krieg - wo genau steht die Schweiz?
       6     Neutralität: Fand in der Schweiz gerade eine Zeitenwende statt
       7                                                        pre-trained model to extract named entities out of docu-
             Putin, die Schweiz und die zwei Seiten der Neutralität
       8     Christoph Blocher: Neutralität ist nicht nur Selbstzweck
       9
      10
                                                                ments from different domains can result in a fall in model
             Sicherheitspolitik: Militärische Neutralität weiterdenken
             Sicherheitspolitik: Solidarische Neutralität
                                                                performance [65]. The training data and categories of
Table 2                                                         the model will influence the output. For example, the
Titles of 10 articles published in Neue Zürcher Zeitung (NZZ) string “ATP” can be labeled as an organization (e.g. As-
during March 2022.                                              sociation of Tennis Professionals) by one model and as a
                                                                chemical (e.g. adenosine triphosphate) by a biomedical-
                                                                NER model. Creating an NER model for a specific type
entities were found. These keywords demonstrate the- of entity requires the annotation of a corpus, which can
matic groupings between the articles. The most common be a significant expense and effort for the user [65].
keyword for articles 1-4 is “Flüchtlinge” (“refugees”), and
for articles 5-10 is “Neutralität” (“neutrality”). This differ- 6.5. Further Ideas
ence can also be observed in the article titles, and upon
closer inspection of the article content, it is evident that The challenge of this system lies in working with pre-
some of the articles (1-4) revolve around the topic of calculated data from systems that cannot be influenced.
refugees from Ukraine, while other articles (5-10) discuss The participants are provided with multiple tables with
the notion of neutrality. Using named entities or, in some the output of two different NER systems, fastText doc-
cases, a predefined list of keywords can be useful to de- ument, and word vectors (see Section 6.3). In addition,
fine broad topic pages (see nzz.ch/themen), but keywords they also have a table at their disposal to verify whether
offer concise yet semantically insights into the content a keyword for a given document is present in the abstract
of a document. Therefore, they can be potentially used and whether it was discovered by any of the NER models
to automatically identify possible subtopics with a news (with 100% string matches). The intuition of System 3
story or discover emerging topics from newly published is that given the resources (cost, time, hardware), one
articles.                                                       needs to come up with the best possible strategies to
                                                                detect meaningful keywords.
6.3. Data Preparation
                                                                                                              6.6. Evaluation: Instance-based
The FLAIR framework [60] was chosen as it contains
many out-of-the-box NER models for generic and biomed-
                                                                                                                   Performance
ical texts. Furthermore, the framework is also useful                                                         In addition to the pre-calculated data, the participants
for integrating pre-trained embeddings and models. As                                                         were also given evaluation functions to compare differ-
many of the texts are from the biomedical domain, the                                                         ences between their system NER model output and the
ScispaCy library was used for word and sentence tok-                                                          keyword list that came with the documents. There are
enization [61]. The results of the NER models were given                                                      cases where an item from the curated keyword list does
to the participants. The ner-english model is a 4-class                                                       not contain the keyword in the abstract, or contains a
NER model for English, which comes with FLAIR [62].                                                           partial or inflected form of the keyword. The evaluation
This model has the following categories: locations (LOC),                                                     function contains a partial string matching sequence,
persons (PER), organizations (ORG), and miscellaneous                                                         where one can choose the amount of character similarity
(MISC) [63]. We also provided participants with NER                                                           between two strings. For example, a document has the
results from HunFlair [38], which is an NER tagger for                                                        label “radio frequency”, but the string “radio frequen-
biomedical texts. This biomedical NER tagger is based on                                                      cies” is present in the abstract and the inflected form
the HUNER tagger, and has the follwing named-entity cat-                                                      was also found by one of the NER models. For this case,
egories: Chemicals, Diseases, Species, Genes or Proteins,                                                     participants can set a string similarity value (e.g., 80%
and Cell lines [64]. As an additional hint to participants,                                                   similarity) to circumvent the issues caused by inflected
document embeddings for each item in the train and test                                                       forms, or partially mentioned forms (“radio frequency”
sets, as well as word embeddings for the entire corpus,                                                       vs. “radio frequency scanner”). Using the resources at
were generated from a fastText model2 trained on the                                                          their disposal, participants must develop the best possible
English Common Crawl dataset (cc.en.300.bin )3 .                                                              strategies to build a system that can detect the maximum
                                                                                                              number of relevant keywords.
2
    https://fasttext.cc/ (last accessed: June 20, 2022).
3
    https://fasttext.cc/docs/en/crawl-vectors.html (last accessed: June
    20, 2022).
7. Participant Contributions                               Swiss National Science Foundation (Project Number
                                                           200021_184628, and 197485), Innosuisse/SNF BRIDGE
Our participants have further investigated keyphrase ex- Discovery (Project Number 40B2-0_187132), European
tractions in System 1 and provided valuable contributions Union Horizon 2020 Research and Innovation Programme
to our proceedings. Their original theses can be founded (DAPHNE, 957407), Botnar Research Centre for Child
at the following Google Drive folder.                      Health, Swiss Data Science Center, Alibaba, Cisco, eBay,
   The basic TextRank keyword extractor in System 1 Google Focused Research Awards, Kuaishou Inc., Oracle
has been extended to account for the following data pre- Labs, Zurich Insurance, and the Department of Computer
processing steps: (1) remove numbers; (2) restrict valid Science at ETH Zurich. We would like to thank Neue
keywords to only nouns; (3) restrict valid keywords by Zürcher Zeitung for collaborating on this project.
imposing the minimum string length. The contribution
can be found on the Google Drive folder.
   Additionally, the evaluation system has been gener- References
alized to output numerical performance scores, allow-
ing simpler comparisons of different keyword extractors. [1] T. D. Nguyen, M.-Y. Kan, Keyphrase extraction in
The contribution can be found on the Google Drive folder.       scientific publications, in: D. H.-L. Goh, T. H. Cao,
   Finally, a comparison between the TextRank algorithm         I. T. Sølvberg, E. Rasmussen (Eds.), Asian Digital
and further unsupervised keyphrase extraction methods           Libraries. Looking Back 10 Years and Forging New
has been provided. The limitation of TextRank is that it        Frontiers, Springer Berlin Heidelberg, Berlin, Hei-
only considers the co-occurences of the word pair and           delberg, 2007, pp. 317–326.
not the semantical meanings, which may cause certain        [2] S. N. Kim, M.-Y. Kan, Re-examining automatic
extracted “frequent” word pairs to either be irrelevant or      keyphrase extraction approaches in scientific ar-
under-represented. Therefore, an experiment has been            ticles, in: Proceedings of the Workshop on Mul-
performed using the pke library to compare the perfor-          tiword Expressions: Identification, Interpretation,
mance of the TextRank algorithm and several other unsu-         Disambiguation and Applications (MWE 2009), As-
pervised keyphrase extraction algorithms on the bench-          sociation for Computational Linguistics, Singapore,
mark test dataset. The contribution can be found on the         2009, pp. 9–16.
Google Drive folder.                                        [3] E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin,
   Beyond the academic setting, the use of keyword ex-          C. G. Nevill-Manning, Domain-specific keyphrase
tractions is demonstrated in the industry setting, where        extraction, in: T. Dean (Ed.), Proceedings of the
Wyona AG utilizes keyword extractors in the working             Sixteenth International Joint Conference on Artifi-
pipeline of the Q&A Chatbot “Katie”. The contribution           cial Intelligence, IJCAI 99, Stockholm, Sweden, July
can be found on the Google Drive folder.                        31 - August 6, 1999. 2 Volumes, 1450 pages, Morgan
                                                                Kaufmann, 1999, pp. 668–673.
                                                            [4] C. Gutwin, G. Paynter, I. Witten, C. Nevill-Manning,
8. Conclusion                                                   E. Frank, Improving browsing in digital libraries
                                                                with keyphrase indexes, Decision Support Systems
In this workshop, we provided the background and base-          27 (1999) 81–104. doi:https://doi.org/10.1016/
line systems for keyword extraction, shared a benchmark         S0167- 9236(99)00038- X .
dataset on scientific keyword extraction, and invited con- [5] O. Medelyan, I. H. Witten, Domain-independent au-
tributions from participants from industry and academia.        tomatic keyphrase indexing with small training sets,
The methodologies discussed can be extended to keyword          J. Am. Soc. Inf. Sci. Technol. 59 (2008) 1026–1040.
extraction in other domains (e.g., legal and news).         [6] O. Borisov, M. Aliannejadi, F. Crestani, Keyword
                                                                extraction for improved document retrieval in con-
                                                                versational search, CoRR abs/2109.05979 (2021).
Acknowledgements                                            [7] J. Han, T. Kim, J. Choi, Web document cluster-
The authors would like to thank the organizers from Swis-       ing by using automatic keyphrase extraction, in:
sText2022 for hosting our workshop. Peter Egger and the         2007 IEEE/WIC/ACM International Conferences on
Chair of Applied Economics acknowledge the support of           Web   Intelligence and Intelligent Agent Technology -
the Department of Management, Technology, and Eco-              Workshops,    2007, pp. 56–59. doi:10.1109/WI- IATW.
nomics at ETH Zurich. Ce Zhang and the DS3Lab grate-            2007.46 .
fully acknowledge the support from the Swiss State Secre- [8] K. M. Hammouda, D. N. Matute, M. S. Kamel,
tariat for Education, Research and Innovation (SERI) un-        Corephrase: Keyphrase extraction for document
der contract number MB22.00036 (for European Research           clustering, in: P. Perner, A. Imiya (Eds.), Ma-
Council (ERC) Starting Grant TRIDENT 101042665), the            chine Learning and Data Mining in Pattern Recogni-
     tion, Springer Berlin Heidelberg, Berlin, Heidelberg,    [19] J. Hu, S. Li, Y. Yao, L. Yu, G. Yang, J. Hu, Patent
     2005, pp. 265–274.                                            keyword extraction algorithm based on distributed
 [9] M. Litvak, M. Last, Graph-based keyword extrac-               representation for patent classification, Entropy 20
     tion for single-document summarization, in: Pro-              (2018). doi:10.3390/e20020104 .
     ceedings of the Workshop on Multi-Source Multi-          [20] H. Ding, X. Luo, Attention-based unsupervised
     lingual Information Extraction and Summarization,             keyphrase extraction and phrase graph for covid-
     MMIES ’08, Association for Computational Linguis-             19 medical literature retrieval, ACM Trans. Comput.
     tics, USA, 2008, p. 17–24.                                    Healthcare 3 (2021). doi:10.1145/3473939 .
[10] K. Sarkar, A keyphrase-based approach to text            [21] M. Komenda, M. Karolyi, A. Pokorná, M. Víta,
     summarization for english and bengali documents,              V. Kríž, Automatic keyword extraction from medi-
     Int. J. Technol. Diffus. 5 (2014) 28–38. doi:10.4018/         cal and healthcare curriculum, in: 2016 Federated
     ijtd.2014040103 .                                             Conference on Computer Science and Information
[11] J. R. Thomas, S. K. Bharti, K. S. Babu, Automatic             Systems (FedCSIS), 2016, pp. 287–290.
     keyword extraction for text summarization in e-          [22] Q. Li, Y.-F. B. Wu, Identifying important concepts
     newspapers, in: Proceedings of the International              from medical documents, Journal of Biomedical
     Conference on Informatics and Analytics, ICIA-16,             Informatics 39 (2006) 668–679. doi:https://doi.
     Association for Computing Machinery, New York,                org/10.1016/j.jbi.2006.02.001 .
     NY, USA, 2016. doi:10.1145/2980258.2980442 .             [23] A. Zehtab-Salmasi, M.-R. Feizi-Derakhshi, M.-A.
[12] S. N. Kim, O. Medelyan, M.-Y. Kan, T. Baldwin,                Balafar, Frake: Fusional real-time automatic key-
     SemEval-2010 task 5 : Automatic keyphrase ex-                 word extraction, 2021. arXiv:2104.04830 .
     traction from scientific articles, in: Proceedings of    [24] C. Sun, L. Hu, S. Li, T. Li, H. Li, L. Chi, A review of
     the 5th International Workshop on Semantic Eval-              unsupervised keyphrase extraction methods using
     uation, Association for Computational Linguistics,            within-collection resources, Symmetry 12 (2020).
     Uppsala, Sweden, 2010, pp. 21–26.                             doi:10.3390/sym12111864 .
[13] J. Li, Y. Li, Z. Xue, Keywords extraction algorithm      [25] D. Mahata, R. R. Shah, J. Kuriakose, R. Zimmermann,
     of financial review based on dirichlet multinomial            J. R. Talburt, Theme-weighted ranking of keywords
     model, in: Y. Jia, W. Zhang, Y. Fu (Eds.), Proceedings        from text documents using phrase embeddings, in:
     of 2020 Chinese Intelligent Systems Conference,               2018 IEEE Conference on Multimedia Information
     Springer Singapore, Singapore, 2021, pp. 107–116.             Processing and Retrieval (MIPR), 2018, pp. 184–189.
[14] M. Pejić Bach, Z̆. Krstić, S. Seljan, L. Turulja, Text        doi:10.1109/MIPR.2018.00041 .
     mining for big data analysis in financial sector: A      [26] S.-C. Kuai, W.-H. Liao, C.-Y. Chang, G.-J. Yu, Fb-
     literature review, Sustainability 11 (2019). doi:10.          kea: A feature-based keyword extraction algorithm
     3390/su11051277 .                                             for improving hit performance, in: 2021 IEEE In-
[15] M. Jungiewicz, M. Łopuszyński, Unsupervised                   ternational Conference on Consumer Electronics-
     keyword extraction from polish legal texts, in:               Taiwan (ICCE-TW), 2021, pp. 1–2. doi:10.1109/
     A. Przepiórkowski, M. Ogrodniczuk (Eds.), Ad-                 ICCE- TW52618.2021.9602870 .
     vances in Natural Language Processing, Springer          [27] R. Saga, H. Kobayashi, T. Miyamoto, H. Tsuji, Mea-
     International Publishing, Cham, 2014, pp. 65–70.              surement evaluation of keyword extraction based
[16] D. Wu, W. Uddin Ahmad, S. Dev, K.-W. Chang,                   on topic coverage, in: C. Stephanidis (Ed.), HCI
     Representation Learning for Resource-Constrained              International 2014 - Posters’ Extended Abstracts,
     Keyphrase Generation, arXiv e-prints (2022).                  Springer International Publishing, Cham, 2014, pp.
     arXiv:2203.08118 .                                            224–227.
[17] J. Piskorski, N. Stefanovitch, G. Jacquet, A. Po-        [28] F. Liu, X. Huang, W. Huang, S. X. Duan, Per-
     davini, Exploring linguistically-lightweight key-             formance evaluation of keyword extraction meth-
     word extraction techniques for indexing news arti-            ods and visualization for student online comments,
     cles in a multilingual set-up, in: Proceedings of the         Symmetry 12 (2020). doi:10.3390/sym12111923 .
     EACL Hackashop on News Media Content Analysis            [29] L. Bornmann, R. Mutz, Growth rates of modern
     and Automated Report Generation, Association for              science: A bibliometric analysis based on the num-
     Computational Linguistics, Online, 2021, pp. 35–44.           ber of publications and cited references, Journal
[18] S. Suzuki, H. Takatsuka, Extraction of keywords               of the Association for Information Science and
     of novelties from patent claims, in: Proceedings of           Technology : JASIST 66 (2015-11) 2215 – 2222.
     COLING 2016, the 26th International Conference                doi:10.1002/asi.23329 , published online 29 April
     on Computational Linguistics: Technical Papers,               2015.
     The COLING 2016 Organizing Committee, Osaka,             [30] B. Hua, Y. Shin, Extraction of sentences describing
     Japan, 2016, pp. 1192–1200.                                   originality from conclusion in academic papers, in:
     Y. Zhang, C. Zhang, P. Mayr, A. Suominen (Eds.),        [42] I. Montani, M. Honnibal, M. Honnibal, S. V.
     Proceedings of the 1st Workshop on AI + Informet-            Landeghem, A. Boyd, H. Peters, P. O. McCann,
     rics (AII2021) co-located with the iConference 2021,         M. Samsonov, J. Geovedi, J. O’Regan, D. Altinok,
     Virtual Event, March 17th, 2021, volume 2871 of              G. Orosz, S. L. Kristiansen, D. de Kok, L. Mi-
     CEUR Workshop Proceedings, CEUR-WS.org, 2021,                randa, Roman, E. Bot, L. Fiedler, G. Howard, Ed-
     pp. 58–70.                                                   ward, W. Phatthiyaphaibun, R. Hudson, Y. Tamura,
[31] M. Krallinger, F. Leitner, O. Rabal, M. Vazquez,             S. Bozek, murat, R. Daniels, P. Baumgartner,
     J. Oyarzábal, A. Valencia, Chemdner: The drugs               M. Amery, B. Böing, explosion/spaCy: New
     and chemical names extraction challenge, Journal             Span Ruler component, JSON (de)serialization of
     of Cheminformatics 7 (2015) S1 – S1.                         Doc, span analyzer and more, 2022. doi:10.5281/
[32] M. F. Porter, An Algorithm for Suffix Stripping, Mor-        zenodo.6621076 .
     gan Kaufmann Publishers Inc., San Francisco, CA,        [43] F. Barrios, F. López, L. Argerich, R. Wachenchauzer,
     USA, 1997, p. 313–316.                                       Variations of the similarity function of textrank for
[33] M. Grootendorst, Keybert: Minimal keyword ex-                automated summarization, CoRR abs/1602.03606
     traction with bert., 2020. doi:10.5281/zenodo.               (2016). arXiv:1602.03606 .
     4461265 .                                               [44] F. Boudin, pke: an open source python-based
[34] K. Kowsari, D. E. Brown, M. Heidarysafa, K. Ja-              keyphrase extraction toolkit, in: Proceedings of
     fari Meimandi, , M. S. Gerber, L. E. Barnes, Hdltex:         COLING 2016, the 26th International Conference
     Hierarchical deep learning for text classification,          on Computational Linguistics: System Demonstra-
     in: Machine Learning and Applications (ICMLA),               tions, Osaka, Japan, 2016, pp. 69–73.
     2017 16th IEEE International Conference on, IEEE,       [45] M. Neumann, D. King, I. Beltagy, W. Ammar, Scis-
     2017.                                                        paCy: Fast and robust models for biomedical natu-
[35] R. Mihalcea, P. Tarau, TextRank: Bringing order              ral language processing, in: Proceedings of the 18th
     into text, in: Proceedings of the 2004 Conference            BioNLP Workshop and Shared Task, Association
     on Empirical Methods in Natural Language Pro-                for Computational Linguistics, Florence, Italy, 2019,
     cessing, Association for Computational Linguistics,          pp. 319–327. doi:10.18653/v1/W19- 5034 .
     Barcelona, Spain, 2004, pp. 404–411.                    [46] A. Hagberg, P. Swart, D. S Chult, Exploring network
[36] S. Lloyd, Least squares quantization in pcm,                 structure, dynamics, and function using networkx
     IEEE Transactions on Information Theory 28 (1982)            (2008).
     129–137. doi:10.1109/TIT.1982.1056489 .                 [47] D. Xu, Y. Tian, A comprehensive survey of clus-
[37] J. MacQueen, Classification and analysis of multi-           tering algorithms, Annals of Data Science 2 (2015)
     variate observations, in: 5th Berkeley Symp. Math.           165–193. doi:10.1007/s40745- 015- 0040- 1 .
     Statist. Probability, 1967, pp. 281–297.                [48] A. E. Ezugwu, A. M. Ikotun, O. O. Oyelade,
[38] L. Weber, M. Sänger, J. Münchmeyer, M. Habibi,               L. Abualigah, J. O. Agushaka, C. I. Eke, A. A.
     U. Leser, A. Akbik, HunFlair: an easy-to-use                 Akinyelu, A comprehensive survey of clustering
     tool for state-of-the-art biomedical named entity            algorithms: State-of-the-art machine learning ap-
     recognition, Bioinformatics 37 (2021) 2792–2794.             plications, taxonomy, challenges, and future re-
     doi:10.1093/bioinformatics/btab042 .                         search prospects, Engineering Applications of Ar-
[39] J. Son, Y. Shin, Music lyrics summarization method           tificial Intelligence 110 (2022) 104743. doi:https:
     using textrank algorithm, Journal of Korea Multi-            //doi.org/10.1016/j.engappai.2022.104743 .
     media Society 21 (2018) 45–50. doi:https://doi.         [49] J. Pennington, R. Socher, C. Manning, GloVe: Global
     org/10.9717/kmms.2018.21.1.045 .                             vectors for word representation, in: Proceedings of
[40] C. Wu, L. Liao, F. Afedzie Kwofie, F. Zou, Y. Wang,          the 2014 Conference on Empirical Methods in Natu-
     M. Zhang, Textrank keyword extraction method                 ral Language Processing (EMNLP), Association for
     based on multi-feature fusion, in: X.-S. Yang,               Computational Linguistics, Doha, Qatar, 2014, pp.
     S. Sherratt, N. Dey, A. Joshi (Eds.), Proceedings of         1532–1543. doi:10.3115/v1/D14- 1162 .
     Sixth International Congress on Information and         [50] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
     Communication Technology, Springer Singapore,                Pre-training of deep bidirectional transformers for
     Singapore, 2022, pp. 493–501.                                language understanding, in: Proceedings of the
[41] S. Pan, Z. Li, J. Dai, An improved textrank key-             2019 Conference of the North American Chap-
     words extraction algorithm, in: Proceedings of the           ter of the Association for Computational Linguis-
     ACM Turing Celebration Conference - China, ACM               tics: Human Language Technologies, Volume 1
     TURC ’19, Association for Computing Machinery,               (Long and Short Papers), Association for Compu-
     New York, NY, USA, 2019. doi:10.1145/3321408.                tational Linguistics, Minneapolis, Minnesota, 2019,
     3326659 .                                                    pp. 4171–4186. doi:10.18653/v1/N19- 1423 .
[51] N. Reimers, I. Gurevych, Sentence-BERT: Sentence           embeddings for sequence labeling, in: COLING
     embeddings using Siamese BERT-networks, in: Pro-           2018, 27th International Conference on Computa-
     ceedings of the 2019 Conference on Empirical Meth-         tional Linguistics, 2018, pp. 1638–1649.
     ods in Natural Language Processing and the 9th In-    [63] E. F. Tjong Kim Sang, F. De Meulder, Introduc-
     ternational Joint Conference on Natural Language           tion to the CoNLL-2003 shared task: Language-
     Processing (EMNLP-IJCNLP), Association for Com-            independent named entity recognition, in: Pro-
     putational Linguistics, Hong Kong, China, 2019, pp.        ceedings of the Seventh Conference on Natural
     3982–3992. doi:10.18653/v1/D19- 1410 .                     Language Learning at HLT-NAACL 2003, 2003, pp.
[52] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,        142–147.
     B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,   [64] L. Weber, J. Münchmeyer, T. Rocktäschel, M. Habibi,
     R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,            U. Leser, HUNER: improving biomedical NER
     D. Cournapeau, M. Brucher, M. Perrot, E. Duch-             with pretraining, Bioinformatics 36 (2019) 295–302.
     esnay, Scikit-learn: Machine learning in Python,           doi:10.1093/bioinformatics/btz528 .
     Journal of Machine Learning Research 12 (2011)        [65] M. Marrero, J. Urbano, S. Sánchez-Cuadrado,
     2825–2830.                                                 J. Morato, J. M. Gómez-Berbís, Named entity
[53] J. D. Hunter, Matplotlib: A 2d graphics environ-           recognition: Fallacies, challenges and opportu-
     ment, Computing in Science & Engineering 9 (2007)          nities,    Computer Standards & Interfaces 35
     90–95. doi:10.1109/MCSE.2007.55 .                          (2013) 482–489. doi:https://doi.org/10.1016/j.
[54] A. Bougouin, F. Boudin, B. Daille,            Topi-        csi.2012.09.004 .
     cRank: Graph-Based Topic Ranking for Keyphrase
     Extraction,      in: International Joint Confer-
     ence on Natural Language Processing (IJCNLP),
     Nagoya, Japan, 2013, pp. 543–551. URL: https://hal.
                                                           A. List of participants to the
     archives-ouvertes.fr/hal-00917969.                       workshop
[55] F. Boudin, Unsupervised keyphrase extraction
     with multipartite graphs, in: Proceedings of the      We thank our workshop participants for valuable
     2018 Conference of the North American Chapter         feedback, contributions, and suggestions.
     of the Association for Computational Linguistics:
                                                             Susie Xi Rao, ETH Zurich (Organizer)
     Human Language Technologies, Volume 2 (Short            Piriyakorn Piriyatamwong, ETH Zurich (Organizer)
     Papers), Association for Computational Linguis-         Parijat Ghoshal, NZZ AG (Organizer)
     tics, New Orleans, Louisiana, 2018, pp. 667–672.        Vanya Brucker, Wyona AG
     doi:10.18653/v1/N18- 2105 .                             Andrea Bussolan, SUPSI
[56] M. Grootendorst, Bertopic: Neural topic modeling        Mercedes García Martínez, Pangeanic
     with a class-based tf-idf procedure, arXiv preprint     Sandra Mitrović, IDSIA USI-SUPSI
     arXiv:2203.05794 (2022).                                Sara Nasirian, SUPSI
[57] D. Jurafsky, J. H. Martin, Speech and language          Emmanuel de Salis, HE-Arc
     processing (draft), preparation [cited 2020 June 1]     Natasa Sarafijanovic-Djukic, FFHS
                                                             Dietrich Trautmann, Thomson Reuters
     Available from: https://web. stanford. edu/~ juraf-
                                                             Michael Wechner, Wyona AG
     sky/slp3 (2018).
                                                             Peter Egger, ETH Zurich (Principal Investigator)
[58] M. W. Berry, J. Kogan, Text mining: applications        Ce Zhang, ETH Zurich (Principal Investigator)
     and theory, John Wiley & Sons, 2010.
[59] A. Mansouri, L. S. Affendey, A. Mamat, Named en-
     tity recognition approaches, International Journal
     of Computer Science and Network Security 8 (2008)
     339–344.
[60] A. Akbik, T. Bergmann, D. Blythe, K. Rasul,
     S. Schweter, R. Vollgraf, FLAIR: An easy-to-use
     framework for state-of-the-art NLP, in: NAACL
     2019, 2019 Annual Conference of the North Ameri-
     can Chapter of the Association for Computational
     Linguistics (Demonstrations), 2019, pp. 54–59.
[61] M. Neumann, D. King, I. Beltagy, W. Ammar, Scis-
     pacy: Fast and robust models for biomedical natural
     language processing, CoRR abs/1902.07669 (2019).
[62] A. Akbik, D. Blythe, R. Vollgraf, Contextual string