<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural Network? Finding the Right Neural Network Architecture for a Research Problem</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Färber</string-name>
          <email>michael.faerber@kit.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Weber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Heidelberg University, Natural Language Processing Group</institution>
          ,
          <addr-line>Im Neuenheimer Feld 325, 69120 Heidelberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Karlsruhe Institute of Technology (KIT), Web Science Group</institution>
          ,
          <addr-line>Kaiserstr. 89, 76133 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Considering the increasing rate of scientific papers published in recent years, for researchers throughout all disciplines it has become a challenge to keep track of which latest scientific methods are suitable for which applications. In particular, an unmanageable amount of neural network architectures has been published. In this paper, we propose the task of recommending neural network architectures based on textual problem descriptions. We frame the recommendation as a text classification task and develop appropriate text classification models for this task. In experiments based on three data sets, we find that an SVM classifier outperforms a more complex model based on BERT. Overall, we give evidence that neural network architecture recommendation is a nontrivial but gainful research topic.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>machine learning</kwd>
        <kwd>neural network architectures</kwd>
        <kwd>open science</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>A multitude of neural network architectures has been
proposed, with many more to come. The knowledge
ral network architectures. Machine learning researchers
and practitioners, such as data scientists and software
deWhen to use which neural network architecture?2</p>
      <p>
        So far, approaches to neural architecture search and
search engines for research data management have been
proposed. Neural architecture search [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is concerned
with the task of automatically finding the optimal neural
network architecture design for a specific task. However,
neural architecture search approaches usually restrict
themselves to a specific architecture type (e.g., RNN or
CNN) and target finding the optimal architecture, such
Instead, the focus of this paper is on a diferent level of
granularity. The idea is to create a model that finds the
most suitable neural network architecture for a research
problem described in natural language. Furthermore,
neural network search engines and ontologies, such as
FAIRnets [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], difer from us because they allow only
nEvelop-O
LGOBE
0000-0001-5458-8645 (M. Färber)
CEUR
htp:/ceur-ws.org
ISN1613-073
© 2021 Copyright for this paper by its authors. Use permitted under Creative
      </p>
      <p>CEUR</p>
      <p>Workshop Proceedings (CEUR-WS.org)
1See https://www.wikidata.org/.</p>
      <p>
        2See https://datascience.stackexchange.com/questions/20222/
how-to-decide-neural-network-architecture.
keyword queries. Chen et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] found that real
information needs are most often formulated as phrases and not
as keywords. The latter case constitutes only 32% of the
investigated queries. In addition, such search systems
renetwork architectures.
      </p>
      <p>In this paper, we propose the task of neural network
arspecific text classification tasks in the fact that research
problem descriptions as input are largely not available
and first need to be created. To this end, we propose two
methods that extract the problem descriptions from
papers’ abstracts. In addition, the usage of neural network
architectures is highly imbalanced in the literature,
making the recommendation task a nontrivial challenge. We
train and evaluate two state-of-the-art machine
learningbased approaches for neural network architecture
recscriptions and neural network architectures derived from
Wikidata. Our proposed approach can benefit students as
well as researchers of various domains. For researchers
with little expertise in the field of machine learning in
particular, our approach simplifies the process of
selecting a suitable neural network model and presumably
yields a reduction in time spent on preliminary research
on appropriate neural architectures.</p>
      <p>To summarize, we make the following contributions:
1. We create evaluation data sets for neural network
architecture recommendation, consisting of 66
unique architectures and 284,337 textual problem
descriptions.</p>
      <p>2. We train and evaluate several classifiers capable
Kernel Perceptron
Multilayer Perceptron
Restricted Boltzmann Machine
winner-take-all
Hopfield N
Neural Abstraction Pyramid
Shift Invariant NN
Spatial Transformer Network
Neural History Compressor
Kohonen NN
Radial Basis Function N
Connectionist Expert System
Boltzmann machine
Bidirectional Associative Memory
Neural Turing Machine
Self-organizing Map
ResNet</p>
      <p>RNTN
TDNN
Bcpnn
MCDNN
HONN</p>
      <p>Elman N
RecCC</p>
      <p>Jordan N
ADALINE
LSTM
CPPN
CMAC
PCNN
DRPNN
NNPDA
MANN
RecNN</p>
      <p>RoBERTa
Neocognitron
Cresceptron
Modular NN
Deep NN</p>
      <p>Feedforward NN
perceptron</p>
      <p>Highway N
Transformer
AlexNet
Text-CNN
EntNet
Hamming NN
LeNet-5
Stochastic NN
CapsNet
3D-CNN</p>
      <p>GCN
CNN
GRU
SNN
DNC</p>
      <p>PNN
DBN</p>
      <p>GAN
ESN
ELM
VAE
HTM
LSM
RNN
DQN</p>
      <p>of predicting neural network architectures based We only consider English abstracts in which neural
on textual problem descriptions.3 network architecture names are mentioned. After
carefully analyzing the resulting abstracts, an issue related to</p>
      <p>The paper is structured as follows: In Section 2, we the neural network architecture “transformer” is found.
describe the creation of the neural network architecture Because the word “transformer” is polysemic, the bulk of
set, as well as two data sets with scientific problem de- abstracts mentioning transformers are mostly concerned
scriptions. Section 3 discusses the methods to predict with (electrical) engineering. To circumvent this
probthe neural network architectures based on textual de- lem, these abstracts are filtered by a keyword list. 4 After
scriptions. In Section 4, we present our experiments. We this, 284,337 abstracts remain.
conclude in Section 5 with a summary. The abstracts usually include both problem
descriptions and names of associated neural network
architec2. Data tures. To extract these items, we propose the following
methods.</p>
      <sec id="sec-1-1">
        <title>2.1. Neural Network Architectures</title>
        <sec id="sec-1-1-1">
          <title>2.2.1. Extraction by Abstract Splitting</title>
          <p>
            Our approach utilizes the knowledge graph Wikidata to
obtain a list of neural network architectures. The follow- The first approach of creating a data set is based on the
ing aspects are taken into consideration: (1) all subclasses observation of Jiang et al. [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. The main idea is that
of artificial neural networks; (2) the hierarchical struc- abstracts can often be conceptually split into an
introducture of these subclasses; (3) aliases and abbreviations. tion and a solution part. After manually checking 500
Our query returns 67 results, of which 66 (see Table 1) randomly selected papers from four conferences (SIGIR,
are appropriate for the task at hand (the additional item SIGKDD, RecSys, and CIKM), the result indicates that
returned is the “artificial neural network” item itself). 71% of the abstracts adhere to this structure [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
We observe that the key phrases “in this paper” and
2.2. Problem Descriptions “this paper” play an important role in the transition
between the problem statement and solution parts (see
TaOur aim is to recommend neural network architectures ble 2). We therefore check for each sentence in the
abbased on problem descriptions. However, problem de- stracts whether these key phrases occur. If there is a
scriptions are, to the best of our knowledge, not available match, we mark the sentence as the beginning of the
to a large degree. However, we argue that parts of papers’ solution part and all prior sentences as the problem
deabstracts are a good approximation of textual research scription part. Table 2 provides an illustration of our
problems. Thus, we use the paper abstracts and metadata abstract-splitting approach.
from the Microsoft Academic Graph (MAG; [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]).
          </p>
          <p>3All data and source code is available online at https://
github.com/michaelfaerber/NNARec.
4[BERT, GPT-2, GPT-3, natural language, self-attention]</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Example Problem Description: The prediction of fail</title>
          <p>ures in rotating machines is an important issue in
industries to improve safety, to reduce the cost of maintenance
and to prevent accidents.</p>
          <p>Example Solution: In this paper a predictive maintenance
algorithm, based on the analysis of the orbits shape of the
rotor shaft is proposed. It is based on an autonomous image
pattern recognition algorithm, implemented by using a</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>Convolutional Neural Network (CNN).[...]</title>
        </sec>
        <sec id="sec-1-1-4">
          <title>Example Target Label: CNN</title>
          <p>descriptions and the neural network architecture names
are of interest and not the long method descriptions,
we additionally identify the neural network architecture</p>
          <p>To evaluate the efectiveness of this method, we let two names mentioned in METHOD (in the example in Table 3:
experienced researchers classify 500 randomly selected CNN) given our list of neural network architecture names.
splits into the following categories: (1) the split is correct, To minimize redundancy for extractions made in the same
(2) the split is incorrect, but a correct split is possible, abstract, if one string is a substring of the other, the longer
and (3) the abstract cannot be split into an introduction one is chosen and the other one is dismissed.
part and a solution part. The diferences between the A last step to reduce noise is to filter out common
annotators lie mostly in the annotators’ conceptions of phrases in the texts that carry no information (e.g., “solve
where to set a split, rather than whether a split is possible. this problem” given the template “we use METHOD to
Inter annotator agreement can be reported by Cohen’s solve this PROBLEM”). While the quality of the extracted
kappa of 0.7538, which indicates a good agreement for problem descriptions is overall satisfying, from 284,337
this task. Overall, based on our analysis, 88.6% of the abstracts mentioning neural network architectures, only
randomly sampled splits are evaluated as being correct. 35,829 problem descriptions remain based on this method.</p>
          <p>Once the abstracts have been split, only parts of the ab- The resulting data set is designated the Key Phrase
Exstracts with mentions of neural network architectures in traction (KE) data set.
their respective solution part are included in the data set,
with the introduction parts as problem descriptions and
the neural network architectures as the labels. We will 2.3. Neural Network Architecture
refer to the resulting data set as the Abstract Splitting Mentions
(AS) data set.</p>
          <p>Due to the diferences in the data set creation, the
distribution of neural network architectures difers in our
2.2.2. Extraction by Key Phrase Templates AS and KE data sets. To make them comparable, we
The aforementioned method has the drawback that the take two steps. First, to avoid losing all instances of
neural network mentioned in the solution part of an sparse classes, the hierarchical structure of some neural
abstract is directly related to the problem description out- network architectures allows for the inclusion of some
lined in the first part of this abstract. However, problem sparse classes into their parent classes (e.g., GRU is
indescriptions in other parts of the abstract are ignored. tegrated into RNN). We perform this step for all classes
To combat this issue, we create a method of identifying with less than 200 instances, given there is a hierarchy
problem descriptions more precisely. to exploit. Second, because some architectures are rarely</p>
          <p>In a first step, we analyze the abstracts that contain mentioned, only classes with at least 200 instances in
neural network architecture mentions to obtain an un- both data sets are considered. This leads to both data sets
derstanding of recurring phrases in problem descriptions. containing the same classes. From the initial 66 neural
From these phrases, we then create templates to extract network architectures retrieved from Wikidata, only 15,
problem descriptions in all abstracts. Table 3 illustrates which are listed in Figure 1, remain.
an example of a template and a match. Overall, we came
up with 44 templates that are based on regular expres- 2.4. Preparing AGENDA as Test Set
sions.</p>
          <p>
            As we can see in Table 3, this method generally results The Abstract GENeration data set (AGENDA;
Koncelin shorter problem descriptions than the plain abstract Kedziorski et al. [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]) has been used for automatic text
splitting method proposed above. As only the problem generation based on knowledge graphs and consists of
ELM
          </p>
          <p>RBF
Spiking NN</p>
          <p>PNN
GAN</p>
          <p>SOM
Feedforward NN</p>
          <p>Deep Belief N
Autoencoder</p>
          <p>Perceptron</p>
          <p>MLP
LSTM</p>
          <p>RNN
Deep NN</p>
          <p>CNN
0
50
100
150
200
250
300
knowledge graphs paired with paper titles and paper
abstracts from the AI domain. As mentions of tasks and
methods are also labeled in these paper abstracts, we
can use this data set for an additional, complementary
evaluation, particularly as an additional test data set
considering its size.</p>
          <p>It is important to note that the text spans labeled as
problem descriptions in this data set are rather short to
be more compatible with knowledge graph entities. We
therefore increased the context by considering whole
sentences as problem descriptions. The resulting, modified
data set, designated mod-AGENDA, has 1,327 instances,
distributed over 15 classes, as Figure 1 shows.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Methods</title>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <sec id="sec-3-1">
        <title>4.1. Evaluation Settings</title>
        <p>We use a train-test split of 80:20 for the AS and KE data
sets. Each of the methods is trained and tested on
either the AS data set or the KE data set. In addition, the
models trained on the KE and AS data sets are
evaluated on the modified AGENDA data set to evaluate the
generalizability of the approaches.</p>
        <p>
          We consider the following methods: (1) SVM. We use
scikit-learn’s Tfidf Vectorizer for numeric representations
and an SVM implemented via a one-vs-rest classification
scheme. (2) Fine-tuned SciBERT. We use SciBERT [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a
scientific domain-specific, pretrained BERT-model, and
ifne-tune it on the classification task with Adam
optimizer [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. (3) Most frequent class (MFC). We consider
the MFC as a baseline.
        </p>
        <p>The task in this paper falls into the realm of supervised
classification. The overwhelming majority of instances
in each of our our data sets has only a single label. Thus, 4.2. Evaluation Results
in the following evaluation, we consider the task as a
multiclass, single-label classification task. For this paper, Precision, recall, F1-score5 (all macro-averaged), and
acwe consider the following widely used text classification curacy for the MFC baseline, SVM, and fine-tuned
Sciapproaches. BERT are reported in Table 4. The results show that the</p>
        <p>TF-IDF + SVM. One approach is based on SVM, using SVM classifier trained and tested on the KE data set is
TF-IDF for representing the text as vectors. As this can most successful with respect to recall, F1 score, and
aclead to very high dimensional sparse vectors, it makes curacy. It beats the more complex SciBERT classifier by
sense to filter out stopwords for the vector representation. more than 100 % in accuracy (0.5908 vs 0.2576) and
F1</p>
        <p>BERT + Classification Layer. As our second ap- score (0.4629 vs 0.1793). However, we note that accuracy
proach, we use a fine-tuned BERT-model with an ad- is not an excellent metric for unbalanced data sets.
ditional classification layer. Regarding the classifiers trained and tested on the AS
data set, the SVM also beats the SciBERT model with
respect to precision, F1 score, and accuracy, but with less
significance. Here, the accuracy of the SVM is 0.17, and</p>
        <p>
          5The F1-score is calculated as the arithmetic mean over the
individual F1 scores [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
–
KE
KE
–
AS
AS
–
KE
KE
AS
AS
AS
AS
AS
mod-AGENDA
mod-AGENDA
mod-AGENDA
mod-AGENDA
mod-AGENDA
        </p>
        <sec id="sec-3-1-1">
          <title>Method</title>
          <p>MFC
SVM
SciBERT
MFC
SVM
SciBERT
MFC
SVM
SciBERT
SVM
SciBERT</p>
          <p>Precision
(Macro)
the F1-score is 0.1 higher than that of SciBERT.</p>
          <p>SVM and SciBERT trained on the AS and KE data sets
perform superior in most cases compared to the MFC
baseline. Notably, MFC achieves a higher accuracy than
SciBERT on the KE data set.</p>
          <p>When evaluating the approaches on the mod-AGENDA
data, the results drop significantly. Nonetheless, the SVM
classifier still achieves the best results, with only little
diference between the AS and KE data sets as training
data sets. SciBERT still outperforms the MFC baseline.</p>
          <p>The methods trained on the AS data set generalize
better to some degree than the methods trained on the
KE data set, despite the simple creation process of the
AS data set. A likely reason for this phenomenon is that
the AS data set is more similar to the AGENDA data set
than the KE data set. In particular, the research problem
descriptions in the KE data set are much shorter than in
the AS data set.</p>
          <p>Overall, given 0.59 and 0.57 as the best accuracy scores
and 0.46 and 0.44 as the top F1 scores, we come to the
conclusion that neural network recommendation based on
textual task descriptions is a nontrivial task (motivating
our paper), while it indicates that users (e.g., early-career
researchers) might find such recommender systems
helpful.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>This paper introduced the task of recommending
neural network architectures based on textual problem
descriptions. To this end, we created two data sets of
labeled problem descriptions. The first splits abstracts by
means of signaling phrases and labels the problem parts
by matching neural network-architecture names. The
second method uses recurring phrases to extract shorter
and more precise problem descriptions via regular
expressions. We used both data sets to train and evaluate
classifiers. We identified the SVM-based approach as
a promising method, outperforming a BERT-based
approach.</p>
      <p>
        In the future, we will extend our recommender system
to machine learning methods in general and combine
it with the recommendation of other scholarly entities,
such as data sets [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Furthermore, we plan to provide a
running system for neural network architecture
recommendation accompanied with a user study.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Metzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Neural architecture search: A survey</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>20</volume>
          (
          <year>2019</year>
          )
          <volume>55</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>55</lpage>
          :
          <fpage>21</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , T. Weller,
          <string-name>
            <surname>FAIRnets Search - A Prototype Search</surname>
          </string-name>
          <article-title>Service to Find Neural Networks</article-title>
          ,
          <source>in: Proceedings of the International Conference on Semantic Systems, SEMANTiCS'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sure-Vetter</surname>
          </string-name>
          ,
          <article-title>Making Neural Networks FAIR</article-title>
          ,
          <source>in: Proceedings of the Second Iberoamerican Conference and First IndoAmerican Conference</source>
          ,
          <source>KGSWC'20</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Towards More Usable Dataset Search: From Query Characterization to Snippet Generation</article-title>
          ,
          <source>in: Proceedings of 28th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM'19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2445</fpage>
          -
          <lpage>2448</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          , et al.,
          <article-title>An Overview of Microsoft Academic Service (MAS) and Applications</article-title>
          , in
          <source>: Proceedings of the 24th International Conference on World Wide Web Companion, WWW'15</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>243</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Recommending Academic Papers via Users' Reading Purposes, in
          <source>: Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys'12</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Caponetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rizzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Russotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xibilia</surname>
          </string-name>
          ,
          <article-title>Deep Learning Algorithm for Predictive Maintenance of Rotating Machines Through the Analysis of the Orbits Shape of the Rotor Shaft</article-title>
          ,
          <source>in: Proceedings of the 1st International Conference on Smart Innovation, Ergonomics and Applied Human Factors, SEAHF'19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>245</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Koncel-Kedziorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bekal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Text Generation from Knowledge Graphs with Graph Transformers, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          ,
          <source>NAACLHLT'19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2284</fpage>
          -
          <lpage>2293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Cohan,
          <article-title>SciBERT: A Pretrained Language Model for Scientific Text</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP'19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3613</fpage>
          -
          <lpage>3618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          ,
          <source>in: Proceedings of the 3rd International Conference on Learning Representations, ICLR'15</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Opitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Burst</surname>
          </string-name>
          ,
          <article-title>Macro F1 and macro F1</article-title>
          , CoRR abs/
          <year>1911</year>
          .03347 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Leisinger</surname>
          </string-name>
          ,
          <article-title>Recommending Datasets for Scientific Problem Descriptions</article-title>
          ,
          <source>in: Proceedings of the 30th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM'21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>3014</fpage>
          -
          <lpage>3018</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>