<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Neural Citation Recommendation: A Reproducibility Study</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Michael</forename><surname>Färber</surname></persName>
							<email>michael.faerber@kit.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Karlsruhe Institute of Technology (KIT)</orgName>
								<address>
									<settlement>Karlsruhe</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Timo</forename><surname>Klein</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Karlsruhe Institute of Technology (KIT)</orgName>
								<address>
									<settlement>Karlsruhe</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Joan</forename><surname>Sigloch</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Karlsruhe Institute of Technology (KIT)</orgName>
								<address>
									<settlement>Karlsruhe</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Neural Citation Recommendation: A Reproducibility Study</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">4266312D24C672FC0CC75E60E49185EA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T21:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>recommender systems</term>
					<term>bibliometrics</term>
					<term>citation context</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Context-aware citation recommendation is used to overcome the process of manually searching for relevant citations by automatically recommending suitable papers as citations for a specified input text. In this paper, we examine the reproducibility of a state-of-the-art approach to context-aware citation recommendation, namely the neural citation network (NCN) by Ebesu and Fang <ref type="bibr" target="#b0">[1]</ref>. We re-implement the network and run evaluations on both RefSeer, the originally used data set, and arXiv CS, as an additional data set. We provide insights on how the different hyperparameters of the neural network affect the model performance of the NCN and thus can be used to improve the model's performance. In this way, we contribute to making citation recommendation approaches and their evaluations more transparent and creating more effective neural network-based models in the future.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>order to generate recommendations, the text surrounding the placeholder, often referred to as the citation context, is used as an input into the recommender system. The output consists of a ranked list containing candidates for the query placeholder.</p><p>In recent years, several approaches to global and local citation recommendation have been proposed <ref type="bibr" target="#b1">[2]</ref>. In this paper, we analyze the reproducibility of one specific local citation recommendation approach, namely the neural citation network (NCN) by Ebesu and Fang <ref type="bibr" target="#b0">[1]</ref>. We chose this approach due to its currentness, its promising results on a large data set, and its wide acceptance in the scientific community (based on citation counts). Note that we were unable to run the source code published online by <ref type="bibr">Ebesu and</ref> Fang. Furthermore, the Python version used by Ebesu and Fang is outdated. Thus, after re-implementing the network, we used both the original data set and another data set for training and evaluating the NCN in order to examine its performance under varying circumstances.</p><p>Overall, we make the following contributions in this paper: 1. We re-implement the NCN by Ebesu and Fang <ref type="bibr" target="#b0">[1]</ref>, a state-of-the-art approach to local citation recommendation. 2. We run extensive experiments based on the NCN using the original data set</p><p>RefSeer and arXiv CS as a further data set. 3. We analyze the evaluation results and give noteworthy conclusions for the future development of local citation recommendation approaches. The rest of this paper is structured as follows: We give an overview of the NCN architecture in Sec. 2. In Sec. 3, we present our experimental setup and the evaluation results. We conclude in Sec. 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The Neural Citation Network</head><p>The NCN proposed by Ebesu et al. <ref type="bibr" target="#b0">[1]</ref> consists of an encoder-decoder model coupled with an attention mechanism (see Fig. <ref type="figure">1</ref>).</p><p>Encoders. Encoders are deployed as part of the NCN in order to turn the raw citation context and the citing/cited authors' names into feature tensors holding important information about the context and the authors, respectively.</p><p>1. Context encoder. The part of the NCN that is responsible for encoding the citation context is a time-delay neural network (TDNN) introduced by Collobert et al. <ref type="bibr" target="#b3">[4]</ref>. It allows multiple forward propagations through the network at once, leading to all feature maps being calculated in parallel. The TDNN used by Ebesu and Fang consists of a convolutional layer followed by both a pooling layer and a fully connected layer. 2. Author encoder. In order to include author information when generating citation recommendations, the NCN comprises an author encoder, which uses the same architecture as the context encoder (outlined above). It is separately applied to (1) the embeddings of the authors' names A q of the document from which the query context originated as well as (2) the embeddings of the authors' names A d of all documents in the database. The </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Citing author encoder (TDNN)</head><p>A q softmax embedding of citing author</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cited author encoder (TDNN)</head><p>A d embedding of cited author X q citation context q Fig. <ref type="figure">1</ref>: Architecture of the neural citation network (NCN).</p><p>author encoder is applied multiple times using TDNNs with varying region filter sizes in the convolutional layer. The final representation which results from applying the context encoder and author encoders is denoted as</p><formula xml:id="formula_0">s = [f (X q ) ⊕ f (A q ) ⊕ f (A d )],</formula><p>with a given citation context representation X q . Decoder. The NCN's decoder is a recurrent neural network (RNN) that makes use of the gated recurrent unit (GRU) <ref type="bibr" target="#b4">[5]</ref> as a gating mechanism as well as the attention mechanism <ref type="bibr" target="#b5">[6]</ref>. It is applied to the title of every document that can be used as a citation for the query citation context. <ref type="foot" target="#foot_0">2</ref> The purpose of the decoder is to generate scores for every document in the database indicating its suitability as a citation for the given query context. The scores can ultimately be used to generate citation recommendations for the query context.</p><p>Attention mechanism. The NCN makes use of the attention mechanism originally introduced by Bahdanau et al. <ref type="bibr" target="#b5">[6]</ref>. With the help of attention, the encodings s j that originate from the context and author encoders are given weights dependent on the decoder output h i−1 for the word prior to i. The result is a context vector c i which is made up of a weighted sum of the encoder outputs s j in accordance to their relevance. Attention is used to put emphasis on encodings that are particularly important for the current time step. The attention mechanism is implemented as a feed-forward neural network that concludes with a softmax layer converting attention vectors a ij into attention scores α ij . These indicate the importance of the encoder output s j for the i th word in the title of the document currently being decoded. To illustrate, in Fig. <ref type="figure">2</ref>: Illustrative example of attention weights. Fig. <ref type="figure">2</ref>, we visualize the matrix α ij for the target sentence "Imagenet classification with deep convolutional neural networks" by the well-known authors "Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton" after the sequence was tokenized and preprocessed. The context for this example was set to "Neural networks are really cool, especially if they are convolutional" with the authors "Chuck Norris, Bruce Lee." This toy example visualizes how the decoder sensibly puts little emphasis on the citing authors as compared to the context and cited authors. The context vector c i is determined for every word i in the document title.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data Sets</head><p>We used two data sets in our evaluation.</p><p>1. RefSeer. Following Ebesu and Fang <ref type="bibr" target="#b0">[1]</ref>, we used RefSeer <ref type="bibr" target="#b6">[7]</ref> as our first data set. Although we followed Ebesu and Fang's instructions on creating their evaluation data set, we were unable to generate the exact same data set based on the original RefSeer data as we were unable to find any information about citing authors within the data set, only cited authors. For comparison, we decided to randomly select 4.5 M out of the generated 14.9 M citation Fig. <ref type="figure">3</ref>: Distribution of citation context and citation title lengths in the preprocessed arXiv CS data set. contexts in order to end up with the same data set size as the one used by Ebesu and Fang. Note that the data set we reused did not contain author information of the citing papers. We thus expected poorer performance than that of the model published by Ebesu and Fang. 2. arXiv CS. We used the arXiv.org publications in the computer science domain as our second data set, as proposed by Färber et al. <ref type="bibr" target="#b7">[8]</ref> for citation-based tasks. We cut off the citation contexts and citation titles at lengths of 100 and 30 words, respectively, to achieve a trade-off between model performance and training time (see Fig. <ref type="figure">3</ref>). Overall, we used 502,353 pairs of citations and citation contexts. We chose this data set in order to obtain insights into how well our models perform under different circumstances than the ones presented by Ebesu and Fang. Thus, our paper is not only a replicability paper (with a focus on repeating prior experiments to see when the methods work) but also a reproducibility paper (repeating experiments in new contexts). For model training and evaluation, we split the data sets into 80% training, 10% validation, and 10% test data sets and set a seed to ensure reproducibility.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Model Re-Implementation</head><p>We rebuilt the NCN from scratch. Our final code is available on GitHub. <ref type="foot" target="#foot_2">3</ref>We used PyTorch to reimplement the network, which was originally coded in TensorFlow version r0.11. We used the torchtext package to convert the data set into a suitable format for PyTorch and to facilitate the preprocessing Table <ref type="table">1</ref>: Results of our replicability and reproducibility studies on the neural citation network (NCN) <ref type="bibr" target="#b0">[1]</ref> using the recall@10 metric. We show the total number of trainable parameters ("# Param.") as an indicator of model complexity. steps. Furthermore, we used the SpaCy library in combination with torchtext to tokenize the data set. After lemmatizing the data and removing stopwords using the combined SpaCy and nltk stopword corpora, we numericalized the data set using a vocabulary size of 20,000 tokens for citation contexts, citation titles, and authors. To facilitate propagating batches through the network, we made use of the bucketing technique that Ebesu and Fang used as well. Like Ebesu and Fang, we further use the BM25 ranking function in the decoder part of the network to preselect citation titles for a given citation context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Evaluation Results</head><p>Citation recommendation approaches are difficult to evaluate, as the citation provided by the original authors cannot be seen as the unequivocal ground truth. Therefore, we did not consider ranking metrics but solely recall@k as our evaluation metric. Table <ref type="table">1</ref> shows the evaluation results.</p><p>RefSeer. We were unable to run our code on exactly the same data set as Ebesu and Fang did, and our model for RefSeer does not include citing authors' information (see Sec. 3.1), leading to a slightly different number of parameters. Presumably due to the missing citing author information, our results are worse than the ones reported by Ebesu and Fang (namely, recall@10 of 0.0929 instead of around 0.29). Overall, all of the recall@10 values were in a similar range. However, using other setups than the one proposed by Ebesu and Fang seems promising.</p><p>arXiv CS. We evaluated our trained models on the first 20,000 of the 50,235 test examples, which significantly reduced the evaluation running time and allowed us to perform detailed ablation studies.</p><p>By applying the hyperparameters used by Ebesu and Fang, our reimplemented NCN yielded a recall@10 of 0.1637, as compared to 0.29 in the original paper. Thus, we were unable to replicate the performance of the original model. We hypothesize that this is a result of our significantly smaller dataset, which comprised only 9.44% of the original paper's training examples (401,882 examples compared to 4,258,383 in the original paper). In order to tune performance, we used differing hyperparameter settings and evaluated our model after every modification. Our changes included the use of different vocabulary sizes when preprocessing the data set as well as varying batch and embedding sizes when propagating data through the network. We also altered the number of filters in the convolutional layer of the TDNN encoder and the number of GRU layers in the RNN decoder. Table <ref type="table">1</ref> shows that the best configuration achieved a 9.77% improvement compared to Ebesu and Fang's hyperparameter values (recall@10 of 0.1797 vs. 0.1637). While the NCN's performance increases with larger capacity in general, this effect only persists up to a certain size. <ref type="foot" target="#foot_3">4</ref> In particular, enlarging the embedding size past 128 dimensions and increasing the vocabulary to more than 20,000 tokens did not guarantee an improved recall@10 value. We suspect this to be the result of our small data set, as compared to the model's increased capacity <ref type="bibr" target="#b8">[9]</ref>.</p><p>In addition to experimenting with various architectural changes, we also tried different batch sizes. Masters et al. <ref type="bibr" target="#b9">[10]</ref> showed that training with smaller mini-batches can lead to improved test performance. However, we were unable to replicate these results for our best configuration. For the larger NCN models, a decreased batch size instead led to inferior test performance. On the other hand, our enhanced filter region sizes for the TDNN context encoder consistently boosted the model's performance. At the same time, this modification is computationally cheap, in terms of both additional parameters and wall time, as the TDNN encoders run in parallel.</p><p>We observed during the evaluation runs that models with a lower validation loss generally achieved a better recall@10 value (given equal batch sizes). While this intuitively makes sense, as we use the loss function to re-rank the top titles, we can also find counterexamples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Discussion</head><p>We believe that there is still room to improve the NCN, in terms of the model's hyperparameters and architecture. Our research shows that changing the filter lengths in the convolutional layer of the network's encoder leads to consistently better results. Further investigation into their effects on model improvement may thus be rewarding. The original architecture only used Dropout <ref type="bibr" target="#b10">[11]</ref> to regularize the network. It may be worthwhile to investigate other regularization techniques such as batch normalization <ref type="bibr" target="#b11">[12]</ref> for convolutional layers or layer normalization <ref type="bibr" target="#b12">[13]</ref> for recurrent layers.</p><p>We conclude that the NCN leads to reasonable results even when applied to a smaller data set, like the arXiv CS subset used in our paper. We believe a major reason for not being able to achieve similar performance results on another data set (arXiv CS) was the significantly smaller size of training examples <ref type="bibr" target="#b8">[9]</ref>. Thus, for the future, it might be more important to use large data sets than to further tune model hyperparameters in order to obtain better recall@10 scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>For this paper, we re-implemented the neural citation network <ref type="bibr" target="#b0">[1]</ref> for citation recommendation and ran evaluations on both RefSeer, the originally used data set, and arXiv CS, as the second evaluation data set. We were unable to achieve the same model performance as Ebesu and Fang did. However, we provided insights on how the different hyperparameters can affect the NCN's model performance and how these insights can be used to further improve the model. In this way, we exemplified how to make citation recommendation approaches and their evaluations more transparent facilitating the creation of more effective models in the future.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="4,177.99,115.83,259.37,255.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,152.06,115.84,311.24,181.56" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">For very large databases, a pre-selection algorithm may make sense to save computing time. See Section</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">.2 for further information.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">See https://github.com/X3N4/neural_citation.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">We use the term "model size" to refer to the embedding dimension, number of convolutional filters, and the GRU dimension. These parameters are set to the same value in most configurations.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Neural Citation Network for Context-Aware Citation Recommendation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Ebesu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR&apos;</title>
				<meeting>the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR&apos;</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="1093" to="1096" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Citation Recommendation: Approaches and Datasets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Färber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jatowt</surname></persName>
		</author>
		<idno>CoRR abs/2002.06961</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Context-aware Citation Recommendation</title>
		<author>
			<persName><forename type="first">Q</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Giles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th International Conference on World Wide Web. WWW &apos;</title>
				<meeting>the 19th International Conference on World Wide Web. WWW &apos;</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="421" to="430" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on Machine Learning. ICML&apos;</title>
				<meeting>the 25th International Conference on Machine Learning. ICML&apos;</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">08</biblScope>
			<biblScope unit="page" from="160" to="167" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merrienboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">¸</forename><surname>Gülçehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. EMNLP&apos;</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing. EMNLP&apos;</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1724" to="1734" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Neural Machine Translation by Jointly Learning to Align and Translate</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Conference on Learning Representations. ICLR&apos;</title>
				<meeting>the 3rd International Conference on Learning Representations. ICLR&apos;</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">15</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">RefSeer: A citation recommendation system</title>
		<author>
			<persName><forename type="first">W</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Giles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries. JCDL&apos;</title>
				<meeting>the 2014 IEEE/ACM Joint Conference on Digital Libraries. JCDL&apos;</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="371" to="374" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A High-Quality Gold Standard for Citation-based Tasks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Färber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Thiemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jatowt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Language Resources and Evaluation. LREC&apos;18</title>
				<meeting>the Eleventh International Conference on Language Resources and Evaluation. LREC&apos;18</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Revisiting Unreasonable Effectiveness of Data in Deep Learning Era</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shrivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 IEEE International Conference on Computer Vision. ICCV&apos;</title>
				<meeting>the 2017 IEEE International Conference on Computer Vision. ICCV&apos;</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="843" to="852" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Revisiting Small Batch Training for Deep Neural Networks</title>
		<author>
			<persName><forename type="first">D</forename><surname>Masters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Luschi</surname></persName>
		</author>
		<idno>CoRR abs/1804.07612</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Dropout: A Simple Way to Prevent Neural Networks from Overfitting</title>
		<author>
			<persName><forename type="first">N</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1929" to="1958" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 32nd International Conference on Machine Learning. ICML&apos;</title>
				<meeting>the 32nd International Conference on Machine Learning. ICML&apos;</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="448" to="456" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Layer Normalization</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Ba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Kiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
		<idno>CoRR abs/1607.06450</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note>BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
