<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Representation Learning for Biomedical Concept Detection in Medical Images: UA.PT Bioinformatics in ImageCLEF 2017</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Eduardo</forename><surname>Pinho</surname></persName>
							<email>eduardopinho@ua.pt</email>
							<affiliation key="aff0">
								<orgName type="department">DETI -Institute of Electronics and Informatics Engineering</orgName>
								<orgName type="institution">Aveiro University of Aveiro</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">João</forename><surname>Figueira</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">DETI -Institute of Electronics and Informatics Engineering</orgName>
								<orgName type="institution">Aveiro University of Aveiro</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jorge</forename><forename type="middle">Miguel</forename><surname>Silva</surname></persName>
							<email>joaofsilva@ua.pt</email>
							<affiliation key="aff0">
								<orgName type="department">DETI -Institute of Electronics and Informatics Engineering</orgName>
								<orgName type="institution">Aveiro University of Aveiro</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlos</forename><surname>Costa</surname></persName>
							<email>carlos.costa@ua.pt</email>
							<affiliation key="aff0">
								<orgName type="department">DETI -Institute of Electronics and Informatics Engineering</orgName>
								<orgName type="institution">Aveiro University of Aveiro</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Representation Learning for Biomedical Concept Detection in Medical Images: UA.PT Bioinformatics in ImageCLEF 2017</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DCD775F5496631EC73FE8E66F0304106</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>ImageCLEF</term>
					<term>Representation Learning</term>
					<term>Deep Learning</term>
					<term>Multimedia Retrieval</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Representation learning is a field that has rapidly evolved during the last decade, with much of this progress being driven by the latest breakthroughs in deep learning. Digital medical imaging is a particularly interesting application since representation learning may enable better medical decision support systems. ImageCLEFcaption focuses on automatic information extraction from biomedical images. This paper describes two representation learning approaches for the concept detection sub-task. The first approach consists of k-means clustering to create bags of words from SIFT descriptors. The second approach is based on a custom deep denoising convolutional autoencoder. A set of perceptron classifiers were trained and evaluated for each representation type. Test results showed a mean F1 score of 0.0488 and 0.0414 for the best run using bags of words and the autoencoder, respectively.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Representation learning has been a rapidly evolving field during the last decade <ref type="bibr" target="#b1">[2]</ref>. The discovery of more powerful representation learning techniques opens up tremendous prospect for semi-supervised and unsupervised decision systems, and further unlocks the potential of content-based image retrieval (CBIR). A significant part of this progress comes as a consequence of the latest breakthroughs in deep learning. This extensive use of deep learning is no exception in health informatics, including medical imaging, where a vast range of use-cases have been tackled, and multiple solutions have relied on deep learning for such purposes <ref type="bibr" target="#b14">[15]</ref>. Due to the inherent nature of medical imaging datasets, which are scarce and both frequently class-imbalanced and non-annotated, the rapid developments in deep learning and representation learning pose particular interest for the medical field, since such developments may enable better concept representation of digital medical imaging.</p><p>Representation learning, sometimes called feature learning, is often defined as learning a function that transforms the available data samples into a representation that makes other machine learning tasks easier to approach. Feature extraction is the related concept of obtaining these representations. This can be achieved using a wide range of approaches, such as k-means clustering, sparse coding <ref type="bibr" target="#b11">[12]</ref> and Restricted Boltzmann Machines (RBMs) <ref type="bibr" target="#b7">[8]</ref>. More recently, with the shift of interest leading to a focus on deep learning techniques, approaches based on autoencoders <ref type="bibr" target="#b15">[16]</ref> have also been developed.</p><p>The long-running ImageCLEF initiative has introduced the caption challenge for 2017 <ref type="bibr" target="#b9">[10]</ref>, aiming for the automatic information extraction from biomedical images. This challenge is divided into two sub-tasks: concept detection and caption prediction. The concept detection sub-task <ref type="bibr" target="#b4">[5]</ref> is the first part of the caption prediction task, in which the goal is to automatically recognize certain concepts from the UMLS vocabulary <ref type="bibr" target="#b2">[3]</ref> in biomedical images. The obtained list of concepts is then used in the caption prediction sub-task, where a small human-readable description of the image must be generated.</p><p>For this challenge, we have hypothesized that a sufficiently powerful representation of images would enable a medical imaging archive to automatically detect biomedical concepts with some level of certainty and efficiency, thus improving the system's information retrieval capabilities over non-annotated data. This paper presents our solution proposal for the concept detection sub-task, and describes our methods of image feature extraction for the purpose of biomedical concept detection, followed by their evaluation under the ImageCLEF 2017 challenge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methods</head><p>This task was accompanied with three data sets containing various images from biomedical journals: the training set (164614 images), the validation set (10000 images) and the testing set (10000 images). Only the first two included the list of concepts applicable to each image, whereas the testing set's annotations were hidden from the participants.</p><p>In order to evaluate the imposed hypothesis of a middle-level representation for medical images, we have addressed the concept detection task in two phases. First, two feature extraction methods for image representation were chosen and built:</p><p>-In Section 2.1, as a classical approach, bags of visual words were used as image descriptors, obtained from the clustering of visual keypoints; -In Section 2.2, a deep convolutional sparse autoencoder was trained and features were extracted from its bottleneck vector.</p><p>Secondly, the two representations were validated by training fast classifiers over the new representations, in Section 2.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Bags of Visual Words</head><p>Without resizing or preprocessing the images, Scale Invariant Feature Transform (SIFT) keypoint descriptors <ref type="bibr" target="#b12">[13]</ref> were extracted from all three datasets. An OpenCV <ref type="bibr" target="#b3">[4]</ref> implementation was used for SIFT keypoint extraction and descriptor computation. Each image could yield a variable number of descriptors of size 128. In cases where the SIFT algorithm did not retrieve any keypoints, the algorithm's parameters were adjusted to loosen edge detection criteria.</p><p>From the training set, 500 files were randomly chosen and their respective keypoints collected to serve as template keypoints. A visual vocabulary (codebook) of size k = 1000 was then obtained by performing k-means clustering on all template keypoints and retrieving the centroids of each cluster, yielding an ordered list of 1000 vectors of fixed size V = {V i }.</p><p>Once a visual vocabulary was available, we constructed an image's bag of visual words (BoW) by determining the closest visual vocabulary point and incrementing the corresponding position in the BoW for each image keypoint descriptor. In other words, for an image's BoW B = {o i }, for each image keypoint descriptor d j , o i is incremented when the smallest Euclidean distance from d j to all other visual vocabulary points in V is the distance to V i . Finally, each BoW was normalized so that the sum of all elements in a BoW equals 1. We can picture the bag of visual words as a histogram of visual descriptor occurrences, which can be used for representing visual content in CBIR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Sparse Autoencoder</head><p>A deep convolutional neural network was designed for the unsupervised extraction of visual features from biomedical images (Figure <ref type="figure" target="#fig_0">1</ref>). It is a quasisymmetrical autoencoder with a series of encoding and decoding blocks (respectively named henceforth ci and di for i ∈ {1, 2, 3}) with shared weights. A sparse latent code representation of dimensionality 10000 (ten thousand) lies in the middle. As a denoising autoencoder, its goal is to learn the pair of functions (E, D) so that x = D(E(x)) is closest to the original input x, where x is a slightly corrupted version of x. The aim of making E a function of x is to force the process to be more stable and robust, thus leading to representations of higher quality <ref type="bibr" target="#b15">[16]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">Encoder / Decoder Specification</head><p>Each encoder block ci is composed of two sequences of 2D convolution components, where each convolution is followed by batch normalization <ref type="bibr" target="#b8">[9]</ref> and Rectified Linear Unit (ReLU) activations. x' = D(z) Then, a 2D max-pooling layer with a 2x2 kernel is added. The first convolutional layer of c1 relies on a kernel (filter) of size 7x7, whereas the remaining convolutions have a kernel of size 3x3. The exact numbers of kernels in each convolutional layer are shown in Figure <ref type="figure" target="#fig_0">1</ref>, starting with 64 filters and duplicating upon each new encoder block. The layers are also described with greater detail in Table <ref type="table">1</ref>. Instead of max-pooling in the third block, a 1x1 kernel convolution is performed, followed by global average pooling and a ReLU activation, yielding the code tensor z. The 1x1 convolution followed by global average pooling behaves similarly to a fully connected network, with the advantage of making the network invariant to input dimensions.</p><formula xml:id="formula_0">x 128x128x3 z = E(x)</formula><note type="other">128x128x3</note><p>Tbl. 1: A tabular representation of the encoder layers' specifications. The Details column may include the normalization and activation layers that follow a layer (where BN stands for batch normalization and ReLU(x) = max(0, x)). The decoding blocks replicate the encoding process in inverse order (Table <ref type="table">2</ref>). It starts with an upsampling of the code to the same three-dimensional feature shape as the encoder's final convolutional layer. Convolutions in these blocks are transposed (also called fractionally-strided convolution in literature, and deconvolution in a few other papers). Furthermore, the unpooling layer is a guided unpooling operation, in which the exact position of the encoder's max-pool activation is replicated in the same position in the corresponding decoding phase. This is achieved by passing the switch indices from the encoder's max-pooling layers, as in <ref type="bibr" target="#b13">[14]</ref>.</p><p>Tbl. 2: A tabular representation of the decoder layers' specifications. The Details column may include the normalization and activation layers that follow a layer, as well as information about weight sharing. The autoencoder was designed with this symmetry to enable the encoder and decoder pair to share the weights of the kernels. These weights were initialized as in <ref type="bibr" target="#b6">[7]</ref>, and were shared in a way that matrix W i is used by the ith convolution layer in the encoder (counting from the left) and by the same transpose convolution number when counted from the right. The only exception lies in the convolutional layer pair closest to the latent code, which do not share any parameters. The layers' bias parameter and batch normalization coefficients are also not shared among the two parts of the network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">Preprocessing and Augmentation</head><p>Training samples were obtained through the following process: images were resized so that its shorter edge size was 160 pixels. Afterwards, the sample was augmented using random square 128 pixel-wide random crops (of 9 possible kinds of crops: 4 corners, 4 edges and center). Validation and test images were simply resized to fit the 128x128 dimensions. For all cases, images' pixel RGB values were normalized with the formula n(v) = v/128.0 − 1, thus sitting in the range [-1, 1]. In the training phase, a Gaussian noise of standard deviation 0.05 was applied over the input, yielding the noisy sample x.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Network Training Details</head><p>The network was trained through stochastic gradient descent, by minimizing the mean squared error between the input x and the output x , using the Adam optimizer <ref type="bibr" target="#b10">[11]</ref>. A sparse representation was achieved with two mechanisms: first, since the final encoding activation is ReLU, negative outputs from the previous layer are zeroed. Second, an absolute value penalization was applied to z, thus adding the extra minimization goal of keeping the code sum small. The final decoder loss function was therefore:</p><formula xml:id="formula_1">L(E, D) = 1 2r r i=0 (x i − x i ) 2 + Ω(z)</formula><p>where</p><formula xml:id="formula_2">Ω(z) = s × max 0, z zi z i − t</formula><p>is the sparsity penalty function, r = 128×128 is the number of pixels in the input images, and x represents the original input without synthesized noise. t and s are, respectively, the penalization threshold and the sparsity coefficient hyperparameters, which we left defined as t = 1 and s = 0.0001. At the final training iteration, 73% of extracted features from the training set were zeros on average. Sparser representations are possible by adjusting these hyperparameters.</p><p>The model was trained over 20600 steps, which is approximately 8 epochs, with a mini-batch size of 64. The autoencoder's loss evolved according to Figure <ref type="figure">2</ref>. The base learning rate was 0.005, but the first 50 steps used a "warm-up" learning rate of 0.001 to prevent the initial loss values from producing extreme activations, which could make the network harder to converge (a similar procedure was done in <ref type="bibr" target="#b5">[6]</ref>). The learning rate was multiplied by 0.2 every 5140 steps (± two epochs), to facilitate convergence.</p><p>TensorFlow <ref type="bibr" target="#b0">[1]</ref> with GPU support was used to train the neural network and retrieve the latent codes of each image in the three datasets. A custom version between 1.1.0 and 1.2.0 was used (specifically, the commit hash 163b1c078d in the official GitHub repository<ref type="foot" target="#foot_0">1</ref> ) in order to have early access to the gradient implementation of the max-pooling layer with arg-max propagation. TensorBoard <ref type="bibr" target="#b0">[1]</ref> was used during the development for monitoring and visualization. Training took approximately 22 hours to complete on one of the GPUs of an NVIDIA Tesla K80 graphics card in an Ubuntu server machine.</p><p>Fig. <ref type="figure">2</ref>: Autoencoder loss evolution over the steps of the training process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Concept Classification Perceptron</head><p>For each representation learned, a simple classification model was applied to the resulting features. Aiming for low complexity and classification speed, the three sets of classifiers were perceptrons trained using stochastic gradient descent over the training set.</p><p>In the first run submitted (#1 ), the learning rate was configured to slowly decay during the training process (µ(t) = 1 αt+1 ) where an L 2 regularization term with coefficient α = 0.005 was included. The last two runs submitted (#2 and #3 ) had slight changes in hyperparameters, which appeared to yield better outcomes when considering a small sample of biomedical concepts. A constant learning rate of 1.0 was used instead, and classes were balanced so that the weight of each class was the inverse of its frequency in the set, thus aiming to penalize the presence of false negatives through higher losses. While run #2 used the SIFT bags of words, run #3 used the latent features from the autoencoder.</p><p>Due to computational time constraints, the number of gradient descent epochs was relatively limited: 50 iterations for the SIFT bags of words and 20 for the autoencoder features. Biomedical concepts' identifiers were sorted by their total number of occurrences in the training set. With this list, the 1000 most frequent concepts in the training set were retrieved and a perceptron model was trained for each. With more time and computational resources, this approach can be linearly scaled to cover all labelled concepts. Once trained, the models' performance was evaluated with the validation set, in terms of precision, recall, and mean F1 score. The same model was then used to predict the concept list of each image in the testing set. Other than this final prediction phase, neither the feature extractors nor the trained classifiers have ever been fed with samples from the testing set. Moreover, no other sources of data were used in the full process.</p><p>Table <ref type="table">3</ref> shows metrics obtained on the validation and test sets. The validation F1 metric, which was obtained from evaluating the model against the validation set, only assumes the existence of the 1000 most frequent concepts in the training set. Nonetheless, these metrics were deemed acceptable for a quantitative comparison among local runs, and have indeed defined the same ranking order as the final test metrics'.</p><p>Tbl. 3: Results obtained from the three submissions to the ImageCLEF 2017 concept detection task. Unlike our previous expectations, the hyperparameter changes applied in runs #2 and #3 appeared to slightly cripple the model's performance, regardless of the extracted features. Applied modifications may also benefit from additional classifier training steps for a better convergence.</p><p>The autoencoder designed for this challenge exhibited the worst performance among submitted runs, while demanding more computational resources for training and feature extraction. However, given its known and recently well studied milestones in image analysis, this does not invalidate the use of deep convolutional neural networks in general. Rather, it suggests that some difficulty in model training for this domain was experienced, and proper concept detection learning would likely require further tweaking of the network's hyperparameters and more iterations.</p><p>Even after reducing the problem to a feature vector instead of the original input, the number of annotated concepts was too large (over 20 thousand) to train a good classifier for each concept. Higher numbers of annotated concepts imply higher computational costs, and available compute power was limited. In order to deal with this tradeoff, we decided to follow an approach of tackling the most frequent concepts, thus mitigating computational costs, but with the drawback of not providing any results for the remaining concepts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>This paper describes our proposal to solve the concept detection sub-task of the ImageCLEFcaption task, resting on the hypothesis that a sufficiently powerful representation of images can enable a medical imaging archive to automatically detect biomedical concepts with some level of certainty and efficiency. Results are presented for the three submitted runs, with the first two being based on a BoWs approach, whereas the third one is based on a deep convolutional sparse autoencoder.</p><p>First and foremost, it is important to mention that extreme variability was experienced in this challenge. Regarding obtained test results, a mean F1 score of 0.0488 and 0.0414 was obtained for the best run using BoW and for the autoencoder, respectively.</p><p>Attaining better feature representations is a major step that can have significant impact in the end results. Due to the major breakthroughs already originated by deep learning techniques, we believe that further research in representation learning should be carried out, as it will allow us to determine improved ways of training neural networks for this purpose, and evaluate a wider variety of feature learning solutions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Schematic representation of the autoencoder. The c# and d# blocks contain convolutional and pooling/unpooling layers in the indicated order. The dashed arrows represent the transfer of pooling indices to guide the unpooling layers.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/tensorflow/tensorflow</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Acknowledgements</head><p>This work is financed by the ERDF -European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation -COMPETE 2020 Programme, and by National Funds through the FCT -Fundação para a Ciência e a Tecnologia. Eduardo Pinho is funded by the FCT under the grant PD/BD/105806/2014. João Figueira Silva is funded by the Research grant Project PTDC/EEI-ESS/68 15/2014 and Jorge Miguel Silva is funded by the Research grant Project CMUP-ERI/ICT/0028/2014-SCREEN-DR.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems</title>
		<author>
			<persName><forename type="first">Martín</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ashish</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Barham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eugene</forename><surname>Brevdo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhifeng</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Craig</forename><surname>Citro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andy</forename><surname>Davis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthieu</forename><surname>Devin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanjay</forename><surname>Ghemawat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Harp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geoffrey</forename><surname>Irving</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Isard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yangqing</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rafal</forename><surname>Jozefowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lukasz</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manjunath</forename><surname>Kudlur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Josh</forename><surname>Levenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Mane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rajat</forename><surname>Monga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sherry</forename><surname>Moore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Derek</forename><surname>Murray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chris</forename><surname>Olah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mike</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathon</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benoit</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kunal</forename><surname>Talwar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Tucker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vincent</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vijay</forename><surname>Vasudevan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fernanda</forename><surname>Viegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oriol</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pete</forename><surname>Warden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Wattenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Wicke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yuan</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaoqiang</forename><surname>Zheng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.04467</idno>
		<ptr target="http://arxiv.org/abs/1603.04467" />
		<imprint>
			<date type="published" when="2016-03">Mar. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Representation learning: A review and new perspectives</title>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aaron</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Vincent</surname></persName>
		</author>
		<idno type="DOI">10.1109/TPAMI.2013.50</idno>
	</analytic>
	<monogr>
		<title level="m">Pattern Analysis and Machine Intelligence</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="1798" to="1828" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The unified medical language system (UMLS): integrating biomedical terminology</title>
		<author>
			<persName><forename type="first">Olivier</forename><surname>Bodenreider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="D267" to="D270" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
	<note>suppl</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The OpenCV Library</title>
		<author>
			<persName><forename type="first">Gary</forename><surname>Bradski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Doctor Dobbs Journal</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="120" to="126" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of ImageCLEFcaption 2017 -Image Caption Prediction and Concept Detection for Biomedical Images</title>
		<author>
			<persName><forename type="first">Carsten</forename><surname>Eickhoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Immanuel</forename><surname>Schwall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alba</forename><surname>García Seco De Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Henning</forename><surname>Müller</surname></persName>
		</author>
		<ptr target="CEUR-WS.org&lt;http://ceur-ws.org&gt;,Sept.2017" />
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Labs Working Notes. CEUR Workshop Proceedings</title>
				<meeting><address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">Kaiming</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiangyu</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaoqing</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jian</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Delving deep into rectifiers: Surpassing human-level performance on imagenet classification</title>
		<author>
			<persName><forename type="first">Kaiming</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiangyu</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaoqing</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jian</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1026" to="1034" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A fast learning algorithm for deep belief nets</title>
		<author>
			<persName><forename type="first">Geoffrey</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Osindero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yee-Whye</forename><surname>Teh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="1527" to="1554" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</title>
		<author>
			<persName><forename type="first">Sergey</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Szegedy</surname></persName>
		</author>
		<idno type="DOI">10.1007/s13398-014-0173-7.2</idno>
		<idno type="arXiv">arXiv:1502.03167</idno>
		<idno>arXiv: 1502.03167</idno>
		<ptr target="http://arxiv.org/abs/1502.03167" />
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overview of ImageCLEF 2017: Information extraction from images</title>
		<author>
			<persName><forename type="first">Bogdan</forename><surname>Ionescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Henning</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mauricio</forename><surname>Villegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Helbert</forename><surname>Arenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giulia</forename><surname>Boato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Duc-Tien</forename><surname>Dang-Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yashin</forename><surname>Dicente Cid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carsten</forename><surname>Eickhoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alba</forename><surname>Garcia Seco De Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cathal</forename><surname>Gurrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bayzidul</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vassili</forename><surname>Kovalev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vitali</forename><surname>Liauchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Josiane</forename><surname>Mothe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luca</forename><surname>Piras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Riegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Immanuel</forename><surname>Schwall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>CLEF; Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017-09">2017. Sept. 2017</date>
			<biblScope unit="volume">10456</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Adam: A Method for Stochastic Optimization</title>
		<author>
			<persName><forename type="first">P</forename><surname>Diederik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jimmy</forename><forename type="middle">Lei</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><surname>Ba</surname></persName>
		</author>
		<ptr target="https://arxiv.org/pdf/1412.6980.pdf" />
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Efficient sparse coding algorithms</title>
		<author>
			<persName><forename type="first">Honglak</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexis</forename><surname>Battle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rajat</forename><surname>Raina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1506.03733v1</idno>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<idno type="ISSN">10495258</idno>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="801" to="808" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Distinctive Image Features from Scale-Invariant Keypoints</title>
		<author>
			<persName><surname>David G Lowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Vision</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="91" to="110" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Learning deconvolution network for semantic segmentation</title>
		<author>
			<persName><forename type="first">Hyeonwoo</forename><surname>Noh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Seunghoon</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bohyung</forename><surname>Han</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1520" to="1528" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Deep Learning for Health Informatics</title>
		<author>
			<persName><forename type="first">Daniele</forename><surname>Ravi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Charence</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fani</forename><surname>Deligianni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Melissa</forename><surname>Berthelot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Javier</forename><surname>Andreu-Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benny</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guang-Zhong</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.1109/JBHI.2016.2636665</idno>
		<ptr target="http://ieeexplore.ieee.org/document/7801947/" />
	</analytic>
	<monogr>
		<title level="j">Biomedical and Health Informatics</title>
		<idno type="ISSN">2168-2194</idno>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="4" to="21" />
			<date type="published" when="2017-01">Jan. 2017</date>
		</imprint>
	</monogr>
	<note>IEEE Journal of</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion</title>
		<author>
			<persName><forename type="first">Pascal</forename><surname>Vincent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hugo</forename><surname>Larochelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Isabelle</forename><surname>Lajoie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierre-Antoine</forename><surname>Manzagol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="3371" to="3408" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
