<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Extracting Temporal Features into a Spatial Domain Using Autoencoders for Sperm Video Analysis</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Vajira</forename><surname>Thambawita</surname></persName>
							<email>vajira@simula.no</email>
							<affiliation key="aff0">
								<address>
									<settlement>SimulaMet</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Oslo Metropolitan University</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pål</forename><surname>Halvorsen</surname></persName>
							<affiliation key="aff0">
								<address>
									<settlement>SimulaMet</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Oslo Metropolitan University</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hugo</forename><surname>Hammer</surname></persName>
							<affiliation key="aff0">
								<address>
									<settlement>SimulaMet</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Oslo Metropolitan University</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Riegler</surname></persName>
							<affiliation key="aff0">
								<address>
									<settlement>SimulaMet</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Kristiania University College</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Trine</forename><forename type="middle">B</forename><surname>Haugen</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Oslo Metropolitan University</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Extracting Temporal Features into a Spatial Domain Using Autoencoders for Sperm Video Analysis</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">6355335F7362EC38A47009166F5A2866</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we present a two-step deep learning method that is used to predict sperm motility and morphology based on video recordings of human spermatozoa. First, we use an autoencoder to extract temporal features from a given semen video and plot these into image-space, which we call feature-images. Second, these feature-images are used to perform transfer learning to predict the motility and morphology values of human sperm. The presented method shows it's capability to extract temporal information into spatial domain feature-images which can be used with traditional convolutional neural networks. Furthermore, the accuracy of the predicted motility of a given semen sample shows that a deep learning-based model can capture the temporal information of microscopic recordings of human semen.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The 2019 Medico task <ref type="bibr" target="#b6">[7]</ref> focuses on automatically predicting semen quality based on video recordings of human spermatozoa. This is change from previous years which have mainly focused on image classification of images taken from the gastrointestinal tract <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. For this year's task, we look at predicting the morphology and motility of a given semen sample. Motility is defined by three variables, namely, the percentage of progressive, nonprogressive, and immotile sperm. Morphology is determined by the percentage of sperm with tail defects, midpiece defects, and head defects. The organizers have provided a dataset consisting of 85 videos of different semen samples and a preliminary analysis of each, which is used as the ground truth. For this competition, the organizers have provided a predefined three-fold split of the VISEM dataset <ref type="bibr" target="#b4">[5]</ref>, which contains 85 videos from different participants and a preliminary analysis of each semen sample. In the dataset paper, the authors presented baseline mean absolute error (MAE) values for motility and morphology. Furthermore, the importance of computer-aided sperm analysis can be identified from the previous works which have been done over the last few decades <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b11">12]</ref>.</p><p>To solve this year's task, we propose a deep learning-based method consisting of two steps -(i) unsupervised feature extraction using an autoencoder <ref type="bibr" target="#b0">[1]</ref> and (ii) video regression using a standard convolutional neural networks (CNN) and transfer learning. The autoencoder we use is different from the state-of-the-art autoencoders used to extract video features <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b12">13]</ref> as they use autoencoders to extract feature vectors which are used with long-short memory models or multi-layer perceptron (MLP)s. In contrast, we use autoencoders to extract feature-images for use in CNNs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">APPROACH</head><p>Our method can primarily be split into two distinct steps. First, we use an autoencoder to extract temporal features from multiple frames of a video into a feature-image. Second, we pass the extracted feature-image into a standard pre-trained CNN to predict the motility and morphology of the spermatozoa in a given video. In this paper, we present the preliminary results for four experiments based on four different input types. The first input type (I1) uses a single raw frame. Input type two (I2) is a stack of identical frames copied across the channel-dimension. The third (I3) and fourth (I4) input type stack 9 and 18 consecutive frames from a video respectively.</p><p>The first two experiments (using I1 and I2) were performed as baseline experiments. The two other experiments (using I3 and I4) were performed to see how the temporal information affects the prediction performance of the approach. For all input types, we split the extracted datasets into three folds based on the folds provided by the organizers. Then, three-fold cross-validation was conducted to evaluate our four experiments. An overview of all experiments is shown in Figure <ref type="figure" target="#fig_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Step 1 -Unsupervised temporal feature extraction</head><p>In step 1, we trained an autoencoder that takes an input frame or frames (I1, I2, I3 or I4) from the sperm videos as depicted in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>Then, the encoder of the autoencoder extracted feature-images and passed them through the decoder architecture to reconstruct the input frame or frames back (R1, R2, R3, and R4). These extracted feature-images are different from traditional feature extractions of autoencoders because the traditional autoencoders extract feature vectors instead of feature-images. In this autoencoder, the mean square error (MSE) loss function is used to calculate the difference between input data and reconstructed data. Then, this error value is backpropagated to train the autoencoder. After training 2,000 epochs, we use the encoder architecture of the autoencoder model to step 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Step 2 -CNN regression model</head><p>We have selected the pre-trained ResNet-34 <ref type="bibr" target="#b5">[6]</ref> as our basic CNN to predict the values of motility and morphology of the sperm videos. However, any pre-trained CNN could be chosen for this step and in future work we will test and compare different ones in more detail. Firstly, we take an input frame or frames (I1, I2, I3 or I4) and pass through the pre-trained encoder model (only the encoder section of the autoencoder model) which was trained also from the same data inputs in an unsupervised way. Then, the outputs of the encoder model were passed through the CNN model which has a modified last layer to output three prediction values for motility or morphology.</p><p>MediaEval'19, 27-29 October 2019, Sophia Antipolis, France Github: https://github.com/vlbthambawita/MedicoTask_2019_paper_2 Thambawita et al. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">RESULTS AND ANALYSIS</head><p>According to the average MAE values shown in Table <ref type="table" target="#tab_0">1</ref>, the average motility values of input I3 and I4 shows the best results among other motility values of input I1 and I2. These performance improvements imply that our model is able to learn temporal features into a spatial feature image representation. Furthermore, input I4 which uses 18 stacked frames shows the best motility average values compared to input I3. This performance gain shows that to predict the sperm motility in sperm videos, it is better to analyze more frames at the same time. This might be due to the fact that the behaviour of sperm is something that needs to be observed over time and not in single frames. Moreover, the predictions for our base case inputs I1 and I2 show the same average values. This shows that our model learns temporal information from different sperm video frames.</p><p>Otherwise, it would be shown different average values for our two base case inputs I1 and I2. When we consider the predicted morphology average in Table <ref type="table" target="#tab_0">1</ref>, it shows values that are almost equal to each other. This is expected because the morphology of a sperm is something that can be observed using a single frame. In contrast to predicting accurate morphology, the predicted morphology values support the prove that our model has the capability to learn temporal data from multiple frames because motility predictions show an improvement when we increase the number of frames analyzed simultaneously.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">CONCLUSION AND FUTURE WORKS</head><p>In this paper, we proposed a novel method to extract temporal features from videos to create feature-images, which can be used to train traditional CNN models. Furthermore, we show that the feature-images capture temporal present in a sequence of frames, which can be used to predict the motility of the sperm videos. This method can be improved by using different error functions to force the model to learn more temporal data. For example, researchers can experiment with variational autoencoders <ref type="bibr" target="#b7">[8]</ref> and generative adversarial learning methods <ref type="bibr" target="#b3">[4]</ref> to improve this technique. Additionally, it may be beneficial to embed long short-term memory units to investigate how our feature-images compare to actual extracted temporal features.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A big picture overview of our two step deep learning model: Step 1 -an autoencoder architecture used to extract image features, Step 2 -the pre-trained Resnet-34 CNN for predicting the regression values of motility and morphology, I1, I2, I3 and I4 -input frames extracted from the video dataset, R1, R2, R3 and R4 -reconstructed data corresponding to the input data I1, I2 I3 and I4, sample 4 feature frames shows extracted 4 feature images from the autoencoder after training 2000 epochs (actual resolution of a feature image is 256X256 which is equal to the original frame size of the input data)</figDesc><graphic coords="2,85.48,83.69,441.05,214.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Mean absolute error values collected from the proposed method from different inputs: I1, I2, I3 and I4</figDesc><table><row><cell></cell><cell></cell><cell cols="2">Motility</cell><cell cols="2">Morphology</cell></row><row><cell cols="2">Input Fold</cell><cell cols="4">MAE Average MAE Average</cell></row><row><cell></cell><cell cols="2">Fold 1 13.330</cell><cell></cell><cell>5.698</cell><cell></cell></row><row><cell>I1</cell><cell cols="2">Fold 2 12.880</cell><cell>13.017</cell><cell>5.748</cell><cell>5.715</cell></row><row><cell></cell><cell cols="2">Fold 3 12.840</cell><cell></cell><cell>5.698</cell><cell></cell></row><row><cell></cell><cell cols="2">Fold 1 12.890</cell><cell></cell><cell>5.573</cell><cell></cell></row><row><cell>I2</cell><cell cols="2">Fold 2 13.010</cell><cell>13.017</cell><cell>5.593</cell><cell>5.606</cell></row><row><cell></cell><cell cols="2">Fold 3 13.150</cell><cell></cell><cell>5.653</cell><cell></cell></row><row><cell></cell><cell cols="2">Fold 1 10.850</cell><cell></cell><cell>5.567</cell><cell></cell></row><row><cell>I3</cell><cell cols="2">Fold 2 11.310</cell><cell>10.970</cell><cell>5.748</cell><cell>5.632</cell></row><row><cell></cell><cell cols="2">Fold 3 10.750</cell><cell></cell><cell>5.580</cell><cell></cell></row><row><cell></cell><cell>Fold 1</cell><cell>9.462</cell><cell></cell><cell>5.900</cell><cell></cell></row><row><cell>I4</cell><cell>Fold 2</cell><cell>9.426</cell><cell>9.427</cell><cell>5.738</cell><cell>5.777</cell></row><row><cell></cell><cell>Fold 3</cell><cell>9.393</cell><cell></cell><cell>5.692</cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">MediaEval'19, 27-29 October 2019, Sophia Antipolis, France Github: https://github.com/vlbthambawita/MedicoTask_2019_paper_2</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Autoencoders, unsupervised learning, and deep architectures</title>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Baldi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICML workshop on unsupervised and transfer learning</title>
				<meeting>ICML workshop on unsupervised and transfer learning</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="37" to="49" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Abnormal event detection in videos using spatiotemporal autoencoder</title>
		<author>
			<persName><forename type="first">Yong</forename><surname>Shean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chong</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Yong</forename><forename type="middle">Haur</forename><surname>Tay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Symposium on Neural Networks</title>
				<meeting>the International Symposium on Neural Networks</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="189" to="196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Estimation of Sperm Concentration and Total Motility From Microscopic Videos of Human Semen Samples</title>
		<author>
			<persName><forename type="first">Karan</forename><surname>Dewan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tathagato</forename><surname>Rai Dastidar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maroof</forename><surname>Ahmad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition (CVPR</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Generative adversarial nets</title>
		<author>
			<persName><forename type="first">Ian</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jean</forename><surname>Pouget-Abadie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mehdi</forename><surname>Mirza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bing</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sherjil</forename><surname>Ozair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aaron</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Advances in neural information processing systems (NIPS)</title>
				<meeting>the Advances in neural information processing systems (NIPS)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2672" to="2680" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">VISEM: A Multimodal Video Dataset of Human Spermatozoa</title>
		<author>
			<persName><forename type="first">B</forename><surname>Trine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steven</forename><forename type="middle">A</forename><surname>Haugen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jorunn</forename><forename type="middle">M</forename><surname>Hicks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oliwia</forename><surname>Andersen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hugo</forename><forename type="middle">L</forename><surname>Witczak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rune</forename><surname>Hammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pål</forename><surname>Borgli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><forename type="middle">A</forename><surname>Halvorsen</surname></persName>
		</author>
		<author>
			<persName><surname>Riegler</surname></persName>
		</author>
		<idno type="DOI">10.1145/3304109.3325814</idno>
		<ptr target="https://doi.org/10.1145/3304109.3325814" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th ACM on Multimedia Systems Conference (MMSys) (MMSys&apos;19)</title>
				<meeting>the 10th ACM on Multimedia Systems Conference (MMSys) (MMSys&apos;19)<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">Kaiming</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiangyu</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaoqing</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jian</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)</title>
				<meeting>the IEEE conference on computer vision and pattern recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Medico Multimedia Task at MediaEval</title>
		<author>
			<persName><forename type="first">Steven</forename><surname>Hicks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pål</forename><surname>Halvorsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jorunn</forename><forename type="middle">M</forename><surname>Trine B Haugen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oliwia</forename><surname>Andersen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Konstantin</forename><surname>Witczak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hugo</forename><forename type="middle">L</forename><surname>Pogorelov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Duc-Tien</forename><surname>Hammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mathias</forename><surname>Dang-Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Lux</surname></persName>
		</author>
		<author>
			<persName><surname>Riegler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the CEUR Workshop on Multimedia Benchmark Workshop (MediaEval)</title>
				<meeting>the CEUR Workshop on Multimedia Benchmark Workshop (MediaEval)</meeting>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Auto-encoding variational bayes</title>
		<author>
			<persName><forename type="first">P</forename><surname>Diederik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Max</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><surname>Welling</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6114</idno>
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The future of computer-aided sperm analysis</title>
		<author>
			<persName><forename type="first">Sharon</forename><forename type="middle">T</forename><surname>Mortimer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerhard</forename><surname>Van Der</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Horst</surname></persName>
		</author>
		<author>
			<persName><surname>Mortimer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Asian journal of andrology</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">545</biblScope>
			<date type="published" when="2015">2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Medico Multimedia Task at MediaEval 2018</title>
		<author>
			<persName><forename type="first">Konstantin</forename><surname>Pogorelov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Riegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pål</forename><surname>Halvorsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steven</forename><forename type="middle">Alexander</forename><surname>Hicks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristin</forename><forename type="middle">Ranheim</forename><surname>Randel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Duc-Tien</forename><surname>Dang-Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mathias</forename><surname>Lux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Olga</forename><surname>Ostroukhova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>De Lange</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the CEUR Workshop on Multimedia Benchmark Workshop (MediaEval)</title>
				<meeting>the CEUR Workshop on Multimedia Benchmark Workshop (MediaEval)</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Multimedia for medicine: the medico task at MediaEval</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Riegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Konstantin</forename><surname>Pogorelov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pål</forename><surname>Halvorsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carsten</forename><surname>Griwodz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Lange</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristin</forename><surname>Randel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sigrun</forename><surname>Eskeland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dang</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Duc</forename><surname>Tien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mathias</forename><surname>Lux</surname></persName>
		</author>
		<author>
			<persName><surname>Others</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017. 2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Automatic Tracking and Motility Analysis of Human Sperm in Time-Lapse Images</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Urbano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Masson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vermilyea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kam</surname></persName>
		</author>
		<idno type="DOI">10.1109/TMI.2016.2630720</idno>
		<ptr target="https://doi.org/10.1109/TMI.2016.2630720" />
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Medical Imaging</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="792" to="801" />
			<date type="published" when="2017-03">2017. March 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Unsupervised extraction of video highlights via robust recurrent auto-encoders</title>
		<author>
			<persName><forename type="first">Huan</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Baoyuan</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stephen</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Wipf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Minyi</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Baining</forename><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE international conference on computer vision (ICCV)</title>
				<meeting>the IEEE international conference on computer vision (ICCV)</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="4633" to="4641" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
