<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">TU-Net: Transformer based U-Net for left ventricle MRI segmentation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Amit</forename><surname>Pandey</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of CSET</orgName>
								<orgName type="institution">Bennett University</orgName>
								<address>
									<settlement>Gautam Buddha Nagar</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Akansha</forename><surname>Singh</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of CSET</orgName>
								<orgName type="institution">Bennett University</orgName>
								<address>
									<settlement>Gautam Buddha Nagar</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ajith</forename><surname>Abraham</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">School of AI</orgName>
								<orgName type="institution">Bennett University</orgName>
								<address>
									<settlement>Gautam Buddha Nagar</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Krishna</forename><forename type="middle">Kant</forename><surname>Singh</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Delhi Technical Campus</orgName>
								<address>
									<settlement>Greater Noida</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">TU-Net: Transformer based U-Net for left ventricle MRI segmentation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CD6526B1985B8D9086CAE77368AD2B68</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>MRI</term>
					<term>Cardiac function</term>
					<term>U-Net</term>
					<term>Multi-Head Self-Attention</term>
					<term>medical image segmentation</term>
					<term>Self-Attention 1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Accurate segmentation of the left ventricle in cardiac MRI images is crucial for evaluating cardiac function and diagnosing cardiovascular conditions. Traditional approaches, including the commonly used U-Net architecture, struggle with capturing the global contextual information required for precise segmentation. This study introduces U-Net MHSA, an enhanced version of U-Net that incorporates Multi-Head Self-Attention (MHSA) in the bottleneck layer to overcome these limitations. By combining the strengths of convolution layers and attention mechanisms, our model effectively captures long-range dependencies while preserving spatial coherence. Our model U-Net MHSA gives better results as compared to the baseline U-Net on the MICCAI 2009 Left Ventricle Segmentation Challenge dataset. U-Net MHSA gives higher scores as compared to baseline U-Net in terms of precision 0.799531 and accuracy 0.797943. Although the model gives a minor trade-off with slightly reduced recall and Intersection over Union (IoU). The overall results shows that the integration of MHSA with U-Net architecture improves the medical image segmentation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Medical image segmentation (MIA) <ref type="bibr" target="#b0">[1]</ref> plays a crucial role in modern healthcare, where accurate and precise diagnostic tools for example Magnetic Resonance Imaging (MRI), X-ray, and CT scans <ref type="bibr" target="#b1">[2]</ref> are very crucial in clinical decision-making. Traditional methods like manual and semi-automatic segmentation are purely based on human inputs and are not so much accurate and precise but also time-consuming. In the last few years machine learning <ref type="bibr" target="#b2">[3]</ref>, deep learning <ref type="bibr" target="#b3">[4]</ref>, and convolutional neural network <ref type="bibr" target="#b4">[5]</ref> have revolutionized the medical image field. U-Net <ref type="bibr" target="#b5">[6]</ref>, based on a convolutional neural network came into the picture in 2015 and revolutionized the field of medical imaging due to its unique U-Shaped architecture and skip connections. By using skip connections U-Net concatenates the low-level features with high-level features for more accurate and precise segmentations of medical images. Despite having a lot of advantages and success U-Net has some limitations also. Initial layers of the encoder path have poor representations of feature maps and these feature maps also pass through skip connections, which have no use and also increase the time and space complexity. U-Net was also not able to handle long-rage dependencies and parallel computations. In order to handle these limitations, we propose TU-Net a hybrid model which integrates MHSA <ref type="bibr" target="#b6">[7]</ref> with U-Net architecture in bottleneck. TU-Net aims to use the strengths of both architectures and gives better performance by capturing global image context and also retains finegrained spatial feature, which is essential for accurate and precise segmentation. In further sections we explain in detail self-attention, MHSA block and U-Net architecture.</p><p>ProfIT AI 2024: 4th International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2024), September 25-27, 2024, Cambridge, MA, USA : e21soep0035@bennett.edu.in (A. Pandey); akansha1.singh@bennett.edu.in (A. Singh); ajith.abraham@bennett.edu.in (A. Abraham); Krishnaiitr2011@gmail.com (K.K. <ref type="bibr">Singh)</ref> 0009-0000-1317-952X (A. Pandey); 0000-0002-5520-8066 (A. Singh); 0000-0002-0169-6738 (A. Abraham); 0000-0002-6510-6768 (K.K. Singh)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>In this particular section, we explain the methodology used in developing TU-Net, a novel architecture that improves the performance of the U-Net baseline model with Transformer-based Multi-Head Self-Attention (MHSA) for left ventricle MRI segmentation. The steps of our model are shown in Figure <ref type="figure">1</ref>. In the first step, the input image passes into the encoders after that in the second step output of the last encoder passes into the MHSA block and finally output of the MHSA block passes into decoders and gets the output segmentation map.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">U-Net Architecture with Integration of MHSA Block</head><p>The U-Net architecture as shown in Figure <ref type="figure">2</ref> was famous for its unique U-shaped encoder-decoder architecture, enabling precise localization and segmentation capabilities. In the encoder path, feature maps are extracted by two successive 3x3 convolutions followed by ReLU activation functions. After that 2x2 max-pooling operations are used to down-sample the image size. The above process is repeated five times as five encoders are used in U-Net architecture. After the fifth encoder in the bottleneck section, we integrate the MHSA module which processes the feature maps received from the last encoder and enables the proposed architecture to capture global contexts and long-range dependencies within the image. Conversely, in the decoder path, feature maps are up-sampled by using 2x2 convolutions, and after that concatenate the feature maps from the corresponding encoder side with the decoder side. After this step two successive 3x3 convolution operations were used followed by the ReLU activation function this process was also repeated five times and finally 1x1 convolution operation was used after the last decoder to give the final segmentation map. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Multi-Head Self-Attention (MHSA)</head><p>MHSA is an advanced technique used in transformer models to improve their ability to process information. Instead of relying on a single attention mechanism with queries, keys, and values all having dimensionality umodel, MHSA divides this process into multiple, parallel attention operations. Each of these operations, known as heads, maps the queries, keys, and values into smaller dimensions uk and uv using distinct learned linear projections. Attention is computed in parallel for each head, and the resulting outputs, which are uv-dimensional, are concatenated and re-projected to produce the final output. This approach allows the model to focus on various representation subspaces at different positions, whereas a single attention head would average these aspects together.</p><p>Overcome U-Net's limitation in capturing long-range dependencies, we incorporated MHSA into the bottleneck of the U-Net architecture. MHSA, which a concept derived from transformers, allows the model to attend to various parts of the input image simultaneously, thereby capturing global context more effectively as mentioned in Figure <ref type="figure">1</ref>. The TU-Net architecture retains the basic structure of U-Net but integrates MHSA in the bottleneck layer to enhance its ability to capture global information. The self-attention mechanism as shown in Figure <ref type="figure">4</ref> works by calculating attention scores between various positions within the input image. It consists of three main components: Query (Q), Key (K), and Value (V ). The attention scores A are calculated by taking the scaled dot-product of Q and K, and then applying a Soft-Max function to obtain the attention weights, as shown in equation 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝐴 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(</head><formula xml:id="formula_0">!" ! #$ " ) (<label>1</label></formula><formula xml:id="formula_1">)</formula><p>where dk is the dimensionality of the key vectors. These weights are then applied to V for the final output, as shown in Equation <ref type="formula">2</ref>.</p><formula xml:id="formula_2">Attention(Q, K, V ) = A • V (2)</formula><p>This process is performed multiple times in parallel to create MHSA, enabling the model to simultaneously focus on different regions of the image as illustrated in Figure <ref type="figure">3</ref>. The step-wise working of MHSA is shown in Figure <ref type="figure">5</ref>. From left to right. In the first step, we simply pass the input sequence, In the second step, we embed each word, In all encoders except encoder 0, we don't need embedding. In the third step, we split into eight heads and multiplied X or R with weight matrices. In the fourth step, calculate the attention scores by making use of Q, K, and V matrices. In the final step, concatenate the results of Z matrices and then multiply with the weight matrix W 0 and finally produce the output.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Training Procedure</head><p>The training procedure for the TU-Net architecture uses the MICCAI 2009 <ref type="bibr" target="#b7">[8]</ref> Left Ventricle Segmentation Challenge dataset. The details of the data set are mentioned in TABLE 1. Before training, the MRI images were subjected to several preprocessing steps to ensure uniformity and enhance model performance. Each image was resized to 256 x 256 pixels, and also the pixel intensity values were normalized. To prevent overfitting, data augmentation techniques such as random rotations, shifts, flips, and zooms were applied to the training dataset. Adam optimizer were used to train the TU-Net model, which is known for its efficiency and capability to handle sparse gradients. A hybrid loss function, which combines binary cross-entropy and Dice was employed to balance pixel-wise accuracy with the overlap between ground truth and predicted masks. During training, the TU-Net model's parameters were iteratively adjusted to minimize the loss function through forward and backward propagation steps. In the forward pass, the input images were fed through the model to obtain predictions, which were then compared to the ground truth masks to compute the loss. In the backward pass, the computed loss was used to update the model parameters through the Adam optimizer.</p><p>The model's performance was validated on the 500-image validation set after each epoch, providing insights into its generalization capability on unseen data. This validation process also guided the tuning of hyperparameters. After training concluded, the final model underwent evaluation using a test set comprising 266 images to gauge its performance in real-world scenarios. The final TU-Net model, incorporating MHSA in the bottleneck layer, comprised a total of 48,195,073 parameters, with 48,183,297 being trainable and 11,776 non-trainable, resulting in a model size of 183.85 MB. The training procedure ensured that the model was well-optimized for accurate and reliable segmentation of the left ventricle in MRI images as mentioned in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Evaluation Metrics</head><p>The performance evaluation encompasses key metrics including precision, recall, specificity, intersection over union (IoU), and a custom evaluation metric derived from the evaluate generator function, offering a comprehensive assessment of overall accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>The performance of the TU-Net model with Multi-Head Self-Attention (MHSA) was evaluated against the standard U-Net model using several key metrics: Precision, Recall, Specificity, IoU, and Accuracy. The evaluation was conducted on the MICCAI 2009 Left Ventricle Segmentation Challenge dataset, focusing on the segmentation of the left ventricle in MRI images. Table <ref type="table" target="#tab_1">2</ref> below summarizes the comparative results of the two models. Precision was higher for the U-Net MHSA model (0.799531) compared to the standard U-Net model (0.773880). This indicates that the incorporation of MHSA helped in reducing false positives. Recall was higher for the U-Net model (0.653408) compared to the U-Net MHSA model (0.576392). This suggests that while the U-Net MHSA model had fewer false positives, it also had a slightly higher number of false negatives. Specificity was slightly better for the U-Net.</p><p>MHSA model (0.997670) compared to the standard U-Net model (0.996921). This improvement, albeit small, indicates a better performance in correctly identifying negative samples. The IoU metric was slightly lower for the U-Net MHSA model (0.503610) compared to the standard U-Net model (0.548658). This suggests that the standard U-Net had a slightly better spatial overlap between the predicted and true segmentation masks. Accuracy, evaluated using the evaluate generator function, was significantly higher for the U-Net MHSA model (0.797943) compared to the standard U-Net model (0.710639). This indicates that the overall performance and correctness of the U-Net MHSA model in segmenting the left ventricle were superior.</p><p>In addition to the tabular results, Figure <ref type="figure">6</ref> illustrates a comparative graph which visually represents the performance disparities between the convolutional U-Net model and the U-Net MHSA model. This graph highlights the enhanced accuracy and precision of the U-Net MHSA model, despite a trade-off in recall and IoU. Figure <ref type="figure">7</ref> illustrates a visual comparison between U-Net MHSA and U-Net. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head><p>This study aimed to enhance the U-Net architecture for medical image segmentation by incorporating MHSA into its bottleneck layer. The results indicate that the enhanced model, U-Net MHSA, shows considerable improvements compared to the standard U-Net, especially regarding precision and overall accuracy. Integrating MHSA into the U-Net framework enables the model to more effectively capture long-range dependencies and contextual relationships within the image, which are crucial for precise segmentation. Our findings show that U-Net MHSA achieved a precision of 0.799531 and an accuracy of 0.797943, outperforming the standard U-Net, which had a precision of 0.773880 and an accuracy of 0.710639. These enhancements highlight the benefits of incorporating attention mechanisms to improve the TU-Net's ability to focus on important features throughout the entire image.</p><p>However, while U-Net MHSA showed notable gains in precision and accuracy, it did exhibit a slightly lower recall (0.576392) and IoU (0.503610) compared to the standard U-Net, which had a recall of 0.653408 and an IoU of 0.548658. This suggests that although U-Net MHSA is more precise in identifying the left ventricle, it may miss some true positives, leading to a lower recall. The decreased IoU indicates a reduced overlap between predicted and actual segmentations, pointing to a potential area for further optimization. The trade-off between precision and recall observed in our study is a common challenge in segmentation tasks. Precision measures how many of the identified segments are correct, while recall measures how many of the actual segments were identified. Achieving a balance between these metrics is crucial for practical applications, especially in medical imaging, where both false positives and false negatives can have significant consequences. One of the strengths of our approach is the ability of MHSA to capture global context, which is often overlooked by traditional convolution operations that primarily focus on local features. By attending to different parts of the image simultaneously, MHSA provides a more comprehensive understanding of spatial relationships, enhancing the model's ability to delineate complex anatomical structures. The overall higher accuracy of U-Net MHSA highlights its robustness and effectiveness for the task of left ventricle segmentation. The additional computational cost introduced by the MHSA module is justified by the performance gains, demonstrating the potential of self-attention mechanisms in improving convolution neural network architectures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>We present U-Net MHSA for medical image segmentation, especially left ventricle in heart images. U-Net MHSA is an advanced architecture, incorporating MHSA into the bottleneck layer has shown significant improvements in precision and overall accuracy. U-Net MHSA has outperformed standard U-Net. While previously standard U-Net had a precision value of 0.733880 and accuracy value of 0.710639, now after integration of U-Net MHSA, the precision value has become 0.799531 and accuracy value has become 0.797943 which is better than before. Along with all these benefits, there is some decrease in recall and Intersection over Union (IOU) values with U-Net MHSA. U-Net MHSA demonstrates the potential of convolution neural network architecture, self-attention mechanism to improve segmentation performance. Future research should focus on optimizing the attention mechanism and validating the model on different segmentation tasks and datasets to ensure its generalizability and robustness in various clinical scenarios.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :Figure 2 :</head><label>12</label><figDesc>Figure 1: Steps of Proposed Model</figDesc><graphic coords="2,127.53,394.24,348.00,109.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :Figure 4 :Figure 5 :</head><label>345</label><figDesc>Figure 3: Multi-head self-attention (MHSA)</figDesc><graphic coords="3,217.53,550.28,167.50,191.24" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 6 :Figure 7 :</head><label>67</label><figDesc>Figure 6: Detailed Working process of MHSA Module</figDesc><graphic coords="6,167.03,410.17,269.00,165.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="4,103.03,63.55,396.79,397.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="4,77.75,490.89,451.00,261.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>Hyperparameters</cell><cell></cell></row><row><cell>Hyperparameter</cell><cell>Value</cell></row><row><cell>Image Size</cell><cell>256 x 256</cell></row><row><cell>Batch Size</cell><cell>64</cell></row><row><cell>Epochs</cell><cell>50</cell></row><row><cell>Training Images</cell><cell>4900</cell></row><row><cell>Validation Images</cell><cell>500</cell></row><row><cell>Test Images</cell><cell>266</cell></row><row><cell>Total Parameters</cell><cell>48,195,073 (183.85 MB)</cell></row><row><cell>Trainable Parameters</cell><cell>48,183,297 (183.80 MB)</cell></row><row><cell>Non-trainable Parameters</cell><cell>11,776 (46.00 KB)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell cols="2">Performance Comparison</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Model</cell><cell>Precision</cell><cell>Recall</cell><cell>Specificity</cell><cell>IoU</cell><cell>Accuracy</cell></row><row><cell>U-Net</cell><cell>0.773880</cell><cell>0.653408</cell><cell>0.996921</cell><cell>0.548658</cell><cell>0.710639</cell></row><row><cell>U-Net MHSA</cell><cell>0.799531</cell><cell>0.576392</cell><cell>0.997670</cell><cell>0.503610</cell><cell>0.797943</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We would like to express our sincere gratitude to the Department of Computer Science at Bennett University for providing the necessary resources and support throughout this research. Special thanks to our colleagues and mentors, whose insights and expertise were invaluable in the development and refinement of this study. This work was not funded. We also extend our appreciation to the MICCAI 2009 Left Ventricle Segmentation Challenge for providing the dataset.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A study of image segmentation algorithms for different types of images</title>
		<author>
			<persName><forename type="first">Krishna</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Akansha</forename><surname>Kant</surname></persName>
		</author>
		<author>
			<persName><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Science Issues (IJCSI)</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page">414</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Biomedical imaging modalities: a tutorial</title>
		<author>
			<persName><forename type="first">Raj</forename><surname>Acharya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computerized Medical Imaging and Graphics</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="3" to="25" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Machine learning and the internet of medical things in healthcare</title>
		<author>
			<persName><forename type="first">Pushpa</forename><surname>Singh</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Academic Press</publisher>
			<biblScope unit="page" from="89" to="111" />
		</imprint>
	</monogr>
	<note>Diagnosing of disease using machine learning</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Era of deep neural networks: A review</title>
		<author>
			<persName><forename type="first">Poonam</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Akansha</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">8th International Conference on Computing, Communication and Networking Technologies (ICCCNT)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">An introduction to convolutional neural networks</title>
		<author>
			<persName><forename type="first">K</forename><surname>O'shea</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1511.08458</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">U-net: Convolutional networks for biomedical image segmentation</title>
		<author>
			<persName><forename type="first">Olaf</forename><surname>Ronneberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Philipp</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Brox</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference</title>
				<meeting><address><addrLine>Munich, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2015">October 5-9, 2015. 2015</date>
		</imprint>
	</monogr>
	<note>part III 18</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Cardiac MR Left Ventricle Segmentation Challenge</title>
		<ptr target="URLhttp://hdl.handle.net/10380/307" />
		<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
