<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Self-Supervised Learning Approach for Detecting BRCA Mutations in Breast Cancer Histopathological Images</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Faycal</forename><surname>Touazi</surname></persName>
							<email>f.touazi@univ-boumerdes.dz</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="laboratory">LIMOSE Laboratory</orgName>
								<orgName type="institution">University M&apos;hamed Bougara</orgName>
								<address>
									<addrLine>Independence Avenue</addrLine>
									<postCode>35000</postCode>
									<settlement>Boumerdes</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Djamel</forename><surname>Gaceb</surname></persName>
							<email>d.gaceb@univ-boumerdes.dz</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="laboratory">LIMOSE Laboratory</orgName>
								<orgName type="institution">University M&apos;hamed Bougara</orgName>
								<address>
									<addrLine>Independence Avenue</addrLine>
									<postCode>35000</postCode>
									<settlement>Boumerdes</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chaima</forename><surname>Belkadi</surname></persName>
							<email>c.belkadi@univ-boumerdes.dz</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="laboratory">LIMOSE Laboratory</orgName>
								<orgName type="institution">University M&apos;hamed Bougara</orgName>
								<address>
									<addrLine>Independence Avenue</addrLine>
									<postCode>35000</postCode>
									<settlement>Boumerdes</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Besma</forename><surname>Loubar</surname></persName>
							<email>b.loubar@univ-boumerdes.dz</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="laboratory">LIMOSE Laboratory</orgName>
								<orgName type="institution">University M&apos;hamed Bougara</orgName>
								<address>
									<addrLine>Independence Avenue</addrLine>
									<postCode>35000</postCode>
									<settlement>Boumerdes</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Self-Supervised Learning Approach for Detecting BRCA Mutations in Breast Cancer Histopathological Images</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CF1E392493DD566AB42EA6E4C72AB3DC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Breast Cancer</term>
					<term>BRCA mutation</term>
					<term>deep learning</term>
					<term>Self supervised learning</term>
					<term>BRCA 1</term>
					<term>BRCA 2</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Breast and ovarian cancers are among the most pressing health issues affecting women globally, with genetic mutations, particularly in the BRCA1 and BRCA2 genes, significantly influencing their development. This thesis offers a comprehensive overview of these cancers, emphasizing the genetic, anatomical, and histopathological factors that contribute to their onset and progression. A detailed examination of the anatomy of the female breast and ovaries provides insight into the origins of these malignancies. The critical role of histopathology in identifying specific cancer subtypes and gene mutations is explored, underscoring its vital importance in diagnosis and treatment. Our results demonstrate that the developed deep learning framework, integrating Vector Quantized-Variational Autoencoders (VQ-VAE) and DBSCAN for clustering, achieved an accuracy of 95% in classifying BRCA mutation-positive and negative cases, outperforming traditional diagnostic methods. By investigating the interplay between genetic predisposition and histopathological analysis, this thesis aims to enhance the understanding of breast and ovarian cancers and their implications for public health.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Breast cancer remains one of the most prevalent cancers globally, affecting millions of women each year. Early detection is a critical factor in improving survival rates, as it allows for timely intervention and treatment. Traditional methods for breast cancer detection, such as mammography, have long been the gold standard in screening programs. In recent years, deep learning has emerged as a powerful tool for enhancing breast cancer detection, particularly in medical imaging tasks such as mammography interpretation.</p><p>Deep learning algorithms, particularly Convolutional Neural Networks (CNNs), have demonstrated remarkable success in breast cancer detection from mammography images. Studies have shown that deep learning models can achieve accuracy levels comparable to radiologists in identifying tumors <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>.</p><p>Mutations in the BRCA1 and BRCA2 genes are among the most well-known genetic risk factors for breast cancer. These mutations, which can be inherited, significantly increase a woman's lifetime risk of developing breast cancer. Women who carry a BRCA1 or BRCA2 mutation have an elevated risk of 50 to 85% of developing breast cancer by the age of 70, compared to a 12% risk in the general population <ref type="bibr" target="#b4">[5]</ref>.</p><p>The discovery of a BRCA mutation in a patient is of crucial importance, not only for early diagnosis and cancer management, but also for informed decision making regarding preventive measures and treatment options. Identifying such mutations can lead to personalized surveillance strategies, riskreducing surgeries, and targeted therapies, thus improving the overall prognosis and quality of life of high-risk patients <ref type="bibr" target="#b5">[6]</ref> [7] <ref type="bibr" target="#b7">[8]</ref>.</p><p>Deep learning has revolutionized various domains and its impact on the medical field is particularly profound. The ability of deep learning algorithms to analyze complex patterns in large datasets has led to significant advances in medical diagnostics, treatment planning, and personalized medicine. In the context of medical imaging, deep learning models have demonstrated exceptional accuracy in tasks such as detecting abnormalities, classifying diseases, and predicting patient outcomes. These models, which often outperform traditional methods, have the potential to assist clinicians in making more informed decisions and improving patient care.</p><p>Our study leverages deep learning techniques to address the challenges associated with detecting BRCA1 and BRCA2 mutations in histopathological images of breast and ovarian cancer. By training a robust model on a curated dataset, we aim to provide a reliable tool for identifying these genetic mutations. The results presented in this work highlight the effectiveness of our approach and demonstrate the potential of deep learning in enhancing the accuracy of cancer detection and prognosis. Through this research, we contribute to the growing body of evidence supporting the integration of deep learning into clinical practice, ultimately aiming to improve outcomes for patients with hereditary cancer risks.</p><p>This paper is organized as follows: Section 2 reviews related works. Section 3 outlines our proposed approach, focusing on the Vector Quantized Variational AutoEncoder (VQ-VAE). Section 4 describes the experimental setup, including the TCGA-BRCA dataset, preprocessing, and evaluation metrics. Section 5 presents the results, covering clustering, BRCA patch classification, and SVS image classification, along with comparisons to related work. Finally, Section 6 concludes with a summary of findings and future research directions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>In this section, we offer a comprehensive review of recent studies that focus on detecting BRCA mutations in breast cancer using deep learning methods.</p><p>Shen Zhao et al. <ref type="bibr" target="#b8">[9]</ref> developed a deep learning framework for comprehensive molecular and prognostic stratifications of triple-negative breast cancer (TNBC). The framework features two CNNs in series: the first, a tissue type classifier based on ResNet-18, achieved a weighted F1 score of 0.96, classifying tissue types with near 90% accuracy. The second CNN predicted molecular features and relapse risks with AUCs ranging from 0.71 to 0.76.</p><p>Xiaoxiao Wang et al. <ref type="bibr" target="#b9">[10]</ref> proposed a deep learning model based on CNNs to predict BRCA gene mutations from histopathological images. Trained on the JSPHCM and JSCH datasets, their model demonstrated high performance with AUC values ranging from 79%.</p><p>Tristan Lazard et al. <ref type="bibr" target="#b10">[11]</ref> employed multiple instance learning (MIL) techniques to identify morphological patterns indicative of homologous recombination deficiency in luminal breast cancers. Their model, tested on a dataset of 673 WSIs from TCGA and an in-house dataset, achieved an AUC of 71%.</p><p>Nam Nhut Phan et al. <ref type="bibr" target="#b11">[12]</ref> developed a deep learning pipeline for classifying breast cancer molecular subtypes from unannotated pathological images. Their approach utilized a two-step transfer learning process with CNNs such as ResNet50, ResNet101, VGG16, and Xception. Initially, the models were pre-trained on ImageNet and then fine-tuned on an internal dataset. They were subsequently trained on the TCGA-BRCA dataset to classify breast cancer into basal, HER2, luminal A, and luminal B subtypes. The images were normalized to 512x512 pixels, and patches were extracted from WSIs. The models achieved average AUCs ranging from 88 to 92%.</p><p>Kurian et al. <ref type="bibr" target="#b12">[13]</ref> proposed a semi-supervised learning approach to classify breast cancer subtypes using histopathological images from the TCGA-BRCA dataset. They focused on differentiating between Basal and Luminal A PAM50 subtypes by analyzing a curated subset of 180 whole slide images (WSIs) selected to minimize heterogeneity. Their model leveraged a Deep Neural Network (DNN) architecture based on SimCLR with a ResNet18 backbone for out-of-distribution (OOD) detection, pre-trained on a large histology image dataset. Patch extraction from annotated tumor regions enabled the model to focus on relevant regions, although it introduced potential label noise. They achieved a patient-level accuracy of 81.43%.</p><p>The methodology employed by Valieris et al. <ref type="bibr" target="#b13">[14]</ref> involved developing a deep learning framework to detect homologous recombination (HR) deficiency in breast tumors using the TCGA-BRCA dataset. The model leveraged whole slide images (WSI), utilizing advanced image processing techniques to extract histopathological features indicative of HR deficiency. To address the complexity and variability in these images, the authors implemented a multiple instance learning (MIL) approach, allowing the model to learn from entire tumor samples without the need for manual segmentation. Their model achieved an area under the curve (AUC) of 80%.</p><p>Table <ref type="table" target="#tab_0">1</ref> provides a comparative summary of the performance achieved in state-of-the-art studies for breast cancer classification, highlighting different datasets, methods, and evaluation metrics used. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed approach</head><p>In this section, we describe our proposed approach for the detection and diagnosis of breast cancer using advanced deep learning techniques. Our approach is designed to address the challenges of analyzing histopathological images and aims to provide a comprehensive solution to detect and classify breast masses. But first we will introduce a key architecture in our proposal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Vector Quantized Variational AutoEncoder</head><p>The Vector Quantized Variational AutoEncoder (VQ-VAE) <ref type="bibr" target="#b14">[15]</ref> is a type of variational autoencoder that introduces vector quantization to obtain a discrete latent representation, distinguishing itself from traditional VAEs, which produce continuous latent codes. The VQ-VAE uses a codebook, a matrix e of dimensions 𝐾 × 𝐷, where 𝐾 represents the number of embeddings and 𝐷 is the dimensionality of each embedding (see Figure <ref type="figure" target="#fig_0">1</ref>). This architecture enables the encoding of data into discrete codes, which helps in learning compact and structured representations. The model consists of three main components:</p><p>• An encoder: that maps input data 𝑥 (such as images) into continuous latent representations 𝑧.</p><p>• A vector quantizer: that transforms these continuous representations into discrete vectors 𝑒 𝑘 using the codebook, by selecting the closest embedding through minimizing the Euclidean distance: quantization(𝑧) = argmin</p><formula xml:id="formula_0">𝑒 𝑘 ∈𝐸 ||𝑧 − 𝑒 𝑘 || 2<label>(1)</label></formula><p>• A decoder: that reconstructs the original data from the discrete latent codes.</p><p>The codebook, which contains the learned embeddings, plays a critical role in the quantization process. It allows the continuous output of the encoder to be mapped to discrete codes, facilitating the generation of data from these discrete codes. By using a discrete latent space, VQ-VAE simplifies model optimization and enables the use of generative models based on discrete distributions, such as PixelCNN or other autoregressive models, to model the latent codes. One of the key advantages of VQ-VAE is its ability to capture meaningful structural representations in the data, making it particularly useful for high-quality generation tasks, especially in areas like medical imaging and discrete signal modeling. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Methodology</head><p>In this section, we detail the steps of our methodology for analyzing histopathological images to identify BRCA mutations and classify cancerous versus normal cells. Our approach employs image preprocessing, feature extraction using Vector Quantized-Variational Autoencoder (VQ-VAE), and clustering techniques to organize images based on mutation status, enabling precise classification and improving diagnostic accuracy (see Figure <ref type="figure" target="#fig_1">2</ref>). • Step 1: Input Images The process begins with the acquisition of large histopathology images in SVS (Scalable Vector Graphics) format. These images are particularly challenging to handle due to their high resolution and substantial size, which necessitates advanced processing techniques. To manage this, the SVS images are divided into smaller patches of size 1400 × 920 pixels. This approach simplifies the analysis and processing of the images while retaining important details. The dataset comprises a diverse set of patches, • Step 2: Dataset Categorization: After patchifing the dataset into small images were CNN models can process theme, further refinement involves categorizing patches based on BRCA mutation status. The patches are divided into three distinct categories: those related to SVS images where BRCA1 is identified, those where BRCA2 is identified, and those with no identified BRCA mutations. This detailed categorization enables a more focused analysis of IDC patches in relation to specific BRCA mutations, enhancing the understanding of their histopathological features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Step 3: Image Characterization Using Vector Quantized-Variational Autoencoder (VQ-VAE)</head><p>In our approach, we utilize a Vector Quantized-Variational Autoencoder (VQ-VAE) <ref type="bibr" target="#b15">[16]</ref> with a codebook of 128 discrete vectors to handle and analyze high-resolution histopathological images. This model is crucial for effectively managing the complexity of these images through its encoder-decoder architecture.</p><p>-Encoder Network: The encoder network transforms high-resolution input images into a continuous latent representation. It consists of multiple neural network layers that extract significant features and reduce the dimensionality of the images while retaining important details. -Vector Quantization: Following the generation of the continuous latent representation, VQ-VAE applies vector quantization. This process maps the continuous latent vectors to the nearest discrete vectors in a predefined codebook of 1024 entries. This quantization step converts the latent space into a more manageable and structured form, which simplifies further analysis. -Codebook: The codebook, comprising 1024 discrete vectors, is updated during training to minimize reconstruction loss. This ensures that the codebook effectively captures the essential characteristics of the input images. -Decoder Network: The decoder network reconstructs the high-resolution images from the quantized latent representation. Using the discrete codes produced by the encoder, the decoder aims to accurately recreate the original images, preserving critical features and details. -Dimensionality Reduction and Efficient Analysis: The combination of the encoder, vector quantization, and decoder facilitates dimensionality reduction of high-resolution images. This reduction compresses the data into a latent space that retains essential information, making the data more suitable for efficient analysis and processing.</p><p>• Step 4: Feature Extraction with VQ-VAE: To extract meaningful features from the images, we utilize the VQ-VAE model. The VQ-VAE's encoder network processes the histopathological images to generate continuous latent representations, which are then quantized into discrete vectors using the codebook. This approach captures intricate patterns and features within the images. The aim is to characterize the images with a reduced dimensionality representation, which simplifies and enhances the clustering operation. This method provides a comprehensive and structured feature representation by reducing the dimensionality of the high-resolution images, making it easier to perform effective clustering. • Step 5: Latent Space Representation: After feature extraction, each image is encoded into a latent vector. These latent vectors collectively form a dataset that is used for subsequent analysis. This latent space representation simplifies the data and prepares it for clustering and other operations.</p><p>• Step 6: Clustering and Annotation: After encoding the images into latent vectors, we apply the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to group similar vectors into clusters. This clustering approach organizes the images into meaningful groups based on their feature vectors, distinguishing between BRCA mutation-positive and BRCA mutation-negative cases. • Step 7: Classification: The final stage involves classifying the images into two main categories: normal cells and cancerous cells. This classification is based on the previously obtained clusters and latent space representations. The model aims to enhance the accuracy of differentiating between various types of cancerous and non-cancerous tissues, thereby improving diagnostic capabilities.</p><p>By integrating VQ-VAE with advanced CNNs and clustering techniques, our approach provides a robust framework for analyzing histopathological images. This methodology is designed to improve the performance of breast cancer detection and diagnosis, offering a more accurate and comprehensive analysis of histological samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimentations and results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">TCGA-BRCA Dataset</head><p>The TCGA-BRCA dataset, referenced in <ref type="bibr" target="#b16">[17]</ref>, is part of the Cancer Genome Atlas (TCGA) project, which aims to enhance the understanding of cancer through comprehensive genomic studies. This dataset includes RNA sequencing, somatic mutation profiles, and gene-level copy number variation data from 1,098 breast invasive carcinoma cases. It contains 1978 images from these 1,098 patients, with 763 tumor samples that include single nucleotide polymorphism (SNP) and copy number variation (CNV) data generated using the Affymetrix 6.0 SNP array, alongside somatic mutation information obtained from the Illumina sequencing platform. Data sources include the Genomic Data Commons (GDC) Data Portal, Pan-Cancer Atlas, and The Broad Institute's TCGA GDAC Firehose. The dataset is publicly available through both the GDC Data Portal and the Cancer Imaging Archive (TCIA). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Data Pre-processing</head><p>In this study, we undertake a comprehensive pre-processing procedure to prepare histopathology images for deep learning analysis.</p><p>First, 200 SVS image files were collected from the TCGA-BRCA dataset, including images with BRCA1 or BRCA2 mutations, as well as some without these mutations. These images can be as large as 130,000x99,000 pixels. To facilitate efficient processing and analysis, the images were divided into smaller patches of 1400 x 920 pixels (see Figure <ref type="figure" target="#fig_3">4</ref> for exemples). Then each patch was classified according to the status of the BRCA mutation, distinguishing between the BRCA mutation positive and BRCA mutation negative cases. This classification is critical for investigating the role of BRCA mutations in breast cancer. The dataset was organized according to the BRCA mutation status, ensuring a comprehensive  <ref type="table" target="#tab_1">2</ref> for the distribution of images). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Used Metrics and Loss Functions</head><p>In this study, we use a variety of metrics and loss functions to evaluate and optimize our deep learning models for breast cancer detection and diagnosis. This includes the VQ-VAE model, which employs a specialized loss function. Below, we outline the metrics and loss functions used:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Loss Functions:</head><p>• Binary Cross-Entropy Loss: Applied for binary classification tasks, such as distinguishing between cancerous and normal patches. Measures the performance of a classification model with output probabilities between 0 and 1. The formula is:</p><formula xml:id="formula_1">Loss BCE = − 1 𝑁 𝑁 ∑︁ 𝑖=1 [𝑦 𝑖 log(𝑦 ˆ𝑖) + (1 − 𝑦 𝑖 ) log(1 − 𝑦 ˆ𝑖)]<label>(2)</label></formula><p>where 𝑦 𝑖 denotes the ground truth label and 𝑦 ˆ𝑖 is the predicted probability. • VQ-VAE Loss Function: The VQ-VAE model utilizes a specialized loss function that includes three key components:</p><p>-Reconstruction Loss: Measures how well the decoder reconstructs the input from the quantized representation. It ensures that the reconstructed image is similar to the original input image. The formula is:</p><formula xml:id="formula_2">Loss Recon = 1 𝑁 𝑁 ∑︁ 𝑖=1 ‖𝑥 𝑖 − 𝑥 ˆ𝑖‖ 2 (3)</formula><p>where 𝑥 𝑖 is the original image and 𝑥 ˆ𝑖 is the reconstructed image. -Codebook Loss: Encourages the codebook vectors to move closer to the encoder output, ensuring that the quantization process effectively captures the data's structure. This component helps to learn a better representation by minimizing the distance between the encoder output and codebook vectors. The formula is:</p><formula xml:id="formula_3">Loss Codebook = 1 𝑁 𝑁 ∑︁ 𝑖=1 ‖𝑧 𝑖 − 𝑒 𝑞(𝑧 𝑖 ) ‖ 2<label>(4)</label></formula><p>where 𝑧 𝑖 is the continuous latent vector and 𝑒 𝑞(𝑧 𝑖 ) is the quantized vector. -Commitment Loss: Penalizes the encoder for not committing to a specific codebook vector, promoting stability in the learned representations. This component helps to stabilize the learning process and ensure that the encoder uses the codebook vectors effectively. The formula is:</p><formula xml:id="formula_4">Loss Commitment = 𝛽 1 𝑁 𝑁 ∑︁ 𝑖=1 ‖𝑧 𝑖 − 𝑒 𝑞(𝑧 𝑖 ) ‖ 2 (5)</formula><p>where 𝛽 is a hyperparameter that controls the weight of the commitment loss term.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Evaluation Metrics:</head><p>• Accuracy: Measures the proportion of correctly classified patches out of the total number of patches. The formula is as follows: </p><p>• F1 Score: The harmonic mean of Precision and Recall, providing a balanced measure of model performance. The formula is:</p><formula xml:id="formula_6">F1 Score = 2 • Precision • Recall Precision + Recall<label>(9)</label></formula><p>• Area Under the Receiver Operating Characteristic Curve (AUC-ROC): Measures the model's ability to distinguish between classes across all classification thresholds. A higher AUC value indicates better model performance. The AUC is calculated as the integral of the Receiver Operating Characteristic (ROC) curve:</p><formula xml:id="formula_7">𝐴𝑈 𝐶 = ∫︁ 1 0 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑑(𝑅𝑒𝑐𝑎𝑙𝑙)<label>(10)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and discussion</head><p>The discussion section is dedicated to analyzing and interpreting the results obtained from our experiments:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Clustering Results</head><p>The clustering process using the DBSCAN algorithm <ref type="bibr" target="#b17">[18]</ref> (Density-Based Spatial Clustering of Applications with Noise) aimed to categorize patches into specific groups based on the presence or absence of BRCA mutations. The parameters for DBSCAN were set with eps = 8.7 and min_samples = 180000, guiding the clustering of the latent space representations. The goal of this clustering was to classify patches according to their status of BRCA mutation. The clustering process identified two distinct categories of clusters. The first cluster contains patches from both images with BRCA mutations and images without these mutations. The second cluster, however, exclusively contains patches from images identified with BRCA mutations. This clustering approach enables a more focused separation, supporting targeted analysis and model training based on the presence or absence of the BRCA mutation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">BRCA Patch Classification</head><p>In this section, we present the results of classifying BRCA patches using three different deep learning models: VGG16 <ref type="bibr" target="#b18">[19]</ref>, ResNet <ref type="bibr" target="#b19">[20]</ref>, EfficientNet <ref type="bibr" target="#b20">[21]</ref>, and Inception V3 <ref type="bibr" target="#b21">[22]</ref>. The classification task involves distinguishing between patches with BRCA mutations and those without. The data set used for the classification of BRCA mutation consists of histopathological image patches, divided into three subsets: training, validation, and testing. Table <ref type="table" target="#tab_4">4</ref> summarizes the distribution of labels across the training, validation, and test sets.</p><p>The training set comprises a total of 57,152 samples, with 48,176 labeled as NO_BRCA and 8,976 as BRCA. The validation set includes 11,431 samples, of which 9,639 are labeled as NO_BRCA and 1,792 as BRCA. Finally, the test set contains 14,288 samples, with 12,044 labeled as NO_BRCA and 2,244 as BRCA.</p><p>The classification results for the detection of BRCA mutation in patches using four different models are presented in Table <ref type="table">5</ref>, which outlines key metrics such as precision, AUC, precision, recall and F1 score for both the BRCA and No Mutation classes. These metrics provide a comprehensive evaluation of each model's performance, highlighting their ability to distinguish between patients with BRCA mutations and those without. All models exhibited exceptional performance with accuracy that exceeded 98%. Inception V3 achieved the highest accuracy at 98.94%, closely followed by EfficientNet and VGG16, both at 98.81%, while ResNet achieved 98.71%. The Area Under the Curve (AUC) further supports these findings, with all models surpassing the 97% threshold, led by Inception V3 at 97.36%.</p><p>Detailed precision, recall, and F1-score reveal that Inception V3 consistently outperformed the other models in all metrics. For the BRCA class, Inception V3 achieved a precision of 98%, a recall of 95%, and an F1 score of 97%. For the No Mutation class, Inception V3 reached near-perfect performance, with a recall of 100%, precision of 99%, and an F1-score of 99%. These results highlight Inception V3's balanced sensitivity (recall) and precision across both classes, making it a reliable model for BRCA mutation detection.</p><p>Although all models show strong performance, Inception V3 stands out with the best overall metrics in accuracy, AUC, and F1 score. EfficientNet and VGG16 shared similar results, achieving an accuracy of 98.81% and maintaining high precision and recall for both classes. ResNet, although slightly lower in performance compared to the other models, still achieved competitive results with a precision of 97% for the BRCA class and high recall values.</p><p>The consistently high performance of all models underscores the effectiveness of deep learning architectures for histopathological image classification. However, the slight edge of Inception V3 in both AUC and F1-score suggests that its architecture may be better suited for extracting subtle features in histopathological images related to BRCA mutations, possibly due to its ability to capture multi-scale features.</p><p>To further analyze the classification performance, confusion matrices for the four models-EfficientNet, VGG16, ResNet, and Inception V3-are illustrated in Figure <ref type="figure" target="#fig_6">5</ref>, showing the distribution of true positives, false positives, true negatives, and false negatives for both the BRCA and No Mutation classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">SVS Image Classification Results</head><p>The classification performance of our model for detecting BRCA mutations in histopathological SVS images is summarized in Table <ref type="table">5</ref>. The model achieved strong metrics for both the BRCA Mutation and No Mutation classes. Specifically, precision, recall, and F1-score for both classes were balanced, indicating robust classification results.   As shown in Table <ref type="table" target="#tab_6">6</ref>, the model achieved an overall accuracy of 95%, with an AUC score of 93.27%. The high precision and recall for both classes demonstrate the effectiveness of our approach in detecting BRCA mutations, reducing the risk of false positives and false negatives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Comparison with Related Work</head><p>Table <ref type="table" target="#tab_7">7</ref> presents a comparison of our method with related work in the field of BRCA mutation detection from histopathological images. Our approach, combining VQVAE and DBSCAN with InceptionV3, outperformed previous studies, achieving the highest AUC of 93.27%.</p><p>Our approach offers several distinct advantages over other methods for the detection of BRCA mutations, primarily due to the integration of advanced unsupervised learning and clustering techniques. Using the VQVAE model, we efficiently encode high-dimensional histopathological images into a </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this study, we present a pioneering deep learning framework designed to predict mutations of the BRCA gene in breast cancer utilizing histopathological images. By integrating Vector Quantized-Variational Autoencoders (VQ-VAE) for effective feature extraction and employing DBSCAN for clustering, we have established a robust model that demonstrates superior accuracy in classifying cases as BRCA mutation-positive or negative. This innovative approach surpasses conventional methods and highlights the potential of artificial intelligence to automate complex diagnostic processes within medical imaging. In perspective, our future work will focus on the improvement of data enhancement techniques to further enhance the accuracy of the model in the detection of BRCA mutations. By generating synthetic samples that capture the variability in the expression of the BRCA mutation, our aim is to improve the robustness and generalization of our model. This will be particularly valuable for addressing imbalances in the dataset and improving the classification of rare mutation cases. In addition, our exploration will extend to investigating the roles of other genetic mutations in breast cancer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Declaration on Generative AI</head><p>During the preparation of this work, the authors used ChatGPT for grammar and spelling checks, as well as paraphrasing. After utilizing this tool, the authors reviewed and edited the content as necessary, taking full responsibility for the final publication.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: VQ-VAE Architecture:The image is encoded into a grid of latent vectors. These vectors are replaced by the nearest codebook vector at the bottleneck. Finally, the quantized vectors pass through the decoder to reconstruct the image<ref type="bibr" target="#b14">[15]</ref>.</figDesc><graphic coords="4,99.21,65.61,396.85,198.43" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Overview of the proposed architecture for histopathological image analysis.</figDesc><graphic coords="4,72.00,426.97,451.28,283.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Examples of SVS image from TCGA-BRCA dataset</figDesc><graphic coords="6,127.55,468.53,340.17,226.78" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Example of generated patches from SVS images</figDesc><graphic coords="7,146.49,220.05,302.29,226.78" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>•</head><label></label><figDesc>Precision and Recall: Precision measures the proportion of true positive predictions among all positive predictions, while Recall measures the proportion of true positive predictions among all actual positives. These metrics are defined as: Precision = True Positives True Positives + False Positives (7) Recall = True Positives True Positives + False Negatives</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>(a) Confusion Matrix -VGG Model (b) Confusion Matrix -ResNet Model (c) Confusion Matrix -EfficientNet Model (d) Confusion Matrix -InceptionV3 Model</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Comparison of Confusion Matrices for Different Models</figDesc><graphic coords="11,72.00,261.28,203.08,166.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Performance Achieved in State-of-the-Art Breast Cancer Studies</figDesc><table><row><cell>Reference</cell><cell>Dataset</cell><cell>Year</cell><cell>Methods</cell><cell></cell><cell>Metrics</cell></row><row><cell>Tristan Lazard et al. [11]</cell><cell>TCGA</cell><cell>2022</cell><cell>ResNet-18</cell><cell></cell><cell>AUC 71%</cell></row><row><cell>Xiaoxiao Wang et al. [10]</cell><cell>JSPHCM, JSCH</cell><cell>2021n</cell><cell>ResNet-18</cell><cell></cell><cell>AUC 79%</cell></row><row><cell>Kurian et al. [13]</cell><cell>TCGA-BRCA</cell><cell>2023</cell><cell>SimCLR</cell><cell></cell><cell>81.34% accuracy</cell></row><row><cell>Valieris et al. [14]</cell><cell>TCGA</cell><cell>2020</cell><cell>Resnet34</cell><cell></cell><cell>AUC 80%</cell></row><row><cell>Nam Nhut Phan et al. [12]</cell><cell>TCGA-BRCA</cell><cell>2021</cell><cell>2-Step</cell><cell>ResNet50,101,</cell><cell>AUC 92%</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">VGG16, Xception</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 Dataset Statistics for BRCA Mutation Classification Mutation Status Number of SVS Images Number of Patches</head><label>2</label><figDesc></figDesc><table><row><cell>BRCA1</cell><cell>53</cell><cell>38,849</cell></row><row><cell>BRCA2</cell><cell>38</cell><cell>25,526</cell></row><row><cell>No BRCA Mutation</cell><cell>109</cell><cell>56,000</cell></row><row><cell>Total</cell><cell>200</cell><cell>120375</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Results of DBSCAN Clustering</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Cluster Total Number of Images With BRCA Mutation No BRCA Mutation</head><label></label><figDesc></figDesc><table><row><cell>Cluster 1</cell><cell>91104</cell><cell>17467</cell><cell>73637</cell></row><row><cell>Cluster 2</cell><cell>32420</cell><cell>3611</cell><cell>25661</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Dataset Summary for BRCA Mutation Classification</figDesc><table><row><cell>Dataset</cell><cell cols="3">Total Samples NO_BRCA BRCA</cell></row><row><cell>Training</cell><cell>64,384</cell><cell>55,408</cell><cell>8,976</cell></row><row><cell>Validation</cell><cell>11,431</cell><cell>9,639</cell><cell>1,792</cell></row><row><cell>Test</cell><cell>44,555</cell><cell>42,011</cell><cell>2,500</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 6 SVS Images Classification Results Model Precision Recall F1-Score</head><label>6</label><figDesc></figDesc><table><row><cell>BRCA Mutation</cell><cell>90%</cell><cell>90%</cell><cell>90%</cell></row><row><cell>No Mutation</cell><cell>97%</cell><cell>97%</cell><cell>97%</cell></row><row><cell>Accuracy</cell><cell>95%</cell><cell>AUC</cell><cell>93.27%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 7</head><label>7</label><figDesc>Comparison with Related Works latent space, allowing the extraction of critical features while preserving key image details. Unlike conventional methods, which may struggle to capture subtle variations in tissue morphology, our model's ability to reconstruct intricate patterns enhances the detection of relevant features. Moreover, the incorporation of DBSCAN for clustering within this latent space adds a significant layer of robustness, effectively grouping similar patterns, and reducing noise. This method ensures that irrelevant or noisy data are filtered out, improving classification accuracy.</figDesc><table><row><cell>Reference</cell><cell>Dataset</cell><cell>Year</cell><cell>Methods</cell><cell>Metrics</cell></row><row><cell>Tristan Lazard et al. [11]</cell><cell>TCGA</cell><cell>2022</cell><cell>ResNet-18</cell><cell>AUC 71%</cell></row><row><cell>Xiaoxiao Wang et al. [10]</cell><cell>JSPHCM, JSCH</cell><cell>2021</cell><cell>ResNet-18</cell><cell>AUC 79%</cell></row><row><cell>Kurian et al. [13]</cell><cell>TCGA-BRCA</cell><cell>2023</cell><cell>SimCLR</cell><cell>81.34% accuracy</cell></row><row><cell>Valieris et al. [14]</cell><cell>TCGA</cell><cell>2020</cell><cell>ResNet-34</cell><cell>AUC 80%</cell></row><row><cell>Nam Nhut Phan et al. [12]</cell><cell>TCGA-BRCA</cell><cell>2021</cell><cell>2-Step ResNet50,101, VGG16, Xception</cell><cell>AUC 92%</cell></row><row><cell>Our Work</cell><cell>TCGA-BRCA</cell><cell>2024</cell><cell>VQVAE, DBSCAN, VGG16, Resnet50, Ef-</cell><cell>AUC 93.27%</cell></row><row><cell></cell><cell></cell><cell></cell><cell>ficientNet, InceptionV3</cell><cell></cell></row></table><note>compact</note></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Two-stage approach for semantic image segmentation of breast cancer: Deep learning and mass detection in mammographic images</title>
		<author>
			<persName><forename type="first">F</forename><surname>Touazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gaceb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chirane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Herzallah</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IDDM</title>
		<imprint>
			<biblScope unit="page" from="62" to="76" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Improving breast cancer diagnosis in mammograms with progressive transfer learning and ensemble deep learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Khaled</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Touazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gaceb</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Arabian Journal for Science and Engineering</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Enhancing breast mass cancer detection through hybrid vit-based image segmentation model</title>
		<author>
			<persName><forename type="first">F</forename><surname>Touazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gaceb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Boudissa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Assas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 6th Conference on Computing Systems and Applications</title>
				<meeting><address><addrLine>Algiers, Algeria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Breast cancer detection using deep learning: Datasets, methods, and challenges ahead</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Dar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rasool</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Assad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers in biology and medicine</title>
		<imprint>
			<biblScope unit="volume">149</biblScope>
			<biblScope unit="page">106073</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Brca1-and brca2-associated hereditary breast and ovarian cancer</title>
		<author>
			<persName><forename type="first">N</forename><surname>Petrucelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Daly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Pathology of hereditary breast and ovarian cancer</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hodgson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Turashvili</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Oncology</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">531790</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Brca mutations: implications of genetic testing in ovarian cancer</title>
		<author>
			<persName><forename type="first">V</forename><surname>Talwar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rauthan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Indian Journal of Cancer</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="S56" to="S67" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Impact of non-brca genes in the indication of riskreducing surgery in hereditary breast and ovarian cancer syndrome (hboc)</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Madrigal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Y R</forename><surname>Garcés</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J J</forename><surname>Ruiz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Current Problems in Cancer</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page">101008</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep learning framework for comprehensive molecular and prognostic stratifications of triple-negative breast cancer</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-C</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>You</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-A</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-T</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Fundamental Research</title>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Prediction of brca gene mutation in breast cancer based on deep learning and histopathology images</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Genetics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">661109</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Deep learning identifies morphological patterns of homologous recombination deficiency in luminal breast cancers from whole slide images</title>
		<author>
			<persName><forename type="first">T</forename><surname>Lazard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bataillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Naylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Popova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F.-C</forename><surname>Bidard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stoppa-Lyonnet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-H</forename><surname>Stern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Decencière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Walter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vincent-Salomon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cell Reports Medicine</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Predicting breast cancer gene expression signature by applying deep convolutional neural networks from unannotated pathological images</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">N</forename><surname>Phan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-M</forename><surname>Tseng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">Y</forename><surname>Chuang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in oncology</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page">769447</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Robust semi-supervised learning for histopathology images through self-supervision guided out-of-distribution scoring</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">C</forename><surname>Kurian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Varsha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Patil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sethi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="121" to="128" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Deep learning predicts underlying features on pathology images with therapeutic relevance for breast and gastric cancer</title>
		<author>
			<persName><forename type="first">R</forename><surname>Valieris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Amaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A B D T</forename><surname>Osório</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Bueno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Rosales Mitrowsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Carraro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">N</forename><surname>Nunes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dias-Neto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">T D</forename><surname>Silva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cancers</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">3687</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Neural discrete representation learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Van Den Oord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<idno>CoRR abs/1711.00937</idno>
		<ptr target="http://arxiv.org/abs/1711.00937.arXiv:1711.00937" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Neural discrete representation learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Van Den Oord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Molecular analysis of tcga breast cancer histologic types</title>
		<author>
			<persName><forename type="first">A</forename><surname>Thennavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Beca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Garcia-Recio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Allison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Collins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Gary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Schnitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Hoadley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cell genomics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A density-based algorithm for discovering clusters in large spatial databases with noise</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">kdd</title>
		<imprint>
			<biblScope unit="volume">96</biblScope>
			<biblScope unit="page" from="226" to="231" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.1556</idno>
		<title level="m">Very deep convolutional networks for large-scale image recognition</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on computer vision and pattern recognition</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Efficientnet: Rethinking model scaling for convolutional neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="6105" to="6114" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Rethinking the inception architecture for computer vision</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wojna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2818" to="2826" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
