1. Introduction

M-MNet for Cancer Classification by Constructing Somatic Mutation Map

Chenxu Quan

Xin Chen

Fenghui Liu

Lin Qi

Yun Tie

1 0 The First Afiliated Hospital of Zhengzhou University , Zhengzhou, Henan, 450000 , China 1 Zhengzhou University , Zhengzhou, Henan, 450000 , China

181 191

Due to the high fatality rate of cancer, timely detection and treatment in the early stage of cancer is very important. Cancer classification studies based on somatic mutation data are helpful for physicians to identify cancer types at the genetic level and reduce the possibility of misdiagnosis and missed diagnosis. However, the one-dimensional (1-D) high redundancy of somatic mutation data resulted in the low robustness and overfitting of the model. In addition, current models based on convolution neural networks (CNNs) fail to take global features of input data into account and are inferior in classification performance. In this paper, we proposed a gene mutation map construction method to realize the dimension transformation of somatic mutation data and make it suitable for existing image classification models, which are based on the RGB three-channel principle of the image. Then, based on the prediction results of driver genes, the feature selection optimization (FSO) algorithm is performed on the original mutation map to solve the problems of high noise and sparsity of the original mutation map. Furthermore, a classification network named M-MNet is introduced based on inverted residual module and multi-head self-attention module. The experimental results show that the proposed method have improved the overall classification performance, and the overall method achieves 94.62% accuracy and 94.34% f1 score in cancer classification tasks of 19 tumor cohorts, which has good cancer classification ability.

eol>Somatic mutation Driver genes Cancer classification Feature selection optimization M-MNet

1. Introduction

There are many types of mutations in the somatic cell genome, such as Single Nucleotide Variants (SNVs), Insertions and Deletions (InDels), and chromosome Structural Variations (SVs) [ 1 ]. When specific genomic mutations occur in somatic cells, they can promote the development and malignant proliferation of cancer cells. These genes are known as driver genes. For example, it has been confirmed that mutations in the VHL and MET genes lead to kidney cancer [ 2 ]. Depending on the mechanisms by which they induce cancer, driver genes can be classified as oncogenes or tumor suppressor genes, both of which interact to maintain stable positive and negative regulatory signals. Oncogenes are often expressed at low levels or not expressed in the genome. When they undergo mutation and abnormal activation, they become carcinogenic factors that induce cancer. Tumor suppressor genes, on the other hand, are genes in the genome that have inhibitory efects on cell growth and potential anti-cancer efects. Mutations or inactivation of tumor suppressor genes can lead to cell carcinogenesis. However, not all mutated genes are driver genes. There are neutral mutations in the human body that do not promote cell carcinogenesis, and genes that undergo such mutations are called passenger genes.

Cancer driver genes play a significant role in various clinical aspects of cancer prevention, early detection and diagnosis, staging and classification, as well as rehabilitation treatment. Both driver genes and passenger genes are mutated genes, but they have distinct roles in regulating cellular physiological mechanisms and their pathological analysis and treatment. Therefore, it is of great significance to identify driver genes among all mutated genes in tumor cells. Due to the high complexity of cancer gene mutation mechanisms [ 3 ], it is challenging to identify the distinguishing features that can efectively diferentiate driver genes from passenger genes. Accurately identifying driver genes remains a huge challenge. In recent years, with the rapid development of computer science and the emergence of second-generation sequencing technologies, such as high-throughput sequencing, it has become possible to analyze cancer driver genes with the support of big data. This data-driven research approach significantly improves the eficiency of cancer research. Taking advantage of these advancements, many complex computational methods have been proposed for detecting cancer driver gene mutations and conducting an in-depth analysis of the regulatory mechanisms behind driver genes in cancer.

This article focuses on the prediction of driver genes and cancer classification tasks based on somatic mutation data, aiming to address the challenges encountered in driver gene prediction and cancer classification. Among them, the driver gene prediction methods based on mutation position face the following problems: (1) The statistical determination of mutation probabilities for nucleotide contexts is independent, which can lead to overfitting of background mutation signals and overlook important signals conveyed by low-frequency mutation sites. (2) The computational complexity increases exponentially when expanding nucleotide contexts. (3) Traditional clustering algorithms such as K-means and DBSCAN are no longer suitable for handling complex mutation data, resulting in poor clustering performance.

The main contributions of this study to address the aforementioned problems are as follows: 1) Firstly, this study applies the RGB three-channel principle to perform dimensionality transformation on somatic mutation data, resulting in a two-dimensional gene mutation image data. This enables the use of existing image classification models for somatic mutation data. 2) Secondly, this study employs a feature selection optimization algorithm to address the issue of high noise and sparsity in the two-dimensional mutation image. This optimization work in feature gene selection efectively enhances the model’s generalization ability. 3) Lastly, to better capture both local and global features of the images, this study proposes the MMNet network model by combining the inverted residual module with the multi-head self-attention module. Specifically, the inverted residual module is utilized to extract local features, followed by the multi-head self-attention module to capture global features. This enables accurate feature extraction and classification of mutation images.

2. Related Work

The computational methods used for cancer driver gene discovery mainly include three main classes: methods for identifying individual cancer driver genes, methods for identifying cancer driver gene modules, and methods for discovering personalized cancer driver genes (i.e., driver genes specific to individual patients).

Single cancer driver gene identification can be divided into two subcategories based on the key techniques used in the methods: mutation-based methods and subnetwork-based methods. Mutationbased methods utilize various features of mutations, such as the significance, functional impact, location, and other information, to discover cancer driver genes. Methods based on the significance of gene mutations include MuSiC [ 4 ] and MutSigCV [ 5 ]. Methods based on the functional impact of gene mutations include OncodriveFM [ 6 ], OncodriveFML [ 7 ], DriverML [ 8 ], and others. These methods often select driver gene candidates based on mutations in genes that have a significant functional impact rather than evaluating the number of mutations. Therefore, these methods can often detect low-frequency mutations that play important roles in cancer development. Methods based on the location information of gene mutations usually rely on clustering methods and are often referred to as hotspot-based methods [ 9 ]. Hotspots typically refer to high-density mutation regions, often driven by positive selection, and are commonly found in functionally important domains or residues in the three-dimensional protein structure [ 10 ]. The OncodriveCLUST method [ 11 ] is a typical representative of hotspot methods. The MMC algorithm [ 12 ] adjusts the influence weights of mutation sites on surrounding sites through kernel density estimation for clustered identification of mutation genes. The clustering algorithm HotMAPS [ 13 ] considers more hotspot information in the 3D protein space by combining tumor mutation data with PDB data. Additionally, MutPanning [ 14 ] identifies driver factors based on the ratio of non-synonymous substitution rate to synonymous substitution rate (dn/ds). Network-based methods often predict cancer driver genes by evaluating the role of genes in biological networks and combining them with gene mutation information. DriverNet [ 15 ] reveals cancer driver genes by assessing the impact of mutations on cancer transcriptional networks.

Identification of cancer driver modules. The majority of methods for identifying cancer driver modules are based on the mutual exclusivity of mutations. CoMEt [ 16 ] uses mutual exclusivity techniques to detect cancer driver modules. Similarly, WeSME [ 17 ] evaluates the mutual exclusivity of gene mutations to detect cancer driver genes. However, WeSME does not evaluate genes within the same pathway but only considers mutually exclusive pairs of mutated genes as candidate cancer driver genes for modularization.

Personalized cancer driver gene identification. Personalized cancer driver gene identification methods are based on gene regulatory networks to recognize driver genes. For example, PNC [ 18 ] identifies the minimum gene set that covers all edges in a bipartite graph as cancer driver genes.

3. Methods 3.1. Data Dimension Transformation

Diferent from traditional 1-D data, gene mutation data has randomness and low frequency. The mutation information randomly distributed in the long redundant gene text sequence is dificult to be efectively captured by the existing deep learning model, and the long data length requires high complexity and performance of the model, and it is dificult to efectively extract features to realize classification tasks by conventional means. Thus, we transform the dimension of somatic mutation data based on the RGB three-channel image principle to be suitable for training of deep learning model.

Figure 1 illustrates the construction process of the gene mutation map. First, count the mutated genes in each tumor cohort of the dataset and sort them based on their chromosomal positions (chromosome 1, 2, · · · , 22, X and Y ). The mutated genes in the -th chromosome of -th tumor cohort is arranged to a matrix , where 0 ≤ ≤ 30 and 0 ≤ ≤ 18.

Next, the mutated genes from diferent tumor cohorts but the same chromosome are resorted. At this stage, the length of the set · = 0 ∩ 1 ∩ · · · ∩ 18, which consists of mutated genes from diferent tumor cohorts in -th chromosome, is . Assuming the constructed gene mutation graph has a shape of × , at this point, the -th chromosome occupies rows in the mutation graph, where: = ︂{ / + 1, mod ̸= 0

/, mod = 0 Then all the genes on the chromosomes occupy rows, where is defined as: = ∑︁ = ∑23︁ {︂ 23 =0 0

//,+ 1, ̸== 00 , ≤

According to the above definition, we can build a gene mutation map of size × , which includes all mutated genes from the 19 tumor cohorts. In this article, the replacement, insertion, and deletion genes from single nucleotide variants are statistical and correspond to the RGB channels of the image, respectively. To map mutation information onto an image, this method first selects the maximum mutation value, , from all samples in the corresponding mutation type. is then used as one of the indicators to map the mutation quantity to grayscale values. For convenience,

Replace Ċ T C CĊ Ċ T T C Ċ Insert Ċ T T T CĊ Ċ T T C Ċ

Delete assume that single nucleotide of mutation gene makes a replacements in the sample gene fragment. And its value in the red channel is . The relationship between and is:

255 = , ≤ 255, ≤ (3)

According to the above definition, the conversion from mutation quantity to grayscale values in a single-channel image can be accomplished. Finally, the single-channel images corresponding to each mutation type in the samples are merged to obtain a two-dimensional mutation image.

3.2. Feature Selection Optimization

The high noise and sparsity issues present in mutation profile data hinder the efective extraction of feature information. Overlearning the noise in the data can increase the complexity of the model and lead to overfitting. To avoid interference caused by high noise and sparsity in the data, this method narrows down the scope of feature selection based on previous work [ 19 ]. It employs clustering algorithms to filter candidate driver genes and potential driver genes for feature selection optimization (FSO). Passenger genes that have no significant relevance to tumor formation or development are eliminated. This approach helps improve the overall training eficiency and classification performance of the model.

Assuming that the dimension of the gene mutation map after FSO is ′ × ′, ′ represents the number of selected feature genes for each tumor cohort. According to the principles of constructing the mutation map, there should be a certain relationship between ′ and ′ denoted by 24 ′ + 19′ − 24 ≤ ′2.

This is because, in the dimension transformation process proposed in this paper, each row of the mutation map corresponds to the mutation information on one chromosome. Since the distribution of mutated genes across the 24 chromosomes in the human body is not uniform, if there ′ + 1 mutated genes in one chromosome, that chromosome will occupy two rows in the mutation map. To better adapt to the training process of deep learning models, this paper sets ′ to be 56. The maximum integer value that ′ can take is theoretically 95. However, in practice, the probability of the occurrence of this situation (where ′ + 1 mutated genes occupy two rows) is almost zero. Therefore, in this paper, ′ is set to 100. This means that for each tumor cohort, the selected mutated genes are the top 100 ranked genes according to the clustering prediction results. If the number of candidate driver genes is less than 100, potential driver genes will be further selected. After FSO, the dimension of the gene Gene Mutation Maps 3×3 CNN

Inverted Residual Module Inverted Residual Structure B

Inverted Residual

Structure A

MHSA Module MHSA Layer h θ 5/H8 h ': θ V θ 5/H8 h θ Linear h θ /58H h ': θ V θ 5/H8 h θ /QLHUD

Self-Attention Layer H *W ´H *W z

H ´W ´d

H *W ´d softmax

H *W ´H *W content-position qrT

H ´W ´d

r H ´1´d

H ´W ´d 1´W ´d WQ :1´1

H *W ´H *W qkT content-content k

H ´W ´d WK :1´1 v

H ´W ´d

WV :1´1 x H ´W ´d

Linear Linear Linear Q

LORQ3J

OY*UH$ERDJ

ROQLWIFDV&

UH3QDF Multi-head Self-Attention

Scaled Dot-Product Attention Scaled Dot-Product Attention Self-Attention Layer

Concat Linear Linear Linear

Linear Linear Linear V mutation matrix is reduced to 56 × 56, efectively addressing the high noise and sparsity issues in the original dimension transformation method.This is because, in the dimension transformation process proposed in this paper, each row of the mutation map corresponds to the mutation information on one chromosome. Since the distribution of mutated genes across the 24 chromosomes in the human body is not uniform, if there ′ + 1 mutated genes in one chromosome, that chromosome will occupy two rows in the mutation map. To better adapt to the training process of deep learning models, this paper sets ′ to be 56. The maximum integer value that ′ can take is theoretically 95. However, in practice, the probability of the occurrence of this situation (where ′ + 1 mutated genes occupy two rows) is almost zero. Therefore, in this paper, ′ is set to 100. This means that for each tumor cohort, the selected mutated genes are the top 100 ranked genes according to the clustering prediction results. If the number of candidate driver genes is less than 100, potential driver genes will be further selected. After FSO, the dimension of the gene mutation matrix is reduced to 56 × 56, efectively addressing the high noise and sparsity issues in the original dimension transformation method.

3.3. M-MNet Classification Network

The proposed network model, M-MNet, is based on the inverted residual module and the multi-head self-attention module. The overall architecture of M-MNet is illustrated in Figure 2.

The network takes the gene mutation map as input. It starts with a 3 × 3 convolutional layer with a stride of 2 to extract shallow features. Then, multiple inverted residual modules are employed to extract high-dimensional features. The inverted residual module consists of two types of inverted residual structures, A and B. In the final stage of the model, several multi-head self-attention modules are introduced to capture global features. Each multi-head self-attention module consists of two 1 × 1 convolutional layers and a multi-head self-attention layer. Finally, the output is obtained through a 7 × 7 global average pooling layer. 1) Inverted Residual Module: In the conventional residual structure, dimension reduction is performed through a 1 × 1 convolutional layer before being increased again through another 1 × 1 convolutional layer. However, in our proposed module, dimension augmentation is performed through a 1 × 1 convolutional layer, followed by feature extraction through a 3 × 3 convolutional layer, and ifnally dimension reduction through another 1 × 1 convolutional layer. We named this structure the ”inverted residual block” because of its reversed order of dimension change. This approach efectively utilizes the feature information from diferent channels at the same spatial location. It is important to note that while the conventional residual structure uses the activation function, the inverted residual structure uses the 6 activation function as follows: = 6() = min( (), 6) = min(max(0, ), 6) (4)

Due to the potential loss of information caused by the ReLU function when performing non-linear transformations on low-dimensional features, the inverted residual structure has a lower-dimensional output. Therefore, in the dimension reduction process, the linear activation function is used instead of the ReLU activation function. This choice helps preserve the information and prevent unnecessary loss.

2) Multi-head Self-attention Module: The core module of the Transformer network is the MultiHead Self-Attention (MHSA), which is a specialized form of the multi-head attention mechanism. In MHSA, the inputs K, V, and Q of the multi- head attention are all hidden state matrices H = R× of the same input sequence, where d represents the dimension of the hidden state and is the length of the sequence. The Transformer network uses absolute positional encoding to enable the attention mechanism to perceive positional information. However, recent studies have found that relative distanceaware positional encoding is more suitable for visual tasks. This is because attention not only considers the content of information but also takes into account the relative distances between features at diferent positions, efectively linking information across objects with positional awareness. In this paper, the two-dimensional relative positional encoding self-attention from reference [ 20 ] is used to implement the multi-head self-attention mechanism.

In the two-dimensional relative positional encoding attention module, all attention mechanisms are performed on a two- dimensional feature map. The relative distance positional encodings Rℎ and R are used to represent the height and width of the feature map, respectively. The attention logarithm is denoted as qk + qr , where q, k, and r represent the query, key, and positional encoding (relative distance encoding), respectively. The ⊕ and ⊗ symbols represent element-wise addition and matrix multiplication, respectively, and 1 × 1 denotes point- wise convolution. It should be noted that in the Transformer network, the normalization layer used is Layer Normalization (LN), while in our model, the normalization layer used for multi-head self-attention is Batch Normalization (BN). The MHSA block in Transformer includes an output projection, but the MHSA used in this paper does not. Additionally, while Transformer uses a single non-linear activation function in the Feed-forward Network (FFN), this paper uses three non-linear activation functions.

3.4. Label Smoothing Regularization

In classification algorithms, data is typically labeled using hard labels, which are represented in the form of one-hot encoding.

However, when the training data is insuficient to reflect the true distribution of the data, the network model may sufer from overfitting, resulting in decreased generalization ability. To address this issue, this article employs Label Smoothing Regularization (LSR) technique to enhance the robustness of the model.

In label smoothing regularization, a uniform distribution is combined with the technique to constrain the model’s predicted results by adding noise to the output. In the label smoothing regularization strategy for multi-class tasks, the one- hot encoded label vector is replaced with a label vector ^ as follows: ^ = ︂{ 1− − 1,, ̸== (5)

Where s a small hyperparameter typically set to 0.1, and represents the number of classes. Label smoothing regularization is employed to prevent the model from relying solely on the training data distribution during the training process. By adding noise to the output, the label smoothing technique constrains the model, to some extent, and mitigates overfitting. Additionally, label smoothing can make the clusters of diferent classes more compact, increasing the inter-class distance and reducing the intra-class distance. This, in turn, enhances the model’s generalization ability.

4. Experiments and Discussion

All experiments in this paper were conducted on the Ubuntu 16.04 system with Python 3.9 version under the Anaconda environment. The deep learning library used was TensorFlow. The GPU model used was GeForce RTX 2080 Ti with 12GB memory. During the training process, the relevant parameters remained consistent. The optimizer employed was Momentum-SGD (Stochastic Gradient Descent with Momentum). The number of epochs was set to 150, the initial learning rate was 0.01, the decay factor was 0.5, and the momentum parameter was set to 0.9. Early stopping was implemented to prevent a decrease in validation accuracy as the number of iterations increased. The evaluation metrics for model performance in this study were accuracy, precision, recall, and F1 score.

4.1. Dataset

The experimental dataset was downloaded from the TCGA (The Cancer Genome Atlas), which consists of somatic mutation data from 19 tumor cohorts. The data format is in MAF (Mutation Annotation Format) files, which include various mutation types such as silent, missense, nonsense, splice site, and frameshift insertions/deletions. After obtaining the somatic mutation dataset, this study initially converted the MAF files into TSV (Tab-Separated Values) format. Subsequently, filtering was applied to each cohort dataset to exclude synonymous mutations and minimize false positives. This study focused on single nucleotide variations and excluded structural variations and insertions/deletions from the somatic mutation data.

The final dataset contains a total of 6,906 samples, 236,245 gene elements, and 1,678,190 single nucleotide mutation positions. The gene list from the Cancer Gene Census (CGC) was retrieved from the COSMIC (Catalogue of Somatic Mutations in Cancer) website in April 2022. The genomic coordinates for coding genes were obtained from the ENCODE website, using Gencode version 19.

4.2. Ablation Study

To validate and interpret the efectiveness and necessity of the proposed feature selection optimization algorithm, we conducted classification tasks using three diferent network models: VGG-16, Inception ResNet-V2, and MobileNet-V2. The input data for these models were gene mutation maps obtained by dimensionality transformation of somatic mutation data. The FSO algorithm was employed to perform feature selection on the two-dimensional gene mutation maps.

Table 1 presents the comparison of classification performance using diferent network models with gene mutation map construction methods alone and in combination with gene mutation map construction and feature selection optimization algorithm. In Table 1, it can be observed that compared to VGG-16, FSO+VGG-16 achieved an improvement of 6.11 percentage points in accuracy, 5.27 percentage points in precision, and 5.69 percentage points in F1 score. FSO+Inception ResNet-V2 showed a general improvement of 1 percentage point in all metrics compared to the original model. FSO+MobileNet-V2 exhibited a 2 percentage point improvement in all metrics compared to MobileNet-V2 as a whole. δ aε VGG-16 δ bε FSO+VGG-16

4.3. Feature Visualization

To provide a clearer explanation of the efectiveness of the feature selection optimization module in improving the overall model performance, this subsection utilizes t-SNE (t- distributed Stochastic Neighbor Embedding) to visualize the features extracted by the model. t-SNE compresses the input features before the final classification feature layer into a two-dimensional plane for output, representing a total of 690 gene mutation map samples, to some extent reflecting the model’s classification performance.

From Figure 3, it can be observed that after undergoing feature selection optimization, several tumor cohorts such as SARC, OV, and GBM are no longer clustered together with cohorts like LGG, PRAD, and THCA. The classification performance has improved, but there are still some individual samples from cohorts like OV, LIHC, and BRCA that cannot be correctly classified.

In addition, this study also compares the proposed M- MNet model with the previously mentioned best-performing MobileNet-V2 network through visualization. Figure 4 illustrates the classification results of the MobileNet-V2 network and the M-MNet network proposed in this paper for the 690 test samples from the 19 tumor cohorts.

From the figure, it can be observed that the MobileNet- V2 network misclassifies certain samples from tumor cohorts such as PRAD, LIHC, SARC, and COAD into the OV cohort, and it also misclassifies some samples from the CESC and UCEC cohorts into the SKCM cohort. However, in the M-MNet network, this situation is improved, indicating the efectiveness and robustness of the proposed network model. δ aε MobileNet-V2 δ bε M-MNet

4.4. Classification performance

We trained the M-MNet network model using gene mutation maps before and after feature selection optimization. We also compared the results with the previous models, and the experimental results are shown in Table 2.

Comparing the experimental results in the fifth and seventh rows of Table 5.3, we can see that the MobileNet-V2 network achieved an accuracy of 91.18%, precision of 91.42%, recall of 91.18%, and F1 score of 91.30%. Compared to the MobileNet- V2 network model with only inverted residual blocks, the M-MNet network showed improvements in all metrics, with an average increase of 2 percentage points. Furthermore, comparing the results in the sixth and eighth rows, it can be observed that FSO+M-MNet outperformed FSO+MobileNet- V2 with an average increase of 1 percentage point, including a 1.28% improvement in accuracy. This demonstrates the efectiveness and necessity of combining the multihead self- attention module with the inverted residual module. FSO+M- MNet achieved improvements of 1.43, 0.96, 1.43, and 1.19 percentage points in accuracy, precision, recall, and F1 score, respectively, compared to M-MNet. In comparison to the VGG- 16 model, the FSO module had a smaller impact on the overall performance of the M-MNet model, indicating that the feature selection optimization algorithm remains efective for the proposed M-MNet network in this chapter. However, the improvement from the feature selection optimization algorithm becomes smaller when applied to models with better original performance. The gene mutation map construction, feature selection optimization algorithm, and M-MNet network proposed in this paper all contributed to the improvement in model performance, demonstrating the efectiveness and robustness of this approach in cancer classification tasks.

5. Conclusion

This paper proposes a feature extraction method that combines the construction of gene mutation maps based on the RGB three-channel principle with feature selection optimization. Furthermore, we present an M-MNet classification model based on the inverted residual module and multi-head self-attention module. To overcome the issues of high noise and sparsity in gene mutation maps constructed based on the RGB three-channel principle, an FSO algorithm is applied for further feature extraction. At the same time, to design a classification model with better performance and stronger robustness, this paper proposes an M-MNet classification model based on the inverted residual module and multi-head self-attention module, trained using label smoothing regularization. The feature extraction method that combines the construction of gene mutation maps based on the RGB channel principle with feature selection optimization efectively solves the problem of high-dimensional mutation data being unsuitable for existing convolutional neural networks. Experimental results show that the overall performance of M-MNet is better than that of other existing classification models.

6. Acknowledgments

Not Applicable.

[1]

Tomao ,

Papa ,

Rossi ,

Strudel ,

Vici ,

G. L.

Russo ,

Tomao , Emerging role of cancer stem cells in the biology and treatment of ovarian cancer: basic knowledge and therapeutic possibilities for an innovative approach , Journal of Experimental & Clinical Cancer Research 32 ( 2013 ) 48 - 48 .

[2]

W. M.

Linehan ,

C. J.

Ricketts , The metabolic basis of kidney cancer , Seminars in Cancer Biology 23 ( 2012 ) 46 .

[3]

Meyerson , S. Gabriel, G. Getz, Advances in understanding cancer genomes through secondgeneration sequencing , Nature Reviews Genetics ( 2018 ).

[4]

Sirvan ,

Peronne ,

Deepak ,

Salendra ,

L. F.

Thomas ,

Kishore ,

Vinay , Sysmut: decoding the functional significance of rare somatic mutations in cancer, Briefings in Bioinformatics ( 2022 ) 4 .

[5]

Candia , E. Bayarsaikhan,

Tandon ,

Budhu ,

X. W.

Wang , The genomic landscape of mongolian hepatocellular carcinoma , Nature Communications 11 ( 2020 ).

[6] Abel , Gonzalez-Perez , Nuria, Lopez-Bigas, Functional impact bias reveals cancer drivers ., Nucleic acids research ( 2012 ).

[7]

Mularoni ,

Sabarinathan ,

Deu-Pons ,

Gonzalez-Perez ,

López-Bigas , Oncodrivefml: a general framework to identify coding and non-coding regions with cancer driver mutations , Genome Biology 17 ( 2016 ).

[8]

Yi ,

Juze ,

Xinyi ,

Wei-Chung ,

Shu-Hsuan ,

Xing ,

Liyuan ,

Yaning ,

Qingbiao , L. a. Pengyuan, Driverml: a machine learning algorithm for identifying driver genes in cancer sequencing studies , Nuclc Acids Research ( 2019 ) e45 - e45 .

[9]

Wei ,

Yang ,

Zhao ,

Chang ,

Ma ,

Dong ,

Guo , J. Ma, Quantification of egfr mutations in primary and metastatic tumors in non-small cell lung cancer , Journal of Experimental & Clinical Cancer Research ( 2014 ).

[10]

Zhao ,

Ren ,

Zhang , H. Liu, Y. Zhang, Landscape of homologous recombination-related (hrr) genes mutations in colon cancer , Journal of Clinical Oncology 39 ( 2021 ) e15525 - e15525 .

[11]

S. W. K.

Ng ,

F. J.

Rouhani ,

S. F.

Brunner ,

Brzozowska ,

S. J.

Aitken ,

Yang ,

Abascal ,

Moore ,

Nikitopoulou , L. a. Chappell, Convergent somatic mutations in metabolism genes in chronic liver disease , Nature 598 ( 2021 ) 473 - 478 .

[12]

Poole ,

Leinonen , I. Shmulevich,

T. A.

Knijnenburg ,

Bernard , Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression , PLOS Computational Biology ( 2017 ).

[13]

Tokheim ,

Bhattacharya ,

Niknafs ,

D. M.

Gygax ,

Kim ,

M. C.

Ryan ,

Masica ,

Karchin , Exome-scale discovery of hotspot mutation regions in human cancer using 3d protein structure , Cancer Research ( 2016 ) 3719 .

[14]

Wiegmann ,

Pacheco ,

Reikowski ,

Stettner ,

Qiu ,

Bouvier ,

Bertram ,

Faisal ,

Brummel ,

Libuda , Identification of the reversible skin layer on co , ACS catalysis 12 ( 2022 ) 3256 - 3268 .

[15] M. F. B. Asad , M. N. A.

Hallak , A.

Sukari , Y.

Baca , M.

Nagasaka , Prognostic impact of xpo1 mutations in metastatic non-small cell lung cancer (nsclc) , Journal of Clinical Oncology 39 ( 2021 ) e20533 - e20533 .

[16] Leiserson , D. M.

Mark , Hsin-Ta, Vandin, Fabio, Raphael, J.

Benjamin , Comet: a statistical approach to identify combinations of mutually exclusive alterations in cancer , Genome Biology 16 ( 2015 ) 1 - 20 .

[17]

Y. A.

Kim ,

Madan ,

T. M.

Przytycka , Wesme: Uncovering mutual exclusivity of cancer drivers and beyond , arXiv e-prints ( 2016 ).

[18]

R. L.

Stevens , Deep learning in cancer and infectious disease: Novel driver problems for future hpc architecture ( 2017 ).

[19]

Quan ,

Liu ,

Qi ,

Tie , Lrt-cluster: A new clustering algorithm based on likelihood ratio test to identify driving genes , Interdisciplinary Sciences: Computational Life Sciences 15 ( 2023 ) 217 - 230 .

[20]

Srinivas , T.-

Lin ,

Parmar ,

Shlens ,

Abbeel ,

Vaswani , Bottleneck transformers for visual recognition , 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ( 2021 ) 16514 - 16524 .

[21]

Huang ,

Wang ,

Lu ,

Zhou ,

Li ,

Liu ,

Chang , A novel image-to-knowledge inference approach for automatically diagnosing tumors , Expert Systems with Applications 229 ( 2023 ) 120450 . URL: https://www.sciencedirect.com/science/article/pii/S0957417423009521. doi:https: //doi.org/10.1016/j.eswa. 2023 . 120450 .

[22]

Szegedy , W. Liu,

Jia ,

Sermanet ,

Reed ,

Anguelov ,

Erhan ,

Vanhoucke ,

Rabinovich , Going deeper with convolutions , in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2015 , pp. 1 - 9 . doi: 10 .1109/CVPR. 2015 . 7298594 .

[23]

Wang ,

Wang , L. Liu,

Li ,

Huang , Fusion of human cognitive knowledge and machine inference for breast cancer detection , in: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM) , 2023 , pp. 179 - 184 . doi: 10 .1109/ICARM58088. 2023 . 10218759 .

[24]

Schlichtkrull ,

Kipf ,

Bloem , R. van den Berg, I. Titov,

Welling , Modeling relational data with graph convolutional networks , in: Extended Semantic Web Conference , 2017 . URL: https://api.semanticscholar.org/CorpusID:5458500.

[25]

Dosovitskiy ,

Beyer ,

Kolesnikov ,

Weissenborn ,

Zhai ,

Unterthiner ,

Dehghani ,

Minderer , G. Heigold,

Gelly ,

Uszkoreit ,

Houlsby , An image is worth 16x16 words: Transformers for image recognition at scale , ArXiv abs/ 2010 .11929 ( 2020 ). URL: https://api. semanticscholar.org/CorpusID:225039882.

[26]

Wang ,

Wang , L. Liu,

Li ,

Huang , Fully automated interpretable breast ultrasound assisted diagnosis system , in: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM) , 2023 , pp. 173 - 178 . doi: 10 .1109/ICARM58088. 2023 . 10218807 .