<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Improving the prediction of disease-related variants using protein three-dimensional structure</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Emidio</forename><surname>Capriotti</surname></persName>
							<email>emidio@stanford.edu</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Mathematics and Computer Sciences</orgName>
								<orgName type="institution">University of Balearic Islands</orgName>
								<address>
									<settlement>Palma de Mallorca</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Russ</forename><forename type="middle">B</forename><surname>Altman</surname></persName>
							<email>russ.altman@stanford.edu</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Departments of Bioengineering * and Genetics ‡</orgName>
								<orgName type="institution">Stanford University</orgName>
								<address>
									<settlement>Stanford (</settlement>
									<region>CA)</region>
									<country>United States of America</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Improving the prediction of disease-related variants using protein three-dimensional structure</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FF473950FDE5F6ABEFC20FCAD0FFF2AF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Background:</head><p>Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. The non-synonymous SNPs occurring in coding regions resulting in single amino acid polymorphisms (SAPs) may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performances, the quality of the prediction can be further improved introducing new features derived from the protein three-dimensional structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results:</head><p>In this paper, we present a structure-based machine learning approach to predict disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features from the protein sequence, structure and function information. After dataset balancing, the structure-based method reaches an overall accuracy of 84%, a correlation coefficient of 0.67, and an area under the receiving operating characteristic curve (AUC) of 0.91. When compared with a similar sequencebased predictor, structure-based method results in an increase of the overall accuracy and the AUC ~3%, and 0.06 for the correlation coefficient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion:</head><p>This work demonstrates that structural information can increase the accuracy of detecting of disease-related SAPs. Our results also quantify the magnitude of the improvement on a large data. This improvement is in agreement with the previously observed results in the prediction of the protein stability change upon mutation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Background</head><p>Currently the number of validated Single Nucleotide Polymorphisms (SNPs) is larger than 14 millions <ref type="bibr">[1]</ref>. In general, mutations occurring in coding regions may have a greater impact on the gene functionality than those occurring in non-coding regions <ref type="bibr">[2]</ref>. Only a small fraction of SNPs (~61,000) corresponds to the subset of annotated missense coding SNPs <ref type="bibr">[3]</ref>. For this subset of Single Amino acid Polymorphisms (SAPs), curators of the Swiss Institute of Bioinformatics provide a classification dividing SAPs in disease-related and neutral according to peer-reviewed bibliography. In the last few year several methods have been developed to predict the impact of a given single point protein mutation <ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref><ref type="bibr" target="#b11">[12]</ref><ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref><ref type="bibr" target="#b15">[16]</ref>. These algorithms are able to predict the protein stability change <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b15">16]</ref>, the variation in protein functional activity <ref type="bibr">[6]</ref> and the insurgence of human pathologies <ref type="bibr">[4,</ref><ref type="bibr">5,</ref><ref type="bibr">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b11">[12]</ref><ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref>. The majority of the methods rely on information derived from protein sequence <ref type="bibr">[4,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b13">14]</ref>, others use protein structure data <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b16">17]</ref> and knowledge-based information <ref type="bibr">[7,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b14">15]</ref>. In this paper we focus our attention on SAPs presenting a new machine learning based method to predict disease-related SAPs using together protein sequence, structural and functional information. We quantified the improvement of the performance resulting from the use of protein structure information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Performance of the method</head><p>In the last decades machine learning approaches have been successfully used to address several biological problems and develop new prediction methods. We modified a previously developed predictor introducing new three-dimensional structure information. In particular we use new features to describe the structural environment of the mutation considering a radius shell of 6 Å around the C-α. To quantify the improvement of the accuracy resulting from the use of 3D structure information, we compare the performances of a structure-based method (SVM-3D) with a sequencebased one (SVM-SEQ). In Tab. 1 different accuracy measures for both predictors are reported. The structure-based method results in 3% better overall accuracy and 0.06 better correlation. Comparing the ROC curves (Fig. <ref type="figure" target="#fig_0">1 A</ref>), SVM-3D results in 0.02 better Area Under the Curve (AUC) with respect to SVM-SEQ. If 10% of wrong predictions are accepted SVM-3D has 6% more true positive. The output returned by the SVM has been used to calculate the Reliability Index (RI) and filter prediction. If predictions with RI&gt;5 are selected the SVM-3D method results in 90% overall accuracy, 0.81 correlation coefficient on 74% of the whole dataset (see Fig <ref type="figure" target="#fig_0">1 B</ref>). Analyzing the predictions of SVM-SEQ and SVM-3D methods we found that outputs agree in the 88% of the cases. On this subset the overall accuracy is 86% and the correlation coefficient of the method is 0.73. For the remaining 12% of the predictions, SVM-SEQ method results in a very poor overall accuracy and correlation respectively 37% and -0.25. SVM-3D performs slightly better than a random predictor resulting in 63% overall accuracy and a 0.25 correlation (see <ref type="bibr">Tab 2)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Structure environment analysis</head><p>Protein three-dimensional structural information is an important feature to predict the effect of SAPs. The analysis of the protein structure provides information about the environment of the mutation. In fact, the effect of the mutation depends on the position of the mutated residue, if it is buried in the hydrophobic core or exposed on the surface of the protein. In Fig. <ref type="figure" target="#fig_1">2</ref> panel A the distributions of the relative solvent accessible area (RSA) for disease-related and neutral variants are plotted. The two distributions have mean RSA values of 20.6 and 35.7 respectively for disease-related and neutral variants (see Fig <ref type="figure" target="#fig_1">2 panel A</ref>). They are significantly different and the Kolmogorov-Smirnov test returns a p-value of 2.8*10 -71 . We calculated the overall accuracy and correlation coefficient of our method dividing the dataset in 10 bins according to RSA value of the mutated residue. The SVM-3D method shows better performance in the prediction of buried (RSA&lt;20) and highly exposed (RSA&gt;80) residues (see Fig <ref type="figure" target="#fig_1">2 panel B</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Scoring the residue interactions</head><p>The protein three-dimensional structure information is important to calculate the interactions between residues far in the sequence but close in the 3D space. We defined two types of interactions: the lost interactions are those missing after the wild-type mutation and the new interactions formed by the mutant residue. In this section we compared the frequency of lost and new interactions related to disease or neutral mutations. We calculated the log odd score for lost and new interactions respectively in panels A and B (see Fig. <ref type="figure" target="#fig_2">3</ref>). According to these results, the most deleterious lost contacts are between and Cys-Cys and newly formed interactions between Trp-Trp are the most damaging ones. The missing Cys-Cys interactions could lead to the loss of a disulphide bond and the mutation of a residue into a Tryptophan when close to another Tryptophan could result in stereo-chemical problems. An example of missing Cys-Cys interaction has been observed in the mutation of Cys163 in the Glycosylasparaginase (Swiss-Prot:ASPG_HUMAN). This mutation is responsible for the insurgence of the Aspartylglucosaminuria (MIM:208400). Looking at the protein structure (Fig <ref type="figure">4</ref>), we found that the mutation of the Cys163 to Serine results in the loss of the disulfide bridge between Cys163 and Cys179 (respectively Cys140 and Cys156 in the PDB structure 1APY chain A). Interesting example of possible damaging newly formed interaction can be observed in the Thyroid hormone receptor (Swiss-Prot:THB_HUMAN) where the mutation of Arg243 into Tryptophan is cause of the Thyroid hormone resistance (MIM:188570,274300). Analyzing the protein structure (11$; chain A) we found that the new Tryptophan could be close to another one in position 239. This mutation could result in stereo-chemical problems in the pocket around the position 243 (see <ref type="bibr">Fig 5)</ref>. Both the examples are correctly predicted by structure-based method and wrongly predicted by the sequence-based algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>We developed a new machine learning approach based on protein structure information to predict the effect of SAPs. The method has been compared to a previously developed sequence-based predictor to quantify the increase of accuracy achieved by protein structure information. Using a balanced set of 6,630 mutations the structure-based method results in about 3% higher accuracy and AUC and 0.06 higher correlation with respect to sequence-based one. Although the increase the accuracy is not extremely high the introduction of structure information can be particularly useful in specific situation providing insight about the disease mechanism like in the cases discussed above. The prediction improvement is in agreement with the previously results observed in the prediction of the protein stability change upon mutation <ref type="bibr" target="#b9">[10]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Datasets</head><p>The preformaces of machine learning methods strongly depend from the training set. This is the reason why the selection of a representative set of SAPs is a pivotal issue in the development of predictive algorithms. A previous analysis of different SAPs databases has shown that annotated set of variants from Swiss-Var database is the best available one <ref type="bibr" target="#b17">[18]</ref>. According to this observation, we selected our set of SAP from Swiss-Var release 57.9 (Oct 2009) and we map all the variants on the protein structures available in the Protein Data Bank (PDB) <ref type="bibr" target="#b18">[19]</ref>. To reduce the number of sequence alignments between Swiss-Prot sequences and sequences derived from the PDB, we use a precompiled list of correspondences between Swiss-Prot and PDB codes available at the ExPASY web site. Using this list we aligned each pair of sequences using Blast algorithm <ref type="bibr" target="#b19">[20]</ref> and filtering out alignment with: i) gaps, ii) sequence identity lower than 100% and iii) shorter than 40 residues. The remaining alignments are used to calculate the correspondence between the Swiss-Prot and PDB residue numerations. In case a mutation maps in more than one protein structure, the one with best resolution has been selected. After this filtering procedure we obtain a set of 4,986 mutations from 784 protein chains. The dataset of variants mapped into protein structures is composed by 3,342 disease-related SAPs and 1,644 neutral polymorphisms. To keep the dataset balanced we doubled the number of neutral variants considering their reverse mutation as neutral. The final set results in 6,630 mutations about equally distributed between disease-related and neutral SAPs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Implemented SVM-based predictors</head><p>The proposed task is to predict whether a given single amino acid polymorphism is a neutral or disease-related. The task is treated as a binary classification problem for the protein upon mutation. The Support Vector Machine (SVM) input features for the structural-based predictor include: the amino acid mutation, the mutation structural environment, the sequence-profile derived features, and a functional-based log-odds score calculated considering the GO classification. The final input vector consists 48 elements:</p><p>• 20 components encoding for the mutations (Mut)</p><p>• 21 local protein structure information (3D)</p><p>• 5 inputs features derived from sequence profile (Prof)</p><p>• 2 elements encoding for the number of GO term associated to the protein and the GO log-odd score (LGO). A similar sequence-based SVM predictor has been used to measure the increase of accuracy resulting from the use of protein three-dimensional structure information. The structure-based SVM differs only in the 21 elements vector encoding for the local protein structure environment (3D) that replaces the 20 elements vector encoding for the sequence environment. More details about the SVM input features have been described in supplementary materials.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Interaction score</head><p>The residues interactions are defined considering all the residues within a radius shell of 6 Å around the C-α of the mutated residue. According to this we calculate a log odd score dividing the frequency of lost interactions related to disease by the same type of interactions that have no pathological effect. Although the mutations could be responsible for protein structural changes, as first approximation, we consider the position of the C-α of the new residue will not change significantly after the mutation. Hence, we consider new interactions those between the mutant residue and the residues previously interacting with the wild-type. A score of the possible damaging effect of lost or new interactions are calculated as follow</p><formula xml:id="formula_0">LC k =log 2 [f(c k (i,j),D)/f(c k (i,j),N)] [1]</formula><p>where f k (c k (i,j),D) and f(c k (i,j),N) are the frequencies of contacts between residues i and j respectively for disease-related (D) and neutral (N) variants and k is equal to l or n respectively for lost and new interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Accuracy measures</head><p>The performances of our methods are evaluated using a 20-fold cross-validation procedure on the whole SAPs dataset. The dataset has been divided keeping the ratio of the disease-related to the neutral polymorphism mutations similar to the original distribution of the whole set. Furthermore, all the proteins in the datasets are clustered according to their sequence similarity with the blastclust program in the BLAST suite <ref type="bibr" target="#b19">[20]</ref> by adopting the default value of length coverage equal to 0.9 and the percentage similarity threshold equal to 30%. We kept all the mutations belonging to a protein in the same training set to overestimate the performance. Classical accuracies measures such as the overall accuracy (Q2), the sensitivity (S), the probability of correct predictions (P), the Matthewʼs correlation coefficient (C), the false and true positive rates (FPR, TPR) and the area under the ROC curve (AUC) are used to score the performance of our predictors. A Reliability Index (RI) score has been calculated to select more reliable predictions. More details about the definition of the statistical index used in this work are provided in the supplementary materials.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TABLES</head><p>Table <ref type="table">1</ref>. Performances of the sequence (SVM-SEQ) and structure (SVM-3D) based methods.</p><p>The accuracy measures are defined in supplementary materials. D, N stands for disease-related and neutral variants respectively. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>FIGURESFigure 1 .</head><label>1</label><figDesc>FIGURES</figDesc><graphic coords="7,91.91,111.83,412.46,187.82" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Analysis of the protein three-dimensional structure environment. Distribution of relative solvent accessible area (RSA) for disease-related and neutral variants (A) and prediction accuracy as a function of the RSA (B). Accuracy measures (Q2, C) are defined in supplementary material. DB is the fraction of the whole dataset for disease-related (D) and neutral (N) mutations.</figDesc><graphic coords="7,91.91,374.99,432.14,161.90" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3</head><label>3</label><figDesc>Fig. 3 Log odd score for lost residues interactions (A) and for newly formed interactions (B). The red zones correspond to damaging lost or new interactions. Bleu points correspond to neutral interactions.</figDesc><graphic coords="8,91.91,72.23,432.14,198.86" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 .Figure 5 .</head><label>45</label><figDesc>Figure 4. Structure of the Glycosylasparaginase (PDB code 1APY chain A) and details of the interactions around Cys163 (Cys140 in the PDB structure)</figDesc><graphic coords="8,91.91,346.67,322.10,177.02" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="9,91.91,72.23,322.10,202.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2 .</head><label>2</label><figDesc>Performances of the methods on agree and not agree subset of predictions SEQ∩3D indicates the subset of agree predictions, SEQ-3D and 3D-SEQ are respectively the predictions of SVM-SEQ and SVM-3D on the not agree prediction subset. The accuracy measures are defined in supplementary materials. PM is the fraction of the dataset. D, N stands for disease-related and neutral variants respectively.</figDesc><table><row><cell></cell><cell>Q2</cell><cell>P[D]</cell><cell>S[D]</cell><cell>P[N]</cell><cell>S[D]</cell><cell>C</cell><cell>AUC</cell><cell></cell></row><row><cell>SVM-SEQ</cell><cell>0.81</cell><cell>0.80</cell><cell>0.82</cell><cell>0.81</cell><cell>0.9</cell><cell>0.61</cell><cell>0.89</cell><cell></cell></row><row><cell>SVM-3D</cell><cell>0.84</cell><cell>0.82</cell><cell>0.86</cell><cell>0.85</cell><cell>0.81</cell><cell>0.67</cell><cell>0.91</cell><cell></cell></row><row><cell></cell><cell>Q2</cell><cell>P[D]</cell><cell>S[D]</cell><cell>P[N]</cell><cell>S[D]</cell><cell>C</cell><cell>AUC</cell><cell>PM</cell></row><row><cell>SEQ∩3D</cell><cell>0.86</cell><cell>0.85</cell><cell>0.89</cell><cell>0.88</cell><cell>0.84</cell><cell>0.73</cell><cell>0.92</cell><cell>88</cell></row><row><cell>SEQ-3D</cell><cell>0.63</cell><cell>0.66</cell><cell>0.65</cell><cell>0.60</cell><cell>0.60</cell><cell>0.25</cell><cell>0.68</cell><cell>12</cell></row><row><cell>3D-SEQ</cell><cell>0.37</cell><cell>0.40</cell><cell>0.35</cell><cell>0.34</cell><cell>0.40</cell><cell>-0.25</cell><cell>0.40</cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>EC acknowledges support from the Marie Curie International Outgoing Fellowship program (PIOF-GA-2009-237225). RBA would like to acknowledge the following funding sources: NIH LM05652 and GM61374.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Supplementary Material</head><p>Improving the prediction of disease-related variants using protein three-dimensional structure. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Support Vector Machine (SVM) input features</head><p>The SVM-based methods developed in this work consider in input the following features: i) residue mutation; ii) protein sequence profile; functional score based on Gene Ontology (GO) terms and iv) either sequence or structure mutation environment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Encoding residue mutation</head><p>The input vector relative to mutation consists of 20 values: the first 20 (the 20 residue types) explicitly define the mutation by setting to -1 the element corresponding to the wild type residue and to 1 the newly introduced residue (all the remaining elements are kept equal to 0).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Encoding mutation structure environment</head><p>The protein structural environment is encoding with a 21 elements vector. The first 20 elements encode for the number of each residue type, which have at least one heavy atom within a radius shell around the C-α of the mutated residue. After an optimization procedure a shell of 6 Å radius has been considered. The 21 st element is the relative solvent accessible area calculated using the DSSP program <ref type="bibr">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Encoding mutation sequence environment</head><p>The 20 element input values for the mutation sequence environment (the 20 elements represent the 20 residue types) encode for the number of the each residue type, to be found inside a window centered at the residue that undergoes mutation and that symmetrically spans the sequence to the left (N-terminus) and to the right (C-terminus) with a length of 19 residues <ref type="bibr">[2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Encoding sequence profile information</head><p>We derive for each mutation: the frequency of the wild type, the frequency of the mutated residue, the number of totally and locally aligned sequences and a conservation index (CI) for the position at hand: the more a residue is functionally important the more is conserved over evolution <ref type="bibr">[3]</ref>. The conservation index is calculated as:</p><p>where f a (i) is the relative frequency of residue a at mutated position i and f a is the overall frequency of the same residue in the alignment. The sequence profile is computed from the output of the program <ref type="bibr">[4]</ref> running on the uniref90 database (Oct 2009) (Evalue threshold=10 -9 , number of runs=1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Functional based score</head><p>The Gene Ontology log-odds score (LGO) provides information about the correlation among a given mutation type (disease related and neutral) and the protein function. The annotation data are relative to the GO Database (version Mar 2010) and are retrieved at the web resource hosted at European Bionformatics Institute (EBI). To calculate the LGO, first we derived the GO terms from all the three branches (molecular function, biological process and cellular components) for all our proteins in the dataset. For each annotated term the appropriate ontology tree was traversed upward to retrieve all the parent terms with the GO-TermFinder tool (http://search.cpan.org/dist/GO-TermFinder/) <ref type="bibr">[5]</ref> and counting a GO term only once. The log-odds score associated to each protein is calculated as:</p><p>LGO=Σ</p><p>where f GO is the frequency of occurrence of a given GO term for the disease-related (D) and neutral mutations (N) adding one pseudo-count to each class. To prevent the overfitting, the LGO scores are evaluated considering f GO values computed over the training sets without including in the GO term counts of the corresponding test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Support Vector Machine software</head><p>The LIBSVM package (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) has been used for the SVM implementation <ref type="bibr">[6]</ref>. The selected SVM kernel is a Radial Basis Function (RBF) kernel K(x i ,x j )=exp(-γ||x i -x j || 2 ) and γ and C parameters are optimized performing a grid like search. After input rescaling the values of the best parameters are C=8 and γ=0.03125</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Statistical indexes for accuracy measure</head><p>The prediction accuracy is scored with several measures. In this paper the efficiency of our predictors have been scored using the following statistical indexes. The overall accuracy is: Q2=P/N <ref type="bibr">[3]</ref> where P is the total number of correctly predicted mutations and N is the total number of mutations. The Matthewʼs correlation coefficient C is defined as:</p><p>where D is the normalization factor: where p(s) and o(s) are the same as in Equation <ref type="formula">5</ref>(ranging from 0 to 1).</p><p>For each prediction a reliability score (RI) is calculated as follows: RI=20*abs|O(D)-0.5| <ref type="bibr" target="#b7">[8]</ref> where O(D) is the SVM output. Other standard scoring measures, such as the area under the ROC curve (AUC) and the true positive rate (TPR= Q(s)) at 10% of False Positive Rate (FPR= 1-P(s)) are also computed <ref type="bibr">[7]</ref>.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">dbSNP: the NCBI database of genetic variation</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Sherry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kholodov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Baker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Phan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Smigielski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sirotkin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="308" to="311" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Characterization of single-nucleotide polymorphisms in coding regions of human genes</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cargill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Altshuler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ireland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sklar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ardlie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Patil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Lane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Lim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kalyanaraman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nat Genet</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="231" to="238" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">L</forename><surname>Yip</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Famiglietti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Duek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">P</forename><surname>David</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gateau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bairoch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Hum Mutat</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="361" to="366" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kejariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Campbell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Diemer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ladunga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ulitsky-Lazareva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Muruganujan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rabkin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="334" to="341" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">SNPs, protein structure, and disease</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Moult</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Hum Mutat</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="263" to="270" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SNAP predicts effect of mutations on protein function</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bromberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Yachdav</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rost</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">20</biblScope>
			<biblScope unit="page" from="2397" to="2398" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Functional annotations improve the predictive score of human disease-related mutations in proteins</title>
		<author>
			<persName><forename type="first">R</forename><surname>Calabrese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Capriotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">L</forename><surname>Martelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Casadio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Hum Mutat</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1237" to="1244" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans</title>
		<author>
			<persName><forename type="first">E</forename><surname>Capriotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Arbiza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Casadio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dopazo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Dopazo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marti-Renom</forename><surname>Ma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Hum Mutat</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="198" to="204" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information</title>
		<author>
			<persName><forename type="first">E</forename><surname>Capriotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Calabrese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Casadio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">22</biblScope>
			<biblScope unit="page" from="2729" to="2734" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure</title>
		<author>
			<persName><forename type="first">E</forename><surname>Capriotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Casadio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="W306" to="310" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
	<note>Web Server issue</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations</title>
		<author>
			<persName><forename type="first">R</forename><surname>Guerois</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Nielsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Serrano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J Mol Biol</title>
		<imprint>
			<biblScope unit="volume">320</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="369" to="387" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources</title>
		<author>
			<persName><forename type="first">R</forename><surname>Karchin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Diekhans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Pieper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Eswar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Haussler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sali</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="2814" to="2820" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Automated inference of molecular mechanisms of disease from amino acid substitutions</title>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">G</forename><surname>Krishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Mort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">K</forename><surname>Kamati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">N</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">D</forename><surname>Mooney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Radivojac</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">21</biblScope>
			<biblScope unit="page" from="2744" to="2750" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Predicting deleterious amino acid substitutions</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">C</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Henikoff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome Res</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="863" to="874" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Human non-synonymous SNPs: server and survey</title>
		<author>
			<persName><forename type="first">V</forename><surname>Ramensky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bork</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sunyaev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">17</biblScope>
			<biblScope unit="page" from="3894" to="3900" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A three-state prediction of single point mutations on protein stability changes</title>
		<author>
			<persName><forename type="first">E</forename><surname>Capriotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Rossi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Casadio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">S6</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">S</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goldman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nielsen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genetics</title>
		<imprint>
			<biblScope unit="volume">168</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1041" to="1051" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Deleterious SNP prediction: be mindful of your training data!</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Care</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Needham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Bulpitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Westhead</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="664" to="672" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data</title>
		<author>
			<persName><forename type="first">H</forename><surname>Berman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Henrick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Nakamura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Markley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="D301" to="303" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">F</forename><surname>Altschul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Schaffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Lipman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">17</biblScope>
			<biblScope unit="page" from="3389" to="3402" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kabsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sander</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biopolymers</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="2577" to="2637" />
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information</title>
		<author>
			<persName><forename type="first">E</forename><surname>Capriotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Calabrese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Casadio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">22</biblScope>
			<biblScope unit="page" from="2729" to="2734" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">AL2CO: calculation of positional conservation in a protein sequence alignment</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Grishin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="700" to="712" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">F</forename><surname>Altschul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Schaffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Lipman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">17</biblScope>
			<biblScope unit="page" from="3389" to="3402" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">I</forename><surname>Boyle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gollub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jin</forename><forename type="middle">H</forename><surname>Botstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cherry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Sherlock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G ;</forename><surname>Go</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">18</biblScope>
			<biblScope unit="page" from="3710" to="3715" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Training nu-support vector classifiers: theory and algorithms</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Comput</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="2119" to="2147" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Assessing the accuracy of prediction algorithms for classification: an overview</title>
		<author>
			<persName><forename type="first">P</forename><surname>Baldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brunak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chauvin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Andersen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Nielsen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="412" to="424" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
