<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Naive Features for Sentiment Analysis on Mexican Touristic Opinions Texts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gabriela</forename><surname>Carmona-Sánchez</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Benemérita Universidad Autónoma de Puebla (BUAP)</orgName>
								<address>
									<postCode>72000</postCode>
									<settlement>Puebla</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ángel</forename><surname>Carmona</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Benemérita Universidad Autónoma de Puebla (BUAP)</orgName>
								<address>
									<postCode>72000</postCode>
									<settlement>Puebla</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Miguel</forename><forename type="middle">Á</forename><surname>Álvarez-Carmona</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Centro de Investigación Científica y de Educación Superior de Ensenada</orgName>
								<orgName type="institution">Unidad de Transferencia Tecnológica Tepic ()</orgName>
								<address>
									<addrLine>CICESE-UT3</addrLine>
									<postCode>63173</postCode>
									<settlement>Nayarit</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Consejo Nacional de Ciencia y Tecnología (Conacyt)</orgName>
								<address>
									<postCode>03940</postCode>
									<settlement>CDMX</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Naive Features for Sentiment Analysis on Mexican Touristic Opinions Texts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">16EE2EFEFA59DF5148ADC414F8D95616</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:24+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Naive features</term>
					<term>Sentiment analysis</term>
					<term>Mexican tourist texts</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents a simple approach to extract naive features to represent and classify tourists' opinions in Mexican places to participate in the Rest-Mex 2021 evaluation forum. The proposed approach consists of extracting 15 simple features. Then, various classification algorithms were used to evaluate the quality of these features, such as SVM, KNN, Decision Tree, Random Forest, and Naive Bayes. A weighting scheme was also proposed to obtain the best combination between algorithms and features, where it turned out that the best algorithm for this set of features was KNN with seven neighbors. Of these features, the best turned out to be what had to do with the length of words and characters and the number of stop words. With this approach, 0.76 of MAE was obtained, obtaining 10th place out of 15 teams, which considering the simplicity of this solution, makes it an acceptable result.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In recent years, tourist texts have taken on great importance in artificial intelligence investigations. This is due to the advantages that can be obtained from analyzing this type of text. One of them is to analyze the sentiment of tourists who leave, writing through digital platforms such as TripAdvisor. In this way, it is possible to automatically determine the user's experience, determine if their comment is positive or negative and through this information, find possible improvements that can be made to improve the experience of other tourists over time.</p><p>This task falls in the area of natural language processing, specifically within sentiment analysis. This task determines if the author of a text expresses himself positively or negatively about a product or service received. There are variations in the task where it is also about determining if the opinion is neutral; it can even go further and determine a numerical scale between 0 and N where 0 would be the most negative and N the most positive <ref type="bibr" target="#b7">[8]</ref>.</p><p>In this way, the sentiment analysis task can be seen as an automatic classification task where the instances are texts, and the class is the text polarity.</p><p>Typically, various textual representations are used for this task, such as ngrams, dictionaries, and embeddings, among others, used to feed classifiers, train them, and test them to observe their performance. However, there are scenarios where it is more critical than few computational resources are used both in time and memory due to limitations of specific tasks, for example, implementing solutions for IoT devices <ref type="bibr" target="#b4">[5]</ref>.</p><p>In this work, we propose to study the scope and effectiveness of features based on describing the text to be analyzed. For their simplicity, we will call these features Naive Features.</p><p>To test these types of features, the database that was released for the Rest-Mex 2021 evaluation forum will be used <ref type="bibr" target="#b1">[2]</ref>. For this edition, a corpus of texts from tourists who visited Guanajuato in Mexico and its attractions was released. In this way, the effectiveness of these features can be tested in texts in Spanish since one of their advantages is that they are independent of the language.</p><p>The rest of the document is organized as follows; In section 2, the methodology followed in this work is described. In section 3, the results and their analysis are presented. Finally, section 4 presents the conclusions of this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methodology</head><p>This section presents the database with which it experimented and the proposed methodology to represent the tourist texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Data set</head><p>The analysis of sentiments task in tourism texts, which this year proposed within the Rest-Mex 2021 evaluation forum, predicts a class for each review provided in the evaluation set. The available classes are integers in the range <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b4">5]</ref>. Reviews were taken from the TripAdvisor website and were written by a tourist who evaluated some of the emblematic places in Guanajuato, Mexico. It is essential to mention that the whole set was in Spanish, being the first data set with these available features for evaluation.</p><p>The forum organizers released two different data sets ; one for training and one for evaluation. The training set consists of 5197 opinions with 9 pieces of information described below:</p><p>-Index: the index of each opinion.</p><p>-Title: The title that the tourist himself gave to his opinion.</p><p>-Opinion: The opinion expressed by the tourist. -Place: The tourist place that the tourist visited and to which the opinion is directed.</p><p>-Gender: the gender of the tourist.</p><p>-Age: The age of the tourist at the time of issuing the opinion.</p><p>-Country: The country of origin of the tourist.</p><p>-Date: the date the opinion was issued.</p><p>-Label: The label representing the polarity of the opinion: <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>The training set classes are unbalanced. as Table <ref type="table" target="#tab_0">1</ref> shows. Finally, the test data set contained 2216 rows and the same information as the training set, except the class information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Proposed Approaches</head><p>To attack the sentiments analysis task in tourist data, it is proposed to use simple features that can capture important information to determine the polarity of an opinion in such a way that it is quick to calculate and represent. Especially to offer an option for restricted applications in time or memory (such as IoT solutions) and that cannot use approaches that, although they have outstanding effectiveness results, can be slow or use much computational power, in addition to having the advantage of being language-independent features.</p><p>Given a text in the data set, its representation will be given by the following features proposed: The information available for each opinion will also be added as:</p><p>-F13: The gender of the person who gave the opinion -F14: The age of the person who gave the opinion -F15: The country of the person who gave the opinion For feature F13 that refers to the opinion author's gender, the value will be 0 if it is a man, 1 if it is a woman, and 2 if the gender of the person is not known. For feature F15, the country will be coded as 0 if the person is from Mexico or 1 if not.</p><p>Each option is transformed into a vector representation of dimension 15, which is easy and fast to calculate and independent of the language, which means that data sets in different languages can be evaluated.</p><p>The 10 fold cross-validation approach was used to classify the data set <ref type="bibr" target="#b8">[9]</ref>. For each partition, the following classifiers were applied:</p><p>-Support Vector Machines (SVM) <ref type="bibr" target="#b5">[6]</ref> k-Nearest Neighbor (KNN) with k ∈ {1, 3, 5, 7} <ref type="bibr" target="#b2">[3]</ref> -Decision Tree (DT) <ref type="bibr" target="#b3">[4]</ref> -Random Forest (RF) <ref type="bibr" target="#b6">[7]</ref> -Naive Bayes (NB) <ref type="bibr" target="#b0">[1]</ref> Accuracy, F-measure, and MAE were used as evaluation measures since it is the measure that the organizers take as official.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results</head><p>In this Section, the results obtained for the training partition are presented. Afterward, the chosen model is presented to be evaluated in the training partition together with its obtained result.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Training data set results</head><p>Table <ref type="table">2</ref> shows the results for each classification algorithm for the representation described in the 2 section. This table shows the columns of accuracy (Acc), maco F-measure (F), the F-measure of each class (F CN where N represents each of the 5 classes), and mae (MAE).</p><p>In the results, it can be seen that SVM obtains the best results for accuracy and for class 5, which is the majority class; however, it obtains 0 for all other classes. This means that although it performs well for the measure mae, it is only classifying one class. On the other hand, KNN obtains the best results for macro measurement F. Class 1 obtains its best result with KNN-1, class 2 with</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2. Training data set results</head><p>Algorithm Acc F F C1 F C2 F C3 F C4 F C5 MAE SVM 51,74 0,13 0 0 0 0 0,68 0,71 KNN-1 38,21 0,21 0,04 0,05 0,15 0,29 0,51 0,87 KNN-3 38,59 0,21 0,02 0,04 0,19 0,25 0,54 0,91 KNN-5 41,96 0,21 0 0,01 0,18 0,3 0,55 0,77 KNN-7 44,17 0,21 0,02 0,01 0,16 0,28 0,59 0,74 DT 38,57 0,2 0,01 0,04 0,15 0,3 0,52 0,87 RF 48,64 0,19 0 0,02 0,07 0,21 0,65 0,7 NB 47,46 0,19 0,02 0,08 0,01 0,21 0,64 0,79 Table <ref type="table" target="#tab_1">3</ref> shows the best features under the information gain measure. Only the features that obtained a value greater than 0,1 appear in this table. These results give evidence that the best feature to solve this task is the number of words in the opinion, the second is the number of stop words, while the number of digits, the average of the length of the words, the different words not counting stop words and the length of the longest word complement the list.</p><p>Figure <ref type="figure" target="#fig_1">1</ref> shows the decision tree when only the word count feature is used. Ten opinions from class 1 (Bad label) and ten from class 5 (Good label) were used to build this decision tree. These opinions were chosen randomly. For this sample, it is possible to see that negatively valued opinions tend to have more words, which gives evidence that when the tourist is not satisfied, he uses more words to exorcise it, and could be the reason why this feature is important in this task.</p><p>It is clear to see that although a good mae result can be considered for all the algorithms, the F-measure results are shallow, which is a consequence of the data imbalance. This makes choosing a classification model not easy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Test data set results</head><p>To choose the best model from those presented in Table <ref type="table">2</ref> it is proposed to implement a weighting scheme to determine which of all the algorithms presents  the best balance between the different results (accuracy, F-measure of each class, and mae).</p><p>It is proposed to generate a linear combination to measure the quality of each of the results obtained as presented in the equation 1.</p><formula xml:id="formula_0">Q = C 1 * Acc+C 2 * F +C 3 * F 1 +C 4 * F 2 +C 5 * F 3 +C 6 * F 4 +C 7 * F 5 +C 8 * M AE (1)</formula><p>Where C i represents the importance of each variable in the equation. F j represents the F-measure results for the class j. Acc, F, and MAE represent the valor of accuracy, macro F-measure and mae, respectively.</p><p>To choose the value of each constant C i , the following weights are proposed:</p><p>-C 1 : Since accuracy is not an important measure because the collection is unbalanced, it will only be given a weight of 1. -C 2 : It seeks to obtain a high F-measure macro result so that it will be given a weight of 10. -C {3,4,5,6,7} : The higher the number of opinions in a class, the easier it is to classify, which means that classes with little data are more complicated. In this way, it is sought to reward the well-classified elements of minority classes by putting as a weight 100 − D(i) where D(i) is the percentage of the class i. -C 8 . Since MAE is the measure taken into account to order the results by the organizers, it will be given the greatest weight, which must be negative since the ideal is to get as close to zero in this measure. Thus the value of this constant will be -100.</p><p>Table <ref type="table" target="#tab_2">4</ref> shows the results of the equation 1. It is possible to see that the algorithm that presents the best value of Q is KNN-7, which did not present a high individual value of some measure; however, it is the one that obtains the best balance. On the other hand, SVM that obtained good results for accuracy, F-measure for class 5, and mae is the one that obtains the worst value of Q.</p><p>For this reason, in order to be evaluated in the Rest-Mex 2021 evaluation forum, it was decided to send the model generated by KNN-7 to the organizers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Official results</head><p>For the official results, the model proposed obtained the following results:</p><p>-Accuracy: 45,71 -Macro F-measure: 0,17 -Mae: 0,76 With these results, the approach proposed obtained 10th place of 15 teams. Also, it is obtained better F-measure results than the baseline, and it was capable of classifying instances in three of the five classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>In this work, a study was presented to measure the performance of naive features to attack the sentiment analysis problem for Mexican tourist texts.</p><p>This solution consisted of representing each tourist opinion in 15 simple features that can be extracted very quickly. This simplicity makes this solution ideal for some applications with an extreme limit of space and memory, for example, in IoT devices, and thus they can use some of these features to obtain an acceptable performance in a shorter response time.</p><p>When evaluating this solution in the Rest-Mex 2021 corpus, 0,76 of MAE was obtained, where the best result obtained in the competition was 0,47. Considering that the maximum possible error is 4 (when the result can be 5, and the prediction is 1, for example), 0,29 represents 7,25 % of the possible error, which is an acceptable loss considering the simplicity solution.</p><p>Evidence is given that the number of words in the opinion gives much information about polarity. Also, the length of the words is an essential source that a classifier can use. Other important features for this task are those that have to do with the stop words. Finally, for this task and in this database, demographic features such as gender, age, and place of origin of the author of the opinion do not seem to provide relevant information for the classification.</p><p>As work in the future, it is proposed to apply this solution to a multilingual collection and to be able to exploit its best feature, which is that it is a languageindependent solution.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>-</head><label></label><figDesc>F1: Number of capital letters in the opinion -F2: The length of the longest word in the opinion -F3: The average words length in the opinion -F4: Number of words in the opinion -F5: Number of characters in the opinion -F6: The ratio between the number of different words and total words in the opinion -F7: The number of digits in the opinion -F8: The ratio of the number of stop words and total words in the opinion -F9: Number of punctuation marks in the opinion -F10: Number of stop words in the opinion -F11: Number of characters in the opinion without stop words -F12: The ratio between the number of different words and total words in the opinion without stop words</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Decision tree for the number of words feature on 20 random opinions</figDesc><graphic coords="6,165.94,144.18,283.48,467.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Distribution of the class on the Rest-Mex 2021 training data set.</figDesc><table><row><cell>https://sites.google.com/cicese.edu.mx/rest-mex-2021</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 .</head><label>3</label><figDesc>Information gain for the best features</figDesc><table><row><cell cols="2">Feature Description</cell><cell>Information gain</cell></row><row><cell>F4</cell><cell>Number of words</cell><cell>0,307</cell></row><row><cell>F8</cell><cell>Ratio of stop words and total words</cell><cell>0,299</cell></row><row><cell>F7</cell><cell>Number of digits</cell><cell>0,272</cell></row><row><cell>F3</cell><cell>Average words length</cell><cell>0,149</cell></row><row><cell>F12</cell><cell>Ratio of different words and total words without stop words</cell><cell>0,143</cell></row><row><cell>F2</cell><cell>Length of the longest word</cell><cell>0,130</cell></row></table><note>NB, class 3 with KNN-3, and class 4 with DT. The best result of mae is obtained with RF; however, this algorithm cannot capture any instance of class 1.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4 .</head><label>4</label><figDesc>Measure of quality Q</figDesc><table><row><cell cols="2">Algorithm SVM KNN-1 KNN-3 KNN-5 KNN-7 DT RF NB</cell></row><row><cell>Q</cell><cell>14,63 19,63 15,20 30,74 36,73 17,11 34,22 26,17</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sentiment analysis of online food reviews using big data analytics</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Javed Awan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">S</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Yasin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Shehzad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Muhammad</forename><surname>Hafiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mazhar</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nabeel</forename><surname>Javed Awan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Awais</forename><surname>Sabir Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hafiz</forename><surname>Yasin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Faisal</forename><surname>Muhammad</surname></persName>
		</author>
		<author>
			<persName><surname>Shehzad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Elementary Education Online</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="827" to="836" />
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
	<note>Sentiment Analysis of Online Food Reviews using Big Data Analytics</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of rest-mex at iberlef 2021: Recommendation system for text mexican tourism</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Álvarez-Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Aranda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Arce-Cárdenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fajardo-Delgado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Guerrero-Rodríguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>López-Monroy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Martínez-Miranda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pérez-Espinosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodríguez-González</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A new covid-19 detection method from human genome sequences using cpg island features and knn classifier</title>
		<author>
			<persName><forename type="first">H</forename><surname>Arslan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Arslan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Engineering Science and Technology, an International Journal</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="839" to="847" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Classification based on decision tree algorithm for machine learning</title>
		<author>
			<persName><forename type="first">B</forename><surname>Charbuty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abdulazeez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Applied Science and Technology Trends</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">01</biblScope>
			<biblScope unit="page" from="20" to="28" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Vulnerabilities and limitations of mqtt protocol used between iot devices</title>
		<author>
			<persName><forename type="first">D</forename><surname>Dinculeanȃ</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page">848</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Review on landslide susceptibility mapping using support vector machines</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Catena</title>
		<imprint>
			<biblScope unit="volume">165</biblScope>
			<biblScope unit="page" from="520" to="529" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Thyroid disorder analysis using random forest classifier</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tadesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ranjan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Intelligent and cloud computing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="385" to="390" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Sentiment analysis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mukherjee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ML. NET Revealed</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="113" to="127" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A novel soft computing model (gaussian process regression with k-fold cross validation) for daily and monthly solar radiation forecasting (part: I)</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rohani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Abdollahpour</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Renewable Energy</title>
		<imprint>
			<biblScope unit="volume">115</biblScope>
			<biblScope unit="page" from="411" to="422" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
