<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">HAHA at FakeDeS 2021: A Fake News Detection Method Based on TF-IDF and Ensemble Machine Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Kun</forename><surname>Li</surname></persName>
							<email>2106967047@qq.com</email>
							<affiliation key="aff0">
								<orgName type="department">School of Information Science and Engineering</orgName>
								<orgName type="institution">Yunnan University</orgName>
								<address>
									<settlement>Yunnan</settlement>
									<country key="CN">P.R. China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">HAHA at FakeDeS 2021: A Fake News Detection Method Based on TF-IDF and Ensemble Machine Learning</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D98BFE9A4D4B57DAD31B04457B3DA00B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Fake News Detection</term>
					<term>TF-IDF</term>
					<term>Machine Learning</term>
					<term>Ensemble Model</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes our participation in the FakeDeS <ref type="bibr" target="#b4">[5]</ref> Task at Iberlef 2021: Fake News Detection in Spanish. Base on this task, we propose the classic TF-IDF feature extraction technology and Stacking ensemble learning method base on weak classifiers. It not only analyzes the content of the news, but also combines effective information such as publishers and topics to improve the performance of our model. We used five machine learning models, and achieved very competitive results on both the validation set and the test set, and got the second place in the final evaluation phase.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Fake news refers to a kind of public opinion that uses false information to deceive the parties in order to achieve a certain purpose. It fails to truly reflect the original appearance of objective things and contains false elements. The information provided by fake news is designed to manipulate people for different purposes: terrorism, political election, advertising, satire, etc. In social networks, fake news spreads in seconds among thousands of people and research has shown that misinformation spreads faster, farther, deeper, and more widely than true information <ref type="bibr" target="#b11">[12]</ref>, so it is necessary to develop tools to help control the amount of fake news on the network.</p><p>A few years ago, the method of detecting fake news was mainly to analyze the effective features from various sources, including the content of the text, user data and the form of news dissemination. It mainly distinguishes true and false news from the aspect of language features, such as writing style and special headlines <ref type="bibr" target="#b6">[7]</ref>, vocabulary and syntactic analysis <ref type="bibr" target="#b9">[10]</ref>. In addition to language features <ref type="bibr" target="#b2">[3]</ref>, some studies have proposed classification schemes on user features and time features <ref type="bibr" target="#b6">[7]</ref>. Recent fake news detection methods mainly use machine learning and deep learning techniques, with special attention to language-based methods <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15]</ref>. Some people use TF-IDF feature extraction technology for fake news detection and have achieved good results <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b1">2]</ref>.</p><p>A fake news detection system is designed to help user detect and filter potentially deceptive news. A predictive method of deliberately misleading news is based on the analysis of the real and deceptive previously censored news, that is, the annotated corpus. Text is main carrier of news information, and the study of news text helps to effectively identify fake news. The specific task is: given the text of a news event, determine whether the event is real news of fake news. For the evaluation of systems, we will use a new testing corpus containing news related to COVID-19 and news from other Ibero-American countries. Its availability will introduce two main challenges to the task: thematic and language variation. Our systems need to take into consideration that part of the testing corpus contains news in a topic that does not exist in training corpus, likewise, we should take into account that the other part of the testing corpus contains news in a different variation of the Spanish that is in training corpus. This paper proposes a method for fake news detection: A fake news detection method based on TF-IDF and ensemble machine learning.</p><p>TF-IDF has the characteristics of simplicity, fast calculation. and it performs well for processing long texts. The Section 2 introduces the corpus and analyzes the composition and distribution of the data. The third section introduce the methodology, data processing methods, feature extraction methods, the base model used, and the final ensemble model. The experiments sett and results are presented in Section 4. Finally, Section 5 outlines the final conclusions and future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Corpus Description</head><p>The Spanish fake news corpus <ref type="bibr" target="#b8">[9]</ref> is news collected from several online sources: existing newspaper websites, media company websites, special websites dedicated to verifying fake news, and websites designated by different reporters as regular fake news publications. All these articles are written in Mexican Spanish. The corpus collected 971 news items from different sources from January to July 2018: 971 news items were divided into training set and test set. Among them, there are 676 pieces of data in training set and 295 pieces of data in the test set. Only two categories (True of Fake) are considered for the marking of the corpus, and the specific conditions of each piece of data are as follows:</p><p>-Category: Fake/ True.</p><p>-Topic: Science/ Sport/ Economy/ Education/ Entertainment/ Politics, Health/ Security/ Society. -Headline: The title of the news.</p><p>-Text: The complete text of the news.</p><p>-Link: The URL where the news was published.</p><p>Among them, the number of fake news and real news is fairly balanced. And the number of fake news and real news in the Topic column is almost the same. However, there is a big gap between the authenticity of news published on different websites. Some websites are almost all fake news, and there are also websites that are all real news. This provides a good idea for our feature extraction. We will consider the impact of "Link" and "Topic" on the classification results when we do experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Method and Technology</head><p>This section includes 4 parts: data preprocessing, feature extraction methods, classification models, and ensemble model methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data Preprocessing</head><p>In order to get better results, data preprocessing is essential. And data preprocessing is usually the first step in natural language processing tasks. First intercept the most critical information in Link, and the result after interception looks like this: www.elruinaversal.com. Because we have analyzed the sources of fake news and found that almost all the news on some websites is fake news. Then we observe the data and find that each row consists of Category, Topic, Source, Headline, Text and Link. All the contents in Category, Topic, Source, Headline, Text and Link are merged as our new input. Finally, data cleaning is performed on the merged input data. Perform data cleaning on the merged data, use regular expressions to remove links, special symbols, punctuation, etc. According to the length of the text, the length of the longest text is 2578 and the shortest text is 31. So we decided to use nltk to remove the stop words in the text. The longest text length after removing the stop words is 1379, and the shortest is 18. Removing stop words in the text will reduce the effect of the model, and not removing stop words will improve the performance of the model. However, we have proved through experiments that removing stop words will reduce the performance of the model, which will be explained in subsequent experiments. And we will verify it in the next experiment. The data processing is: 1) merge all columns + remove stop words: 2) merge all columns + without remove stop words: 3) only Text + remove stop words: 4) only Text + without remove stop words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Feature Extraction</head><p>The method base on news content focuses on extracting various features of fake news content, including knowledge-base and style-based features. This paper mainly uses two methods to extract text features: 1) LabelEncoder; 2) TF-IDF.</p><p>-LabelEncoder: We use Sklearn's LabelEncoder method to hard-code text features, that is, to encode discrete numbers or text, and convert the discrete data to numbers between (0, n-1), where n represents different data values. We performed LabelEncoder on the Topic and Source features. The LabelEncoder method also played a very good role in the experiment. -TF-IDF: Term Frequency-inverse Document Frequency (TF-IDF) is a statistical analysis method for keywords, used to evaluate the importance of a word to a document set or a corpus. The importance of a word is positively correlated with the number of times it appears in the article, and negatively correlated with the number of times it appears in the corpus. TF-IDF can effectively avoid the influence of commonly used words on keywords and improve the relevance between keywords and articles. TF refers to the total number of times a word appears in the article. This indicator is usually normalized and defined as the number of times a word appears in the article divided by the total number of words in the article, which can prevent the result from being biased towards too long document (The same word usually has a higher word frequency in a long document than in a short document). IDF refers to the frequency of reverse documents. The fewer documents that contain a word, the greater of the IDF value, indicating that the word has a strong ability to distinguish. Using TF-IDF can well extract text features in Spanish news. For texts with a length of several thousand, TF-IDF is better than RNN and other neural network model in extracting features of long texts. And for the challenge of changing language, TF-IDF can easily solve it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Base Classification Model</head><p>We use five basic weak classifiers as our base model.</p><p>- </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Ensemble Model</head><p>LightGBM <ref type="bibr" target="#b5">[6]</ref>: The model is a fast, distributed, high-performance gradient boosting framework base on decision tree algorithm. It can be used in sorting, classification, regression, and many other machine learning tasks. LightGBM is an improvement of the GBDT algorithm <ref type="bibr" target="#b3">[4]</ref>. LightGBM didn't use the traditional pre-sorting idea, but optimizes the histogram of the eigenvalues. The weak classifier is used to iteratively train to obtain the optimal model, which has the advantages of good training effect and not easy to overfit. The train method is GBDT: it is an algorithm that classifies or regresses data by using an additive model and continuously reducing the residuals generated during the training process. Choose the above LogisticRegression (LR), SGDClassifier (SGDC), PassiveAg-gressiveClassifier (PAC), RidgeClassifier (RC), LnearSVC (LSVC) as the weak base model. Figure <ref type="figure" target="#fig_1">1</ref> shows the use of the Stacking method in the ensemble learning method to predict all the trained base models on the entire training set, and each base model will get a classification prediction result. For each base model, we train out model by using 5-fold cross-validation, concatenate the classification prediction results after each base model training, and finally send all the features to the final LightGBM model for training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experimental Setup</head><p>First, data processing is performed, and then all the hyperparameter of the experiment are introduced. Almost all base model use default parameters, and each base model uses 5-fold cross-validation for training; the LightGBM model also uses 5-fold cross-validation for training. The training model of the LightGBM model is GBDT; the learning rate is set to 0.01; the maximum number of iterations num-boost-round is set to 10000; the progress is displayed every 50 iterations. Finally, the model outputs the final accuracy rate and and F1-macro. The hyperparameters of each classifier are as follows: LR (random-state=1017, C=3), SGDC (random-state=1017, loss='log'), PAC (random-state=1017, C=2), RC (random-state=1017), LSVC (random-state=1017).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Result</head><p>We will evaluate the model from F1 measure and accuracy on the "fake" class. The results of the base model and ensemble model on the training data shown in Table <ref type="table" target="#tab_0">1</ref>, where "Merge-All" is to concatenate all the information, and "Stopwords" is removing stop words from the text. Accuracy and F1-macro are both results obtained on the validation set. From Table <ref type="table" target="#tab_0">1</ref>, we can see that the per-  We can get that in the same base model, merge can play a certain role in improving the accuracy of the model and the F1-macro score. Without the merge, the accuracy of the model is significantly reduced, and the remove stop words from the text will reduce the performance of the model. At the same time, no matter how the data is processed, the ensemble model is better than the base model in accuracy, and the F1-macro score of the ensemble model performs better with merged. Among them, the accuracy of the ensemble model is at least 2.6% higher than that of the base model, and at most 7%. The F1-macro score is improved by at least 2.1% and at most 6.8%. This fully illustrates that our ensemble learning model plays a very good role in improving the performance of the model. The more information the model inputs, the better the performance of the model, but too much data will affect the efficiency of model operation.</p><p>On the test data set, we only got two results: 1) not merging and removing stop words, 2) merging and not removing stop words. The F1-macro score of the first type is only 0.6975, and the F1-macro score of the second type reaches 0.7548 show in Table <ref type="table" target="#tab_1">2</ref>, which once again shows that data processing and ensemble learning methods can effectively improve the performance of the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions and Further Work</head><p>This article describes the fake news detection classification task in the IberLEF 2021 task. We used some classic feature extraction methods and machine learning techniques, and achieved high performance on the development set through ensemble methods. Compared with other deep learning and machine learning methods, the performance on the test set is also very competitive. Compared with MEX-A3T 2020 <ref type="bibr" target="#b12">[13]</ref>, the accuracy rate on the verification set has increased by about 8%, and the F1-macro score has increased by 6%. Compared with last year's best papers, the results of our model are also very competitive. The best F1-macro score we got on the test set was 0.7548, which was the second place. Due to changes in language and tweet content, the performance on the development set is still lower The future work is to explore more advanced technologies, use better feature extraction methods, and achieve better results in the next competition. Secondly, we also plan to apply our model to other languages and better solve the current flood of Covid-19 information. Finally, it is harmful to treat all news from a link as fake news or real news. We will solve this problem in future work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>LogisticRegression (LR): Logistic regression is used to discover the connection between features and output results, used for classification problems in supervised machine learning algorithms, and has a close relationship with neural networks. Neural networks can be regarded as multiple logistic regression classifiers Stacked. Logistic regression can be used for binary classification problems and multi-classification problems. -SGDClassifier (SGDC): Mainly used in large-scale sparse data problems. It is a collection of linear classifiers trained with stochastic gradient descent algorithm, It is a linear (soft interval) support vector machine classifier by default, which is logistic regression in this article. -PassiveAggressiveClassifier (PAC): It is a classic online linear classifier, which can continuously integrate new samples to adjust the classification model and enhance the classification ability of the model. It can perform feature extraction on streaming data, and can perform incremental learning. -RidgeClassifier (RC): This classifier uses penalized least square function to adapt to the classification model. The loss function used by RidgeClassifier can make different calculation performance profiles. -LnearSVC (LSVC): A linear classification supports vector machine is implemented by liblinear, which can be used for two-class classification or multiclass classification.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Combine the processed data of the five basic models and then perform ensemble learning.</figDesc><graphic coords="5,134.77,278.55,345.81,121.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Results for the development data set of fake news by using different classifier models, and preprocessing techniques.formance of the ensemble model is always better than that of the base model, regardless of whether the merging and stop words are removed. For the same model, the highest accuracy can be obtained by performing two data processing methods at the same time. 91.1% and the highest F1-macro score 91.3%. At the same time, removing the stop words actually weakens the performance of the model, and merging all column information can significantly improve the performance of the base model, and it also improves the ensemble model to a certain extent.</figDesc><table><row><cell cols="5">Model Merge All Stopwords Accuracy F1-macro</cell></row><row><cell>LR</cell><cell>Yes</cell><cell>Yes</cell><cell>0.876</cell><cell>0.880</cell></row><row><cell>SGDC</cell><cell>Yes</cell><cell>Yes</cell><cell>0.883</cell><cell>0.890</cell></row><row><cell>PAC</cell><cell>Yes</cell><cell>Yes</cell><cell>0.885</cell><cell>0.892</cell></row><row><cell>RC</cell><cell>Yes</cell><cell>Yes</cell><cell>0.873</cell><cell>0.878</cell></row><row><cell>LS</cell><cell>Yes</cell><cell>Yes</cell><cell>0.879</cell><cell>0.883</cell></row><row><cell>Ensemble</cell><cell>Yes</cell><cell>Yes</cell><cell>0.911</cell><cell>0.913</cell></row><row><cell>LR</cell><cell>Yes</cell><cell>No</cell><cell>0.859</cell><cell>0.861</cell></row><row><cell>SGDC</cell><cell>Yes</cell><cell>No</cell><cell>0.865</cell><cell>0.871</cell></row><row><cell>PAC</cell><cell>Yes</cell><cell>No</cell><cell>0.870</cell><cell>0.875</cell></row><row><cell>RC</cell><cell>Yes</cell><cell>No</cell><cell>0.862</cell><cell>0.865</cell></row><row><cell>LS</cell><cell>Yes</cell><cell>No</cell><cell>0.865</cell><cell>0.869</cell></row><row><cell>Ensemble</cell><cell>Yes</cell><cell>No</cell><cell>0.902</cell><cell>0.904</cell></row><row><cell>LR</cell><cell>No</cell><cell>Yes</cell><cell>0.808</cell><cell>0.805</cell></row><row><cell>SGDC</cell><cell>No</cell><cell>Yes</cell><cell>0.811</cell><cell>0.816</cell></row><row><cell>PAC</cell><cell>No</cell><cell>Yes</cell><cell>0.805</cell><cell>0.813</cell></row><row><cell>RC</cell><cell>No</cell><cell>Yes</cell><cell>0.817</cell><cell>0.815</cell></row><row><cell>LS</cell><cell>No</cell><cell>Yes</cell><cell>0.812</cell><cell>0.812</cell></row><row><cell>Ensemble</cell><cell>No</cell><cell>Yes</cell><cell>0.882</cell><cell>0.884</cell></row><row><cell>LR</cell><cell>No</cell><cell>No</cell><cell>0.822</cell><cell>0.819</cell></row><row><cell>SGDC</cell><cell>No</cell><cell>No</cell><cell>0.822</cell><cell>0.828</cell></row><row><cell>PAC</cell><cell>No</cell><cell>No</cell><cell>0.827</cell><cell>0.832</cell></row><row><cell>RC</cell><cell>No</cell><cell>No</cell><cell>0.819</cell><cell>0.816</cell></row><row><cell>LS</cell><cell>No</cell><cell>No</cell><cell>0.828</cell><cell>0.828</cell></row><row><cell>Ensemble</cell><cell>No</cell><cell>No</cell><cell>0.898</cell><cell>0.900</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>The results of our model on the official test sets.</figDesc><table><row><cell cols="2">Model F1-macro rank</cell></row><row><cell>Ensemble 0.7548</cell><cell>2</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We would like to thank the organizers for the opportunity and organization of this task, as well as teachers and seniors for their help. Finally, we would like to thank the school for supporting my research and the patient work of future reviewers.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Detecting opinion spams and fake news using text classification</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Traore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saad</surname></persName>
		</author>
		<idno type="DOI">10.1002/spy2.9</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/abs/10.1002/spy2.9" />
	</analytic>
	<monogr>
		<title level="j">Security and Privacy</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">e9</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Tecnm at MEX-A3T 2020: Fake news and aggressiveness analysis in mexican spanish</title>
		<author>
			<persName><forename type="first">S</forename><surname>Arce-Cardenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fajardo-Delgado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á Á</forename><surname>Carmona</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Iber-LEF@SEPLN. CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2664</biblScope>
			<biblScope unit="page" from="265" to="272" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Information credibility on twitter</title>
		<author>
			<persName><forename type="first">C</forename><surname>Castillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mendoza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Poblete</surname></persName>
		</author>
		<idno type="DOI">10.1145/1963405.1963500</idno>
		<idno>1963405.1963500</idno>
		<ptr target="https://doi.org/10.1145/1963405.1963500" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th International Conference on World Wide Web</title>
				<meeting>the 20th International Conference on World Wide Web<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="675" to="684" />
		</imprint>
	</monogr>
	<note>WWW &apos;11</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Greedy function approximation: A gradient boostingmachine</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Friedman</surname></persName>
		</author>
		<idno type="DOI">10.1214/aos/1013203451</idno>
		<ptr target="https://doi.org/10.1214/aos/1013203451" />
	</analytic>
	<monogr>
		<title level="j">The Annals of Statistics</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1189" to="1232" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of fakedes task at iberlef 2020: Fake news detection in spanish</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Posadas-Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bel-Enguix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Porto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">0</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Lightgbm: A highly efficient gradient boosting decision tree</title>
		<author>
			<persName><forename type="first">G</forename><surname>Ke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Y</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Neural Information Processing Systems</title>
				<meeting>the 31st International Conference on Neural Information Processing Systems<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="3149" to="3157" />
		</imprint>
	</monogr>
	<note>NIPS&apos;17</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Prominent features of rumor propagation in online social media</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kwon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICDM.2013.61</idno>
		<ptr target="https://doi.org/10.1109/ICDM.2013.61" />
	</analytic>
	<monogr>
		<title level="m">IEEE 13th International Conference on Data Mining</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="1103" to="1108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A survey on natural language processing for fake news detection</title>
		<author>
			<persName><forename type="first">R</forename><surname>Oshikawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/2020.lrec-1.747" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Language Resources and Evaluation Conference</title>
				<meeting>the 12th Language Resources and Evaluation Conference<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>European Language Resources Association</publisher>
			<date type="published" when="2020-05">May 2020</date>
			<biblScope unit="page" from="6086" to="6093" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Detection of fake news in a new corpus for the spanish language</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Posadas-Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J M</forename><surname>Escobar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Intelligent &amp; Fuzzy Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="4869" to="4876" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A stylometric inquiry into hyperpartisan and fake news</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Reinartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P18-1022</idno>
		<ptr target="https://www.aclweb.org/anthology/P18-1022" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 56th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Melbourne, Australia</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018-07">Jul 2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="231" to="240" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Fake news detection on social media: A data mining perspective</title>
		<author>
			<persName><forename type="first">K</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sliva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.1145/3137597.3137600</idno>
		<ptr target="https://doi.org/10.1145/3137597.3137600" />
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explor. Newsl</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="22" to="36" />
			<date type="published" when="2017-09">Sep 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The spread of true and false news online</title>
		<author>
			<persName><forename type="first">S</forename><surname>Vosoughi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Aral</surname></persName>
		</author>
		<idno type="DOI">10.1126/science.aap9559</idno>
		<ptr target="https://science.sciencemag.org/content/359/6380/1146" />
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">359</biblScope>
			<biblScope unit="issue">6380</biblScope>
			<biblScope unit="page" from="1146" to="1151" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Itcg&apos;s participation at MEX-A3T 2020: Aggressive identification and fake news detection based on textual features for mexican spanish</title>
		<author>
			<persName><forename type="first">D</forename><surname>Zaizar-Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fajardo-Delgado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á Á</forename><surname>Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á G</forename><surname>Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Cámara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Martínez-Unanue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><surname>Zafra</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2664/mexa3tpaper4.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) colocated with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">M J</forename><surname>Zambrano</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">A O</forename><surname>Miranda</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Zamorano</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Gutiérrez</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Rosá</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Montes-Y-Gómez</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Vega</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">G</forename></persName>
		</editor>
		<meeting>the Iberian Languages Evaluation Forum (IberLEF 2020) colocated with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020)<address><addrLine>Málaga, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-09-23">September 23th, 2020. 2020</date>
			<biblScope unit="volume">2664</biblScope>
			<biblScope unit="page" from="258" to="264" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An overview of online fake news: Characterization, detection, and discussion</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ghorbani</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ipm.2019.03.004</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S0306457318306794" />
	</analytic>
	<monogr>
		<title level="j">Information Processing Management</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">102025</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Fake news: Fundamental theories, detection strategies and challenges</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zafarani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.1145/3289600.3291382</idno>
		<ptr target="https://doi.org/10.1145/3289600.3291382" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining</title>
				<meeting>the Twelfth ACM International Conference on Web Search and Data Mining<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="836" to="837" />
		</imprint>
	</monogr>
	<note>WSDM &apos;19</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
