<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A simple language-agnostic yet strong baseline system for hate speech and offensive content identification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Yves</forename><surname>Bestgen</surname></persName>
							<email>yves.bestgen@uclouvain.be</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire d&apos;analyse statistique des textes -Statistical Analysis of Text Laboratory (LAST -SATLab)</orgName>
								<orgName type="institution">Université catholique de Louvain</orgName>
								<address>
									<addrLine>10 place Cardinal Mercier, Louvain-la-Neuve</addrLine>
									<postCode>1348</postCode>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Forum for Information Retrieval Evaluation</orgName>
								<address>
									<addrLine>December 13-17</addrLine>
									<postCode>2021</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A simple language-agnostic yet strong baseline system for hate speech and offensive content identification</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C30766602A455159C7975E9214954CC0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Character n-grams</term>
					<term>logistic regression</term>
					<term>gradient boosting decision tree</term>
					<term>low-resource languages</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>For automatically identifying hate speech and offensive content in tweets, a system based on a classical supervised algorithm only fed with character n-grams, and thus completely language-agnostic, is proposed by the SATLab team. After its optimization in terms of the feature weighting and the classifier parameters, it reached, in the multilingual HASOC 2021 challenge, a medium performance level in English, the language for which it is easy to develop deep learning approaches relying on many external linguistic resources, but a far better level for the two less resourced language, Hindi and Marathi. It ended even first when performances are averaged over the three tasks in these languages. These performances suggest that it is an interesting reference level to evaluate the benefits of using more complex approaches such as deep learning or taking into account complementary resources.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The diffusion of hate speech and offensive content in social networks has become a crucial problem. The tremendous number of posts broadcasted at any given time prevents their identification by human evaluation. This task is made even more complex by the large number of languages in which these offensive contents are spread. Not surprisingly, a lot of research is being done to develop automatic detection systems. As in many NLP domains, deep learning approaches and the use of pre-computed embeddings have proven to be the most efficient, even in languages with few resources <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. However, traditional machine learning systems have sometimes proven to be very competitive <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. One may thus wonder what level of performance can be achieved by a much simpler yet heavily optimized classical supervised approach, completely language-agnostic, based only on a few thousand examples to feed the supervised learner but without any additional resources. If this system is (relatively) successful, it would give a computationally easy baseline that could help evaluating the benefits of additional knowledge, complex architectures, deep learning or language expertise. The HASOC 2021 shared task "Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages" <ref type="bibr" target="#b4">[5]</ref> is particularly relevant for developing such a system because it proposes three languages. Among them, one, English, is obviously the most studied language in automatic language processing and the one in which the largest number of resources is available. Hindi and, even more so, Marathi have been much less studied and are still classified as low-resource languages <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>. One can think a priori that the approach proposed here will be much more competitive in these two languages.</p><p>The remainder of this paper presents the datasets made available for this shared task and the challenge rules, the system developed, and the results obtained which confirms that the proposed approach is a strong language-agnostic baseline for hate speech and offensive content identification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Materials and Task</head><p>The SATLab participated in subtask 1 of the HASOC 2021 shared task "Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages" which proposes two problems to be solved in three languages <ref type="bibr" target="#b4">[5]</ref>. The first problem requires to categorize tweets into two categories: Hate and Offensive (HOF) or not (NOT). It is proposed for English, Hindi and Marathi. The second problem requires categorizing the same tweets into four categories, by dividing the Hate and Offensive category into three subcategories: Hate speech (HATE), Offensive (OFFN) and Profane (PRFN). It is offered for English and Hindi.</p><p>For each language, learning and test materials have been provided by the task organizers <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b7">8]</ref>. The frequencies (#) and percentages (%) in each category of each problem for each language are given in Table <ref type="table" target="#tab_0">1</ref>.</p><p>This table deserves several comments. First of all, the learning set is much smaller in Marathi (18% of the total) than in the other two languages, the difference between the two latter being much smaller (37% of the total in English and 45% of the total in Hindi). The proportion of tweets in the HOF category is much larger in English than in the other two languages. The difference clearly comes from the PRFN category which is much more frequent in English than in Hindi where it represents only a very small percentage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Challenge rules</head><p>The rules of the challenge allowed teams to use any additional resources including materials from previous HASOC tasks, lexical norms such as emotional word lists, precomputed embeddings, the use of syntactic parsers or even machine translation systems to analyze other languages in English. The system proposed by the SATLab does not include any of these additional resources.</p><p>The official measure chosen by the organizers to rank the teams in the challenge is the Macro-F1 which has the advantage of giving the same weight to all categories, however rare they may be (e.g., less than 5% of PRFN in Hindi).</p><p>Each team was allowed to submit five runs for each subtask between August 20 and 30, 2021, and the team's best performance was displayed in the Leaderboard. Compared to the ten or so other shared tasks I participated in, it is important to underline that the submission system proposed by the challenge organizers (https://hasocfire.github.io/submission/login.html) was particularly ergonomic. Moreover, the fact that the teams could not hide their best score, as it is often the case in other systems, made, in my opinion, the competition more fair.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed System</head><p>In order to meet the requirements presented in the introduction, the proposed system is only based on character n-grams <ref type="bibr" target="#b9">[10]</ref>, an approach frequently used in automatic language processing when the developed system has to support several languages. These n-grams were extracted from the lowercased tweets with the only specificity that those starting or ending the tweet were distinguished from the others by the presence of a specific character. All character n-grams observed at least twice in the material were retained.</p><p>During the n-gram extraction, three parameters had to be set:</p><p>• The length of the n-grams in number of characters. The minimum length was systematically set to 1 while the maximum lengths evaluated varied between four and eight characters. • The weighting applied to the frequency of each feature in each instance. Two wellestablished weighting schema were evaluated:</p><formula xml:id="formula_0">-Sublinear TF-IDF: (sl)TF-IDF = (1 + log(𝑡𝑓 )) × log 𝑁 𝑑𝑓<label>(1)</label></formula><p>where 𝑡𝑓 refers to the frequency of the term in the document, 𝑁 is the number of documents in the set and 𝑑𝑓 the number of documents that include the term. -BM25 ( <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref>), which is considered as one of the most efficient weighting schema <ref type="bibr" target="#b12">[13]</ref>. It is a kind of TF-IDF that takes into account the length of the document. The following formula was used:</p><formula xml:id="formula_1">BM25 = 𝑡𝑓 𝑡𝑓 + 𝑘 1 * (1 − 𝑏 + 𝑏 * 𝑑𝑙 𝑑𝑙−𝑎𝑣𝑔 𝑑𝑙 ) × log 𝑁 − 𝑑𝑓 + 0.5 𝑑𝑓 + 0.5<label>(2)</label></formula><p>in which * 𝑡𝑓 𝑡𝑓 +𝑘 1 is the TF component which, contrarily to the usual TF-IDF, has an asymptotic maximum tuned by the 𝑘 1 parameter. * (1 − 𝑏 + 𝑏 * 𝑑𝑙 𝑑𝑙−𝑎𝑣𝑔 𝑑𝑙 ), where 𝑑𝑙 is the length of the document and 𝑎𝑣𝑔 𝑑𝑙 , the average length of the documents in the set, is the document length normalization factor whose impact is tuned by parameter 𝑏 (and by 𝑘 1 ). * The second part of the formula is a variant of the usual IDF, proposed by Robertson and Spärck Jones <ref type="bibr" target="#b10">[11]</ref>. In our analyses, 𝑘 1 was set to 2 and 𝑏 to 0.75.</p><p>• Normalization of the feature scores for each instance:</p><formula xml:id="formula_2">-The classical L2 regularization. -A MinMax transformation: MinMax = 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒 𝑖 _𝑠𝑐𝑜𝑟𝑒 − 𝑚𝑖𝑛 𝑚𝑎𝑥 − 𝑚𝑖𝑛 + 0.01<label>(3)</label></formula><p>It is important to note that this transformation is applied independently to each instance and not, as is often the case, to each feature. The value of 0.01 is added to distinguish the lowest scoring feature of an instance with the value of 0, which codes the absence of a feature.</p><p>These character n-grams were the only features provided to the supervised learning procedure. Two well-established procedures were evaluated:</p><p>• The (dual) L2-regularized logistic regression as implemented in the LIBLinear package <ref type="bibr" target="#b13">[14]</ref>, an extremely fast approach and very simple to use because it only requires the optimization of two parameters. The two parameters to optimize are the regularization parameter C and -wi which allows to adjust this parameter C for the different categories. This approach was used for the initial submission to each of the five problems. • A much slower and more complex approach to optimize because it requires the optimization of many parameters, but that has recently outperformed all deep-learning based systems participating in the CMCL 2021 shared task on predicting gaze data during reading <ref type="bibr" target="#b14">[15]</ref>: a gradient boosting decision tree approach as implemented in the LightGBM free software <ref type="bibr" target="#b15">[16]</ref>. This approach has been used only in a second time.</p><p>The system was independently optimized for each language during the learning phase using a 3-fold cross-validation procedure, whose folds were stratified according to the four categories of problem 2 for English and Hindi and the two categories of problem 1 for Marathi. This cross-validation step led to setting the parameters described above as shown in Table <ref type="table" target="#tab_1">2</ref> for the initial SATLab submissions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>In this section, the performance of the initial system proposed by the SATLab and the various optimization attempts that have been made are first presented. Secondly, these performances are compared to those of other teams in order to determine whether the proposed approach is competitive enough to serve as a baseline for evaluating the benefits of using deep learning approaches and resources supplementary to those provided in the task itself. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">SATLab submissions</head><p>Table <ref type="table">3</ref> presents the performance of the main versions of the SATLab system submitted for the five problems and thus the benefits brought by the optimization attempts on the test set. The first row reports the performance of the original system for each problem during the cross-validation step. Logically, the performances are less good for problems requiring the identification of more than two categories as well as when a category is particularly rare (Hindi-2). We also observe strong differences between the three languages. Since only one split into three folds was used, one can assume that these scores are, at least slightly, overestimated.</p><p>The second row shows the performance of the same versions on the test set and thus the initial submissions to the challenge. All scores are higher on the test set than during the cross-validation step.</p><p>As it was allowed to submit five runs for each problem, I first tried to optimize the classifier based on logistic regression by modifying very slightly the two LIBLinear parameters (i.e., C and -wi). These attempts brought a (very) slight benefit for two of the five problems as shown in the third row of Table <ref type="table">3</ref>.</p><p>In a second step, an LightGBM classifier was trained using a random grid search procedure for each of the five problems to try optimizing the parameters. As shown in the fourth row of Table <ref type="table">3</ref>, this step resulted in a stronger performance improvement in two problems: English-1 and Marathi. For the other three problems, LightGBM did not improve the performance of LIBLinear. The selected parameters for the two successful problems are given in Appendix 1. The number of boosting iterations was determined during cross-validation by using the LightGBM early stopping procedure which stops training when the performance on the validation fold doesn't improve in the last 200 rounds. The final system values on the test set for the five problems are bolded in the table. The run names of these solutions in the official leaderboard are respectively: English 1b, English 2, Hindi 1, Hindi B S4 and Marathi 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Benchmarking the approach</head><p>The main objective of SATLab's participation in HASOC 2021 was to propose a competitive system relying only on the training data and employing only classical supervised learning procedures. To determine whether this goal was achieved, Tables 4-6 compare the performance of the approach to that of the other participating systems.</p><p>Table <ref type="table" target="#tab_2">4</ref> shows for each of the five problems the number of teams that participated, the scores of the top three teams, the scores of the best SATLab version, and the scores of the two contiguous teams. As it can be seen, it is clearly in the two less resourced languages, Hindi and Marathi, that the performance of the approach is among the best since it is even second, very close to the first (and the third), in the Hindi-2 problem. In English on the other hand, the system is ranked in the middle of the pack of average scores at 0.048 and 0.054 of the best team.</p><p>The difference in performance between English and the other two languages is particularly evident in Tables <ref type="table" target="#tab_4">5 and 6</ref>, which present the average scores of the teams for the five problems (Table <ref type="table" target="#tab_3">5</ref>), the two problems in English and the three problems in less endowed languages (Table <ref type="table" target="#tab_4">6</ref>). Before calculating these averages, the scores for each problem were divided by the maximum score obtained for the problem in question. This transformation<ref type="foot" target="#foot_0">1</ref> allows to give an equivalent weight to the scores of all problems. It is then possible to present in the same table, without distorting the results, all the teams, whatever the number of problems they have participated in. Without this transformation, the teams that participated in the most difficult tasks are penalized compared to those that did not. In these tables, the number of problems each team participated in is given by the column in which the score is found and the total number of teams that participated in a given number of problems is presented in the last row.</p><p>In terms of the overall average (Table <ref type="table" target="#tab_3">5</ref>), SATLab ranks sixth overall and third among the 16 teams that participated in the five tasks. In English (Table <ref type="table" target="#tab_4">6</ref>), on the other hand, it ranks only 20th. In the two less endowed languages (Table <ref type="table" target="#tab_4">6</ref>), it is second, exceeded only by a team that participated in only one of the five tasks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>A system, based exclusively on the character n-grams present in the posts to be categorized, employing no additional linguistic resources and thus completely language-agnostic, is proposed to automatically identify hate speech and offensive content in social network posts. It relies on traditional machine learning procedures such as logistic regression. Used in the HASOC 2021 challenge <ref type="bibr" target="#b4">[5]</ref>, it reached a medium performance level in English, the language for which it was easy to develop deep learning approaches relying on many external linguistic resources. Its performance, averaged on the two Hindi problems and the Marathi problem, ranks it in first place among the teams that proposed systems for at least two of these problems. These performances suggest that it is an interesting reference level to evaluate the benefits of using more complex approaches that are frequently used to address this type of task such as deep learning or taking into account complementary resources <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b8">9]</ref>. However, it is essential to note that the proposed system never ranked first in any specific task. It is therefore clearly not the best performing system for any of the five tasks.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Dataset statistics of subtask 1</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">Learning Phase</cell><cell></cell><cell></cell><cell>Learning</cell><cell>Test</cell></row><row><cell></cell><cell></cell><cell cols="2">Problem 1</cell><cell></cell><cell cols="2">Problem 2</cell><cell></cell><cell>Phase</cell><cell>Phase</cell></row><row><cell></cell><cell></cell><cell cols="2">NOT HOF</cell><cell cols="4">NONE HATE OFFN PRFN</cell><cell>Total</cell><cell>Total</cell></row><row><cell>English</cell><cell cols="3"># 1342 2501</cell><cell>1342</cell><cell>683</cell><cell>622</cell><cell>1196</cell><cell>3843</cell><cell>1281</cell></row><row><cell></cell><cell>%</cell><cell>34.9</cell><cell>65.1</cell><cell>34.9</cell><cell>17.8</cell><cell>16.2</cell><cell>31.1</cell><cell>75.0</cell><cell>25.0</cell></row><row><cell>Hindi</cell><cell cols="3"># 3161 1433</cell><cell>3161</cell><cell>566</cell><cell>654</cell><cell>213</cell><cell>4594</cell><cell>1532</cell></row><row><cell></cell><cell>%</cell><cell>68.8</cell><cell>31.2</cell><cell>68.8</cell><cell>12.3</cell><cell>14.2</cell><cell>4.6</cell><cell>75.0</cell><cell>25.0</cell></row><row><cell cols="3">Marathi # 1205</cell><cell>669</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>1874</cell><cell>525</cell></row><row><cell></cell><cell>%</cell><cell>64.3</cell><cell>35.7</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>78.1</cell><cell>21.9</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Parameters for the initial submissions</figDesc><table><row><cell>Language</cell><cell cols="2">English</cell><cell cols="2">Hindi</cell><cell>Marathi</cell></row><row><cell>Problem</cell><cell>1</cell><cell>2</cell><cell>1</cell><cell>2</cell><cell>1</cell></row><row><cell>N-gram length</cell><cell>5</cell><cell>5</cell><cell>5</cell><cell>5</cell><cell>5</cell></row><row><cell>Weighting</cell><cell>TF-IDF</cell><cell cols="2">TF-IDF BM25</cell><cell>TF-IDF</cell><cell>BM25</cell></row><row><cell cols="2">Normalization MinMax</cell><cell>L2</cell><cell>L2</cell><cell>MinMax</cell><cell>L2</cell></row><row><cell>C</cell><cell>1.1</cell><cell>2.5</cell><cell>3.7</cell><cell>0.083</cell><cell>6</cell></row><row><cell>w_HOF</cell><cell>0.5</cell><cell></cell><cell>2.2</cell><cell></cell><cell>2</cell></row><row><cell>w_HATE</cell><cell></cell><cell>2.0</cell><cell></cell><cell>1.87</cell><cell></cell></row><row><cell>w_OFFN</cell><cell></cell><cell>3.0</cell><cell></cell><cell>0.93</cell><cell></cell></row><row><cell>w_PRFN</cell><cell></cell><cell>0.8</cell><cell></cell><cell>5.60</cell><cell></cell></row><row><cell>Table 3</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Macro-F1 during cross-validation and on the test set</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Language</cell><cell cols="2">English</cell><cell cols="2">Hindi</cell><cell>Marathi</cell></row><row><cell>Problem</cell><cell>1</cell><cell>2</cell><cell>1</cell><cell>2</cell><cell>1</cell></row><row><cell>CV</cell><cell cols="4">0.7483 0.5876 0.7551 0.5133</cell><cell>0.8565</cell></row><row><cell>Initial</cell><cell cols="4">0.7635 0.6114 0.7718 0.5563</cell><cell>0.8547</cell></row><row><cell>Best LR</cell><cell></cell><cell></cell><cell></cell><cell>0.5586</cell><cell>0.8595</cell></row><row><cell cols="2">Best LGBM 0.7823</cell><cell></cell><cell></cell><cell></cell><cell>0.8749</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Macro-F1 on the test set for the five problems</figDesc><table><row><cell>English-1 N=56</cell><cell></cell><cell>English-2 N=37</cell><cell></cell></row><row><cell>Rank Team</cell><cell cols="2">Macro-F1 Rank Team</cell><cell>Macro-F1</cell></row><row><cell>1 NLP-CIC</cell><cell>0.8305</cell><cell>1 NLP-CIC</cell><cell>0.6657</cell></row><row><cell>2 HUNLP</cell><cell>0.8215</cell><cell>2 neuro-utmn-thales</cell><cell>0.6577</cell></row><row><cell>3 neuro-utmn-thales</cell><cell>0.8199</cell><cell>3 HASOC21rub</cell><cell>0.6482</cell></row><row><cell>...</cell><cell></cell><cell>...</cell><cell></cell></row><row><cell>22 hate-busters</cell><cell>0.7894</cell><cell>15 KuiYongyi</cell><cell>0.6116</cell></row><row><cell>23 SATLab</cell><cell>0.7823</cell><cell>16 SATLab</cell><cell>0.6114</cell></row><row><cell>24 TAD</cell><cell>0.7776</cell><cell>17 hate-busters</cell><cell>0.6096</cell></row><row><cell>...</cell><cell></cell><cell>...</cell><cell></cell></row><row><cell>Hindi-1 N=34</cell><cell></cell><cell>Marathi N=25</cell><cell></cell></row><row><cell>Rank Team</cell><cell cols="2">Macro-F1 Rank Team</cell><cell>Macro-F1</cell></row><row><cell>1 t1</cell><cell>0.7825</cell><cell>1 WLV-RIT</cell><cell>0.9144</cell></row><row><cell>2 Super Mario</cell><cell>0.7797</cell><cell>2 neuro-utmn-thales</cell><cell>0.8808</cell></row><row><cell>3 Hasnuhana</cell><cell>0.7797</cell><cell>3 Hasnuhana</cell><cell>0.8756</cell></row><row><cell>...</cell><cell></cell><cell>4 SATLab</cell><cell>0.8749</cell></row><row><cell>6 KuiYongyi</cell><cell>0.7725</cell><cell>5 PreCog IIIT</cell><cell>0.8734</cell></row><row><cell>7 SATLab</cell><cell>0.7718</cell><cell>...</cell><cell></cell></row><row><cell>8 neuro-utmn-thales</cell><cell>0.7682</cell><cell></cell><cell></cell></row><row><cell>...</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Hindi-2 N=24</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Rank Team</cell><cell>Macro-F1</cell><cell></cell><cell></cell></row><row><cell>1 NeuralSpace</cell><cell>0.5603</cell><cell></cell><cell></cell></row><row><cell>2 SATLab</cell><cell>0.5586</cell><cell></cell><cell></cell></row><row><cell>3 hate-busters</cell><cell>0.5582</cell><cell></cell><cell></cell></row><row><cell>...</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc>Transformed Macro-F1 for the five problems</figDesc><table><row><cell></cell><cell cols="5">Nbr. of problems the team participated in</cell></row><row><cell>Rank Team</cell><cell>5</cell><cell>4</cell><cell>3</cell><cell>2</cell><cell>1</cell></row><row><cell>1 WLV-RIT</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>1.0000</cell></row><row><cell>2 NLP-CIC</cell><cell>0.9814</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>3 neuro-utmn-thales</cell><cell></cell><cell>0.9800</cell><cell></cell><cell></cell><cell></cell></row><row><cell>4 HASOC21rub</cell><cell></cell><cell></cell><cell></cell><cell>0.9693</cell><cell></cell></row><row><cell>5 NeuralSpace</cell><cell>0.9666</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>6 SATLab</cell><cell>0.9601</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>7 KuiYongyi</cell><cell>0.9596</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>8 CAROLL Passau</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.9590</cell></row><row><cell>9 IMS-SINAI</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.9569</cell></row><row><cell>10 hate-busters</cell><cell>0.9517</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>...</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Number of Teams</cell><cell>16</cell><cell>8</cell><cell>6</cell><cell>18</cell><cell>15</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 6</head><label>6</label><figDesc>Transformed Macro-F1 for the two English problems and for the Hindi and Marathi problems</figDesc><table><row><cell cols="2">English 1 &amp; 2</cell><cell></cell><cell cols="2">Hindi 1 &amp; 2 and Marathi</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">#problems</cell><cell></cell><cell cols="2">#problems</cell><cell></cell></row><row><cell>Rk Team</cell><cell>2</cell><cell>1</cell><cell>Rk Team</cell><cell>3</cell><cell>2</cell><cell>1</cell></row><row><cell>1 NLP-CIC</cell><cell>1.0000</cell><cell></cell><cell>1 WLV-RIT</cell><cell></cell><cell></cell><cell>1.0000</cell></row><row><cell cols="2">2 neuro-utmn-thales 0.9876</cell><cell></cell><cell>2 SATLab</cell><cell>0.9800</cell><cell></cell><cell></cell></row><row><cell>3 HASOC21rub</cell><cell>0.9693</cell><cell></cell><cell>3 NeuralSpace</cell><cell>0.9787</cell><cell></cell><cell></cell></row><row><cell>4 HUNLP</cell><cell>0.9675</cell><cell></cell><cell>4 neuro-utmn-thales</cell><cell cols="2">0.9725</cell><cell></cell></row><row><cell>5 HNLP</cell><cell>0.9674</cell><cell></cell><cell>5 KuiYongyi</cell><cell>0.9707</cell><cell></cell><cell></cell></row><row><cell>...</cell><cell></cell><cell></cell><cell>6 NLP-CIC</cell><cell>0.9690</cell><cell></cell><cell></cell></row><row><cell>19 hate-busters</cell><cell>0.9331</cell><cell></cell><cell>7 hate-busters</cell><cell>0.9640</cell><cell></cell><cell></cell></row><row><cell>20 SATLab</cell><cell>0.9302</cell><cell></cell><cell>8 CAROLL Passau</cell><cell></cell><cell></cell><cell>0.9590</cell></row><row><cell>21 TeamOulu</cell><cell></cell><cell>0.9272</cell><cell>9 BIU</cell><cell cols="2">0.9484</cell><cell></cell></row><row><cell>...</cell><cell></cell><cell></cell><cell>...</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Number of Teams</cell><cell>38</cell><cell>25</cell><cell>Number of Teams</cell><cell>17</cell><cell>13</cell><cell>8</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">This transformation of the scores considers that the minimum score in each task is the same, 0, and that therefore no correction should be made at this level. This seems to me justified by the fact that, even if it is unlikely, a system can be wrong on all instances, but also and especially because it is the deviation from the maximum score that is important.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The author wishes to thank the organizers of this shared task for putting together this valuable event and the reviewers for their very constructive comments. He is a Research Associate of the Fonds de la Recherche Scientifique -FNRS (Fédération Wallonie Bruxelles de Belgique). Computational resources have been provided by the supercomputing facilities of the Université Catholique de Louvain (CISM/UCL) and the Consortium des Equipements de Calcul Intensif en Fédération Wallonie Bruxelles (CECI).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in indoeuropean languages</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Patel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mandalia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Patel</surname></persName>
		</author>
		<idno type="DOI">10.1145/3368567.3368584</idno>
		<idno>doi:10.1145/3368567.3368584</idno>
		<ptr target="https://doi.org/10.1145/3368567.3368584" />
	</analytic>
	<monogr>
		<title level="m">FIRE &apos;19: Forum for Information Retrieval Evaluation</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Gangopadhyay</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</editor>
		<meeting><address><addrLine>Kolkata, India</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019-12">December, 2019. 2019</date>
			<biblScope unit="page" from="14" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the HASOC track at FIRE 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<idno type="DOI">10.1145/3441501.3441517</idno>
		<idno>doi:10.1145/3441501.3441517</idno>
		<ptr target="https://doi.org/10.1145/3441501.3441517" />
	</analytic>
	<monogr>
		<title level="m">FIRE 2020: Forum for Information Retrieval Evaluation</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Gangopadhyay</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</editor>
		<meeting><address><addrLine>Hyderabad, India</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2020">December 16-20, 2020. 2020</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Iiit-hyderabad at HASOC 2019: Hate speech detection</title>
		<author>
			<persName><forename type="first">V</forename><surname>Mujadia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Sharma</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2517/T3-12.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2019 -Forum for Information Retrieval Evaluation</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</editor>
		<meeting><address><addrLine>Kolkata, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">December 12-15, 2019. 2019</date>
			<biblScope unit="volume">2517</biblScope>
			<biblScope unit="page" from="271" to="278" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Irlab@iitbhu at HASOC 2019: Traditional machine learning for hate speech and offensive content identification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Saroj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Mundotiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2517/T3-17.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2019 -Forum for Information Retrieval Evaluation</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</editor>
		<meeting><address><addrLine>Kolkata, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">December 12-15, 2019. 2019</date>
			<biblScope unit="volume">2517</biblScope>
			<biblScope unit="page" from="308" to="314" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech</title>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Madhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satapara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2021-12">December 2021. 2021</date>
			<biblScope unit="page" from="13" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://aclanthology.org/W18-3400" />
		<title level="m">Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Haffari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cherry</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Foster</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Khadivi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Salehi</surname></persName>
		</editor>
		<meeting>the Workshop on Deep Learning Approaches for Low-Resource NLP, Association for Computational Linguistics<address><addrLine>Melbourne</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Ortega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Ojha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Liu</surname></persName>
		</author>
		<title level="m">Proceedings of the 4th workshop on technologies for machine translation of low-resource languages: Introduction</title>
				<meeting>the 4th workshop on technologies for machine translation of low-resource languages: Introduction</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note>Proceedings of Machine Translation Summit XVIII. I-VI</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Cross-lingual offensive language identification for low resource languages: The case of Marathi</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gaikwad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Homan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of RANLP</title>
				<meeting>RANLP</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Madhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satapara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nandini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename></persName>
		</author>
		<ptr target="http://ceur-ws.org/" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2021 -Forum for Information Retrieval Evaluation</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Improving the character ngram model for the DSL task with BM25 weighting and less frequently used feature sets</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bestgen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</title>
				<meeting>the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)<address><addrLine>Valencia, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="115" to="123" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The probabilistic relevance framework: BM25 and beyond</title>
		<author>
			<persName><forename type="first">S</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zaragoza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="333" to="389" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Optimizing a supervised classifier for a difficult language identification problem</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bestgen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eigth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</title>
				<meeting>the Eigth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="96" to="101" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">An Introduction to Information Retrieval</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>Cambridge University Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">LIBLINEAR: A library for large linear classification</title>
		<author>
			<persName><forename type="first">R.-E</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1871" to="1874" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">LAST at CMCL 2021 shared task: Predicting gaze data during reading with a gradient boosting decision tree approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bestgen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.cmcl-1.10</idno>
		<ptr target="https://aclanthology.org/2021.cmcl-1.10.doi:10.18653/v1/2021.cmcl-1.10" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the Workshop on Cognitive Modeling and Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="90" to="96" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">LightGBM: A highly efficient gradient boosting decision tree</title>
		<author>
			<persName><forename type="first">G</forename><surname>Ke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Liu</surname></persName>
		</author>
		<ptr target="http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 30</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><forename type="middle">V</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="3146" to="3154" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
		<idno>.78875</idno>
		<title level="m">num_leaves&apos;: 14, &apos;learning_rate&apos;: 0.0095, &apos;min_data_in_leaf&apos;: 6, &apos;max_depth&apos;: 10, &apos;feature_fraction&apos;: 0</title>
				<imprint/>
	</monogr>
	<note>min_data_in_leaf&apos;: 3, &apos;max_depth&apos;: 11. , &apos;feature_fraction&apos;: 0.12125, &apos;bagging_freq&apos;: 4, &apos;bagging_fraction&apos;: 0.8, &apos;metric&apos;: &apos;binary&apos;, &apos;objective&apos;: &apos;binary&apos;, &apos;is_unbalance&apos;: &apos;false&apos;. The treshold used to decide that an instance belongs to the HOF class was set at 0.35</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
