<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">matteo-brv @ DaDoEval: An SVM-based Approach for Automatic Document Dating</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Matteo</forename><surname>Brivio</surname></persName>
							<email>matteo.brivio@student.uni-tuebingen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Linguistics</orgName>
								<orgName type="institution">University of Tübingen</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">matteo-brv @ DaDoEval: An SVM-based Approach for Automatic Document Dating</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FD5493D52EA1F63C306836390A0D4161</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>English. This paper describes our contribution to the EVALITA 2020 shared task DaDoEval -Dating Document Evaluation. The solution we present is based on a linear multi-class Support Vector Machine classifier trained on a combination of character and word n-grams, as well as number of word tokens per document. Despite its simplicity, the system ranked first both in the coarse-grained classification task on same-genre data and in the one on cross-genre data, achieving a macroaverage F1 score of 0.934 and 0.413, respectively. The system implementation is available at https://github.com/ matteobrv/DaDoEval.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Temporal information, such as the publication date of a document, is of major relevance in a number of domains, like historical linguistics and digital humanities <ref type="bibr" target="#b19">(Niculae et al., 2014)</ref>. This is arguably even more true for a wide range of information retrieval tasks, such as document exploration, similarity search, summarisation and clustering, where the temporal dimension plays a major role in improving search results <ref type="bibr" target="#b11">(Alonso et al., 2007;</ref><ref type="bibr" target="#b10">Alonso et al., 2011)</ref>.</p><p>Such information, however, is not always readily available and must therefore be inferred, relying either on qualitative or quantitative methods, if not both <ref type="bibr" target="#b1">(Ciula, 2017)</ref>. Nonetheless, despite their significance, methods for temporal text classification and automatic document dating are still rather unexplored compared to other text classification tasks <ref type="bibr" target="#b19">(Niculae et al., 2014)</ref>. This, however, is most likely bound to change as the increasing availability of large-scale, time-annotated digital resources, such as Google n-grams<ref type="foot" target="#foot_0">1</ref> , is promoting research in this direction. Two recent examples of this new trend, in line with the present task, are the Diachronic Text Evaluation shared task organised by <ref type="bibr" target="#b9">Popescu et al. (2015)</ref> at SemEval 2015 and the RetroC Challenge presented by <ref type="bibr" target="#b7">Graliński et al. (2017)</ref>.</p><p>In this work we propose a simple, yet effective, approach for automatic document dating based on a linear multi-class Support Vector Machine classifier, trained on a combination of character and word n-grams, as well as document length in word tokens.</p><p>The solution is evaluated in the context of the DaDoEval -Dating Document Evaluationshared task at EVALITA 2020 <ref type="bibr" target="#b15">(Menini et al., 2020;</ref><ref type="bibr" target="#b0">Basile et al., 2020)</ref>. The task is based on the Alcide De Gasperi's corpus of public documents <ref type="bibr" target="#b13">(Tonelli et al., 2019)</ref> and is organised into six sub-tasks: (I) coarse-grained classification on same-genre data, (II) coarse-grained classification on cross-genre data, (III) fine-grained classification on same-genre data, (IV) fine-grained classification on cross-genre data, (V) year-based classification on same-genre data, (VI) year-based classification on cross-genre data.</p><p>The proposed solution tackles the first two subtasks, coarse-grained classification on same-genre and cross-genre data. Both sub-tasks require to correctly assign document samples to one of the main five time periods identified in De Gasperi's political life, spanning a range of over fifty years from 1901 to 1954.</p><p>The paper is structured as follows: in section 2 we provide a brief overview of the training data set, in section 3 we go over the system setup and describe the feature space, section 4 is dedicated to results analysis and discussion, in section 5 we <ref type="bibr">1901</ref><ref type="bibr">-1918</ref><ref type="bibr">1919</ref><ref type="bibr">-1926</ref><ref type="bibr">1927</ref><ref type="bibr">-1942</ref><ref type="bibr">1943</ref><ref type="bibr">-1947</ref><ref type="bibr">1948</ref><ref type="bibr">-1954 SAMPLES SAMPLES</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data</head><p>The training data set released for the shared task includes 2,210 document samples extracted from the Alcide De Gasperi's corpus of public documents, a multi-genre collection of 2,759 texts written or transcribed between 1901 and 1954 <ref type="bibr" target="#b13">(Tonelli et al., 2019)</ref>.</p><p>With respect to the coarse-grained classification sub-tasks, the given samples are organised into five classes (see Table <ref type="table" target="#tab_0">1</ref> A preliminary analysis of the data set reveals an imbalanced class distribution, with a significantly lower number of samples in the third class, corresponding to the 1927-1942 interval. This, however, is partially mitigated by the markedly higher average number of word tokens per sample observed in this class compared to the other ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">System Description</head><p>The proposed solution is based on a Support Vector Machine (SVM) classifier implemented using the Scikit-learn library <ref type="bibr" target="#b6">(Pedregosa et al., 2011)</ref>.</p><p>To account for the rather imbalanced data set, the SVM is tuned in such a way that classes are assigned weights inversely proportional to their frequency in the input data.</p><p>Following the assumption that most text categorisation problems are linearly separable <ref type="bibr" target="#b17">(Joachims, 1998)</ref> the model uses a linear kernel implemented in terms of libsvm <ref type="bibr" target="#b4">(Chang and Lin, 2011)</ref> while relying on a one-versus-one decision strategy to handle both sub-tasks as multi-class, single label, classification problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Feature space</head><p>The system relies solely on the data provided by the task organisers and is split into training set (80%) and development set (20%). No preprocessing is applied, as measures such as case normalisation and punctuation removal do not seem to improve the classification result on the development set, but rather to worsen it.</p><p>Each document in the data set is represented using three sets of features: document length in terms of word tokens as well as character and word n-grams. In this respect, we explore the idea that SVMs trained on combinations of character and word n-grams are particularly effective in tackling text classification tasks <ref type="bibr" target="#b3">(C ¸öltekin and Rama, 2017;</ref><ref type="bibr" target="#b2">C ¸öltekin and Rama, 2018)</ref>.</p><p>Character n-grams are extracted for n ∈ {3, 4, 5} and span across word boundaries, thus capturing punctuation and space characters occurring at the beginning and at the end of each word token. Word n-grams, on the other hand, are extracted for n ∈ {1, 2}. Both feature sets are weighted using term-frequency, inverse-document frequency (TF-IDF) to scale down the impact of the most frequent n-grams.</p><p>The number of word tokens per document is computed in a naive way, splitting each sample at every white space. Similarly to n-gram features, tokens count are scaled down to a 0-1 range in an attempt to avoid numerical problems and prevent features in higher numeric ranges from dominating those in smaller ones <ref type="bibr" target="#b5">(Hsu et al., 2003)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Optimisation and Tuning</head><p>The system hyper-parameters are optimised to obtain the best F1 score on the development set.</p><p>A subset of the hyper-parameters is tuned empirically through several experiments or on the basis of existing literature. This is the case for kernel type, decision strategy, class balancing, tolerance for stopping criterion (tol) and n-grams size.</p><p>The remaining hyper-parameters considered during optimisation are the regularisation param-eter (C) together with the maximum and minimum document frequency (max df, min df), which in the present approach are used to set an acceptance threshold for high and low frequency ngrams.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>COMPONENT PARAMETER VALUE</head><p>TfidfVectorizer These hyper-parameters are tuned through the BayesSearchCV algorithm implemented in the scikit-optimize library <ref type="bibr" target="#b18">(Head et al., 2020)</ref>, using a 5-fold-shuffled cross validation. BayesSearchCV relies on Bayesian Optimisation and explores the hyper-parameters search space exploiting the information available from previous evaluations. This is in contrast to other approaches, such as grid and random search, which move across the search space either in an exhaustive or completely random manner.</p><p>Table <ref type="table" target="#tab_1">2</ref> summarises the best hyper-parameters setup obtained from the tuning process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>In this section we present the results for the two sub-tasks the system participated to. Results are summarised in Table <ref type="table">3</ref> and reported in terms of macro-average F1 score.</p><p>The system ranked first both in the same-genre and in the cross-genre coarse-grained classification task, obtaining a macro-average F1 score of 0.934 and 0.413, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SUB-TASK TEAM</head><p>RUN MACRO F1 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Classification on same-genre data</head><p>The runs submitted for the first sub-task are based on test samples of the same genre as the ones in the training set. The system scored well above the baseline, which was computed with a Logistic Regression model trained on TF-IDF-weighted word unigrams, without performing any preprocessing.</p><p>Overall, the results registered on the test set are in line with those observed during training. This is confirmed by the data summarised in Table <ref type="table" target="#tab_3">4</ref> and by the confusion matrix in Figure <ref type="figure" target="#fig_1">1</ref>.</p><p>The confusion matrix depicts a run on the development set which achieved a macro-average F1 score of 0.95, while Table <ref type="table" target="#tab_3">4</ref> reports the perclass results of the best test run submitted for the sub-task. In both cases <ref type="bibr">1919-1926, 1943-1947 and 1948-1954</ref>  1 9 0 1 -1 9 1 8 1 9 1 9 -1 9 2 6 1 9 2 7 -1 9 4 2 1 9 4 3 -1 9 4 7 1 9 4 8 -1 9 5 4 Predicted label <ref type="bibr">1901-1918 1919-1926 1927-1942 1943-1947 1948-1954</ref> True label </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Classification on cross-genre data</head><p>The runs submitted for the second sub-task are based on samples coming from a cross-genre, outof-domain test data set. These samples are a subset of the documents collected for the Epistolario project <ref type="bibr" target="#b14">(Tonelli et al., 2020)</ref> As expected, despite scoring above the baseline, cross-genre results are significantly lower than those obtained in the same-genre task. Perclass results summarised in Table <ref type="table" target="#tab_4">5</ref> show how promising system performances registered in the same-genre task do not transfer to the cross-genre one, suggesting a poor ability of the model to generalise. Particularly interesting and worth investigating are the results registered for the third class, corresponding to the 1927-1942 interval. With respect to this class precision and recall values are equal to 0, indicating that model did not recognise any sample as belonging to this time period.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Possible improvements</head><p>Results for the same-genre task are quite encouraging and in line with those obtained on the development set, where the F1 score ranges between 0.92 and 0.96. However, with the current data and setup, there might not be much room for further improvement. Nonetheless, additional features like richness measures and linguistically motivated features (e.g. POS tags) are explored in other contributions <ref type="bibr" target="#b12">( Štajner and Zampieri, 2013;</ref><ref type="bibr" target="#b8">Zampieri et al., 2016)</ref> and could help achieve more stable results.</p><p>On the other hand, results for the second subtask suggest a lack of generalisation on crossgenre, out-of-domain data. In this respect, even though SVM-based systems for text classification should be able to perform well and take advantage of high dimensional feature spaces <ref type="bibr" target="#b17">(Joachims, 1998)</ref>, it might still be worthwhile experimenting with some feature selection methods. Another angle worth considering is that the system might be too sensitive to the shallow n-gram features used to represent the training data. In this case, including deeper text features, such as those encoding syntactic information, might help the system to abstract away from the lexical level. A first step in this direction is attempted by <ref type="bibr" target="#b16">Szymanski and Lynch (2015)</ref> who employ Google Syntactic N-grams in an SVM-based system that participated to the Diachronic Text Evaluation shared task <ref type="bibr" target="#b9">(Popescu et al., 2015)</ref> at SemEval 2015.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>In this paper we describe a simple, yet effective, approach for automatic document dating implemented for the DaDoEval shared task at EVALITA 2020. The system is based on a linear Support Vector Machine and is trained on a small set of stylistic and lexical features, resulting in a fast and efficient classification model.</p><p>In particular, the approach achieves top scores in both coarse-grained classification sub-tasks, thus confirming that SVM-based systems trained on character and word n-grams are indeed well suited to tackle text classification problems.</p><p>Nonetheless, results observed in the second task suggest that the model does not generalise well on cross-genre data, leaving room for further improvements.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>) corresponding to the main time periods historians identified in De Gasperi's political life: Habsburg years 1901-1918, Beginning of political activity 1919-1926, Internal exile 1927-1942, From fascism to the Italian Republic 1943-1947, Building the Italian Re-public 1948-1954.   </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Confusion matrix for a development set run with a macro-average F1 score of 0.95.</figDesc><graphic coords="4,116.85,80.65,144.48,144.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Training set overview, showing the number of document samples per class and the average number of word tokens per sample, rounded up to the nearest integer.</figDesc><table><row><cell>PER CLASS</cell><cell>572</cell><cell>342</cell><cell>150</cell><cell>514</cell><cell>632</cell></row><row><cell>AVG. SAMPLE LENGTH</cell><cell>867</cell><cell>1033</cell><cell>3044</cell><cell>633</cell><cell>1209</cell></row><row><cell cols="3">consider possible improvements while section 6 is</cell><cell></cell><cell></cell><cell></cell></row><row><cell>reserved for final remarks.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Final hyper-parameters setup for each system component.</figDesc><table><row><cell></cell><cell>analyzer</cell><cell>word</cell></row><row><cell></cell><cell>max df</cell><cell>0.9</cell></row><row><cell></cell><cell>min df</cell><cell>0.004</cell></row><row><cell></cell><cell>ngram range</cell><cell>(1, 2)</cell></row><row><cell></cell><cell>lowercase</cell><cell>False</cell></row><row><cell cols="2">TfidfVectorizer analyzer</cell><cell>char</cell></row><row><cell></cell><cell>max df</cell><cell>0.3</cell></row><row><cell></cell><cell>min df</cell><cell>0.001</cell></row><row><cell></cell><cell>ngram range</cell><cell>(3, 5)</cell></row><row><cell></cell><cell>lowercase</cell><cell>False</cell></row><row><cell>SVM</cell><cell>kernel</cell><cell>linear</cell></row><row><cell></cell><cell cols="2">decision function ovo</cell></row><row><cell></cell><cell>tol</cell><cell>1e-12</cell></row><row><cell></cell><cell>C</cell><cell>0.881</cell></row><row><cell></cell><cell>class weight</cell><cell>balanced</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>are the classes showing the highest number of misclassifications and, incidentally, are also the ones corresponding to the shortest time periods. Per-class results of the best test run for sub-task 1.</figDesc><table><row><cell>CLASS</cell><cell cols="2">PRECISION RECALL</cell><cell>F1</cell></row><row><cell>1901-1918</cell><cell>0.914</cell><cell>0.986</cell><cell>0.948</cell></row><row><cell>1919-1926</cell><cell>0.96</cell><cell>0.872</cell><cell>0.913</cell></row><row><cell>1927-1942</cell><cell>0.973</cell><cell>0.973</cell><cell>0.973</cell></row><row><cell>1943-1947</cell><cell>0.898</cell><cell>0.898</cell><cell>0.898</cell></row><row><cell>1948-1954</cell><cell>0.939</cell><cell>0.933</cell><cell>0.936</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>, an ongoing effort to create a digital archive of Alcide De Gasperi's private and public correspondence. Per-class results of the best test run for sub-task 2.</figDesc><table><row><cell>CLASS</cell><cell cols="2">PRECISION RECALL</cell><cell>F1</cell></row><row><cell>1901-1918</cell><cell>0.583</cell><cell>0.7</cell><cell>0.636</cell></row><row><cell>1919-1926</cell><cell>1.0</cell><cell>0.15</cell><cell>0.261</cell></row><row><cell>1927-1942</cell><cell>0.0</cell><cell>0.0</cell><cell>0.0</cell></row><row><cell>1943-1947</cell><cell>0.6</cell><cell>0.75</cell><cell>0.667</cell></row><row><cell>1948-1954</cell><cell>0.354</cell><cell>0.85</cell><cell>0.5</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://books.google.com/ngrams</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We thank Dr. C ¸agrı C ¸öltekin for his patient encouragement and valuable suggestions throughout this project.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</title>
		<author>
			<persName><forename type="first">Danilo</forename><surname>Valerio Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucia</forename><forename type="middle">C</forename><surname>Di Maro</surname></persName>
		</author>
		<author>
			<persName><surname>Passaro</surname></persName>
		</author>
		<ptr target="CEUR.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop</title>
				<meeting>Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop<address><addrLine>EVALITA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Digital palaeography: What is digital about it? Digital Scholarship in the</title>
		<author>
			<persName><forename type="first">Arianna</forename><surname>Ciula</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Humanities</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="89" to="105" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Tübingen-oslo at SemEval-2018 task 2: SVMs perform better than RNNs in emoji prediction</title>
		<author>
			<persName><forename type="first">C</forename><surname>¸agrı C ¸öltekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Taraka</forename><surname>Rama</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th International Workshop on Semantic Evaluation</title>
				<meeting>The 12th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="34" to="38" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Tübingen system in VarDial 2017 shared task: experiments with language identification and cross-lingual parsing</title>
		<author>
			<persName><forename type="first">C</forename><surname>¸agrı C ¸öltekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Taraka</forename><surname>Rama</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects</title>
				<meeting>the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects<address><addrLine>Var-Dial</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="146" to="155" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">LIBSVM: A library for support vector machines</title>
		<author>
			<persName><forename type="first">Chih-Chung</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Jen</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Intelligent Systems and Technology</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">27</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">A practical guide to support vector classification</title>
		<author>
			<persName><forename type="first">Chih-Wei</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Chung</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Jen</forename><surname>Lin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, National Taiwan University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The RetroC Challenge: How to Guess the Publication Year of a Text?</title>
		<author>
			<persName><forename type="first">Filip</forename><surname>Graliński</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rafał</forename><surname>Jaworski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Łukasz</forename><surname>Borchmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Piotr</forename><surname>Wierzchoń</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage</title>
				<meeting>the 2nd International Conference on Digital Access to Textual Cultural Heritage</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="29" to="34" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Modeling Language Change in Historical Corpora: The Case of Portuguese</title>
		<author>
			<persName><forename type="first">Marcos</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shervin</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Dras</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC&apos;16)</title>
				<meeting>the 10th International Conference on Language Resources and Evaluation (LREC&apos;16)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="4098" to="4104" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Semeval 2015, task 7: Diachronic text evaluation</title>
		<author>
			<persName><forename type="first">Octavian</forename><surname>Popescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carlo</forename><surname>Strapparava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Workshop on Semantic Evaluation</title>
				<meeting>the 9th International Workshop on Semantic Evaluation<address><addrLine>SemEval</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="870" to="878" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Temporal Information Retrieval: Challenges and Opportunities</title>
		<author>
			<persName><forename type="first">Omar</forename><surname>Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Strötgen</forename><surname>Jannik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Baeza</forename><forename type="middle">Y</forename><surname>Ricardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gertz</forename><surname>Michael</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Temporal Web Analytics Workshop</title>
				<meeting>the 1st International Temporal Web Analytics Workshop</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">On the value of temporal information in information retrieval</title>
		<author>
			<persName><forename type="first">Omar</forename><surname>Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gertz</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Baeza</forename><forename type="middle">Y</forename><surname>Ricardo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGIR Forum</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="35" to="41" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Stylistic Changes for Temporal Text Classification</title>
		<author>
			<persName><forename type="first">Sanja</forename><surname>Štajner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marcos</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Conference on Text, Speech and Dialogue (TSD)</title>
		<title level="s">Lecture Notes in Artificial Intelligence -LNAI</title>
		<meeting>the 16th International Conference on Text, Speech and Dialogue (TSD)</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">8082</biblScope>
			<biblScope unit="page" from="519" to="526" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain</title>
		<author>
			<persName><forename type="first">Sara</forename><surname>Tonelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rachele</forename><surname>Sprugnoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giovanni</forename><surname>Moretti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CLIC-it 2019</title>
				<meeting>CLIC-it 2019</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Epistolario De Gasperi: National Edition of De Gasperi&apos;s Letters in Digital Format</title>
		<author>
			<persName><forename type="first">Sara</forename><surname>Tonelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rachele</forename><surname>Sprugnoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giovanni</forename><surname>Moretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefano</forename><surname>Malfatti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Odorizzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of AIUCD</title>
				<meeting>AIUCD</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">DaDoEval @ EVALITA 2020: Same-Genre and Cross-Genre Dating of Historical Documents</title>
		<author>
			<persName><forename type="first">Stefano</forename><surname>Menini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giovanni</forename><surname>Moretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rachele</forename><surname>Sprugnoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sara</forename><surname>Tonelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop</title>
				<meeting>Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop<address><addrLine>EVALITA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">UCD: Diachronic Text Classification with Character, Word, and Syntactic N-grams</title>
		<author>
			<persName><forename type="first">Terrence</forename><surname>Szymanski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerard</forename><surname>Lynch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Workshop on Semantic Evaluation</title>
				<meeting>the 9th International Workshop on Semantic Evaluation<address><addrLine>SemEval</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="879" to="883" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Text categorization with support vector machines: Learning with many relevant features</title>
		<author>
			<persName><forename type="first">Thorsten</forename><surname>Joachims</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th European Conference on Machine Learning (ECML&apos;98)</title>
				<meeting>the 10th European Conference on Machine Learning (ECML&apos;98)</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">1398</biblScope>
			<biblScope unit="page" from="137" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">Tim</forename><surname>Head</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manoj</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Holger</forename><surname>Nahrstaedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gilles</forename><surname>Louppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iaroslav</forename><surname>Shcherbatyi</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.4014775</idno>
		<ptr target="http://doi.org/10.5281/zenodo.4014775" />
		<title level="m">scikit-optimize/scikit-optimize</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">8</biblScope>
		</imprint>
	</monogr>
	<note>Version v0</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Temporal Text Ranking and Automatic Dating of Texts</title>
		<author>
			<persName><forename type="first">Marcos</forename><surname>Vlad Niculae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liviu</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alina</forename><forename type="middle">M</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><surname>Ciobanu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics</title>
				<meeting>the 14th Conference of the European Chapter of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="17" to="21" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
