<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A comparison of machine learning techniques for predicting insemination outcome in Irish dairy cows</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Caroline</forename><surname>Fenlon</surname></persName>
							<email>caroline.fenlon@ucdconnect.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luke</forename><surname>O'grady</surname></persName>
							<email>luke.ogrady@ucd.ie</email>
							<affiliation key="aff1">
								<orgName type="department">School of Veterinary Medicine</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">John</forename><surname>Dunnion</surname></persName>
							<email>john.dunnion@ucd.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Laurence</forename><surname>Shalloo</surname></persName>
							<email>laurence.shalloo@teagasc.ie</email>
							<affiliation key="aff2">
								<orgName type="department">Animal and Grassland Research and Innovation Centre</orgName>
								<orgName type="institution" key="instit1">Teagasc</orgName>
								<orgName type="institution" key="instit2">Moorepark</orgName>
								<address>
									<settlement>Fermoy</settlement>
									<region>County Cork</region>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stephen</forename><surname>Butler</surname></persName>
							<email>stephen.butler@teagasc.ie</email>
							<affiliation key="aff2">
								<orgName type="department">Animal and Grassland Research and Innovation Centre</orgName>
								<orgName type="institution" key="instit1">Teagasc</orgName>
								<orgName type="institution" key="instit2">Moorepark</orgName>
								<address>
									<settlement>Fermoy</settlement>
									<region>County Cork</region>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Doherty</surname></persName>
							<email>michael.doherty@ucd.ie</email>
							<affiliation key="aff1">
								<orgName type="department">School of Veterinary Medicine</orgName>
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A comparison of machine learning techniques for predicting insemination outcome in Irish dairy cows</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5A1DDDBB02DE72791CDFE0EDB1A90AF1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Reproductive performance has an important effect on economic efficiency in dairy farms with short yearly periods of breeding. The individual factors affecting the outcome of an artificial insemination have been extensively researched in many univariate models. In this study, these factors are analysed in combination to create a comprehensive multivariate model of conception in Irish dairy cows. Logistic regression, Naïve Bayes, Decision Tree learning and Random Forests are trained using 2,723 artificial insemination records from Irish research farms. An additional 4,205 breeding events from commercial dairy farms are used to evaluate and compare the performance of each data mining technique. The models are assessed in terms of both discrimination and calibration ability. The logistic regression model was found to be the most useful model for predicting insemination outcome. This model is proposed as being appropriate for use in decision support and in general simulation of Irish dairy cows.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Dairy production systems in Ireland are primarily based on seasonal calving patterns. Reproductive performance in these systems has an important impact on economic efficiency. In these pasture-based farms, the aim is to align peak grass availability with peak lactating cow energy demands, by breeding animals during a set time period. Poor reproductive performance results in extended periods of calving, suboptimal utilisation of pastures and increased feed costs.</p><p>The individual factors affecting conception have been extensively researched. However, few models have comprehensively examined the factors influencing the outcome of insemination in combination, particularly at the individual breeding event level <ref type="bibr" target="#b18">[19]</ref>. Most statistical analysis has focused on identifying important factors in isolation and analysing overall measures of reproductive performance, such as calving to conception interval or the probability of conception during a breeding season <ref type="bibr" target="#b4">[5]</ref>.</p><p>Statistically important factors incorporating both genetic and phenotypic effects (parity, stage of lactation, calving events, measures of energy balance and milk production) were identified as significant in previous analyses of records from Irish herds <ref type="bibr" target="#b1">[2]</ref> <ref type="bibr" target="#b6">[7]</ref>. Binary logistic regression was used to form a predictive model of conception outcome. In this study, the aim was to identify and apply other appropriate machine learning techniques to the problem of predicting insemination outcome. To allow direct comparison of the models, they were all built using the same variables as the previous study.</p><p>When evaluating binary predictions, two categories of assessment are possible: discrimination and calibration <ref type="bibr" target="#b20">[21]</ref>. Discrimination measures a model's ability to correctly classify cases; i.e. the separation between the successful and unsuccessful outcomes. Evaluations of discrimination depend on a cut-off point to transform the predicted probabilities into outcomes and ignore the raw predictions. Classification tables show the rate of correct class predictions, separated by positive and negative instances. These values can be used to calculate precision and recall <ref type="bibr" target="#b16">[17]</ref>. To identify the optimal cut-off point, receiver operating characteristic (ROC) curves are used to plot the false-positive rate against the true-positive.</p><p>Calibration compares the predictions to the true proportions of events occurring, i.e. determining if the observed frequency of occurrence is similar to the predicted probability, within groups of records. Reliability measures such as the Hosmer-Lemeshow test <ref type="bibr" target="#b11">[12]</ref> are used to test overall goodness-of-fit. Calibration plots <ref type="bibr" target="#b3">[4]</ref> allow visual inspection of deviation, with statistical tests for analysis of bias and spread. Analysis of deviances may be used to highlight outlying records or covariate values.</p><p>As breeding outcome may be considered both in terms of the probability of occurrence and the binary prediction, the models used were compared using both forms of assessment. Evaluation was carried out on an external dataset of records from typically managed commercial Irish dairy herds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Data</head><p>The data available for model training were sourced from the centralised database at Teagasc's Animal and Grassland Research and Innovation Centre, Moorepark, Co. Cork. The animals included in the dataset were from the Curtins and Ballydague spring-calving research herds, both of which emulate typical Irish dairy management systems. Additional variables were available in this dataset which were used to find the significant factors in the modelling process. After cleaning, inference and missing value removal, 2,723 artificial insemination service records from 658 lactating cows (1,552 lactations) were available for analysis. Service outcome (i.e. conception or no conception) was recorded as a binary variable and was confirmed by ultrasound pregnancy diagnosis between 30 and 60 days post-service or subsequent calving 282 ± 15 days after conception. 47.88% of the services resulted in conception. The variables analysed were: parity (the number of times the cow has previously calved); log days in milk (days since last calving); inter-service interval; the difficulty of the last calving; body condition score (measure of how fat or thin the cow is), as a second-order polynomial effect due to its non-linear relationship with conception probability; and genetic traits for milk production and calving interval.</p><p>Observations within the external testing dataset were recorded on 9 commercial dairy farms involved in a herd fertility consultancy program operated by the School of Veterinary Medicine, University College Dublin (UCD) <ref type="bibr" target="#b19">[20]</ref>. 4,205 services from 1,471 cows (2,702 lactations) were available for prediction. The same measurements as in the training set were available. 47.49% of these services were successful.</p><p>Descriptive statistics from both datasets are shown in Table <ref type="table" target="#tab_0">1</ref>. All data manipulation, analysis and evaluation were carried out using the R statistical programming language <ref type="bibr" target="#b17">[18]</ref> and R libraries. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Machine Learning Techniques</head><p>Four widely-used methods capable of modelling binary values or probabilities were used to model the outcome of breeding to service.</p><p>Logistic Regression. Binary logistic regression <ref type="bibr" target="#b11">[12]</ref> (R function glm <ref type="bibr" target="#b17">[18]</ref>) is a generalisation of simple linear regression designed to model the effect of inde-pendent variables on the probability of the modelled outcome occurring. Logistic regression assumes all independent variables are normally distributed and not strongly correlated. Regression analysis allows for interactions between independent variables to be included in the model. Random effects can be incorporated to account for the influence of unmeasurable events or global effects. In this study, a basic logistic regression model without interactions or random effects was built to allow for direct comparison with other models. Logistic regression models predict the probability of the event occurring, which can then be transformed to a binary outcome using a threshold probability.</p><p>Naïve Bayes. The implementation of Naïve Bayes used in this study (e1071 library function naiveBayes <ref type="bibr" target="#b15">[16]</ref>) also makes the assumption that numeric features are normally distributed, but assumes no dependencies between them.</p><p>If known, a-priori probabilities can be set; in this case, the overall conception rate was used. The Bayes rule calculates the probability of each potential outcome, given the a-priori probabilities and the input values. The outcome with the highest probability is then chosen as the predicted result.</p><p>Decision Tree. Tree models are created by recursively splitting the training dataset into subsets based on the value of an attribute. The next node is chosen by finding the attribute that can provide the most information when splitting the set. Cut-off thresholds are generated to discretise numeric variables. Using the rpart function (from the R library of the same name <ref type="bibr" target="#b21">[22]</ref>) results in probabilistic terminal nodes for binary outcomes.</p><p>Random Forest. Random forests (randomForest library function randomForest <ref type="bibr" target="#b13">[14]</ref>) are an ensemble learning method for Decision Trees. It uses both bootstrapping and random feature selection to train a large number of Decision Trees <ref type="bibr" target="#b22">[23]</ref>.</p><p>In this study, random forests with 100, 250 and 500 trees were built.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Evaluation</head><p>Discrimination analysis. For each of the models, the true and predicted service outcomes (given a threshold probability of 50%) were tabulated in a confusion matrix. From this, precision, recall and F-measure were calculated. The Matthews correlation coefficient was also calculated to show the performance of the models in comparison with a random classifier <ref type="bibr" target="#b14">[15]</ref>. It ranges from -1 (completely inaccurate predictions) to +1 (completely accurate predictions), with 0 indicating the same performance as random prediction.</p><p>Receiver operating characteristic (ROC) curves were used to assess how performance varied as the discrimination threshold was altered. The plot presents the true positive rate against the false positive rate, allowing the optimal probability or classifier to be interpreted visually or using summary statistics, such as the area under the curve.</p><p>Calibration analysis. Each model was used to predict the probability of conception occurring in each row of the test set, using the predict function with appropriate arguments.</p><p>The Hosmer-Lemeshow test <ref type="bibr" target="#b10">[11]</ref> was used to evaluate the overall goodness-offit of the models on the testing data. The test (R function hoslem.test from the ResourceSelection <ref type="bibr" target="#b12">[13]</ref> package) splits the observations (sorted by predicted probability) into 10 equal-sized groups of risk and compares the observed number of events to the mean predicted number of events within each group. The disadvantage of overall goodness-of-fit tests is that they cannot identify more specific cases of poor prediction <ref type="bibr" target="#b5">[6]</ref>. For a thorough investigation of capabilities, they should be used in conjunction with the more in-depth tests of calibration described below.</p><p>For each set of model predictions, a calibration plot was drawn by grouping the observations into 25 equi-interval bins and plotting the mean predicted probability against the proportion of true events within each group. The data were split into 25 to allow for acceptable-sized groups while still maintaining low within-group probability variation. Bins containing fewer than 20 records were not plotted. Confidence intervals for the proportions of successful inseminations were calculated using the F distribution (calibration.plot function of the PresenceAbsence R package <ref type="bibr" target="#b7">[8]</ref>).</p><p>Binned prediction deviations were visually examined for patterns. 95% of the binned values should lie within two standard deviations of 0 <ref type="bibr" target="#b8">[9]</ref>. The absolute group deviances were averaged to find the mean absolute calibration error.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results</head><p>All of the variables described were significant at P = 0.05 (using the drop1 function on the logistic regression model).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Discrimination</head><p>The ROC curve of each of the models is shown in Figure <ref type="figure">1</ref>. The confusion matrix for each model is in Table <ref type="table" target="#tab_1">2</ref>. Discrimination test results (precision, recall, F-score and Matthews correlation coefficient) are in Table <ref type="table" target="#tab_2">3</ref>. All of the models performed similarly in these tests, with F-scores ranging from 50.01% to 52.03%. All of the models performed better than a random classifier in the Matthews correlation coefficient (range 0.11 to 0.16).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Calibration</head><p>Results of statistical tests carried out to measure calibration and goodness-offit are shown in Table <ref type="table" target="#tab_3">4</ref>. These results can be seen visually in the calibration (Figure <ref type="figure">2</ref>) and deviance plots (Figure <ref type="figure">3</ref>).   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 1: ROC curves of the four machine learning models</head><p>There were no significant differences found between the true and predicted logistic regression and Decision Tree outcomes with the Hosmer-Lemeshow test. The test found significant differences between the true outcomes and the predictions from the Naïve Bayes and all of the Random Forest models.</p><p>The models had mean absolute calibration error ranging from 3.48% to 6.40%, with the Random Forest model built with 100 trees having the highest rate of calibration error. The Decision Tree just exceeds the accepted limit of 5% of deviance values outside the two standard deviation limit. The Naïve Bayes and all of the Random Forest models were well above this limit. Some evidence of a deviance pattern is seen in the Naïve Bayes deviance plot, while a very clear pattern is observed for the Random Forest models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discussion</head><p>The logistic regression model had the best calibration performance; its calibration error was lowest, along with the most compact deviance spread. The model's F-score was similar to the other models, but it had the highest precision and lowest recall. Its Matthews correlation coefficient was the highest of the models. The Naïve Bayes model failed the Hosmer-Lemeshow test of overall goodnessof-fit, and the calibration plot showed some points outside the 95% confidence interval. With 20% of its deviance values outside two standard deviations of 0 and some observation of systematic deviance, it showed poor capability of predicting the probability of conception. This was in spite of discrimination performance comparable to the rest of the models.</p><p>The probabilities predicted from the Decision Tree model had a very narrow range; only four distinct probabilities were predicted, resulting in only two probability groups with enough records to display on the calibration plot. This also reduced the number of rows used to calculate the Hosmer-Lemeshow test statistic. Although the discrimination evaluation of the Decision Tree did not differ greatly from the other models, its poor calibration performance makes it an unsuitable choice for predicting the outcome of service.</p><p>Because the algorithm continues to create trees until every record is correctly classified, the Random Forests were perfect models of the training data used to build them. Although these models had the best test performance in terms of discrimination, their calibration results were poor. The calibration plots show significant bias, and the distinctly non-random deviance plots indicate that the models are not capturing some important element related to the outcome <ref type="bibr" target="#b9">[10]</ref>.</p><p>Data that are not well separated along different outcomes will be very common in epidemiological applications, where probabilities close to 1 or 0 are uncommon and most in-group probabilities tend to be centred close to 50%. The benefit of modelling these outcomes is to identify events with probabilities outside the norm. This can aid the decision making of farmers and their advisors when selecting the best animals for costly insemination techniques such as sexed semen <ref type="bibr" target="#b2">[3]</ref>. Because the probability is the focus, rather than the ultimate outcome, a predictive model with good calibration is key. Thus the logistic regression model is the best model for predicting service outcome. Easily interpretable coefficients or odds ratios may be used to inform farmers about the important risk factors for service outcome.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>This paper demonstrates a novel application of machine learning algorithms in the context of Irish agriculture. Each technique was trained using data from research herds and tested with an external dataset representing the typical commercial dairy herd in Ireland. The methods implemented all show similar discriminative ability, but logistic regression was found to be the most capable at correctly predicting the probability of conception. Further improvements to the model might be made using regression with ensemble methods such as bagging <ref type="bibr" target="#b0">[1]</ref>. This is, to the authors' knowledge, the first time comprehensive statistical modelling of service outcome in Irish cows has been reported. Having a generalisable predictive model of how various risk factors combine to influence the probability of conception will aid farmers to better understand the performance potential of their animals when making management decisions, such as culling or selection of herd replacements. In addition, the fact that the model is based on easily recordable and obtainable data should further increase the practical utility of the model as a decision support tool. As well as the stand-alone benefits of the model, it is being integrated into a detailed whole-farm model of Irish dairy animals, which will simulate nutrition, reproduction, management and economics in daily time-steps for the entire life of each animal.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 2 :Fig. 3 :</head><label>23</label><figDesc>Fig. 2: Calibration plots for the models of dairy cow conception</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Descriptive statistics of Moorepark and UCD School of Veterinary Medicine datasets</figDesc><table><row><cell>Variable</cell><cell>Training data mean (SD)</cell><cell>Testing data mean (SD)</cell></row><row><cell>Parity</cell><cell>2.48 (1.51)</cell><cell>2.78 (1.74)</cell></row><row><cell>Days in milk</cell><cell>91.86 (29.83)</cell><cell>85.60 (28.83)</cell></row><row><cell>Calving interval genetic trait</cell><cell>-3.32 (2.68)</cell><cell>-2.90 (2.47)</cell></row><row><cell>Milk genetic trait</cell><cell>82.55 (184.91)</cell><cell>169.33 (153.00)</cell></row><row><cell>Body condition score at breeding</cell><cell>2.89 (0.31)</cell><cell>2.86 (0.22)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Confusion matrices for each of the models</figDesc><table><row><cell>Model</cell><cell></cell><cell>Conceived</cell><cell></cell><cell>Did not conceive</cell></row><row><cell>Logistic Regression</cell><cell>Predicted True Predicted False</cell><cell>895 1102</cell><cell></cell><cell>651 1557</cell></row><row><cell>Nave Bayes</cell><cell>Predicted True Predicted False</cell><cell>928 1069</cell><cell></cell><cell>745 1463</cell></row><row><cell>Decision Tree</cell><cell>Predicted True Predicted False</cell><cell>924 1073</cell><cell></cell><cell>774 1434</cell></row><row><cell>Random Forest (100 trees)</cell><cell>Predicted True Predicted False</cell><cell>981 1016</cell><cell></cell><cell>843 1365</cell></row><row><cell>Random Forest (250 trees)</cell><cell>Predicted True Predicted False</cell><cell>988 1009</cell><cell></cell><cell>813 1395</cell></row><row><cell>Random Forest (500 trees)</cell><cell>Predicted True Predicted False</cell><cell>989 1008</cell><cell></cell><cell>830 1378</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Matthews</cell></row><row><cell>Model</cell><cell>Precision</cell><cell>Recall</cell><cell>F-score</cell><cell>Correlation</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Coefficient</cell></row><row><cell>Logistic Regression</cell><cell>57.89%</cell><cell>44.82%</cell><cell>50.52%</cell><cell>0.16</cell></row><row><cell>Naive Bayes</cell><cell>55.47%</cell><cell>46.47%</cell><cell>50.57%</cell><cell>0.13</cell></row><row><cell>Decision Tree</cell><cell>54.42%</cell><cell>46.27%</cell><cell>50.01%</cell><cell>0.11</cell></row><row><cell>Random Forest (100 trees)</cell><cell>53.78%</cell><cell>49.12%</cell><cell>51.35%</cell><cell>0.11</cell></row><row><cell>Random Forest (250 trees)</cell><cell>54.86%</cell><cell>49.47%</cell><cell>52.03%</cell><cell>0.13</cell></row><row><cell>Random Forest (500 trees)</cell><cell>54.37%</cell><cell>49.52%</cell><cell>51.83%</cell><cell>0.12</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Discrimination statistical tests</figDesc><table><row><cell>Model</cell><cell>Hosmer-Lemeshow p-value</cell><cell>Mean absolute calibration error</cell><cell>Deviances within 2 SD</cell></row><row><cell>Logistic Regression</cell><cell>0.19</cell><cell>3.48%</cell><cell>100%</cell></row><row><cell>Naive Bayes</cell><cell>0.00</cell><cell>4.26%</cell><cell>80%</cell></row><row><cell>Decision Tree</cell><cell>1.00</cell><cell>4.66%</cell><cell>94%</cell></row><row><cell>Random Forest (100 trees)</cell><cell>0.00</cell><cell>6.40%</cell><cell>63%</cell></row><row><cell>Random Forest (250 trees)</cell><cell>0.00</cell><cell>5.96%</cell><cell>64%</cell></row><row><cell>Random Forest (500 trees)</cell><cell>0.00</cell><cell>5.99%</cell><cell>68%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Calibration statistical tests</figDesc><table><row><cell>Model</cell><cell>Logistic Regression Naive Bayes</cell><cell>Decision Tree Random Forest (100 trees)</cell><cell cols="2">Random Forest (250 trees) Random Forest (500 trees)</cell></row><row><cell>1.00</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.75</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.50 True positive rate</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.25</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.00</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.00</cell><cell>0.25</cell><cell>0.50</cell><cell>0.75</cell><cell>1.00</cell></row><row><cell></cell><cell></cell><cell>False positive rate</cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements. This research was supported by funding from the Dairy Levy Research Fund. The authors would like to thank Anne Geoghegan in Teagasc, Moorepark and the farmers involved in the UCD School of Veterinary Medicine consultancy programme for assistance in gathering the data used in this study. The suggestions of the AICS reviewers were gratefully received.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Bagging Predictors</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<idno type="DOI">10.1007/BF00058655</idno>
		<ptr target="http://link.springer.com/10.1007/BF00058655" />
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">421</biblScope>
			<biblScope unit="page" from="123" to="140" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Relationships among milk yield, body condition, cow weight, and reproduction in spring-calved Holstein-Friesians</title>
		<author>
			<persName><forename type="first">F</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>O'sullivan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Mee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dillon</surname></persName>
		</author>
		<idno type="DOI">10.3168/jds.S0022-0302(03)73823-5</idno>
		<ptr target="http://dx.doi.org/10.3168/jds.S0022-0302(03)73823-5" />
	</analytic>
	<monogr>
		<title level="j">Journal of Dairy Science</title>
		<imprint>
			<biblScope unit="volume">86</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="2308" to="2319" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Applications and cost benefits of sexed semen in pasture-based dairy production systems</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Butler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">A</forename><surname>Hutchinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Cromie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shalloo</surname></persName>
		</author>
		<ptr target="http://journals.cambridge.org/abstract{\_}S1751731114000664" />
	</analytic>
	<monogr>
		<title level="j">Animal</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">s1</biblScope>
			<biblScope unit="page" from="165" to="172" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Properties and Benefits of Calibrated Classifiers</title>
		<author>
			<persName><forename type="first">I</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Goldszmidt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ECML</title>
				<meeting>ECML</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="125" to="148" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The influence of genetic selection and feed system on the reproductive performance of springcalving dairy cows within future pasture-based production systems</title>
		<author>
			<persName><forename type="first">J</forename><surname>Coleman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Pierce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Berry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Brennan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Horan</surname></persName>
		</author>
		<idno type="DOI">10.3168/jds.2009-2108</idno>
		<ptr target="http://dx.doi.org/10.3168/jds.2009-2108" />
	</analytic>
	<monogr>
		<title level="j">Journal of Dairy Science</title>
		<imprint>
			<biblScope unit="volume">92</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="5258" to="5269" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Cox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Snell</surname></persName>
		</author>
		<ptr target="https://books.google.com/books?hl=en{\&amp;}lr={\&amp;}id=0R8J71LCLXsC{\&amp;}pgis=1" />
		<title level="m">Analysis of Binary Data, Second Edition</title>
				<meeting><address><addrLine>Boca Raton</addrLine></address></meeting>
		<imprint>
			<publisher>CRC Press</publisher>
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Genetic merit for fertility traits in Holstein cows: I. Production characteristics and reproductive efficiency in a pasture-based system</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Cummins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lonergan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C O</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Berry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Butler</surname></persName>
		</author>
		<ptr target="http://www.ncbi.nlm.nih.gov/pubmed/22365213" />
	</analytic>
	<monogr>
		<title level="j">Journal of Dairy Science</title>
		<imprint>
			<biblScope unit="volume">95</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1310" to="1322" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Freeman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Moisen</surname></persName>
		</author>
		<ptr target="http://www.jstatsoft.org/v23/i11/paper" />
		<title level="m">PresenceAbsence: An R Package for Presence Absence Analysis</title>
				<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Data Analysis Using Regression and Multilevel/Hierarchical Models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hill</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Cambridge University Press</publisher>
			<pubPlace>Cambridge</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">rms: Regression Modeling Strategies</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">E</forename><surname>Harrell</surname></persName>
		</author>
		<ptr target="http://cran.r-project.org/package=rms" />
	</analytic>
	<monogr>
		<title level="m">R package version</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="3" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Goodness of fit tests for the multiple logistic regression model</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Hosmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lemeshow</surname></persName>
		</author>
		<idno type="DOI">10.1080/03610928008827941</idno>
		<ptr target="http://www.tandfonline.com/doi/abs/10.1080/03610928008827941" />
	</analytic>
	<monogr>
		<title level="j">Communications in Statistics -Theory and Methods</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1043" to="1069" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Applied Logistic Regression</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Hosmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lemeshow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">X</forename><surname>Sturdivant</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>Wiley</publisher>
			<pubPlace>Hoboken</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Resource Selection Functions Based on Use-Availability Data: Theoretical Motivation and Evaluation Methods</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Nielsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Merrill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Mcdonald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Boyce</surname></persName>
		</author>
		<idno type="DOI">10.2193/0022-541X(2006)70[347</idno>
		<ptr target="http://dx.doi.org/10.2193/0022-541X(2006)70[347" />
	</analytic>
	<monogr>
		<title level="j">The Journal of Wildlife Management</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="347" to="357" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
	<note>2.0.CO;2</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Classification and Regression by randomForest</title>
		<author>
			<persName><forename type="first">A</forename><surname>Liaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiener</surname></persName>
		</author>
		<ptr target="http://cran.r-project.org/doc/Rnews/" />
	</analytic>
	<monogr>
		<title level="j">R News</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="18" to="22" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Comparison of the predicted and observed secondary structure of T4 phage lysozyme</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">W</forename><surname>Matthews</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BBA -Protein Structure</title>
		<imprint>
			<biblScope unit="volume">405</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="442" to="451" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dimitriadou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hornik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Weingessel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Leisch</surname></persName>
		</author>
		<title level="m">e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071)</title>
				<meeting><address><addrLine>TU Wien</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
	<note>R package version</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Advanced data mining techniques</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Olson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Delen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>Springer Science &amp; Business Media</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">Team</forename><surname>Core</surname></persName>
		</author>
		<ptr target="http://www.r-project.org/" />
		<title level="m">R: A Language and Environment for Statistical Computing</title>
				<meeting><address><addrLine>Vienna, Austria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>R Foundation for Statistical Computing</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms</title>
		<author>
			<persName><forename type="first">S</forename><surname>Shahinfar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Page</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guenther</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Cabrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fricke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Weigel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Dairy Science</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The effect of Lameness before and during the breeding season on fertility in 10 pasture-based Irish dairy herds</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Somers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huxley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Lorenz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Doherty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>O'grady</surname></persName>
		</author>
		<ptr target="http://www.irishvetjournal.org/content/68/1/14" />
	</analytic>
	<monogr>
		<title level="j">Irish Veterinary Journal</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="issue">14</biblScope>
			<biblScope unit="page" from="1" to="7" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Assessment of the adequacy of mathematical models</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">O</forename><surname>Tedeschi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Agricultural Systems</title>
		<imprint>
			<biblScope unit="volume">89</biblScope>
			<biblScope unit="page" from="225" to="247" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">rpart: Recursive Partitioning and Regression Trees</title>
		<author>
			<persName><forename type="first">T</forename><surname>Therneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Atkinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ripley</surname></persName>
		</author>
		<ptr target="http://cran.r-project.org/package=rpart" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Random decision forests</title>
		<author>
			<persName><forename type="first">Kam</forename><surname>Tin</surname></persName>
		</author>
		<author>
			<persName><surname>Ho</surname></persName>
		</author>
		<ptr target="http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=598994" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of 3rd International Conference on Document Analysis and Recognition</title>
				<meeting>3rd International Conference on Document Analysis and Recognition</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="278" to="282" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
