<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A comparison of machine learning techniques for predicting insemination outcome in Irish dairy cows</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Caroline Fenlon</string-name>
          <email>caroline.fenlon@ucdconnect.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luke O'Gradyy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Dunnion</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurence Shallooz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen Butlerz</string-name>
          <email>stephen.butlerg@teagasc.ie</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Dohertyy</string-name>
          <email>michael.dohertyg@ucd.ie</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, University College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <issue>0</issue>
      <abstract>
        <p>Reproductive performance has an important e ect on economic e ciency in dairy farms with short yearly periods of breeding. The individual factors a ecting the outcome of an arti cial insemination have been extensively researched in many univariate models. In this study, these factors are analysed in combination to create a comprehensive multivariate model of conception in Irish dairy cows. Logistic regression, Nave Bayes, Decision Tree learning and Random Forests are trained using 2,723 arti cial insemination records from Irish research farms. An additional 4,205 breeding events from commercial dairy farms are used to evaluate and compare the performance of each data mining technique. The models are assessed in terms of both discrimination and calibration ability. The logistic regression model was found to be the most useful model for predicting insemination outcome. This model is proposed as being appropriate for use in decision support and in general simulation of Irish dairy cows.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Dairy production systems in Ireland are primarily based on seasonal calving
patterns. Reproductive performance in these systems has an important impact on
economic e ciency. In these pasture-based farms, the aim is to align peak grass
availability with peak lactating cow energy demands, by breeding animals during
a set time period. Poor reproductive performance results in extended periods of
calving, suboptimal utilisation of pastures and increased feed costs.</p>
      <p>
        The individual factors a ecting conception have been extensively researched.
However, few models have comprehensively examined the factors in uencing the
outcome of insemination in combination, particularly at the individual breeding
event level [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Most statistical analysis has focused on identifying important
factors in isolation and analysing overall measures of reproductive performance,
such as calving to conception interval or the probability of conception during a
breeding season [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Statistically important factors incorporating both genetic and phenotypic
effects (parity, stage of lactation, calving events, measures of energy balance and
milk production) were identi ed as signi cant in previous analyses of records
from Irish herds [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Binary logistic regression was used to form a predictive
model of conception outcome. In this study, the aim was to identify and apply
other appropriate machine learning techniques to the problem of predicting
insemination outcome. To allow direct comparison of the models, they were all
built using the same variables as the previous study.
      </p>
      <p>
        When evaluating binary predictions, two categories of assessment are
possible: discrimination and calibration [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Discrimination measures a model's
ability to correctly classify cases; i.e. the separation between the successful and
unsuccessful outcomes. Evaluations of discrimination depend on a cut-o point
to transform the predicted probabilities into outcomes and ignore the raw
predictions. Classi cation tables show the rate of correct class predictions, separated
by positive and negative instances. These values can be used to calculate
precision and recall [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. To identify the optimal cut-o point, receiver operating
characteristic (ROC) curves are used to plot the false-positive rate against the
true-positive.
      </p>
      <p>
        Calibration compares the predictions to the true proportions of events
occurring, i.e. determining if the observed frequency of occurrence is similar to the
predicted probability, within groups of records. Reliability measures such as the
Hosmer-Lemeshow test [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are used to test overall goodness-of- t. Calibration
plots [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] allow visual inspection of deviation, with statistical tests for analysis of
bias and spread. Analysis of deviances may be used to highlight outlying records
or covariate values.
      </p>
      <p>As breeding outcome may be considered both in terms of the probability
of occurrence and the binary prediction, the models used were compared using
both forms of assessment. Evaluation was carried out on an external dataset of
records from typically managed commercial Irish dairy herds.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>Data</title>
        <p>The data available for model training were sourced from the centralised database
at Teagasc's Animal and Grassland Research and Innovation Centre, Moorepark,
Co. Cork. The animals included in the dataset were from the Curtins and
Ballydague spring-calving research herds, both of which emulate typical Irish dairy
management systems. Additional variables were available in this dataset which
were used to nd the signi cant factors in the modelling process. After cleaning,
inference and missing value removal, 2,723 arti cial insemination service records
from 658 lactating cows (1,552 lactations) were available for analysis. Service
outcome (i.e. conception or no conception) was recorded as a binary variable
and was con rmed by ultrasound pregnancy diagnosis between 30 and 60 days
post-service or subsequent calving 282 15 days after conception. 47.88% of the
services resulted in conception. The variables analysed were: parity (the number
of times the cow has previously calved); log days in milk (days since last
calving); inter-service interval; the di culty of the last calving; body condition score
(measure of how fat or thin the cow is), as a second-order polynomial e ect due
to its non-linear relationship with conception probability; and genetic traits for
milk production and calving interval.</p>
        <p>
          Observations within the external testing dataset were recorded on 9
commercial dairy farms involved in a herd fertility consultancy program operated
by the School of Veterinary Medicine, University College Dublin (UCD) [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
4,205 services from 1,471 cows (2,702 lactations) were available for prediction.
The same measurements as in the training set were available. 47.49% of these
services were successful.
        </p>
        <p>
          Descriptive statistics from both datasets are shown in Table 1. All data
manipulation, analysis and evaluation were carried out using the R statistical
programming language [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and R libraries.
        </p>
        <sec id="sec-2-1-1">
          <title>Variable</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Parity</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Days in milk</title>
        </sec>
        <sec id="sec-2-1-4">
          <title>Calving interval genetic trait</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>Milk genetic trait</title>
        </sec>
        <sec id="sec-2-1-6">
          <title>Body condition score at breeding</title>
          <p>Training data
mean (SD)
2.48 (1.51)
91.86 (29.83)
-3.32 (2.68)
82.55 (184.91)
2.89 (0.31)</p>
          <p>Testing data
mean (SD)
2.78 (1.74)
85.60 (28.83)
-2.90 (2.47)
169.33 (153.00)
2.86 (0.22)
Four widely-used methods capable of modelling binary values or probabilities
were used to model the outcome of breeding to service.</p>
          <p>
            Logistic Regression. Binary logistic regression [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] (R function glm [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]) is
a generalisation of simple linear regression designed to model the e ect of
independent variables on the probability of the modelled outcome occurring. Logistic
regression assumes all independent variables are normally distributed and not
strongly correlated. Regression analysis allows for interactions between
independent variables to be included in the model. Random e ects can be incorporated
to account for the in uence of unmeasurable events or global e ects. In this
study, a basic logistic regression model without interactions or random e ects
was built to allow for direct comparison with other models. Logistic regression
models predict the probability of the event occurring, which can then be
transformed to a binary outcome using a threshold probability.
          </p>
          <p>
            Nave Bayes. The implementation of Nave Bayes used in this study (e1071
library function naiveBayes [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]) also makes the assumption that numeric
features are normally distributed, but assumes no dependencies between them.
If known, a-priori probabilities can be set; in this case, the overall conception rate
was used. The Bayes rule calculates the probability of each potential outcome,
given the a-priori probabilities and the input values. The outcome with the
highest probability is then chosen as the predicted result.
          </p>
          <p>
            Decision Tree. Tree models are created by recursively splitting the training
dataset into subsets based on the value of an attribute. The next node is chosen
by nding the attribute that can provide the most information when splitting the
set. Cut-o thresholds are generated to discretise numeric variables. Using the
rpart function (from the R library of the same name [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]) results in probabilistic
terminal nodes for binary outcomes.
          </p>
          <p>
            Random Forest. Random forests (randomForest library function randomForest
[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]) are an ensemble learning method for Decision Trees. It uses both
bootstrapping and random feature selection to train a large number of Decision Trees [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ].
In this study, random forests with 100, 250 and 500 trees were built.
2.3
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Evaluation</title>
        <p>
          Discrimination analysis. For each of the models, the true and predicted
service outcomes (given a threshold probability of 50%) were tabulated in a
confusion matrix. From this, precision, recall and F-measure were calculated. The
Matthews correlation coe cient was also calculated to show the performance of
the models in comparison with a random classi er [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. It ranges from -1
(completely inaccurate predictions) to +1 (completely accurate predictions), with 0
indicating the same performance as random prediction.
        </p>
        <p>Receiver operating characteristic (ROC) curves were used to assess how
performance varied as the discrimination threshold was altered. The plot presents
the true positive rate against the false positive rate, allowing the optimal
probability or classi er to be interpreted visually or using summary statistics, such
as the area under the curve.</p>
        <p>Calibration analysis. Each model was used to predict the probability of
conception occurring in each row of the test set, using the predict function with
appropriate arguments.</p>
        <p>
          The Hosmer-Lemeshow test [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] was used to evaluate the overall
goodness-oft of the models on the testing data. The test (R function hoslem.test from
the ResourceSelection [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] package) splits the observations (sorted by predicted
probability) into 10 equal-sized groups of risk and compares the observed
number of events to the mean predicted number of events within each group. The
disadvantage of overall goodness-of- t tests is that they cannot identify more
speci c cases of poor prediction [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. For a thorough investigation of capabilities,
they should be used in conjunction with the more in-depth tests of calibration
described below.
        </p>
        <p>
          For each set of model predictions, a calibration plot was drawn by
grouping the observations into 25 equi-interval bins and plotting the mean predicted
probability against the proportion of true events within each group. The data
were split into 25 to allow for acceptable-sized groups while still maintaining low
within-group probability variation. Bins containing fewer than 20 records were
not plotted. Con dence intervals for the proportions of successful inseminations
were calculated using the F distribution (calibration.plot function of the
PresenceAbsence R package [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]).
        </p>
        <p>
          Binned prediction deviations were visually examined for patterns. 95% of the
binned values should lie within two standard deviations of 0 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The absolute
group deviances were averaged to nd the mean absolute calibration error.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>All of the variables described were signi cant at P = 0.05 (using the drop1
function on the logistic regression model).
3.1</p>
      <sec id="sec-3-1">
        <title>Discrimination</title>
        <p>The ROC curve of each of the models is shown in Figure 1. The confusion matrix
for each model is in Table 2. Discrimination test results (precision, recall, F-score
and Matthews correlation coe cient) are in Table 3. All of the models performed
similarly in these tests, with F-scores ranging from 50.01% to 52.03%. All of the
models performed better than a random classi er in the Matthews correlation
coe cient (range 0.11 to 0.16).
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Calibration</title>
        <p>Results of statistical tests carried out to measure calibration and
goodness-oft are shown in Table 4. These results can be seen visually in the calibration
(Figure 2) and deviance plots (Figure 3).</p>
        <sec id="sec-3-2-1">
          <title>Logistic Regression</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Nave Bayes</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>Decision Tree</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>Random Forest (100 trees)</title>
        </sec>
        <sec id="sec-3-2-5">
          <title>Random Forest (250 trees)</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>Random Forest (500 trees)</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>Predicted True</title>
        </sec>
        <sec id="sec-3-2-8">
          <title>Predicted False</title>
        </sec>
        <sec id="sec-3-2-9">
          <title>Predicted True</title>
        </sec>
        <sec id="sec-3-2-10">
          <title>Predicted False</title>
        </sec>
        <sec id="sec-3-2-11">
          <title>Predicted True</title>
        </sec>
        <sec id="sec-3-2-12">
          <title>Predicted False</title>
        </sec>
        <sec id="sec-3-2-13">
          <title>Predicted True</title>
        </sec>
        <sec id="sec-3-2-14">
          <title>Predicted False</title>
        </sec>
        <sec id="sec-3-2-15">
          <title>Predicted True</title>
        </sec>
        <sec id="sec-3-2-16">
          <title>Predicted False</title>
        </sec>
        <sec id="sec-3-2-17">
          <title>Predicted True</title>
        </sec>
        <sec id="sec-3-2-18">
          <title>Predicted False</title>
          <p>Conceived
895
1102
928
1069
924
1073
981
1016
988
1009
989
1008
0.00
0.00
0.25</p>
          <p>0.50
False positive rate</p>
          <p>There were no signi cant di erences found between the true and predicted
logistic regression and Decision Tree outcomes with the Hosmer-Lemeshow test.
The test found signi cant di erences between the true outcomes and the
predictions from the Nave Bayes and all of the Random Forest models.</p>
          <p>The models had mean absolute calibration error ranging from 3.48% to
6.40%, with the Random Forest model built with 100 trees having the
highest rate of calibration error. The Decision Tree just exceeds the accepted limit
of 5% of deviance values outside the two standard deviation limit. The Nave
Bayes and all of the Random Forest models were well above this limit. Some
evidence of a deviance pattern is seen in the Nave Bayes deviance plot, while a
very clear pattern is observed for the Random Forest models.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The logistic regression model had the best calibration performance; its
calibration error was lowest, along with the most compact deviance spread. The model's
F-score was similar to the other models, but it had the highest precision and
lowest recall. Its Matthews correlation coe cient was the highest of the models.
0.0
1.0</p>
      <p>The Nave Bayes model failed the Hosmer-Lemeshow test of overall
goodnessof- t, and the calibration plot showed some points outside the 95% con dence
interval. With 20% of its deviance values outside two standard deviations of
0 and some observation of systematic deviance, it showed poor capability of
predicting the probability of conception. This was in spite of discrimination
performance comparable to the rest of the models.</p>
      <p>The probabilities predicted from the Decision Tree model had a very
narrow range; only four distinct probabilities were predicted, resulting in only two
probability groups with enough records to display on the calibration plot. This
also reduced the number of rows used to calculate the Hosmer-Lemeshow test
statistic. Although the discrimination evaluation of the Decision Tree did not
di er greatly from the other models, its poor calibration performance makes it
an unsuitable choice for predicting the outcome of service.</p>
      <p>
        Because the algorithm continues to create trees until every record is correctly
classi ed, the Random Forests were perfect models of the training data used to
build them. Although these models had the best test performance in terms of
discrimination, their calibration results were poor. The calibration plots show
signi cant bias, and the distinctly non-random deviance plots indicate that the
models are not capturing some important element related to the outcome [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Data that are not well separated along di erent outcomes will be very
common in epidemiological applications, where probabilities close to 1 or 0 are
uncommon and most in-group probabilities tend to be centred close to 50%. The
bene t of modelling these outcomes is to identify events with probabilities
outside the norm. This can aid the decision making of farmers and their advisors
when selecting the best animals for costly insemination techniques such as sexed
semen [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Because the probability is the focus, rather than the ultimate
outcome, a predictive model with good calibration is key. Thus the logistic regression
model is the best model for predicting service outcome. Easily interpretable
coe cients or odds ratios may be used to inform farmers about the important risk
factors for service outcome.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        This paper demonstrates a novel application of machine learning algorithms in
the context of Irish agriculture. Each technique was trained using data from
research herds and tested with an external dataset representing the typical
commercial dairy herd in Ireland. The methods implemented all show similar
discriminative ability, but logistic regression was found to be the most capable at
correctly predicting the probability of conception. Further improvements to the
model might be made using regression with ensemble methods such as bagging
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>This is, to the authors' knowledge, the rst time comprehensive statistical
modelling of service outcome in Irish cows has been reported. Having a
generalisable predictive model of how various risk factors combine to in uence the
probability of conception will aid farmers to better understand the performance
potential of their animals when making management decisions, such as culling
or selection of herd replacements. In addition, the fact that the model is based
on easily recordable and obtainable data should further increase the practical
utility of the model as a decision support tool. As well as the stand-alone
bene ts of the model, it is being integrated into a detailed whole-farm model of
Irish dairy animals, which will simulate nutrition, reproduction, management
and economics in daily time-steps for the entire life of each animal.
Acknowledgements. This research was supported by funding from the Dairy
Levy Research Fund. The authors would like to thank Anne Geoghegan in
Teagasc, Moorepark and the farmers involved in the UCD School of Veterinary
Medicine consultancy programme for assistance in gathering the data used in
this study. The suggestions of the AICS reviewers were gratefully received.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.: Bagging</given-names>
          </string-name>
          <string-name>
            <surname>Predictors</surname>
          </string-name>
          .
          <source>Machine Learning</source>
          <volume>24</volume>
          (
          <issue>421</issue>
          ),
          <volume>123</volume>
          {
          <fpage>140</fpage>
          (
          <year>1996</year>
          ), http://link.springer.com/10.1007/BF00058655
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Sullivan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mee</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>R.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Relationships among milk yield, body condition, cow weight, and reproduction in spring-calved HolsteinFriesians</article-title>
          .
          <source>Journal of Dairy Science</source>
          <volume>86</volume>
          (
          <issue>7</issue>
          ),
          <volume>2308</volume>
          {
          <fpage>2319</fpage>
          (
          <year>2003</year>
          ), http://dx.doi. org/10.3168/jds.S0022-
          <volume>0302</volume>
          (
          <issue>03</issue>
          )
          <fpage>73823</fpage>
          -
          <lpage>5</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Butler</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutchinson</surname>
            ,
            <given-names>I.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cromie</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shalloo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Applications and cost bene ts of sexed semen in pasture-based dairy production systems</article-title>
          .
          <source>Animal 8 Suppl</source>
          <volume>1</volume>
          (
          <issue>s1</issue>
          ),
          <volume>165</volume>
          {
          <fpage>72</fpage>
          (
          <year>2014</year>
          ), http://journals.cambridge.org/ abstract{\_}
          <fpage>S1751731114000664</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldszmidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Properties and Bene ts of Calibrated Classi ers</article-title>
          .
          <source>In: Proceedings of ECML</source>
          , pp.
          <volume>125</volume>
          {
          <fpage>148</fpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coleman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pierce</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horan</surname>
            ,
            <given-names>B.:</given-names>
          </string-name>
          <article-title>The in uence of genetic selection and feed system on the reproductive performance of springcalving dairy cows within future pasture-based production systems</article-title>
          .
          <source>Journal of Dairy Science</source>
          <volume>92</volume>
          (
          <issue>10</issue>
          ),
          <volume>5258</volume>
          {
          <fpage>5269</fpage>
          (
          <year>2009</year>
          ), http://dx.doi.org/10.3168/jds. 2009-
          <fpage>2108</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snell</surname>
            ,
            <given-names>E.J.:</given-names>
          </string-name>
          <article-title>Analysis of Binary Data, Second Edition</article-title>
          . CRC Press, Boca Raton (
          <year>1989</year>
          ), https://books.google.com/books?hl=en{\&amp;}lr={\&amp;}id= 0R8J71LCLXsC{\&amp;}pgis=
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cummins</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lonergan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>A.C.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>R.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butler</surname>
          </string-name>
          , S.T.:
          <article-title>Genetic merit for fertility traits in Holstein cows: I. Production characteristics and reproductive e ciency in a pasture-based system</article-title>
          .
          <source>Journal of Dairy Science</source>
          <volume>95</volume>
          (
          <issue>3</issue>
          ),
          <volume>1310</volume>
          {
          <fpage>22</fpage>
          (
          <year>2012</year>
          ), http://www.ncbi.nlm.nih.gov/pubmed/22365213
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Freeman</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moisen</surname>
          </string-name>
          , G.:
          <article-title>PresenceAbsence: An R Package for Presence Absence Analysis (</article-title>
          <year>2008</year>
          ), http://www.jstatsoft.org/v23/i11/paper
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gelman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Data Analysis Using Regression and Multilevel/Hierarchical Models</article-title>
          . Cambridge University Press, Cambridge (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Harrell</surname>
            ,
            <given-names>F.E.</given-names>
          </string-name>
          <article-title>: rms: Regression Modeling Strategies</article-title>
          .
          <source>R package version 4</source>
          .3-
          <fpage>1</fpage>
          (
          <year>2015</year>
          ), http://cran.r-project.org/package=rms
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hosmer</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemeshow</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Goodness of t tests for the multiple logistic regression model</article-title>
          .
          <source>Communications in Statistics - Theory and Methods</source>
          <volume>9</volume>
          (
          <issue>10</issue>
          ),
          <volume>1043</volume>
          {
          <fpage>1069</fpage>
          (
          <year>1980</year>
          ), http://www.tandfonline.com/doi/abs/10. 1080/03610928008827941
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hosmer</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemeshow</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturdivant</surname>
            ,
            <given-names>R.X.</given-names>
          </string-name>
          : Applied Logistic Regression. Wiley, Hoboken (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , C.J.,
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merrill</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>T.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyce</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          :
          <article-title>Resource Selection Functions Based on Use-Availability Data: Theoretical Motivation and Evaluation Methods</article-title>
          .
          <source>The Journal of Wildlife Management</source>
          <volume>70</volume>
          (
          <issue>2</issue>
          ),
          <volume>347</volume>
          {
          <fpage>357</fpage>
          (
          <year>2006</year>
          ), http://dx.doi.org/10.2193/
          <fpage>0022</fpage>
          -
          <lpage>541X</lpage>
          (
          <year>2006</year>
          )
          <volume>70</volume>
          [347: RSFBOU]
          <article-title>2.0</article-title>
          .CO;
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Liaw</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiener</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Classi cation and Regression by randomForest</article-title>
          .
          <source>R News</source>
          <volume>2</volume>
          (
          <issue>3</issue>
          ),
          <volume>18</volume>
          {
          <fpage>22</fpage>
          (
          <year>2002</year>
          ), http://cran.r-project.org/doc/Rnews/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Matthews</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          :
          <article-title>Comparison of the predicted and observed secondary structure of T4 phage lysozyme</article-title>
          .
          <source>BBA - Protein Structure</source>
          <volume>405</volume>
          (
          <issue>2</issue>
          ),
          <volume>442</volume>
          {
          <fpage>451</fpage>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Meyer, D.,
          <string-name>
            <surname>Dimitriadou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornik</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weingessel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leisch</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071)</article-title>
          ,
          <source>TU Wien</source>
          ,
          <year>2015</year>
          . R package version pp.
          <volume>1</volume>
          {
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Olson</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Advanced data mining techniques</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          , New York (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>R</given-names>
            <surname>Core Team: R: A Language</surname>
          </string-name>
          and
          <article-title>Environment for Statistical Computing</article-title>
          . R Foundation for Statistical Computing, Vienna, Austria (
          <year>2015</year>
          ), http://www. r-project.org/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Shahinfar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guenther</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cabrera</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fricke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weigel</surname>
          </string-name>
          , K.:
          <article-title>Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms</article-title>
          .
          <source>Journal of Dairy Science</source>
          <volume>97</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Somers</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huxley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorenz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doherty</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Grady</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The e ect of Lameness before and during the breeding season on fertility in 10 pasture-based Irish dairy herds</article-title>
          .
          <source>Irish Veterinary Journal</source>
          <volume>68</volume>
          (
          <issue>14</issue>
          ),
          <volume>1</volume>
          {
          <issue>7</issue>
          (
          <issue>2015</issue>
          ), http://www. irishvetjournal.org/content/68/1/14
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Tedeschi</surname>
            ,
            <given-names>L.O.</given-names>
          </string-name>
          :
          <article-title>Assessment of the adequacy of mathematical models</article-title>
          .
          <source>Agricultural Systems</source>
          <volume>89</volume>
          ,
          <fpage>225</fpage>
          {
          <fpage>247</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Therneau</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atkinson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ripley</surname>
          </string-name>
          , B.:
          <article-title>rpart: Recursive Partitioning and Regression Trees (</article-title>
          <year>2015</year>
          ), http://cran.r-project.org/package=rpart
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. Tin Kam Ho:
          <article-title>Random decision forests</article-title>
          .
          <source>Proceedings of 3rd International Conference on Document Analysis and Recognition</source>
          <volume>1</volume>
          ,
          <issue>278</issue>
          {
          <fpage>282</fpage>
          (
          <year>1995</year>
          ), http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=
          <fpage>598994</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>