<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis and Enhancement of Conditional Random Fields Gene Mention Taggers in BioCreative II Challenge Evalua- tion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yu-Ming Chang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cheng-Ju Kuo</string-name>
          <email>clarkkuo@iis.sinica.edu.tw</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Han-Shen Huang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu-Shi Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chun-Nan Hsu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, National Taiwan University</institution>
          ,
          <addr-line>Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Bioinformatics, National Yang-Ming University</institution>
          ,
          <addr-line>Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Information Science</institution>
          ,
          <addr-line>Academia Sinica, Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Background: Tagging gene and gene product mentions in scientific text is an important initial step of literature mining. In BioCreative 2 challenge, the conditional random fields model (CRF) was the most prevailing method in the gene mention task. In this paper, we analyze two best performing CRF-based systems in BioCreative 2. We examine their key claims and propose enhancement based on the analysis results. Results: We implemented their systems in MALLET as specified in their report and in CRF++, a different CRF package, to empirically analyze their claims. We found that their feature set is effective for models trained by MALLET, but a smaller set works better for those by CRF++. We confirmed the effectiveness of pairing parentheses as a post processing step. We found that backward parsing is not always superior to forward parsing. The benefit of applying bidirectional parsing is the creation of a wider variety of complementary models. We elaborated the notion of divergent models by relating it to the difference of the increments of ture positives and false positives of the union model. Conclusions: To further enhance the performance, we can integrate more models based on the elaborated notion of divergent models that we derived to minimize the number of models required.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background</title>
      <p>At present, scientific literature is still the largest and most reliable source of biomedical knowledge. A great
deal of efforts have been devoted to literature mining in attempts to extract large volumes of biomedical
facts, such as protein-protein interactions and disease-gene associations, from the literature. Curation of
large-scale experimental data generated by high-throughput experimental methods also depends on
literature mining. Literature mining usually takes many complex steps. Tagging gene and gene product
mentions in scientific text is an important initial step. However, gene mention tagging is particularly
difficult because authors rarely use standardized gene names and gene names naturally co-occur with other
types that have similar morphology, and even similar context.</p>
      <p>
        The second BioCreative challenge (BioCreative 2) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a recent competition for biological literature
mining systems. It took place in 2006 and followed by a workshop in April of 2007. This challenge
consisted of a gene mention task, a gene normalization task and protein-protein interaction tasks. The gene
mention task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] evaluated how accurate a computer program can automatically tag gene names in
sentences extracted from MEDLINE abstracts. Participants were given a tagged training corpus to develop
their systems and an untagged test corpus to apply their systems for evaluation. The training corpus
contains 15,000 sentences and the test corpus 5,000 sentences. Each run submitted by a participant was
evaluated based on F-score, F := 2prp·+10r0% , where p is precision and r is recall. A total of 21 participants
submitted 3 runs to the challenge. The highest achieved F-score was 87.21.
      </p>
      <p>
        In BioCreative 1 held in 2004, the conditional random fields model (CRF) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] was applied in the gene
metion tagging task and achieved high F-scores [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Of the 21 participants in BioCreative 2, 11 chose the
CRF model. Apparently, CRF has become the most prevailing method in this task. In this paper, we
analyze two CRF-based systems in BioCreative 2. One of them is Kuo et al.’s system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which is the best
performing system based on CRF in BioCreative 2 (ranked 2nd). Its performance is not statistically
significantly worse than any other system, and its performance is the best among all systems for a test
corpus re-weighted to reflect the distribution of a random sentence extracted from MEDLINE. The other is
Huang et al.’s system [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which combines CRF with support vector machines (SVM) to achieve one of the
best F-scores (ranked 3rd) in BioCreative 2. In fact, even the top performing system is not statistically
significantly better than this system. The high performance of this combo system reconfirms a well-known
strategy that combining multiple complementary models always improve the performance. Aside from their
performance, both systems are interesting because they were built mostly on top of open source software
packages for CRF and NLP, which makes it possible to duplicate their results.
      </p>
      <sec id="sec-1-1">
        <title>Kuo et al.’s System</title>
        <p>
          The key idea of Kuo et al.’s system [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is to combine bidirectional parsing CRF models with a rich feature
set. They used MALLET [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to implement their CRF models to take advantage of its feature induction
capability [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The CRF model is trained to label each token in an input sentence as one of B (beginning of
a gene entity), I (in a gene entity), and O (outside a gene entity). Due to the special characteristics of
name-entities of genes and gene products [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a rich set of features is required to achieve satisfactory
F-scores. Table 1 shows their feature set. Moreover, to include contextual information, they used -2 to 2 as
the offsets to generate contextual features that apply to predicates including words, stemmed words and
word morphology predicates at each position. To extract features, the Genia Tagger [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] was applied for
stemming, tokenization and part-of-speech tagging. They modified the Genia Tagger slightly to tokenize
words with a higher granularity. For example, punctuation symbols within words were segmented. They
also applied a rule-based filter to clean up some easily fixed mistakes, such as entities with unpaired
parentheses or square brackets.
        </p>
        <p>
          To further improve its performance, they combined the tagging results of forward and backward parsing.
In forward parsing, CRF reads and tags the input sentences from left to right, while in backward parsing,
CRF reads and tags the input sentences from right to left. Figure 1 illustrates different parsing directions.
We note that the training set and the “B,I,O” labels must both be reversed to train a backward parsing
CRF model. That is, their backward parsing is equivalent to applying “I,O,E” labelling to reversed
sentences [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] in named entity recognition. They tested the forward and backward parsing models and
found that backward parsing constantly outperformed forward parsing in both recall and precision, but its
reason is unclear. They assumed that some “signals” at the end of entities are more important to well
demarcate boundaries of entities.
        </p>
        <p>Finally, they applied a special method based on likelihood scores and dictionary-filtering, which used a
dictionary-based filter to select entities from the union of the top ten tagging solutions obtained by
MALLET’s n-best option. In fact, the union of the top ten tagging solutions of bidirectional parsing
achieved a nearly perfect recall at 98.10 for the final test, but with 13.87 precision. That is, nearly all true
positives are in this union. They distilled real true positives from this union as follows.
1. Parse the input sentence in both directions to obtain the top ten solutions for each direction with
their output scores;
2. Compute the intersection of bidirectional parsing and select the solution in the intersection that
minimizes the sum of its output scores;
3. For the other 18 solutions, select the labeled terms appearing in a dictionary with its length greater
than three.</p>
        <p>Step 2 is derived from the optimal model integration. Let x be a test sentence, y be a tagging, and mi be a
model where i = 1, 2, . . .. The optimal integration of mi’s is to select y such that
y = arg max Y p(y|x, mi) = arg myin − X log p(y|x, mi).</p>
        <p>
          y i i
Next, they used approved gene symbols and aliases obtained from HUGO [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] as their dictionary for the
final dictionary filtering.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Huang et al.’s System</title>
        <p>
          Huang et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] considered the gene mention tagging task as a classification problem and applied support
vector machines (SVM) to solve it. They selected a large set of features as the input and trained two SVM
models with different multiclass extension methods. They found that backward parsing constantly
outperformed forward parsing regardless of the multiclass extension methods and obtained high precision
rates, but recall rates were not as satisfactory. To enhance recall rates, their approach is to construct
divergent but high performance models to cover different aspects of the feature space, and then combine
them into an ensemble. They also applied union and intersection to combine the outputs of SVM models
with that of a CRF model, which was trained with the same feature set, and successfully enhanced recall
rates without degrading too much precision. They chose Yet Another Multipurpose Chunk Annotator
(YamCha) [
          <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
          ] to build their SVM models because it is tuned for name entity chunking tasks. They
designed their features based on the experience and previous works on named entity recognition [
          <xref ref-type="bibr" rid="ref14 ref4">4, 14</xref>
          ].
Table 3 shows the set of features. There are a total of 617,515 features in the feature set.
They also used an inside/outside representation for gene mention tagging with B, I, and O class labels.
Since SVM is an intrinsic binary classifier, extensions must be made to handle multiclass problem. They
applied two popular methods to extend a binary classifier to multiclass: one vs. all and one vs. one. They
also trained a conditional random field (CRF) model to increase the divergence of the ensemble.
Table 4 shows the final test results of this model, as well as the final results of the unions of CRF with the
two SVM models. The results show that the simple ensemble model significantly enhanced recall, with all
recall results ranked in the top quartile, while precision results dropped slightly. All F-score results were
ranked in the top quartile among 21 participants, too.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results and Discussion</title>
      <p>
        In this section, we present the results of our investigation of the key claims of these systems with
discussions. We developed a JAVA feature extractor for MALLET to duplicate their results. To investigate
the portability, we also applied CRF++ [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], another free package for CRF training. In addition, we
applied CTJPGIS, a new algorithm for CRF training to compare its performance with L-BFGS [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the de
facto standard algorithm for CRF training. The derivation of CTJPGIS is presented in method section.
Though our focus is mainly on CRF, we also include the YamCha implementation of SVM in our
investigation.
      </p>
      <sec id="sec-2-1">
        <title>Feature Selection</title>
        <p>To investigate the impact of feature selection, our plan was to remove a subset of features corresponding to
a feature type and observe its impact to the performance. This is expensive because training CRF takes
time. We duplicated Kuo et al.’s feature set as shown in Table 1 for models trained by MALLET. This set
will be denoted as F1. We also duplicated Huang et al.’s feature set as shown in Table 3. This set will be
denoted as F2. The difference between these two sets of features includes the N-grams in F1 and the prefix
and suffix features in F2. We changed Huang et al’s set and slightly to improve the performance of SVM
by removing orthographic features. We used this new set of features for models trained by CRF++
because YamCha and CRF++ share the same input format of the features. We compared the performance
of the CRF++ tagger with F2 including and not including orthographic features. The performance was
improved slighly without orthographic features. Moreover, the F-score 87.12 by CRF++ is already higher
than Kuo et al.’s system and this is achieved by a single CRF model without model integration.
We tried our MALLET model to see if the removal of orthographic features also helps. It turns out that
the F-score drops significantly by 4 percentage points. The results are shown in Table 5. We removed other
subsets from the feature set for MALLET and observed similar performance degradation. Therefore, F1, as
given in Table 1, is effective for models trained by MALLET, but a smaller set works better for those by
CRF++, and the selection of best features depends on the CRF package we used. In the following
discussion, models trained by MALLET will use F1 as the feature set while models trained by CRF++ will
use F2 excluding orthographic features F2−O, which will also be used by SVM models trained by YamCha.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Post Processing</title>
        <p>We implemented the post processing step that resolves problems caused by unpaired parenthesis. For
example, suppose the input sentence is</p>
        <p>. . . implicated the NIMA (never in mitosis, gene A)-related kinase-6 (NEK6) . . .
but the model tags “gene A)-related kinase-6” as a gene mention. This tagged entity is obviously incorrect
because it includes only a right parenthesis. The post processing program will first find the left parenthesis
in the sentence, and then search if there is a stop word or parenthesis at the left side of the left parenthesis.
In this example, the second token “the” conforms to the condition. At last, the program will extend the
tagged entity to the token right after the stop word “the” and output the extended string “NIMA (never in
mitosis, gene A)-related kinase-6” as the tagged entity, which is a correct tagging. When the tagged entity
contains only a left parenthesis, reverse the search direction for parenthesis and stop word.
Table 6 shows the improvement of the post processing for two different models. Though the improvement
is less than a percentage point in F-score, for all models, both precision and recall are improved, suggesting
no trade-off and the effectiveness of pairing parentheses as a post processing step.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Backward and Forward Parsing</title>
        <p>The most prominent feature of Kuo et al. and Huang et al’s systems is the use of backward parsing.
However, the superiority of backward parsing is also the most speculative. Intuitively, it can be explained
that some “signals” at the end of gene entities enable the system to recognize them more accurately. For
example,</p>
        <p>. . . zinc finger and BTB domain-containing protein 39 . . .</p>
        <p>The words highlighted at the end appear to be the “signal.” The empirical results presented in their report
appeared to be evidential, but it is not clear whether the claim is applicable to other data sets or whether
the claim is only specific to the choice of feature sets.</p>
        <p>We started by investigating the difference of the trained models of the forward and backward parsing by
applying MALLET with the same feature set used in Kuo et al.’s system. We counted the percentage
differences between the forward and backward parsing models of nonzero predicates and errors made at
each type of class transition positions. Nonzero predicates are important because they are the only
predicates that will be considered by the CRF model when it tags a sentence. The result is shown in
Table 7, which shows that nonzero predicates are different between forward and backward parsing for
transitions involving B’s, but no significant difference in errors made at those transitions. There appears
no correlation between the differences and the errors made.</p>
        <p>
          Then we applied CRF++ to see if the superiority of backward parsing is portable to another CRF package.
The feature set used by the CRF++ tagger is slightly different from the MALLET version, as discussed in
the sub-section about feature selection. In a nutshell, we compared forward and backward parsing for the
same BioCreative 2 data set but used different CRF packages with different sets of features. The result is
shown in Table 8. It turns out that as reported in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the MALLET tagger performs better applying
backward parsing than forward parsing by 0.47 percentage points in F-score, but on the contrary to Kuo et
al.’s results, our CRF++ tagger performs worse applying backward parsing by 0.05 percentage points. The
difference column in Table 8 shows that the difference between forward and backward parsing models for
MALLET is ten times as many as that for CRF++, which leads to significant improvement of precision by
intersection of forward and backward parsing models for MALLET but tiny improvement for CRF++. We
also compared the log-likelihood scores obtained by MALLET and CRF++ for both parsing directions and
found that the log-likelihood scores are almost the same for CRF++ but different for MALLET.
We conclude that backward parsing is not always superior to forward parsing. The benefit of applying
bidirectional parsing is the creation of a wider variety of complementary models.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Model Integration</title>
        <p>
          Another prominent feature of Kuo et al. and Huang et al.’s systems is that combining divergent but high
performance models always improve the performance. To create divergent but high performance models,
we implemented the following four models:
• The first one is the intersection of forward and backward parsing models trained by MALLET with
its default training algorithm L-BFGS, denoted by MalletL-BFGSint. Intersection was applied to
boost its precision. The feature set used is F1.
• The second one is a forward parsing model trained by CRF++, denoted by CRF++L-BFGS, also using
its default training algorithm L-BFGS.
• We altered the training algorithm of CRF++ from L-BFGS to CTJPGIS to train the third model,
called CRF++CTJPGIS. This new algorithm is introduced to create a wider variety of models. Since
CTJPGIS is derived quite differently from L-BFGS [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], their search paths to the optimum are quite
different, too. Therefore, CTJPGIS can be applied to create a model to complement the model
trained by L-BFGS. Both CRF++ models use F2−O as their feature set.
• The forth model is the forward parsing SVM model trained by YamCha. The input feature set is also
        </p>
        <p>F2−O as discussed in feature selection sub-section.</p>
        <p>Tabel 9 shows the precision, recall and F-score of the four models and their unions. The results show that
with a similar set of features, we can duplicate Kuo et al.’s performance by CRF models trained by a
different software package. In fact, both models trained by CRF++ outperform Kuo et al.’s best system,
and their union outperforms the rank 1 system in BioCreative 2, which achieved a F-score of 87.21. Our
best performing system is the union of two models trained by L-BFGS, achieving a F-score of 87.67. We
note that no external data source other than the training corpus provided by BioCreative 2 was used and
only pairs of integrated models was required to achieve this result.</p>
        <p>Other notable results include that the difference of F-scores between the YamCha model and two CRF++
model is surprisingly large even though they share the same set of features, and that the unions of SVM
and CRF models perform not as well as the unions of CRF model pairs. Intuitively, the variance between
the tagging results of a SVM model and a CRF model is supposed to be larger than that between CRF
models and therefore the union between SVM and CRF models is supposed to complement each other’s
false negatives and perform better. But the results show differently. An explanation is that the
performance of the SVM model is not as good as its CRF counterparts. But the performance of
MalletL-BFGSint is not as good neither, though its precision is very high. We seeked the answer by
examining the difference of true positives and false positives before and after applying union to each pair of
models. Compared to the F-score results in Table 9, we can see that the more increment of TP and the less
increment of FP bring a higher F-score. Figure 2 illustrates this finding with bar charts, which show that
the larger the difference between the increments of TP and FP, the larger the gain of F-score by union. If
the difference is negative, then union may degrade the performance when individual tagger’s performance
is not sufficiently high. This explains why the union of MalletL-BFGSint and CRF++L-BFGS performs
much better than the unions of YamCha and other CRF models.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>We have analyzed the key claims of the two best performing CRF-based gene mention taggers in
BioCreative 2 and proposed simple enhancement that performs better than any system in BioCreative 2.
We showed that the set of features used by Kuo et al. is effective for MALLET, but when applied to
CRF++, it is not as effective as a set with orthographic features removed. Then we showed that balancing
parentheses as a post-processing step always improves the performance. We analyzed a prominent claim
that backward parsing models is superior for gene mention tagging. We found that it may apply to
MALLET models but not apply to CRF++. No significant difference between backward and forward
parsing was observed for CRF++ models. It is not clear why MALLET and CRF++ respond differently
with regards to feature selection and parsing direction. In theory, they implement the same L-BFGS
algorithm for the same CRF model. We suspect that approximation in model inference may play a role in
their different behavior. However, this cannot be elucidated unless we actually trace their code to figure
out the real cause. Finally, we confirmed that integrating divergent models improves the performance. We
elaborated the notion of divergent models by relating it to the difference of the increments of ture positives
and false positives of the union model.</p>
      <p>
        To further enhance the performance, we can integrate more models based on the elaborated notion of
divergent models that we derived. A grand ensembl of all participating systems in BioCreative 2 achieves a
F-score of 90.6 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Our heuristic can be used to minimize the number of models required.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <sec id="sec-4-1">
        <title>CTJPGIS</title>
        <p>
          CTJPGIS is the abbreviation of “the componentwise triple jump method for penalized generalized iterative
scaling.” CTJPGIS is derived from the generalized iterative scaling (GIS) method [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], which is a classical
method to train exponential probabilistic models. However, GIS usually converges slowly, especially when
applied to train a CRF model for large-scale gene mention tagging tasks.
        </p>
        <p>
          Since GIS can also be considered as fixed-point iteration [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], we can apply the triple jump extrapolation
method to speed up its convergence. The triple jump method is an approximation of Aitken’s acceleration
for fixed-point iteration methods. It has been successfully applied to the EM algorithm [
          <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19–22</xref>
          ]. The idea is
to estimate the extrapolation rate by considering the previous two consecutive estimates of the parameter
vectors. The triple jump extrapolation method can effectively accelerate the EM algorithm by substantially
reducing the number of iterations required for the EM algorithm to converge. Though the triple jump
method, as all variants of Aitken’s acceleration, may not monotonically increase the likelihood, we can
apply the idea proposed by [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] to resolve the issue. The idea is to discard the extrapolation if it fails to
improve the likelihood and use the estimate obtained without the extrapolation. In this way, convergence
can be guaranteed [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. CTJPGIS runs as fast as L-BFGS, therefore, we can create many models efficiently.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hirschman</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valencia</surname>
            <given-names>A</given-names>
          </string-name>
          :
          <article-title>Proceedings of the second BioCreative challenge evaluation workshop</article-title>
          .
          <source>CINO Centro Nacional de Investigaciones Oncologicas</source>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wilbur</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanabe</surname>
            <given-names>L</given-names>
          </string-name>
          :
          <article-title>BioCreative 2. Gene Mention Task</article-title>
          .
          <source>In Proceedings of the Second BioCreative Challenge Evaluation Workshop</source>
          <year>2007</year>
          :
          <fpage>7</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lafferty</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            <given-names>F</given-names>
          </string-name>
          :
          <article-title>Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data</article-title>
          .
          <source>In Proceedings of 18th International Conference on Machine Learning (ICML '03)</source>
          <year>2001</year>
          :
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>McDonald</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            <given-names>F</given-names>
          </string-name>
          :
          <article-title>Identifying gene and protein mentions in text using conditional random fields</article-title>
          .
          <source>BMC Bioinformatics</source>
          <year>2005</year>
          , 6:
          <fpage>S6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kuo</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>YM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            <given-names>HS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>KT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>BH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>YS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            <given-names>CN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chung</surname>
            <given-names>IF</given-names>
          </string-name>
          :
          <article-title>Rich Feature Set, Unification of Bidirectional Parsing and Dictionary Filtering for High F-Score Gene Mention Tagging</article-title>
          .
          <source>In Proceedings of the Second BioCreative Challenge Evaluation Workshop</source>
          <year>2007</year>
          :
          <fpage>105</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Huang</surname>
            <given-names>HS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>YS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>KT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuo</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>YM</given-names>
          </string-name>
          , ,
          <string-name>
            <surname>Yang</surname>
            <given-names>BH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            <given-names>CN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chung</surname>
            <given-names>IF</given-names>
          </string-name>
          :
          <article-title>High-Recall Gene Mention Recognition by Unification of Multiple Backward Parsing Models</article-title>
          .
          <source>In Proceedings of the Second BioCreative Challenge Evaluation Workshop</source>
          <year>2007</year>
          :
          <fpage>109</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>McCallum</surname>
            <given-names>AK</given-names>
          </string-name>
          :
          <article-title>MALLET: A Machine Learning for</article-title>
          <source>Language Toolkit</source>
          <year>2002</year>
          . [Http://mallet.cs.umass.edu].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>McCallum</surname>
            <given-names>A</given-names>
          </string-name>
          :
          <article-title>Efficiently inducing features of conditional random fields</article-title>
          .
          <source>In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03</source>
          )
          <year>2003</year>
          [citeseer.ist.psu.
          <source>edu/mccallum03efficiently.html].</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zhou</surname>
            <given-names>GD</given-names>
          </string-name>
          :
          <article-title>Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid</article-title>
          .
          <source>International Journal of Medical Informatics</source>
          <year>2006</year>
          ,
          <volume>75</volume>
          :
          <fpage>456</fpage>
          -
          <lpage>467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tsuruoka</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tateishi</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>JD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohta</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNaught</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsujii</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Developing a Robust Part-of-Speech Tagger for Biomedical Text</article-title>
          .
          <source>In Advances in Informatics - 10th Panhellenic Conference on Informatics</source>
          <year>2005</year>
          :
          <fpage>382</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kudo</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsumoto</surname>
            <given-names>Y</given-names>
          </string-name>
          :
          <article-title>Chunking with support vector machines 2001, [citeseer</article-title>
          .ist.psu.edu/kudo01chunking.html].
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Eyre</surname>
            <given-names>TA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducluzeau</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sneddon</surname>
            <given-names>TP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Povey</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruford</surname>
            <given-names>EA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lush</surname>
            <given-names>MJ</given-names>
          </string-name>
          :
          <source>The HUGO Gene Nomenclature Database</source>
          ,
          <year>2006</year>
          updates.
          <source>Nucleic Acids Research</source>
          <year>2006</year>
          ,
          <volume>34</volume>
          :
          <fpage>D319</fpage>
          -
          <lpage>D321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kudo</surname>
            <given-names>T</given-names>
          </string-name>
          : YamCha: Yet Another Multipurpose CHunk Annotator
          <year>2001</year>
          . [Http://chasen.org/ taku/software/yamcha/].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mitsumori</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fation</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murata</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doi</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doi</surname>
            <given-names>H</given-names>
          </string-name>
          :
          <article-title>Gene/protein name recognition based on support vector machine using dictionary as features</article-title>
          .
          <source>BMC Bioinformatics</source>
          <year>2005</year>
          , 6:
          <fpage>S8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kudo</surname>
            <given-names>T</given-names>
          </string-name>
          : CRF++: Yet Another CRF toolkit
          <year>2005</year>
          . [Http://crfpp.sourceforge.net/].
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Nocedal</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            <given-names>SJ</given-names>
          </string-name>
          : Numerical Optimization.
          <source>Springer</source>
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Darroch</surname>
            <given-names>JN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ratcliff</surname>
            <given-names>D</given-names>
          </string-name>
          :
          <article-title>Generalized iterative scaling for log-linear models</article-title>
          .
          <source>The Annals of Mathematical Statistics</source>
          <year>1972</year>
          ,
          <volume>43</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1470</fpage>
          -
          <lpage>1480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Burden</surname>
            <given-names>RL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faires</surname>
            <given-names>D</given-names>
          </string-name>
          :
          <article-title>Numerical Analysis</article-title>
          .
          <source>PWS-KENT Pub Co</source>
          .
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Hsu</surname>
            <given-names>CN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            <given-names>HS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>BH</given-names>
          </string-name>
          :
          <article-title>Global and Componentwise Extrapolation for Accelerating Data Mining from Large Incomplete Data Set with the EM Algorithm</article-title>
          . To appear
          <source>in Proceedings of the 6th IEEE International Conference on Data Mining (ICDM '06)</source>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Huang</surname>
            <given-names>HS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>BH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            <given-names>CN</given-names>
          </string-name>
          :
          <article-title>Triple-Jump Acceleration for the EM Algorithm</article-title>
          .
          <source>In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM '05)</source>
          <year>2005</year>
          :
          <fpage>649</fpage>
          -
          <lpage>652</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Hesterberg</surname>
            <given-names>T</given-names>
          </string-name>
          :
          <article-title>Staggered Aitken Acceleration for EM</article-title>
          .
          <source>In Proceedings of the Statistical Computing Section of the American Statistical Association</source>
          , Minneapolis, Minnesota,
          <string-name>
            <surname>USA</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Schafer</surname>
            <given-names>JL</given-names>
          </string-name>
          :
          <article-title>Analysis of Incomplete Multivariate Data</article-title>
          . London: Chapman and Hall / CRC Press 1997.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Salakhutdinov</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roweis</surname>
            <given-names>S</given-names>
          </string-name>
          :
          <article-title>Adaptive overrelaxed bound optimization methods</article-title>
          .
          <source>In Proceedings of the Twentieth International Conference on Machine Learning</source>
          <year>2003</year>
          :
          <fpage>664</fpage>
          -
          <lpage>671</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>