<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TagMiner: A Semisupervised Associative POS Tagger E↵ective for Resource Poor Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pratibha Rani</string-name>
          <email>rani@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vikram Pudi</string-name>
          <email>vikram@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dipti Misra Sharma</string-name>
          <email>dipti@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Institute of Information Technology</institution>
          ,
          <addr-line>Hyderabad</addr-line>
          ,
          <country>India pratibha</country>
        </aff>
      </contrib-group>
      <fpage>113</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>We present here, TagMiner, a data mining approach for part-of-speech (POS) tagging, an important Natural language processing (NLP) classification task. It is a semi-supervised associative classification method for POS tagging. Existing methods for building POS taggers require extensive domain and linguistic knowledge and resources. Our method uses combination of a small POS tagged corpus and a raw untagged text data as training data to build the classifier model using association rules. Our tagger works well with very little training data also. The use of semi-supervised learning provides the advantage of not requiring a large high quality tagged corpus. These properties make it especially suitable for resource poor languages. Our experiments on various resource-rich, resource-moderate and resource-poor languages show good performance without using any language specific linguistic information. We note that inclusion of such features in our method may further improve the performance. Results also show that for smaller training data sizes our tagger performs better than state-of-the-art CRF tagger using same features as our tagger.</p>
      </abstract>
      <kwd-group>
        <kwd>Part-of-Speech Tagging</kwd>
        <kwd>Associative Classification</kwd>
        <kwd>Association Rules</kwd>
        <kwd>Semi-supervised Classification</kwd>
        <kwd>NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Part-of-speech (POS) tagging is an important NLP classification task that takes
a word or a sentence as input, assigns a POS tag or other lexical class marker
to a word or to each word in the sentence, and produces the tagged text as
output. For this task several rule based [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], stochastic supervised [
        <xref ref-type="bibr" rid="ref15 ref30 ref6">6, 15, 30</xref>
        ],
and unsupervised [
        <xref ref-type="bibr" rid="ref16 ref2 ref5">2, 5, 16</xref>
        ] approaches are available for a number of languages.
All of these approaches (including the state-of-the-art taggers) require training
data and linguistic resources like dictionaries in large quantities. These taggers
do not perform well for languages which do not have much resources and training
data, referred to as resource poor languages.
      </p>
      <p>The creation of linguistic resources is a time consuming expensive process
which requires expert linguistic knowledge. So, there is a need to develop
semisupervised and generic POS tagging methods which take advantage of raw
untagged corpus and require less or no lexical resources. A few such available
techniques are mentioned in Sect. 2. In order to perform well, these techniques
require a large raw untagged corpus. Unfortunately, for many resource poor
languages, even obtaining this is hard.</p>
      <p>This motivates us to explore data mining methods to build generic POS
tagger. Data mining, being composed of data driven techniques, is a promising
direction to explore or to develop language/domain independent POS tagging
methods. However, direct application of data mining concepts for this task is not
feasible and requires handling various challenges like 1) mapping POS tagging
task to association rule mining problem, 2) developing semi-supervised methods
to extract association rules from training set of tagged and raw untagged data
combined and 3) handling challenges of POS tagging task (discussed in Sect. 4.2),
like class imbalance, data sparsity and phrase boundary problems.</p>
      <p>
        Associative classification [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] is a well known data mining based
classification approach which uses association rules [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to build the classifier model. In
this work, we apply associative classification for POS tagging and present
TagMiner, a generic semi-supervised method for POS tagging. Our method uses a
combination of a small POS tagged corpus and a raw untagged text data as
training data to build a classifier model using a new concept of context based
association rule mining. These association rules work as context based tagging
rules. Our Experiments demonstrate that it gives good performance even
without using any linguistic resources—except for a small POS tagged corpus—for
resource-rich English, resource-moderate Hindi and resource-poor Telugu, Tamil
and Bengali languages.
      </p>
      <p>Our method is generic in two aspects: (1) it does not use any language specific
linguistic information such as morphological features and there is ample scope
to improve further by including such features, (2) it does not require a large,
high quality, tagged corpus and uses the POS tags of the tagged corpus only
to calculate scores of “context based lists” which are used to form association
rules. This can be easily adapted for various languages. Also, as an additional
benefit model made by our tagger is human understandable since it is based on
association rules.</p>
      <p>Our algorithm has following advantages, especially suitable for resource poor
languages, arising due to the use of raw untagged data: (1) it tags unknown
words without using smoothing techniques, (2) the coverage of words present in
the classifier model is increased which in turn increases tagging accuracy and
(3) it creates additional linguistic resources from raw untagged data in the form
of word clusters.</p>
      <p>Remainder of this paper is as follows: Section 2 surveys related work.
Section 3 formally presents the problem. Section 4, 5 and 6 present details of our
proposed approach. Section 7 gives details of the datasets, various experiments
and discusses the performance. Section 8 concludes our work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Associative classifiers use association rules to build a classifier model. They
have been successfully applied for various classification tasks, for example, [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]
presents an associative classifier for mammography image classification and [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]
uses it for predictive analysis in health care data mining. Some of the
associative classifiers worth mentioning are CBA [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] which integrates association rules
and classification by finding class association rules, CMAR [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] uses concept of
multiple class-association rules, CPAR [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] is based on predictive association
rules and ACME [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] exploits maximum entropy principle. A good review of
various associative classifiers and the detailed analysis of this method can be
found in [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. In some other association rule based approaches [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] uses
association rules in a hybrid system of Naive Bayes and genetic classifier for text
classification and [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] presents a supervised language specific hybrid algorithm
of statistical method and association rule mining to increase the POS tagging
accuracy of Chinese text. To the best of our knowledge no semi-supervised method
exists for association rule mining from training set of tagged and raw untagged
data combined.
      </p>
      <p>
        For POS tagging, one of the first semi-supervised methods was proposed
by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which uses raw untagged corpus by incorporating features obtained from
a small fraction of untagged data along with features obtained from a large
tagged data. A good overview of the existing semi-supervised POS tagging
methods and discussion on their limitations is provided by [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], which uses graph as
a smoothness regularizer to train CRFs [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] in a semi-supervised manner from a
large untagged data and a small tagged data. In another approach [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] presents a
condensed nearest neighbor method for semi-supervised POS tagging and report
97.5% accuracy on WSJ dataset of English. Most of the existing semi-supervised
POS tagging methods use a combination of complex learning methods and
existing supervised tagging methods to learn from large untagged data and moderate
sized tagged data. All these methods have been developed for resource rich
English and other European languages.
      </p>
      <p>To the best of our knowledge no semi-supervised tagging method has been
employed for resource moderate Hindi and resource poor Telugu and Tamil
languages. Also to the best of our knowledge no fully data mining based generic
POS tagger exists for any language. Baseline POS taggers for various languages
are discussed below. We note that all the reported accuracy values were obtained
for very small sized test sets. All the mentioned POS taggers use linguistic
(especially morphological) knowledge in some or the other form, while our approach
uses only the POS tags of the tagged set in an indirect form and learns from the
raw untagged data.</p>
      <p>
        For Hindi language, [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] proposes a CRF model with Transformation Based
Learning (TBL) with morphological features and reports 78.67% accuracy on
SPSAL corpus. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] reports 92.36% accuracy on ISPC corpus using special
linguistic features in a HMM model. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] proposes an HMM model with
morphological features and reports 93.05% accuracy. For Telugu language, [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] applies
Transformation Based Learning (TBL) on top of a CRF model and reports
77.37% accuracy on SPSAL corpus. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] uses various special linguistic features
in a HMM model and reports 91.23% accuracy on ISPC corpus.
      </p>
      <p>
        For Bengali language, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] presents various supervised and semi-supervised
Maximum Entropy and HMM models using morphological features and report
87.9% accuracy for semi-supervised HMM model on CIIL corpus. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] reports
92.35% accuracy using a voted approach among various models. For Tamil
language, [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] presents a linear programming based SVM model and reports 95.63%
accuracy.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem Definition</title>
      <p>Automated POS tagging is a classification task which takes a word or a sentence
as input, assigns a POS tag or other lexical class marker to a word or to each
word in the sentence, and produces the tagged text as output. In semi-supervised
paradigm the POS tagger is built from a corpus of untagged sentences and a set
of tagged sentences. The POS tagging classification problem is formally defined
as follows:</p>
      <p>Given a set of tags = {T1, T2, . . . , Tn}, an annotated set of tagged
sentences AS = {St1, St2, . . . StN }, where Sti = hW1/Ti, W2/Tj . . . Wn/Tki (where
Wi is a word and Ti is a tag from ) and a raw untagged training corpus of
sentences D = {S1, S2 . . . SM }, where Si = hW1W2 . . . Wmi, the goal is to build a
classifier model which outputs the best tag sequence hT1T2 . . . Tli for an input
sequence of words hW1W2 . . . Wli.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>TagMiner</title>
      <sec id="sec-4-1">
        <title>Mapping POS tagging task to Association Rule Mining problem</title>
        <p>
          According to the one sense per collocation [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] hypothesis, the sense of a word
in a document is ee↵ctively determined by its context. The notion of context
has been used in various methods of POS tagging [
          <xref ref-type="bibr" rid="ref2 ref30">2, 30</xref>
          ]. A context can occur
in multiple places in the text. We refer to the list of occurrences of a context as
a context based list. We use this idea for building TagMiner. In our method, we
mine context based association rules from training data containing both tagged
and untagged text. Our method works as follows:
– We collect all possible words occurring in the same context from the raw
untagged data into a list called context based list (formally defined later). In
this way we are able to find groups of words of similar categories from the
raw untagged data.
– Using the annotated set and the tag finding algorithm (in Fig. 1), we find
association rules of the form: Context ) T ag for the context based lists.
Each rule maps a context based list to a suitable POS tag. These association
rules work as the context based classification rules.
– Lastly, we group these context based association rules according to their
POS tags to form clusters. This set of clusters is used as the classifier model
to tag words using the method described in Sect. 6 and Fig. 2.
        </p>
        <p>By experimenting with two varieties of bi-gram (one with preceding word
and the other with succeeding word as context) and trigram as possible contexts
we found that trigram works best for our method. For a word instance Wi, we
fix its context as a trigram containing Wi in the middle and we use this context
to find the context based list. Any other notion of context can be used as long as
it fits into the formalism given below.</p>
        <p>Context Based List: If is a function mapping from a word instance Wi
in the data to its context (Wi), then 1( (Wi)) is a list of words instances
sharing the same context. We refer to this list as context based list of (Wi). It
denotes words of similar category or type as Wi in a specific context and can
store multiple instances of a word. For a given trigram (Wi 1 Wi Wi+1) of words,
(Wi) = (Wi 1, Wi+1). The preceding word Wi 1 and succeeding word Wi+1
are called context words and (Wi) is called the context word pair of Wi.
Context Based Association Rule: For each context based list L, our
approach finds association rule of the form L ) T . This rule maps the context
based list L to a POS tag T with support and confidence parameters defined
below. Since each list L is obtained from a unique context word pair, so each
association rule uniquely associates a context to a POS tag and works as the
context based tagging rule.</p>
        <p>In the following definitions and formulas we develop the intuition and the
method to compute the interestingness measures of the significant association
rules. The complexity in defining support is due to the presence of raw untagged
training data required for semi-supervised learning. The support is the frequency
(count) of occurrences of the context in the dataset. Context based lists are made
from raw untagged data D and we are interested in the words of this list for which
we know the tag in annotated set AS. Hence, we define Support of a context as
follows:
AllTagContextSupport: Number of unique words of a context based list L
whose tags are available (in annotated set AS) is denoted as AllT agContextSupport(L).
This measure gives the number of tagged words of L.</p>
        <p>ContextSupport: For a list of words L in which duplicates may be present,
ContextSupport(L) is defined as the set of unique words present in L.
Coverage: For a context based list L,
(1)
(2)
Coverage(L) =</p>
        <sec id="sec-4-1-1">
          <title>AllT agContextSupport(L)</title>
          <p>|ContextSupport(L)|
This measure represents the confidence that enough number of tagged samples
are present in L.</p>
          <p>ContextTagSupport: Number of unique words of a context based list L present
in annotated set AS with a particular tag T is denoted as ContextT agSupport(L, T ).
Confidence: For a context based list L and tag T ,</p>
          <p>Conf idence(L, T ) =</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>ContextT agSupport(L, T )</title>
          <p>|ContextSupport(L)|
This measure represents the confidence that considerable number of words in
list L have a particular tag T and leads to rules of the form Context ) T ag.
WordTagSupport: Frequency of tag T for a word W in the annotated set AS
is denoted as W ordT agSupport(T, W ).</p>
          <p>WordTagScore: For a word W and tag T , W ordT agScore is defined as:
W ordT agScore(W, T ) =</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>W ordT agSupport(T, W )</title>
          <p>max W ordT agSupport(Ti, W )</p>
          <p>Ti2
This represents how good the tag fits the word on a scale of 0 to 1.
ListTagScore: For a tag T in context based list L, ListT agScore is defined as:
(3)
(4)
ListT agScore(L, T ) =</p>
          <p>P
Wi2 ContextSupport(L)</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>W ordT agScore(Wi, T )</title>
          <p>
            |{Wi 2 ContextSupport(L) : Wi/T 2 AS}|
Where, AS is the annotated set. This formula represents the average frequency
of tag T in context based list L. Intuitively, it represents how good the tag fits
the list. Unfortunately, this is not always indicative of the correct tag for the list.
For example, if a tag is overall very frequent, it can bias this score. Therefore, we
compare this with the following score, inspired by the notion of Conviction [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
BackgroundTagScore: For a tag T in annotated set AS, BackgroundT agScore
is defined as:
          </p>
          <p>P
Wi2 ContextSupport(AS)
|{Wi 2 ContextSupport(AS) : Wi/T 2 AS}|</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>W ordT agScore(Wi, T )</title>
          <p>(5)</p>
          <p>BackgroundT agScore(T ) =
This represents the average frequency of tag T in annotated set AS.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>POS Tagging Challenges</title>
        <p>
          POS tagging, especially for resource poor languages, involves three major
challenges listed below. In our approach we handle each of them explicitly.
1. Data sparsity problem: Some POS tag classes are present in the annotated
set with very few representations. This is not enough to derive statistical
information about them. In our approach, the use of raw untagged data
reduces this problem (shown in Sect. 7.4).
2. Class imbalance problem: POS tag classes are highly imbalanced in their
occurrence frequency. While selecting a tag this may lead to biasing towards
the most frequent tags. Existing solutions of class imbalance problem
typically favor rare classes [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. However, while tagging the context based lists,
we need to find POS tags for them in such a way that we neither favor
frequent tags nor rare tags. We tackle this problem using a novel Minmax
approach to find the best preferred POS tag instead of the most frequent
one (described in Sect. 5.2).
3. Phrase boundary problem: Some lists are formed at phrase boundaries
where the context comes from two di↵erent phrases. We need to filter out
those context based lists which do not contain words of similar categories. In
this case, the context of a word instance need not represent strong context
and so the context based list may contain unrelated words. We use suitable
parameters to handle this problem (explained in Sect. 5.3).
1. for each tag Ti 2 present in annotated set AS do:
2. Find BackgroundT agScore(Ti) // Use Equation (5)
3. for context based list L do:
4. Find Coverage(L) // Use Equation (1)
5. if Coverage(L) M inCoverage:
6. ContextT agSupport(L, Tmax) = max ContextT agSupport(L, Ti)
        </p>
        <p>Ti2
7. M axconf = Conf idence(L, Tmax) // Use Equation (2)
8. if M axconf &gt; M inConf idence:
9. M axT set = {Ti | ContextT agSupport(L, Ti) == ContextT agSupport(L, Tmax)}
10. BestP ref T ag = F indBestP ref T ag(L, M axT set)
11. Return BestP ref T ag
12. else: Return NOTVALIST
13. else: Return NOTVALIST
22.
23.
24.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Building Classifier Model from Context Based Lists</title>
      <sec id="sec-5-1">
        <title>Finding Association Rule for a Context Based List</title>
        <p>The first step in our classifier model building method is to compute context based
lists from an untagged training corpus D. It may be noted that a context based
list can store multiple instances of a word. We use a sliding window of size three
to collect the context based lists from D, in a single iteration, taking care of
sentence boundaries.</p>
        <p>In the next step we use the algorithm shown in Fig. 1 to find association
rules for all the context based lists. In this algorithm, BackgroundT agScore
of all the POS tags present in the annotated set AS (lines 1-2) are computed
first. Then for a context based list satisfying the threshold values of Coverage
and Conf idence (lines 3-9), function F indBestP ref T ag (described in Sect. 5.2)
finds the best preferred tag (lines 10-11, 14-24) from the set of tags with
maximum ContextT agSupport (lines 7-9).</p>
        <p>For a context based list L present as antecedent in association rule L ) T , tag
T returned by this algorithm becomes the consequent. This algorithm outputs
best preferred tags for all the context based lists and hence finds association
rules for all of them.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Handling Class Imbalance Problem</title>
        <p>We handle the class imbalance problem by using a novel Minmax approach in the
function F indBestP ref T ag (lines 14-24 in Fig. 1) and parameters ListT agScore
and BackgroundT agScore. In Minmax approach the preferred tag Ti for
context based list L, is the one which has maximum ContextT agSupport(L, Ti) but
minimum W ordT agSupport(Ti, W ) among those words of list L which have tag
Ti as the best tag in AS. This takes care that the selected tag is supported by
majority of the words in the list and is not biased by the most frequent tag of
the annotated set.</p>
        <p>To find the best preferred tag in function F indBestP ref T ag, from the set of
all the tags with maximum ContextT agSupport value (line 9), at first we found
those tags which were best tags (having maximum W ordT agSupport value) for
the words of list L in AS (lines 15-20). Next, from this set of preferred tags
we find the tag with minimum W ordT agSupport value (line 21). Then criteria</p>
        <sec id="sec-5-2-1">
          <title>ListT agScore(L, Ti) BackgroundT agScore(Ti) (lines 22-23) ensures that the</title>
          <p>selected tag has above average support in the annotated set and the context
based list, both. If none of the tags satisfy this criteria, then we tag the list as
“NOTVALIST” (line 24).
5.3</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Handling Phrase Boundary Problem</title>
        <p>To filter out context based lists with the phrase boundary problem (see Sect. 4.2)
we use two suitable threshold values for parameters Confidence and Coverage.
Coverage takes care of the fact that a context based list has considerable number
of words to map it to a tag and Confidence ensures that the tag found for the
list is the one which is supported by majority of the words in the list.</p>
        <p>If context based list L has Coverage and Confidence values less than the
corresponding threshold values M inCoverage and M inConf idence, we tag L
as “NOTVALIST” (lines 3-8, 12, 13 in Fig. 1). If L satisfies both of the threshold
values then only we find the set of all the tags which have maximum value of
ContextT agSupport(L, Ti) and use this set (lines 9-10) to find the best preferred
tag for the list (lines 14-24).
5.4</p>
      </sec>
      <sec id="sec-5-4">
        <title>POS tag wise grouping of Association Rules to form Clusters</title>
        <p>In the last step, we group context based lists according to their POS tags to
get clusters of context based lists as classifier model. We exclude context based
lists with tag “NOTVALIST” from the grouping process. Then we process these
clusters to store word frequencies, corresponding context word pairs and their
frequencies in each cluster. We represent the set of clusters as Clustset.</p>
        <p>Since we are highly confident about the tags of the words present in the
annotated set AS so, to improve cluster quality we apply a pruning strategy
on the words of the clusters present in AS and remove those words from each
cluster which do not have a matching cluster tag in AS. Finally, we get a set
of clusters in which each cluster has a set of words with their frequencies and a
set of associated context word pairs with their frequencies. Each cluster has a
unique POS tag. These clusters are overlapping in nature and words can belong
to multiple clusters.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>POS tagging Method</title>
      <p>To tag the words of a test sentence we make use of the test word’s context word
pair, preceding word and the word frequency in a cluster to decide the tag of
the word (see Fig. 2). When a test word is found in only one cluster then we
output the cluster tag as the tag of the test word. But when a test word is found
in many clusters, then to select the suitable clusters following priority order is
followed:
1. Criteria 1: Highest priority is given to the presence of matching context
word pair of the test word in the clusters.
2. Criteria 2: Second highest priority is given to the presence of matching
preceding word of the test word as first word of the context word pairs in
clusters.
3. Criteria 3: Last priority is given to the frequency of the test word in the
clusters.</p>
      <p>For test words not present in any cluster we use criterion 1 and 2 to select
appropriate clusters. Based on the priority order, only one of the criterion is
used to select the suitable clusters. If we are not able to find any suitable cluster
then we return “NOTAG” as the tag of the test word.</p>
      <p>Even when we find suitable clusters, to increase precision, our method finds
POS tags only for those cases where it is confident. It avoids to wrongly classify
non confident cases and returns “NOTAG” for them. This is especially useful
when the cost of misclassifying (false positive) is high. This also gives opportunity
to integrate other language/domain specific POS taggers as they can be used for
the non-confident cases.</p>
      <p>After selecting the suitable clusters we need to make sure that we have enough
confidence in the highest probability tag obtained from the clusters. To ensure
this we use the parameter TagProbDif, which gives the fractional di↵erence
between the highest and the second highest cluster tag probabilities and is defined
as follows:</p>
      <p>T agP robDif =</p>
      <sec id="sec-6-1">
        <title>T agP rob(Cmax)</title>
      </sec>
      <sec id="sec-6-2">
        <title>T agP rob(Cmax)</title>
      </sec>
      <sec id="sec-6-3">
        <title>T agP rob(Csecmax)</title>
        <p>Where, Cmax is the cluster with highest T agP rob(Ci) value and Csecmax is
the cluster with second highest T agP rob(Ci) value. T agP rob(Ci) of a cluster is
defined as follows:</p>
        <p>T agP rob(Ci) =</p>
        <p>Frequency of X in Ci</p>
        <p>P Frequency of X in Cj
8 Cj2 Clustset
(6)
(7)
Where, X is set as follows: If the test word is present in cluster Ci then X = test
word. For test word not present in any cluster, if the clusters are selected based
on the presence of the context word pair of the test word then X = context
for each word W mid in sentence S with context word pair CWp and CWs do:
1. Initialize P redClustset = {}
2. if 9 cluster Ci 2 Clustset | W mid 2 Ci:
(a) Find P Clustset = {Ci | W mid 2 Ci}
(b) if 9 cluster Cj 2 P Clustset | CWp and CWs pair is present as context word pair in
cluster Cj :</p>
        <p>Find all such clusters from P Clustset and append to P redClustset #Criteria 1
(c) else:
if 9 cluster Cj 2 P Clustset | CWp is present as preceding word in a context word
pair in cluster Cj :</p>
        <p>Find all such clusters from P Clustset and append to P redClustset #Crit. 2
else: Append P redClustset = P redClustset [ P Clustset #Criteria 3
3. else:
(a) if 9 cluster Ci 2 Clustset | CWp and CWs pair is present as context word pair in
cluster Ci:</p>
        <p>Find all such clusters from Clustset and append to P redClustset #Criteria 1
(b) else:
if 9 cluster Ci 2 Clustset | CWp is present as preceding word in a context word
pair in cluster Ci:</p>
        <p>Find all such clusters from Clustset and append to P redClustset #Crit. 2
else: Return NOTAG
4. 8 Ci 2 P redClustset Find T agP rob(Ci) // Use Equation 7
5. Find Cmax = cluster with highest T agP rob(Ci) value in P redClustset
6. Find Csecmax = cluster with second highest T agP rob(Cj ) value in P redClustset
7. Find T agP robDif // Use Equation 6
8. if T agP robDif M inprobdif : Return P redT ag = POS tag label of cluster Cmax
9. else: Return NOTAG
word pair. If the clusters are selected based on the presence of the preceding
word of the test word as first word of the context word pairs in clusters then
X = preceding word of the test word. In this way we are able to tag some
unseen/unknown words also which are not present in the training data. This, in
a way, acts as an alternative of smoothing technique for them.</p>
        <p>After selecting the clusters (based on priority order) we compute their T agP rob
values using (7) and then compute T agP robDif using (6). For T agP robDif
value above a suitable threshold value M inprobdif we output the tag of cluster
with highest T agP rob value as the tag of the test word, otherwise we return
“NOTAG”(see Fig. 2).
7
7.1</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Experiments, Results and Observations</title>
      <sec id="sec-7-1">
        <title>Dataset Details</title>
        <p>
          We have done our experiments on resource-rich English1(uses Biber tag set [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]),
resource-moderate Hindi [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ] and resource-poor Telugu2 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], Tamil3 and
Ben1 New York Times dataset of American National Corpus available at http://
americannationalcorpus.org/FirstRelease/contents.html
2 Provided by IIIT Hyderabad, data is part of IL-ILMT project sponsored by MC&amp;IT,
        </p>
        <p>Govt. of India Reference No: 11(10)/2006-HCC(TDIL)
3 Available at http://sanskrit.jnu.ac.in/ilci/index.jsp
gali4 languages. Table 1 gives details of all the language datasets. All the five
language datasets have flat tag sets present in annotated training and test sets
without any hierarchy and considerable number of lexical ambiguities are also
present. We note that except English all the other four languages are
morphologically rich and have free word-order property. The POS tag data distribution
in the resource-moderate and resource-poor language datasets are highly
imbalanced and sparse.
We observed that following set of threshold values M inConf idence = 60%,
M inCoverage = 60% and M inprobDif = 30% for the three parameters gives
best AverageAcuracy (defined below) values for all the five languages. Tables 1
and 2 show the results for this set of parameter values.</p>
        <p>Number of correctly tagged test words
AverageAccuracy = (8)
|Test set| No. of test words tagged as NOTAG
Where, |Test set| = No. of words in the test set.</p>
        <p>For both known and unknown test words, for all the five languages,
maximum number of correct tagging was done by giving highest priority to presence
of context word pair in the cluster. Here, known words means test set words
which are present in untagged training set and unknown word means unseen
test set words which are not present in the untagged training set. Note that
words of annotated set are not included in the classifier model, only their tags
4 Available at http://sanskrit.jnu.ac.in/ilci/index.jsp
are used indirectly while building the model. In the results shown in Table 1,
around 46% unknown English words, 60% unknown Hindi words, 67% unknown
Telugu words, 52% unknown Bengali words and 57% unknown Tamil words were
correctly tagged using their context word pair. This shows the strength of our
tagger to tag unknown words without using any smoothing technique used by
other POS taggers.</p>
        <p>
          In Table 2, we compare our results with a supervised CRF5 tagger [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
This tagger uses words, their POS tag and context word pair information from
annotated data, while our tagger uses words and their context word pair
information from untagged data and POS tag information from annotated data.
We observe that for annotated data size  25K words, our tagger gives
better AverageAccuracy than CRF tagger. Our tagger also gives better POS tag
precisions and better tagging accuracies than CRF tagger for unknown words
and performance improves by increasing the untagged data size up to a certain
size. This shows that our tagger can be a better choice for resource poor
languages. Also, as an additional benefit model made by our tagger is more human
understandable than model made by CRF tagger.
7.3
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>E↵ect of Annotated (POS tagged) Data Size</title>
        <p>We varied the size of annotated set of Tamil (see Table 3) while keeping the raw
untagged set constant and observed that the coverage of words by the clusters
in the classifier model increases with the increase in the size of annotated data,
the tagging accuracy increases while the number of words missed by the model
(tagged as “NOTAG”) decreases. For all languages we observed that increasing
the annotated training data size improves cluster quality which increases the
AverageAcuracy values but only up to a certain size. We also observed that there
is only a slight decrease in AverageAcuracy value with decrease in annotated
set size, so performance does not decrease drastically when the annotated set is
made smaller. Our tagger gives above 70% AverageAcuracy for annotated data
size as low as 5K and raw untagged data size 10K on all the languages. This
justifies the use of small annotated set to build a semi-supervised POS tagging
model for resource poor languages.
7.4</p>
      </sec>
      <sec id="sec-7-3">
        <title>E↵ect of Raw Untagged Data Size</title>
        <p>In Tables 1, 2 and 4 , we observe that increasing the raw untagged training
data size initially increases word coverage of clusters which in turn increases
the AverageAcuracy values but stabilizes after a certain size. For all languages
we observed that the coverage of words by the clusters in the classifier model
increases with the increase in the size of untagged data (while keeping the size
of annotated set constant). This accounts for the increase in tagging accuracy
and decrease in the number of words missed by the model (tagged as NOTAG).
Other interesting observation is that AverageAccuracy does not vary much as
5 Available at http://crfpp.googlecode.com/svn/trunk/doc/index.html, CRF
model outputs tag for all test words. So, for CRF tagger AverageAccuracy = (No.
of correctly tagged test words)/(No. of test words).
the untagged data size varies, so our algorithm is able to perform well even with
a small sized untagged data.
7.5</p>
      </sec>
      <sec id="sec-7-4">
        <title>E↵ect of Various Parameters</title>
        <p>We made the following observations about the e↵ect of parameter values: (1)
Increasing threshold values of M inConf idence for parameter Confidence, it
increases the quality of clusters but at the same time it also increases the number
of context based lists tagged as “NOTVALIST” which decreases the word
coverage of clusters. (2) Decreasing threshold values of M inCoverage for parameters
Coverage although decreases the quality of clusters but at the same time it
increases the word coverage of clusters by decreasing the number of context based
lists tagged as “NOTVALIST”. (3) By varying the threshold value of Minprobdif
from 5% to 30% for parameter TagProbDif we found that increasing the
threshold value increases the precision values of POS tags but slightly decreases their
recall because the number of words tagged as “NOTAG” increases. Practical
advantage of this parameter is that it ensures that tagging of ambiguous and
non-confident cases is avoided. (4) The number of POS tag clusters obtained in
the classifier model is almost independent of the selected threshold values of the
parameters. For the datasets given in Table 1 and for the range of threshold
values M inConf idence = 60% to 90% and M inCoverage = 0% to 75%, number
of POS tag clusters found for English was 100 to 101, for Hindi was 29 to 31, for
Tamil was 22 to 26, for Bengali was 25 and for Telugu was 23. We noted that
the POS tags missing from the set of clusters were the rare POS tags having
very low frequencies.
8</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusions and Future Work</title>
      <p>In this work we developed TagMiner, a semi-supervised associative classification
method for POS tagging. We used the concept of context based list and context
based association rule mining. We developed a method to find interestingness
measures required to find the association rules in a semi-supervised manner
from a training set of tagged and raw untagged data combined. We showed
that TagMiner gives good performance for resource rich as well as resource poor
languages without using extensive linguistic knowledge. It works well even with
less tagged training data and less untagged training data. It can also tag unknown
words. To some extent, it handles class imbalance and data sparsity problems
using the untagged data and a special method to find interestingness measures.
It handles phrase boundary problem using a set of parameters. These advantages
make it very suitable for resource poor languages and can be used as an initial
POS tagger while developing linguistic resources for them.</p>
      <p>Future work includes (1) using other contexts instead of trigram, (2) finding
methods to include linguistic features in the current approach, (3) mining tagging
patterns from the clusters to find tag of a test word and (4) using this approach
for other lexical item classification tasks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Imielin´ski, T.,
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mining Association Rules Between Sets of Items in Large Databases</article-title>
          .
          <source>In: Proc. of SIGMOD</source>
          . pp.
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          :
          <article-title>Part-of-Speech Tagging in Context</article-title>
          .
          <source>In: Proc. of COLING</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bharati</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misra Sharma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sangal</surname>
          </string-name>
          , R.:
          <article-title>AnnCorra : Annotating Corpora Guidelines For POS And Chunk Annotation For Indian Languages</article-title>
          .
          <source>Tech. Rep. TRLTRC-31</source>
          , Language Technologies Research Centre,
          <string-name>
            <given-names>IIIT</given-names>
            ,
            <surname>Hyderabad</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bhatt</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narasimhan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rambow</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A multi-representational and multi-layered treebank for Hindi/Urdu</article-title>
          . In
          <source>: Proc. of the Third Linguistic Annotation Workshop</source>
          . pp.
          <fpage>186</fpage>
          -
          <lpage>189</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Unsupervised Part-of-Speech Tagging Employing Ecient Graph Clustering</article-title>
          .
          <source>In: Proc. of ACL</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Brants</surname>
          </string-name>
          , T.:
          <article-title>TnT: a statistical part-of-speech tagger</article-title>
          .
          <source>In: Proc. of ANLP</source>
          . pp.
          <fpage>224</fpage>
          -
          <lpage>231</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Brill</surname>
          </string-name>
          , E.:
          <article-title>A Simple Rule-Based Part of Speech Tagger</article-title>
          .
          <source>In: Proc. of ANLP</source>
          . pp.
          <fpage>152</fpage>
          -
          <lpage>155</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Brill</surname>
          </string-name>
          , E.:
          <article-title>Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging</article-title>
          .
          <source>Comput. Linguist</source>
          .
          <volume>21</volume>
          (
          <issue>4</issue>
          ),
          <fpage>543</fpage>
          -
          <lpage>565</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dynamic Itemset Counting and Implication Rules for Market Basket Data</article-title>
          .
          <source>In: Proc. of SIGMOD</source>
          . pp.
          <fpage>255</fpage>
          -
          <lpage>264</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Cutting</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kupiec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sibun</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A practical part-of-speech tagger</article-title>
          .
          <source>In: Proc. of the third conference on ANLP</source>
          . pp.
          <fpage>133</fpage>
          -
          <lpage>140</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Dandapat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatic Part-of-speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario</article-title>
          .
          <source>In: Proc. of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions</source>
          . pp.
          <fpage>221</fpage>
          -
          <lpage>224</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pudi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Class Based Weighted K-Nearest Neighbor over Imbalance Dataset</article-title>
          .
          <source>In: Proc. of PAKDD (2)</source>
          . pp.
          <fpage>305</fpage>
          -
          <lpage>316</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ekbal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasanuzzaman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandyopadhyay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Voted Approach for Part of Speech Tagging in Bengali</article-title>
          .
          <source>In: Proc. of PACLIC</source>
          . pp.
          <fpage>120</fpage>
          -
          <lpage>129</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gadde</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeleti</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          :
          <article-title>Improving statistical POS tagging using Linguistic feature for Hindi and Telugu</article-title>
          .
          <source>In: Proc. of ICON</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Gimenez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Svmtool: A general pos tagger generator based on support vector machines</article-title>
          .
          <source>In: Proc. of LREC</source>
          . pp.
          <fpage>43</fpage>
          -
          <lpage>46</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Goldwater</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griths</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A fully Bayesian approach to unsupervised part-ofspeech tagging</article-title>
          .
          <source>In: Proc. of the 45th Annual Meeting of the ACL</source>
          . pp.
          <fpage>744</fpage>
          -
          <lpage>751</lpage>
          (
          <year>June 2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ide</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suderman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The American National Corpus First Release</article-title>
          .
          <source>In: Proc. of LREC</source>
          . pp.
          <fpage>1681</fpage>
          -
          <lpage>1684</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kamruzzaman</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haider</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm</article-title>
          .
          <source>CoRR abs/1009</source>
          .4976 (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. La↵erty, J.D.,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data</article-title>
          .
          <source>In: Proc. of ICML</source>
          . pp.
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>J</given-names>
            .,
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>CMAR: Accurate and Ecient Classification Based on Multiple Class-Association Rules</article-title>
          .
          <source>In: Proc. of ICDM</source>
          . pp.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Ma, Y.:
          <article-title>Integrating Classification and Association Rule Mining</article-title>
          .
          <source>In: Proc. of KDD</source>
          . pp.
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>P.V.S.</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Part Of Speech Tagging Using Conditional Random Fields and Transformation Based Learning</article-title>
          .
          <source>In: Proc. of IJCAI Workshop SPSAL</source>
          . pp.
          <fpage>21</fpage>
          -
          <lpage>24</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Shaohong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guidan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <source>Research of POS Tagging Rules Mining Algorithm. Applied Mechanics and Materials 347-350</source>
          ,
          <fpage>2836</fpage>
          -
          <lpage>2840</lpage>
          (
          <year>August 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic</surname>
          </string-name>
          <article-title>Knowledge</article-title>
          .
          <source>In: Proc. of ICON</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Søgaard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semisupervised condensed nearest neighbor for part-of-speech tagging</article-title>
          .
          <source>In: Proc. of ACL HLT: short papers - Volume</source>
          <volume>2</volume>
          . pp.
          <fpage>48</fpage>
          -
          <lpage>52</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Soni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vyas</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Using Associative Classifiers For Predictive Analysis In Health Care Data Mining</article-title>
          .
          <source>Int. Journal Of Computer Application</source>
          <volume>4</volume>
          (
          <issue>5</issue>
          ),
          <fpage>33</fpage>
          -
          <lpage>37</lpage>
          (
          <year>July 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Subramanya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Ecient graph-based semi-supervised learning of structured tagging models</article-title>
          .
          <source>In: Proc. of EMNLP</source>
          . pp.
          <fpage>167</fpage>
          -
          <lpage>176</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Thabtah</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A Review of Associative Classification Mining</article-title>
          .
          <source>The Knowledge Engineering Review</source>
          <volume>22</volume>
          (
          <issue>1</issue>
          ),
          <fpage>37</fpage>
          -
          <lpage>65</lpage>
          (
          <year>Mar 2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Thonangi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pudi</surname>
          </string-name>
          , V.:
          <article-title>ACME: An Associative Classifier Based on Maximum Entropy Principle</article-title>
          .
          <source>In: Proc. of ALT</source>
          . pp.
          <fpage>122</fpage>
          -
          <lpage>134</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Feature-rich part-of-speech tagging with a cyclic dependency network</article-title>
          .
          <source>In: Proc. of NAACL HLT'03 - Volume 1</source>
          . pp.
          <fpage>173</fpage>
          -
          <lpage>180</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31. V.,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , P., S.K.,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , R.:
          <article-title>Tamil POS Tagging using Linear Programming</article-title>
          .
          <source>Int. Journal of Recent Trends in Engineering</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>166</fpage>
          -
          <lpage>169</lpage>
          (May
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Yarowsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>One sense per collocation</article-title>
          .
          <source>In: Proc. of the workshop on Human Language Technology</source>
          . pp.
          <fpage>266</fpage>
          -
          <lpage>271</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          .: CPAR:
          <article-title>Classification based on Predictive Association Rules</article-title>
          .
          <source>In: Proc. of SDM</source>
          . pp.
          <fpage>331</fpage>
          -
          <lpage>335</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34. Za¨ıane,
          <string-name>
            <given-names>O.R.</given-names>
            ,
            <surname>Antonie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.L.</given-names>
            ,
            <surname>Coman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Mammography Classification By an Association Rule-based Classifier</article-title>
          .
          <source>In: Proc. of MDM/KDD</source>
          . pp.
          <fpage>62</fpage>
          -
          <lpage>69</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>