Hierarchical Text Classification for Supporting
                Educational Programs

    Qi Ju∗, Chiara Ravagni?†, Alessandro Moschitti∗, and Giampiero Vaschetto?

                            ∗DISI, University of Trento, Italy
                            {qi,moschitti}@disi.unitn.it
                            ?Centro Studi Erickson, Italy
                 {chiara.ravagni,giampiero.vaschetto}@erickson.it
                           †University of Nuremberg, Germany


       Abstract. More than two decades have passed since the first design of the
       CONSTRUE system [2], a powerful rule-based model for the categorization
       of Reuters news. Nowadays, statistical approaches are well assessed and they
       allow for an easy design of text classification (TC) systems. Additionally, the
       Web has emphasized the need of approaches for digesting large amount of
       textual information and making it more easily accessible, e.g., thorough hier-
       archical taxonomies like Dmoz or Yahoo! categories. Surprisingly, automated
       approaches have not proved yet to be indispensable for such categorization
       processes. This suggests that the role of TC might be different from simply
       routing documents to different topical categories.
       In this paper, we provide evidence of the promising use of TC as a support for
       an interesting and high level human activity in the educational context. The
       latter refers to the selection and definition of educational programs tailored
       on specific needs of pupils, who sometime require particular attention and
       actions to solve their learning problems. TC in this context is exploited to
       automatically extract several aspects and properties from learning objects,
       i.e., didactic material, in terms of semantic labels. These can be used to
       organized the different pieces of material in specific didactic program, which
       can address specific deficiencies of pupils. The TC experiments, carried out
       with state-of-the-art algorithms and a small set of training data, show that
       automatic classifiers can easily derive labels like, didactic context, school
       matter, pupil difficulties and educative solution type.


Keywords: hierarchical text classification, information management applications,
e-learning

1    Introduction
The last two decades have seen an impressive development of methods for automated
text categorization (TC) [7]. This has been mainly due to the combination of two
important factors: (i) the exponential development of the Web, requiring for effective
methods of information access and management; and (ii) the enhancement in theory
and practice of machine learning methods, which constitute the bases of TC.
    Despite the success of the TC research, it is still not clear if such technology
should be devoted to the design of topical categorization systems as very famous
Web hierarchical categorization systems are currently manually maintained, e.g.,
Dmoz or Yahoo! categories. On the other hand, TC also regards the association of
semantic labels that go beyond the simple routing of information to the most ap-
propriate user feeds. Indeed, this kind of task inevitably suffers from errors in Recall
and/or in Precision. Different would be the approach and results, if the outcome
of the TC system were cooperatively used as a tool to organize the information in
different and creative ways. In this respect, TC would be seen as a tool similarly
to search engines, rather than an end-to-end system forced to demonstrate a very
high accuracy.
    In this paper, we report on our experience with the e-Value project, whose
aims are the reorganization or combination of educational materials in different
pedagogical contexts. The Erickson Research Centre has been cataloging a large set
of published educational materials in smaller units, according to the SCORM (2004)
standards, Shareable Content Object Reference Model1 . These documents are used
for the creation of novel and specific didactic product as follows: (i) school classes
are evaluated about target cognitive processes; (ii) processes in which pupils have
difficulties are detected and recorded in a huge database (DB) of normative data
along with the results of its elaboration; (iii) The Decision Support System (DSS)
chooses the proper didactic material for the class according to the DB content.
    The above steps require: (a) to identify cognitive processes involved in pupils’
learning; (b) to divide the didactic materials in smaller parts (learning objects); and
(c) classify such objects according to their bibliographic characteristics and to the
cognitive processes involved, which depends on the user context (e.g., age, class,
special situations). An automatic classifier can be used for easing and speeding up
the last step. It can provide a rough classification, which can constitute the starting
point for the work of expert catalogers.
    The use of the classifier would reduce the cataloging costs, both in terms of
time and human resources. Indeed, any educational material, being part of a book,
article or best practice, needs to be read and evaluated by experts, before being
assigned to the proper categories; this process takes a huge amount of time. As an
alternative model, the classifier can perform a first approximate categorization and
after, the experts can refine it. The clear advantage is that materials pertaining
to a certain subject can be directly assigned to its experts (working in that field),
thus improving the accuracy of classification and avoiding the burden to exchange
materials among the different experts.
    However, the above scenario could be realized only if the adopted multi-class
classifier (MCC) performed accurate hierarchical categorization. Given the novelty
of the intended taxonomy, it is not simple to predict if MCC can deploy the needed
accuracy. For this purpose, we have:

    – designed a new taxonomy that meets the organization needs of e-Value;
    – defined an annotation procedure and produced an initial datasets of 122 docu-
      ments, organized in 112 categories (of course the documents are repeated in the
      hierarchy); and
    – implemented an MCC, which exploits state-of-the-art TC models such as, Sup-
      port Vector Machines, structured in binary flat categorizers.

The preliminary experiments on the overall hierarchy of 112 nodes show promising
results, ranging from a Micro-F1 of above 95% for the first level to about 70% on
the whole hierarchy. This outcome is rather promising and enables future research
in the use of TC for the efficient implementation of educational programs.
    In the reminder of this paper, Section 2 describes the tackled task in more detail,
Section 3 reports on our results and Section 4 derives the final conclusions.

2      Automatic Support to the e-Value project
The main objective of the e-Value project is to design, develop and test a multimedia
platform (consisting of a set of web applications), which integrates the evaluation
of various learning abilities and the application of didactic processes. These can
benefit from automatic methods for classifying the didactic material used in such
1
    http://www.adlnet.gov/capabilities/scorm/scorm-2004-4th
                                               Root

                       C1                C2            C3            C4


                  C11 C12 C13    C21 C22 C23 C24      C31 C32   C41 C42 C43 C44 C45


             C121 C122 C123 C124   C231 C232 C233 C311 C312 C321 C322


                                C2321 C2322 C2323 C2324


Fig. 1. Hierarchical Categorization Scheme of e-Value (only category with at least 1 train-
ing documents are present


processes. The next sections describe the problem in more detail and suggest how
a TC system can be used in such context.

2.1   e-Value Framework
The framework includes different interconnected processes:
 – standard evaluation procedures and dynamic assessment of learning abilities of
   pupils;
 – collection of normative data, e.g., educational material and pupils’ evaluations;
 – continuous data flow, i.e., the related database is continuously updated and the
   normative data currently available is integrated and compared with the new
   arriving data; and
 – qualitative and quantitative evaluation of the collected data.
    The educational material is used for defining didactic products, which address
specific action (intervention). It consists of books, CD-ROMs, collections of articles,
etc. The e-Value project aims at both using independently and jointly the materials
above.
    Designing an intervention often requires the use of units taken from several
books or CD-ROMs but including the entire sources is very ineffective, considering
that only some small parts will be used. To enable more flexibility in the creation
of training programs, the material collections are divided into basic training units,
called learning objects, which can be reassembled in a flexible way. This requires
to analyze the materials to be used in the interventions and selecting the portion
involved in the target cognitive processes.

2.2   A framework use-case
A use of the framework is illustrated by the following example. In a school context
some classes are evaluated with respect to targeted cognitive processes. The tests
may reveal that some of the pupils have difficulties in certain processes. Thus,
the test results are recorded (building a large database of normative data) along
with some elaboration of them, i.e., basic data statistics. Then the DSS chooses
the proper didactic material for the class by proposing different material to pupils
requiring attention and quick intervention. For this purpose the educational team
need to:
 – identify every cognitive process that can be involved in learning. At the moment,
   this has been restricted to mathematics and reading-writing (with linguistic
   skills and metaphonetics);
              !"#$%&'())")(&*$+,&*#$-#&+
    !"#       .(."##(,&+                      !$$#     %&'()*+*,*-.(#                     !$/01#    2345&33.*+.#
    !""#      678*,(#9&,,:.+)(+;.(#           !$$"#    <,*=(,&#                           !$/0>#    ?*'&+;&#
    !"$#      678*,(#45.@(5.(#                !$$""#   A.@(#                              !$/0B#    A(9.7.#C8(95('&#
    !"$"#     ?5.@(5.(#!,(33&#D#              !$$"$#   6.,,(=(#                           !$//#     !(,7*,*EF8@&5.#5(;.*+(,.#
    !"$$#     ?5.@(5.(#!,(33&#DD#             !$$$#    ?5*)*+9(#                          !$/1#     !(,7*,*EF8@&5.#5&,('.G.#
    !"$0#     ?5.@(5.(#!,(33&#DDD#            !$$$"#   H*+&@(#                            !$/>#     !(,7*,*EA(44*5'.#&#45*4*5;.*+.#
    !"$/#     ?5.@(5.(#!,(33&#DI#             !$0#     J=.,.'K#,.+-8.3'.7L&#              !$/B#     !(,7*,*E!(,7*,*#,&''&5(,&#
    !"$1#     ?5.@(5.(#!,(33&#I#              !$0"#    M&33.7*#                           !$/N#     ?5*=,&@E3*,G.+-#
    !"0#      678*,(#3&7*+9(5.(#9.#"#-5(9*#   !$0""#   O&+*@.+(;.*+&#                     !$/P#     !(4(7.'K#9.#*5.&+'(53.#+&,,*#34(;.*#
    !$#       !"#$%&'())")(&*$+/"#$'("+       !$0"$#   !('&-*5.;;(;.*+&#                  !$/Q"R#   !*3'58.5.3'&@.#9.#5.)&5.@&+'*#7*+G&+;.*+(,.#
    !$"#      M&''*E375.''85(#                !$0"0#   D9&+'.).7(;.*+&#                   !$/Q""#   <&*@&'5.(#&87,.9&(#S4.(+(T#
    !$""#     ?5&5&C8.3.'.#                   !$0"/#   O&).+.;.*+&#                       !$/Q"$#   %.385(#9.#-5(+9&;;&#-&*@&'5.7L&#
    !$"""#    ?5&5&C8.3.'.#-5()*#@*'*5.#      !$0"1#   ?*,.3&@.(#                         !$/Q"0#   %.385(#9.#-5(+9&;;&#).3.7L&#
    !$""$#    ?5&5&C8.3.'.#G.38*#34(;.(,.#    !$0">#   J55.77L.@&+'*#,&33.7(,&#           !$/Q"/#   M&#'5(3)*5@(;.*+.#-&*@&'5.7L&#
    !$""0#    U&*5.&#.+-&+8&#                 !$0$#    %*5)*E3.+'(33.#                    !$1#      J,'5*#
    !$"$#     O&7*9.).7(#                     !$0$"#   !*+7*59(+;&#                       !0#       !"#$%&'())")(&*$+-(#0")(&*$+"10**(+
    !$"$"#    M&''&5&#                        !$0$$#   6'58''85(#9&,,(#)5(3&#             !0"#      V26#
    !$"$$#    6.,,(=&#                        !$0$0#   J+(,.3.#-5(@@('.7(,&#              !0""#     J8'.3@*#
    !$"$0#    ?(5*,&#                         !$0$/#   J+(,.3.#,*-.7(#                    !0"$#     W9.'*#
    !$"$/#    F*+#4(5*,&#                     !$00#    F(55(;.*+&#                        !0"0#     I.3'(#
    !$"$1#    H5(3.EV5(+*#                    !$00"#   !*@45&+3.*+&#5(77*+'*#             !0"/#     ?3.7*@*'5.7.'K#
    !$"0#     !*@45&+3.*+&#                   !$00$#   ?5*98;.*+&#5(77*+'*#               !0"1#     6.+95*@&#9.#O*X+#
    !$"0"#    ?(5*,&#                         !$/#     %('&@('.7(#                        !0">#     J,'5*#
    !$"0$#    H5(3.#                          !$/"#    F8@&5*#                            !0$#      O6J#
    !$"00#    V5(+*#                          !$/""#   ?5*7&33.#3&@(+'.7.#                !0$"#     D4&5(''.G.'K#
    !$"/#     !*@4.'(;.*+&#                   !$/"$#   !*+'&--.*#                         !0$$#     O.3,&33.(#
    !$"/"#    !,&''&5&#                       !$/"0#   ?5*7&33.#45&E3.+'(''.7.#           !0$0#     O.3-5().(#
    !$"/$#    6.,,(=&#+*+#*5'*-5().7L&#       !$/"/#   ?5*7&33.#,&33.7(,.#.+'(''.7.#   !0$/#     O.37(,78,.(#
    !$"/0#    ?(5*,&#+*+#*5'*-5().7L&#        !$/$#    !(,7*,*#E#45*7&33.#9.#=(3&#        !0$1#     !*@=.+(;.*+&#9.#O6J#9.G&53.#E#(,'5*#
    !$"//#    !F*+#4(5*,&#                    !$/$"#   6&-+.#9&,,&#*4&5(;.*+.#            !0$>#     F&338+#O6J#
    !$"1#     Y5'*-5().(#                     !$/$$#   H(''.#+8@&5.7.#                    !/#       !"#$%&'())")(&*$+#(2&+.(+(*#$'3$*#&+
    !$"1"#    Y4(5*,&#                        !$/$0#   U(=&,,.+&#                         !/"#      ?*'&+;.(@&+'*#
    !$"1$#    Y)5(3.#                         !$/$/#   !(,7*,*#(#@&+'&#                   !/$#      A&784&5*#
    !$"10#    Y=5(+*#                         !$/0#    !(,7*,*#E#+8@&5.#+('85(,.#         !/0#      O.9(''.7(#.+3&-+(@&+'*#
    !$">#     6'&385(#'&3'*#                  !$/0"#   J,-*5.'@.#9.#7(,7*,*#375.''*#      !//#      D+'&5G&+'*#,*-*4&9.7*#
    !$">"#    ?.(+.).7(;.*+&#                 !$/0$#   D+7*,*++(@&+'*#9.#+8@&5.#          !/1#      D+'&5G&+'*#43.7*,*-.7*#
    !$">$#    U5(375.;.*+&#                   !$/00#   %8,'.4,.#&#9.G.3*5.#
                                                       %.+.@*#!*@8+&#%8,'.4,*#&##         #         #
                                                       %(33.@*#!*@8+&#
    !$">0#    A&G.3.*+&#                      !$/0/#   O&+*@.+('*5&#
!                                                                                         #         #


             Table 1. Description of the different categories of the hierarchy in Figure 1.
                     !"#"$%&'   ()*+,%-.'   ("/0%-.'   1)"2+/+.,'   3"2*$$'   4&'
                     5&'        67'         &8'        9:7;<&'      &:9999'   9:=&;6'
                     5<'        ;9'         <9'        9:=9;7'      9:=>99'   9:=<87'
                     56'        ;&'         &='        9:=>99'      &:9999'   9:=?;;'
                     5;'        6='         &?'        &:9999'      9:77<;'   9:=6?>'
                     '          '           '          '            '         '
                     @+2).'     '           '          9:=<99'      9:=>76'   9:=677'
                     @*2).'     '           '          9:=<;<'      9:=>>&'   9:=67<'
                     '

                           Fig. 2. Performance for the first level


 – divide the didactic materials in smaller parts (learning objects). This because
   the use of the entire books or CD-Rom would be unfeasible, considering that
   just a few exercises need to be applied. Thus the whole material has to be
   checked by experts to be subdivided in learning objects. The latter are then
   used to design the formative offer, in place of the entire material, obtaining a
   more personalized and individualized learning.
 – Categorize the materials according to their bibliographic characteristics and,
   most importantly for the fruition of the materials, to features of the involved
   cognitive processes, e.g., the age, class and special situations of the target pupils
   etc.
 – Porting the material from paper or optical media to an electronic format (pdf
   or swf) so that it can be reassembled online and offline.

   In the last phase the application of an automatic classifier can provide significant
benefits to the whole process as explained in the following section.


2.3   Classification Task


To meet the need of the e-Value project, we have defined a new taxonomy as well
as the annotation procedure and initial datasets. Our hierarchical categorization
scheme is shown in Figure 1, whose more descriptive labels are reported in Table
1. The materials have to be classified according to four macro-categories, and then
divided into a structure of sub-categories of 4 levels. Each category is meaningful
for a correct description of the materials, from both administrative perspective
(e.g., in which educational context should be applied) and subject/cognitive process
viewpoint (e.g. Mathematics – Number – Lexical and semantic processes instead of
Mathematics – Basic processes of calculus – Numerical facts). The Macro-categories
are: C1 – School and class (referring to the ages 5 – 14); C2 – Subject/cognitive
process (referring to the subjects of mathematics, linguistics, phonetics, reading-
writing abilities); C3 – Pupils’ situation (for the cases of special needs or particular
situations); and C4 – Type of material (or the normal didactic usage in the class,
or for pupils with special situation or greater difficulties in the subject).
    Such automatic classification could improve the manual categorization costs, in
terms of both time and human resource. Each piece of educational material, being
part of a book, article or best practice, needs to be read and evaluated by experts,
before being assigned to the proper categories, and this process takes a huge amount
of time. Therefore, the use of an automatic classifier could significantly reduce the
time required to read and evaluate the materials. Of course, experts will need to read
part of the material in any case to refine and validate the output of the classifier.
However, the materials pertaining to a certain subject can be directly routed to the
experts of such field, thus improving the categorization accuracy.
               !"#"$%&   '()*+%,-        '"./%,-        0("1*.*-+     2"1)$$        34
               54                   67             48          9:7;&4             4       9:<4;6
               5&                   ;9             &9          9:<9;7          9:<=       9:<&87
               56                   ;4             4<            9:<=             4       9:<>;;
               5;                   6<             4>               4        9:77&;       9:<6>=

               544                   =              4               9             9             9
               54&                  68             4=         9:<666        9:<666        9:<666
               546                   >              4               9             9             9
               5&4                  4&              =               4           9:7       9:777<
               5&&                  49              6             9:;       9:888>            9:=
               5&6                   4              4         9:<;4&        9:<;4&        9:<;4&
               5&;                  &9             44               4             4             4
               5&=                   9              4               9             9             9
               564                   &              9
               56&                  6<             4<           9:<=             4        9:<>;;
               5;4                  &6             44              4        9:<9<4        9:<=&;
               5;&                  64             4&         9:7=>4             4        9:<&64
               5;6                  &=              7         9:777<             4        9:<;4&
               5;;                  49              8              4        9:888>            9:7
               5;=                   9              4              9             9              9
                                                        ?               ?             ?
               @*1(-                                          9:<48&        9:<48&        9:<48&
               @)1(-                                          9:8;97         9:86>        9:86&=


                         Fig. 3. Performance for the second level


3     Experiments
The aim of our evaluation is to demonstrate that state-of-the-art TC methods can be
applied to learn hierarchical classifiers for our e-Value taxonomy. This task is made
complex by two different aspects: (i) in addition to topic labels such as, Euclidean
Geometry, Problem Solving or Geometric Transformation, the taxonomy also con-
tains semantic characterization such as Story Development or Story Understanding,
whose characterization using simple terms seems harder; and (ii) given the novelty
of the taxonomy, we could only produce a small dataset, which makes the learning
of classification functions more difficult. To deal with and analyze such problems,
we experimented with hierarchy subsets, defined according to the hierarchy’s levels,
ranging from 1 to 4 (the maximum depth of our hierarchy). The deeper the level,
the more difficult TC is.

3.1   Setup
One major drawback of machine learning and thus of TC based on it is the need
of training data, i.e., a set of documents manually classified into the referring tax-
onomy. This data is difficult to find and/or to produce as it requires human labor.
Given the novelty of our taxonomy defined in Figure 1, no previous data was avail-
able. Thus, we set an annotation procedure (with only one annotator) of the didactic
material available in the Erickson’s database. We randomly selected 60 documents
and we classified each of them according to all the 112 nodes of the taxonomy. This
led to a dataset of 122 documents (repetitions are considered).
    We randomly divided the above data in training and test set by taking care that
for each document all its repetitions were all put either in the training or in the test
set. The training data was used to learn the set of 112 binary classifiers, one for each
category, following the one-vs-all schema. The output of the multi-class classifier is
the merged set of the individual binary classifier decisions. Although simple, this is
considered a state-of-the-art approach [5, 3]. We used default SVM parameters as the
small training data prevented to apply any reasonable parameterization approach.
We used a bag-of-term representation (string separated by space and punctuation)
without applying any feature selection, stop list or lemmatization. Although, we are
 !"#"$%& '()*+%,- '"./%,- 0("1*.*-+   2"1)$$     34     !"#"$%&   '()*+    '",-   .("/*,*0+   1"/)$$     23
  5464      6&      47     78999:       789    789&49    45353      5       3         6          6        6
  5466      67       ;        7          7        7      45355      7       3         6          6        6
  546&      47       ;        7          7        7      45357      7       3         6          6        6
  546<      47       =        4         789     78:=     453&5      6       3         6          6        6
  546=       >       &     78&&&&     78&&&&   78&&&&    45383      6       5         6          6        6
  5646       =       4        7          7        7      45385      6       3         6          6        6
  564<       7       4        7          7        7      45387      6       3         6          6        6
  564=       7       6        7          7        7      45393      5       3         6          6        6
  5649       6       4        7          7        7      45533      &       3         3          3        3
  5664       >       6       78<         4     78=:4<    45535      :       5        6;&         3     6;8<3&
  5666       >       &     78999:     78999:   78999:    45553      :       7      6;999<     6;999<   6;999<
  56&6       6       4        7          7        7      45755      6       3         6          6        6
  56<4       4       6        7          7        7      45757      5       3         6          6        6
  56<6      46      47        4         78<    78=:4<    4575&      3       3         6          6        6
  56<&       4       <        7          7        7      45&33      3       5         6          6        6
  5&64       7       &        7          7        7      45&53      8       7         6          6        6
  5&66      67       >       78:      78:::;   78:&9;    45&55      &       8         6          6        6
  5&6&       4       9        7          7        7      45&57      3       5         6          6        6
  5&6<      49       <        4          4        4      45&5&      7       9         3       6;399<   6;5=8<
  5&6=       =       4        7          7        7      45&73      3       &         6          6        6
                                                         45&75      3       3         6          6        6
                                                         45&77      6       5         6          6        6
                               ?         ?        ?
 @*1(-                      78;=<=    78:6=4   78:;<=   >*/(0                      6;=&76     6;97:8   6;<5<7
 @)1(-                      786;;&    7869;>   7869&4   >)/(0                      6;37:&     6;35==   6;33&<

                   (a) third level                                        (b) fourth level

Fig. 4. Performance for the third and fourth level. Categories with no document in the
test set and the categories of upper levels are not reported.


confident that the latter may relevantly improves our models. We used the classical
log(T F ) ∗ IDF weighting scheme and normalized vectors.
    The performance is provided by means of Micro- and Macro-Average F1, evalu-
ated from our test data over all 112 categories. Additionally, the F1s of the binary
classifiers are reported. For measuring the performance of different hierarchical lev-
els, only the nodes up to the target level are considered, e.g., for the first level, we
only measure the Micro/Macro F1 of C1, C2, C3 and C4.

3.2      Results and Discussion
Table 2 reports the performance on the first level. We note that for each category
there are about 40 documents for training. These seem to be enough as the accuracy
of the individual categories as well as the overall Micro/Macro F1 is exceptionally
high. This is not completely surprising as most documents are repeated in the above
four categories.
    Table 3 illustrates the results for the second level. We note that when the training
documents are more than 20, very good results can be achieved. Low performance is
shown for C11 and C13, which are trained with less than 7 documents. Additionally,
they have only one test document, this means that their accuracy cannot really be
estimated. The situation of C31 is even worse as it has no test documents. In this
case, we do not report any accuracy in the related row. It should also be noted
that, since we use one-vs-all schema, the accuracy of C1,..,C4 is the same as before.
Thus, from now on, we will not report the accuracy of previously reported binary
classifiers.
    Table 4 shows the performance on levels 3 and 4. Again the few training doc-
uments available for the classifiers prevent to achieve a reasonable F1. There are
some good cases such as C124 and C322 but also bad cases such as C122 and C123.
The latter two refer to Primaria Classe II and Primaria Classe III, respectively,
which have large overlap with the other classes, i.e., I, IV and V. For separating
such categories, the simple bag-of-words may not be enough.

4     Conclusions
In this paper, we have described an interesting and new semantic classification
problem in the context of the educational framework of the e-Value project. We
have defined a new hierarchical taxonomy, which is promising for improving the
production cycle of educational systems. To test the feasibility of the approach, we
have also built a corpus annotated according to the above taxonomy. Such data
was used for training an MCC based on SVMs. The results show that when there
is a reasonable amount of training documents the classifiers can deploy remarkably
high accuracy. On the other hand, the F1 of lower level categories is highly affected
by data scarceness. Some categories would probably require the definition of more
expressive features to better model their separation.
    Possible solutions are also provided by previous work, which shows more ad-
vanced TC models, e.g., [6], in which global dependencies between hierarchical nodes
are encoded in a gradient descendent learning approach. They experimented with
Reuters Volume 1 (RCV1) 2 on a subhierarchy only containing 34 nodes. Other rel-
evant work such as [4] and [1] uses a rather different datasets and a different idea of
dependencies based on the feature distributions over the linked categories. Finally,
[3] experiment with models similar to ours achieving state-of-the-art on RCV1.

Acknowledgements
The research described in this paper has been partially supported by the Italian
Project e-Value (PAT) and by the European Community’s Seventh Framework
Programme (FP7/2007-2013) under the grants #231126: LivingKnowledge –
Facts, Opinions and Bias in Time, #247758: EternalS – Trustworthy Eternal
Systems via Evolving Software, Data and Knowledge, and #288024: LiMoSINe –
Linguistically Motivated Semantic aggregation engiNes.


References
1. Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Belkin, N.J., In-
   gwersen, P., Leong, M.K. (eds.) Proceedings of SIGIR-00, 23rd ACM International
   Conference on Research and Development in Information Retrieval. pp. 256–263.
   ACM Press, New York, US, Athens, GR (2000), http://research.microsoft.com/
   ~sdumais/sigir00.pdf
2. Hayes, P.J., Weinstein, S.P.: Construe/Tis: a system for content-based indexing of a
   database of news stories. In: Rappaport, A., Smith, R. (eds.) Proceedings of IAAI-90,
   2nd Conference on Innovative Applications of Artificial Intelligence. pp. 49–66. AAAI
   Press, Menlo Park, US (1990)
3. Lewis, D.D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text
   categorization research. The Journal of Machine Learning Research (5), 361–397 (2004)
4. McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification
   by shrinkage in a hierarchy of classes. In: ICML. pp. 359–367 (1998)
5. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5,
   101–141 (December 2004), http://dl.acm.org/citation.cfm?id=1005332.1005336
6. Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hi-
   erarchical multilabel classification models. The Journal of Machine Learning Research
   (7), 1601–1626 (2006)
7. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing
   Surveys 34(1), 1–47 (2002)

2
    trec.nist.gov/data/reuters/reuters.html