Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                            115


    Tracing Research Paradigm Change Using Terminological Methods
    A Pilot Study on “Machine Translation” in the ACL Anthology Reference Corpus


                     Anne-Kathrin Schumann† and Behrang QasemiZadeh
                                †
                               Applied Linguistics, Translation and Interpreting
                                         Universität des Saarlandes
                                Campus A2.2, 66123 Saarbrücken, Germany
                               anne.schumann@mx.uni-saarland.de
                            behrang.qasemizadeh@insight-centre.org


                     Abstract                                 into being, and (b) how they develop over time as
                                                              the scientific field itself evolves. Empirical work
     This paper explores the use of terminology               on the creation and development of terminologies
     extraction methods for detecting paradig-                is especially relevant for investigations into the
     matic changes in scientific articles. We                 history of science. Furthermore, studies of this
     use a statistical method for identifying
                                                              kind are also likely to benefit terminology as a dis-
     salient nouns and adjectives that signal
     these paradigmatic changes. We then em-                  cipline, since they might provide insights into the
     ploy the extracted lexical units for discov-             driving forces of terminological development and
     ering terms that are assumed to be central               knowledge organization.
     in characterising paradigm shifts. To as-                   The method proposed here identifies lexical
     sess the method’s performance, in this pi-               units the importance of which increases or de-
     lot study, we work on “machine transla-                  creases upon the transition from an earlier period
     tion” (MT) research articles sampled from
                                                              to a more recent one. In other words, we approach
     the ACL anthology reference corpus. We
     analyse this corpus to check whether the
                                                              history of science in the form of a trend analysis
     proposed approach can trace the dramatic                 task. Formally, this task consists of two sub-tasks,
     changes that machine translation research                namely:
     has experienced in the last decades: from
     transformational rule-based methods to sta-               (a) the detection of those periods in time when a
     tistical machine learning-based techniques.                   paradigm change is taking place (e.g., as sig-
                                                                   nalled by terminological dynamics in a do-
                                                                   main);
1   Introduction
                                                               (b) the extraction of terms that are indicative of a
Research in computational terminology tradition-                   declining or rising paradigm.
ally focuses on static models of knowledge ac-
quisition and representation. Corpus-based ap-                The pilot study described in this paper relates
proaches have led to an increased interest in the             only to the extraction of terms signalling paradigm
automatic extraction and semantic categorisation              shift (i.e., sub-task (b)). The material for our
of terms with many successful applications. How-              analysis consists of research articles dealing with
ever, progress in the empirical description and               “machine translation”. These articles are sam-
computational modelling of terminological dy-                 pled from the ACL Anthology Reference Corpus
namics has been rather slow.                                  (ACL ARC)—introduced in Bird et al. (2008).
   This paper suggests that terminological meth-                 Linguistically, the proposed method is inspired
ods and principles can be employed in empirical               by studies on register.1 Register linguistics ap-
investigations of diachronic knowledge evolution.             proaches linguistic variation as the description of
In particular, terminological methods can provide                 1
                                                                  See Cabré (1998) for an elaboration of terminological
new insights into problems of diachrony since they            aspects of register. Also, see Teich et al. (2015) for an applied
can be used to trace (a) how terminologies come               perspective.
                Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           116


changing configurations of linguistic features on           (2011) elaborates a typology for the description
the textual level. One of the relevant dimen-               of short-term term evolution patterns such as ne-
sions for this type of study certainly is the lexi-         ology–necrology (i.e., appearance–disappearance
con. Accordingly, we hypothesise that paradig-              of terms), term migration, and topic central-
matic changes in a field of knowledge are the               ity–disappearance. Both papers, unfortunately, do
cause of terminological dynamics. These dynam-              not provide any methodology for the automatic
ics are expressed in the form of the rise or de-            detection of these dynamics.
cline of not just isolated terms but whole groups              In computational linguistics, trend analysis is
of terms.                                                   usually approached by computing topic centrality
   We conclude that terms extracted by our method           and/or community influence measures and plot-
are salient if they are able to depict the paradig-         ting them on a timeline. An example is the work
matic change that the MT field has undergone in             by Hall et al. (2008) who try to trace the “de-
the last decades—that is, the advent of statistical         velopment of research ideas over time”. They
methods in contrast to symbolic approaches that             employ the standard Latent Dirichlet Allocation
were in use earlier. The remainder of this paper            (LDA) algorithm (Blei et al., 2003)—a term-by-
is structured as follows. Section 2 briefly sum-            document model—for identifying “topic clusters”.
marises relevant previous work. Section 3 outlines          The method involves manual selection of relevant
our extraction method. Section 4 reports the re-            topics and seed words in multiple runs of the LDA
sults of our pilot study, followed by an evaluation         algorithm. Probabilities derived from the LDA
in Section 5. Section 6 discusses obtained results          model are then used for the identification of rising
and concludes this paper.                                   and declining topics. Similar to our work, the au-
                                                            thors report experiments over the ACL ARC, us-
2   Related Work                                            ing publications from 1978–2006.
The term “paradigm” in the sense intended here                 A term-based approach to topic and trend analy-
goes back to Kuhn (1962). According to Kuhn, a              sis is proposed by Mariani et al. (2014). The anal-
paradigm emerges from a generally acknowledged              ysis is conducted on the ELRA Anthology of LREC
scientific contribution to a research field. The            publications starting in 1998. A term extraction
significance of the paradigm consists in its abil-          method, namely TermoStat (Drouin, 2004), is em-
ity to propose research problems and solutions to           ployed to extract “topic keywords”. For each year,
these problems to the relevant community. Some              terms and their variants are grouped into synsets
of Kuhn’s arguments can be traced back to Fleck             and the most frequent terms are found. Finally,
(1935). Fleck describes scientific communities as           the authors study the rank development for the 50
communities of thought (“Denkkollektive”) who               most frequent terms in order to extract informa-
share habits in their way of perceiving and solving         tion on whether topics designated by these terms
scientific problems (“Denkstil”, literally “style of        have risen, declined, or stayed stable over the pe-
thought”). What is important here for our research          riod under analysis. Relevant co-occurrences of
question is that paradigms are coupled not only             terms are also listed.
with specific types of problems and research meth-             Gupta and Manning (2011) stress that for the
ods, but also with terminologies: they constitute           purpose of detailed investigations into the history
the inventory of lexical units used to refer to con-        of science “. . . an understanding of more than just
cepts that are central for a given paradigm. Con-           the ’topics’ of discussion . . . ” is necessary. They
sequently, they are subject to change whenever the          extract semantic information for the categories
conceptual outline of the discipline changes.               FOCUS (i.e., the main contribution of an article),
   Terminological dynamics have been ap-                    TECHNIQUE , and DOMAIN from the title and ab-
proached by terminology proper from various                 stract sentences of research papers using a set of
perspectives. Relevant to our study are the                 bootstrapped patterns. They then identify com-
articles by Kristiansen (2011) and Picton (2011).           munities using the LDA algorithm. An influence
Kristiansen (2011) provides a detailed account              measure is defined and calculated for communi-
of external motivating factors of conceptual                ties based on the number of times their FOCUS,
and, eventually, terminological dynamics. Picton            DOMAIN , or TECHNIQUE have been adopted by
                  Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                             117


other communities. Finally, results obtained from             for a given research paradigm. Secondly, we
the ACL ARC are projected onto a timeline.                    use extracted lemmas for identifying paradigmatic
   The work listed above has a number of short-               terms.
comings, amongst them are:                                       The first step (i.e., extraction of lemmas) con-
                                                              sists of three sub-processes:
    • Approaches based on topic modeling do not
      always provide readily interpretable topics.                 1. extraction of frequency per document infor-
      While many of the induced topics are con-                       mation for all nouns and adjectives in the two
      vincing in terms of their lexical outline, we                   sub-corpora under analysis and removal of
      believe that the use of terminology, as pro-                    strings containing non-alpha-numeric char-
      posed by Mariani et al. (2014), can provide                     acters;
      more targeted information.
                                                                   2. ranking of lexemes obtained for the two time
    • For any detailed understanding of the his-                      periods using the method explained below;
      tory of a given discipline, it is insufficient               3. comparison of the two ranked lists in order to
      to measure how “central” or “popular” cer-                      identify those lexemes that have undergone
      tain topics were at different periods in time.                  relevant rank-shifts.
      Instead, the internal, fine-grained dynamics
      of the field such as paradigms and paradigm               Frequency and document-related information is
      shifts need to be understood. To our knowl-             extracted using the IMS Open Corpus Workbench
      edge, the work by Mariani et al. (2014) is the          (CWB) loaded with our data (Evert and Hardie,
      only one that includes a study of the lexical           2011). For ranking, we employ the measure for
      context of terminological units; however, this          calculating domain consensus proposed by Sclano
      analysis is not carried out systematically. We          and Velardi (2007). This measure—DC Di (t)—is
      believe that a systematic study of how groups           defined as follows:
      of terms change over time can provide rich                                    X
                                                                   DC Di (t) = −            nf (t, dk ) log(nf (t, dk )),   (1)
      information for users that are interested in the                             dk ∈Di
      history of a given scientific discipline (e.g.,
      see Figure 2).                                          where dk denotes the kth document in domain Di ,
                                                              and nf is the normalised frequency of term t in
3    Detection of Lexical Rank Shifts: The                    dk ∈ Di . DC Di (t) goes beyond the use of raw
     Method                                                   frequencies (e.g., as used by Mariani et al. (2014)).
                                                              Instead, DC Di (t) favors lexemes that are evenly
Our work differs from previous studies in that
                                                              distributed over all the texts in the two sub-corpora
we exploit the notion of rank shifts for detect-
                                                              as opposed to candidates that are frequent just in
ing fine-grained shifts rather than measuring topic
                                                              a small number of texts. The process results in
centrality or popularity. The comparison of rank
                                                              ranked lists of lexemes for the two time periods
shifts between two lists of sorted lexical items
                                                              that we want to compare. Each lexeme either oc-
is an established research method in the field of
                                                              curs in only one of the two lists or in both of them.
quantitative historical linguistics (e.g., c.f. Arapov
                                                              To detect major rank shifts RS for a lexeme t that
and Cherc (1974)) and we believe that it can be
                                                              occurs in both lists, we use the following formula:
adapted to our purposes.
   In essence, our approach to the detection of                                         1        1
terminological dynamics revealing a paradigm                              RS (t) =            −         ,                   (2)
                                                                                      RNew (t) ROld (t)
change is two-fold. Firstly, we extract lemmas
that experience a change in their ranks upon the              where R(t) denotes the rank of t in the two ranked
transition from older publications to more re-                lists New (recent publications) and Old (early
cent ones. We believe that these lemmas are                   publications).
either paradigmatic terms themselves or can be                   In the next step, the lemmas with highest rank
used to extract paradigmatic terms. We restrict               shifts are employed to build partly lexicalised term
word classes to nouns and adjectives since we be-             extraction patterns for identifying paradigmatic
lieve that they are the most characteristic units             terms. PoS sequence patterns are taken from the
                         Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                    118


          Pattern             CWB query                                Up terms                          Down terms
          adjective     +     [pos=”JJ.*”]                             machine translation               natural language
          noun                [lemma=”lexicon”]                        language model                    deep structure
          past participle     [pos=”VVN”]                              translation system                phrase structure
          + noun              [lemma=”lexicon”]                        word sense                        transformational rule
          noun + noun         [pos=”N.*”]                              training datum                    syntactic analysis
                              [lemma=”lexicon”]                        test set                          surface structure
          noun + noun +       [pos=”N.*”]    [pos=”N.*”]               mt system                         sentence structure
          noun                [lemma=”lexicon”]                        translation model                 physics problem
          noun + prepo-       [pos=”N.*”]     [pos=”IN”]               sentence pair                     semantic theory
          sition + noun       [lemma=”lexicon”]                        statistical machine translation   transformational grammar
          adjective + ad-     [pos=”JJ.*”]   [pos=”JJ.*”]              machine translation system        phrase structure grammar
          jective + noun      [lemma=”lexicon”]                        bleu score                        average number
                                                                       parallel corpus                   linguistic theory
                                                                       training set                      conversion rule
Table 1: Examples of partly lexicalised term ex-                       english word                      source language
traction patterns.

    ONLY NEW           ONLY OLD         UP             DOWN
                                                                     Table 3: Most frequent paradigmatic term can-
    alignment          periphrasing     word           language      didates extracted using the proposed lexicalised
    tag                cannonical       translation    sentence      PoS sequence patterns. We consider Up terms and
    annotation         transcodage      corpus         structure
    database           transcoded       model          analysis      Down terms as indicators of topics that are trend-
    baseline           pidgin           result         rule          ing and un-trending, respectively.
    ontology           sjstem           text           form
    threshold          descri           method         problem
    monolingual        ption            information    semantic
    multilingual       versinn          feature        grammar       search has undergone a major paradigm shift since
    learning           periphrasin      system         computer      the late 1980s, we want to examine whether our
    architecture       paragrapher      approach       program
    engine             subroutine       set            theory
                                                                     method is able to capture and characterise this
    n-gram             Noninclusive     training       way           paradigm shift.
    decoder            inclusiveness    pair           possible         To prepare the data for experiments, we ex-
    tagger             quelques         source         dictionary
                                                                     tract nouns and adjectives from papers contain-
                 (a)                             (b)                 ing either the string “machine translation” or “au-
                                                                     tomatic translation”. We divide the corpus into
Table 2: The result obtained from processing and
                                                                     two sets of articles: Old (1960s–70s) and New
comparing the Old and New sub-corpora. Note
                                                                     (1980s onwards). Since New is substantially
that dues to the presence of noise in pre-processes
                                                                     larger than Old , we randomly reduce the size of
(e.g., OCR), the extracted lists of lexemes also
                                                                     the New set in order to make it more comparable
contain invalid lexical units such as in Table 2a.
                                                                     to Old . Despite this effort, the two sub-corpora
                                                                     still have a different size and structure—New con-
multilingual term extraction tool TTC TermSuite                      tains 290,337 nouns and adjectives whereas Old
(Daille and Blancafort, 2013)2 . Table 1 provides                    contains only 79,247.
examples of these patterns.                                             The extracted lemmas are weighted using Equa-
                                                                     tions 1 and 2. Consequently, four sets of words are
4        Experiment                                                  generated:
As stated earlier, we used the ACL ARC as a                               • words that occur only in New (ONLY NEW);
dataset. The corpus contains research articles on
                                                                          • words that occur only in Old (ONLY OLD);
the topic of human language technology dating
                                                                          • words whose rank increases upon the transi-
back as far as 1965. In our experiments, we
                                                                            tion from Old to New (UP);
use the preprocessed segmented version of the
ACL ARC (i.e., the ACL RD-TEC) provided by                                • words whose rank decreases upon the transi-
QasemiZadeh and Handschuh (2014). Our pilot                                 tion from Old to New (DOWN).
study is limited to the research publications in the
                                                                     The first set—items that occur only in New—is
domain of MT. Given our knowledge that MT re-
                                                                     comparatively large and contains 14,347 adjec-
     2
         http://code.google.com/p/ttc-project/                       tives and nouns. Old, on the other hand, has
                           Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                      119


                     1                                                                  Up            Down
                                                   UP
                                                                                        training      transformational
                                                  DOWN
                    0.8                                                                 corpus        routine
                                                                                        score         force
        Precision
                    0.6                                                                 probability   picture
                                                                                        target        location
                    0.4                                                                 pair          numeral
                                                                                        evaluation    title
                    0.2                                                                 task          reverse
                                                                                        statistical   geometric
                     0                                                                  source        physics
                      50     100   150      200   250    300
                                    Top n term
                                                                                        performance   decimal
                                                                                        bilingual     personal
                                                                                        feature       intension
Figure 1: Precision at n for the extracted list of                                      error         Russian
terms using the lexicalised patterns for the Up and                                     sense         storage
Down lemmas.
                                                                       Table 4: The baseline lemma list: top 15 lemmas
                                                                       sorted by frequency and rank shifts.
7,094 unique adjectives and nouns. 1,023 lem-
mas have an increased rank over time, and 2,880
words are subject to rank decrease. Table 2 de-                          (b) the lists as a whole contain words that
tails the results by showing the top 15 items in                             are typical for the mainstream research
each set of generated words. Table 2a shows                                  paradigms in the respective periods.
words that occur only in New or only in Old. Ta-
ble 2b, however, shows common words with the                           To investigate (a), participants make binary dis-
largest rank shifts. Note that ONLY NEW and                            tinctions (i.e., in each of the Up and Down lists, a
ONLY OLD have been ranked by their assigned                            lemma is marked either as relevant or irrelevant).
DC score (Equation 1), whereas Up and Down                             To investigate (b), participants are asked to pro-
are sorted according to the score computed using                       vide a grade indicating the relevance of the lists of
Equation 2.                                                            terms on a scale from 1 (“list is irrelevant”) to 5
   In the second step, we select the top 30 plausi-                    (“relevant”).
ble noun lemmas from the UP list (shown in Ta-                            In order to assess whether the DCDi (t) rank-
ble 2b) and use them for building term extraction                      ing mechanism proposed in this paper (i.e., Equa-
patterns (as exemplified in Table 1). This pro-                        tions 1 and 2) outperforms simpler ranking meth-
cess is also repeated for the top 30 nouns from                        ods, we also construct a baseline data-set: nouns
the DOWN list. The two obtained sets of pat-                           and adjectives in New and Old are sorted by their
terns are employed to extract terms from the New                       frequency and then evaluated by the differences in
and the Old sub-corpora, respectively. Table 3                         their ranks. The resulting baseline data is given in
provides an overview over the 15 most frequent                         Table 4. Evaluators are asked to repeat the above-
candidate terms extracted by this method. Fig-                         mentioned assessment also for this baseline with-
ure 1 reports the precision for the first 300 Up and                   out being aware of how both data-sets were pro-
Down paradigmatic term candidates obtained by                          duced. Table 5 summarizss the results of this eval-
automatically comparing them to terms annotated                        uation.
in the ACL RD-TEC by QasemiZadeh and Hand-                                Each row of the Sub-Tables 5 summarises the
schuh (2014).                                                          input from each of the expert evaluators. The
                                                                       first and the second column in each sub-table
5   Evaluation                                                         show the sum of positively marked Up and Down
                                                                       items—that is, the sum of those lemmas (out
The 15 lemmas listed in Table 2b (i.e., DCDi (t)-
                                                                       of 15) that were found salient for either the
ranked lemmas) are presented to 5 researchers in
                                                                       1960s–1970s or the 1980s–2000s (sub-task (a)).
the area of machine translation. The evaluators are
                                                                       The third column presents the overall evaluation
asked whether
                                                                       of the lists (i.e., sub-task (b)). Table 5a provides
 (a) the individual lemmas in Table 2b are salient                     the results for the list of lexical items that are
     for the period they are supposed to represent                     ranked using the DC Di (t) score (i.e., listed in Ta-
     (New and Old); and,                                               ble 2b). Table 5b provides the assessments for the
                   Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                              120


    Up   Down     Overall         UP   DOWN      Overall         Sub-Corpus Old               Sub-Corpus New
    12   10       4:5             15   5         3:5
                                                                 natural language             machine translation
    12   10       4:5             11   2         3:5
    13   12       4:5             14   11        3:5
                                                                 machine translation          natural language
    10   10       3:5             11   6         4:5             computational linguistics    language processing
    3    4        2:5             6    2         3:5             data base                    translation system
                                                                 artificial intelligence      target language
            (a)                            (b)                   language processing          computational linguistics
                                                                 phrase structure             natural language processing
Table 5: Each row of these tables summarises                     syntactic analysis           training data
the assessment of each of the evaluators. Ta-                    translation system           source language
ble 5a shows the results for the sets of lexical items           automatic translation        test set
ranked by DC Di (t) (listed in Tables 2b). Table 5b,             natural languages            information retrieval
in contrast, provides the result for the sets of lex-            information retrieval        machine translation system
                                                                 noun phrase                  language model
ical items that are sorted by their raw frequencies
                                                                 language understanding       training corpus
(listed in Tables 4).                                            noun phrases                 noun phrase


                                                                Table 6: 15 most frequent terms (two tokens or
baseline list (i.e., listed in Table 4).
                                                                longer) in the Old and the New sub-corpora. This
   As can be observed in Table 5, the evaluators
                                                                list was collected using the manual annotations in
tend to prefer the DC Di (t)-ranked lexical items
                                                                the ACL RD-TEC and from the documents in the
over the baseline data-set. Except for one of the
                                                                two Old and New sub-corpora.
annotators who suggests that the baseline method
provides more informative output (i.e., the last row
of Tables 5a and 5b), the evaluators consistently               and is only slightly preceded by “statistical ma-
prefer the ranking mechanism proposed in this pa-               chine translation” itself. We also find that, during
per, assigning an overall grade of 3–4 (out of 5)               the 1980s, references to “linguistic‘theory” were
points to the output. However, the difference re-               rather frequent, but they have largely vanished
mains but slight.                                               since 1990. Themes such as generative gram-
   Table 6 shows the 15 most frequent terms in                  mar or phrase structure grammar were not dom-
the Old and the New corpus, respectively. These                 inant even in the earlier decades, but they exhibit
terms were collected using the manual annotations               a constant decline at least since the 1990s. Ev-
in the ACL RD-TEC by QasemiZadeh and Hand-                      idently, the plot confirms that our attribution of
schuh (2014). By comparing these terms to the                   terms to the categories Up and Down is justified.
output of our method (Table 3), we observe con-                 Moreover, this plot supports our hypothesis that
siderable differences. Evidently, for the detection             paradigm shifts are lexically expressed by dynam-
of paradigm shifts, terms extracted using semi-                 ics of whole groups of related terms.
lexicalised part-of-speech (PoS) patterns based on
our DC Di (t) method are better indicators of the               6   Discussion and future work
paradigm shift than terms ranked by their raw fre-              For a detailed understanding of the dynamics of
quencies.                                                       science, it is insufficient to measure how “central”
   Figure 2 exemplifies some of the dynamics de-                or “popular” certain topics are at different periods
tected by our method. For each year, the plot                   of time. Instead, those groups of terms that signal
shows the frequencies of terms normalised by the                paradigm changes must be detected—this is the
sum of all term frequencies extracted from the                  key idea that motivates the research presented in
publications in that year. All plotted terms were               this paper. The pilot study described here, there-
among the top items in our Up and Down lists.                   fore, aims at showing that terminological methods
Up paradigmatic terms are given in blue whereas                 can be employed to serve this purpose, and to pro-
Down paradigmatic terms are plotted in black.                   vide information for understanding what is going
   Figure 2 illustrates what types of information               on in a scientific field at a given moment in time.
can be drawn from the analysis conducted here.                     An inspection of our method’s output indi-
For example, we observe that “automatic eval-                   cates that the renewal of vocabulary (happening by
uation” rises synchronously with “Bleu score”                   some words falling from use and others being in-
                              Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                         121


                          ·10−3
                      5


                      4                      statistical machine translation              bleu score
                                                  automatic evaluation                      test set
 Relative Frequency


                                                   generative grammar             phrase structure grammar
                      3                              linguistic theory


                      2


                      1


                       0
                      1965                     1975                       1985        1990           1995          2000     2005
                                                                        Publication Year

Figure 2: Terms mapped onto a timeline: For each year, the y-axis shows the frequencies of terms
normalised by the sum of the frequencies of all the terms extracted in that year.


troduced) is considerable given the relatively short                      rience a similar increase in frequency and speci-
time span under analysis in our experiments. We                           ficity. That is, it is harder to distinguish irrelevant
observe that the content words shared by the two                          collocations containing Down terms from collo-
data sets are, in fact, a minority. However, we also                      cations with terminological value. Hence, term
observe that Only New (Table 2a) clearly con-                             extraction performance for Down terms is worse.
tains items that are indicative of more recent MT                         We believe that, if this property can be shown to
research such as “alignment”, “n-gram” or “de-                            hold in general, it is highly relevant as it can be
coder”. The items that are specific to Only Old ,                         used for the extraction of emergent and semanti-
on the other hand, seem to be rather spurious and                         cally related terms. Term extraction performance
low-frequent. These lexical units, rather unsur-                          itself can be further improved by integrating stan-
prisingly, disappear upon the transition from Old                         dard practices such as stop-word filtering.
to New .                                                                     Last not but not least, a timeline plot of Up
   Our evaluation also indicates that the lemmas                          and Down paradigmatic terms indicates that Down
extracted by our method (Table 2b) are indicative                         terms, as expected, do not exhibit the same expo-
of the respective time periods, at least as far as the                    nential growth as Up paradigmatic terms. How-
top ranks are concerned. MT experts prefer the                            ever, what we also observe is that many relevant
output of our proposed method over the output of                          terms do not simply fall from use (e.g., the term
the baseline method, perhaps due to the improved                          “linguistic theory”). They may even increase their
coverage of the relevant Down lemmas.                                     absolute frequency or become salient again in new
   Moreover, the terminological evaluation of the                         or unforeseen contexts.
extracted paradigmatic terms (Figure 1) shows                                The local context of terms therefore remains an
that Up lemmas indeed help to extract valid com-                          unexplored factor in trend analysis research. If
putational linguistics terms. Performance for                             we look more closely into our data, we find unex-
Down lemmas, however, is consistently worse.                              pected formulations such as “the language model
This difference in performance, in our opinion, is                        in the human” or “translation model based on se-
related to the higher productivity of the Up lem-                         mantic interpretation”. Future work will need to
mas from Table 2b: Up lemmas are used in a                                address these kinds of dynamics in superficially
growing number of more specific and more fre-                             identical terms that are even more fine-grained
quent terms, whereas Down lemmas do not expe-                             than the rank shifts observed in this pilot study.
                     Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                122


   Several measures can be taken into considera-                     The acl anthology reference corpus: A reference
tion for improving our current evaluation method.                    dataset for bibliographic research in computational
Future work will also strive for a comparison of                     linguistics. In Proceedings of LREC’08, Marrakech,
multiple sub-corpora that represent time slices of                   Morocco, may. ELRA.
                                                                  David Blei, Andrew Ng, and Michael Jordan. 2003.
different granularity, perhaps of more similar size                  Latent Dirichlet Allocation. Journal of Machine
and structure. The detection of time periods in                      Learning Research, (3).
which paradigm shifts occur and a more precise                    Maria Teresa Cabré. 1998. Do we need an autonomous
modelling of their interplay with terminological                     theory of terms? Terminology, 2(5).
dynamics are also important topics for future re-                 Béatrice Daille and Helena Blancafort.            2013.
search.                                                              Knowledge-poor and Knowledge-rich Approaches
   Finally, we would like to mention that an im-                     for Multilingual Terminology Extraction. In Ci-
                                                                     cling.
portant observation about the dependence of lexi-
                                                                  Patrick Drouin. 2004. Detection of Domain Specific
cal dynamics on frequency has already been made                      Terminology Using Corpora Comparison. In LREC.
by Arapov and Cherc (1974) who explicitly refer                   Stefan Evert and Andrew Hardie. 2011. Twenty-first
to Zipf:                                                             century Corpus Workbench: Updating a query ar-
                                                                     chitecture for the new millennium. In Corpus Lin-
        The speed of decay . . . can, in a way, be                   guistics.
        understood as the probability of decay.                   Ludwik Fleck. 1935. Entstehung und Entwick-
        The higher the ordinal number (rank) of                      lung einer wissenschaftlichen Tatsache: Einführung
        a [word] group . . . , the lower the fre-                    in die Lehre vom Denkstil und Denkkollektiv.
        quency of the words belonging to that                        Schwabe.
                                                                  Sonal Gupta and Christopher D. Manning. 2011. Ana-
        group, the higher is the speed of decay                      lyzing the Dynamics of Research by Extracting Key
        of this group.3                                              Aspects of Scientific Papers. In IJCNLP.
                                                                  David Hall, Daniel Jurafsky, and Christopher D. Man-
It is no surprise that term frequency does play a                    ning. 2008. Studying the History of Ideas Using
role in term necrology. However, the formula that                    Topic Models. In EMNLP.
we currently use for rank comparison (i.e., Equa-                 Marita Kristiansen. 2011. Domain dynamics in schol-
tion 2) does not account for this aspect. Further-                   arly areas: How external pressure may cause con-
more, the question how to compare terms the fre-                     cept and term changes. Terminology, 17(1).
quencies of which differ by sizes of magnitude                    Thomas S. Kuhn. 1962. The Structure of Scientific
                                                                     Revolutions. University of Chicago.
is also yet unresolved. Future work will address
                                                                  Joseph Mariani, Patrick Paroubek, Gil Francopoulo,
these shortcomings.                                                  and Olivier Hamon. 2014. Rediscovering 15 Years
                                                                     of Discoveries in Language Resources and Evalua-
Acknowledgements                                                     tion: The LREC Anthology Analysis. In LREC.
We thank Mihael Arcan, Iacer Calixto, Peyman                      Aurelie Picton. 2011. Picturing short-period di-
                                                                     achronic phenomena in specialised corpora: A tex-
Passban, Liling Tan and colleagues for evaluating
                                                                     tual terminology description of the dynamics of
our data. We would also like to thank Prof. Elke                     knowledge in space technologies. Terminology,
Teich for her comments and advice. This research                     17(1).
has been supported by the Deutsche Forschungs-                    Behrang QasemiZadeh and Siegfried Handschuh.
gemeinschaft (DFG, German Research Founda-                           2014. The ACL RD-TEC: A Dataset for Bench-
tion) through the Cluster of Excellence ‘Multi-                      marking Terminology Extraction and Classification
modal Computing and Interaction’.                                    in Computational Linguistics. In Computerm.
                                                                  Francesco Sclano and Paola Velardi. 2007. Termex-
                                                                     tractor: A Web Application to Learn the Shared Ter-
References                                                           minology of Emergent Web Communities. In Enter-
                                                                     prise Interoperability II: New Challenges and Ap-
M. V. Arapov and M. M. Cherc. 1974. Matematičeskie                  proaches. Springer.
   metody v istoričeskoj lingvistike. Nauka.                     Elke Teich, Stefania Degaetano-Ortlieb, Peter
Steven Bird, Robert Dale, Bonnie Dorr, Bryan Gibson,                 Fankhauser, Hannah Kermes, and Ekaterina
   Mark Joseph, Min-Yen Kan, Dongwon Lee, Brett                      Lapshinova-Koltunski. 2015. The Linguistic Con-
   Powley, Dragomir Radev, and Yee Fan Tan. 2008.                    strual of Disciplinarity: A Data-Mining Approach
   3                                                                 Using Register Features. J. Assoc. Inf. Sci. Technol.
       Translated from Russian.