<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Role of Language Evolution in Digital Archives?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nina Tahmasebi</string-name>
          <email>ninat@chalmers.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Risse</string-name>
          <email>risse@L3S.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science &amp; Engineering Department, Chalmers University of Technology</institution>
          ,
          <addr-line>412 96 Gothenburg</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>L3S Research Center</institution>
          ,
          <addr-line>Appelstr. 9, 30167 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>With advancements in technology and culture, our language changes. We invent new words, add or change meanings of existing words and change names of existing things. Left untackled, these changes in language create a gap between the language known by users and the language stored in our digital archives. In particular, they a ect our possibility to rstly nd content and secondly interpret that content. In this paper we discuss the limitations brought on by language evolution and existing methodology for automatically nding evolution. We discuss measured needed in the near future to ensure semantically accessible digital archives for long-term preservation.</p>
      </abstract>
      <kwd-group>
        <kwd>language evolution</kwd>
        <kwd>nding and understanding content</kwd>
        <kwd>digital archives</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>With advancements in technology, culture and through high impact events, our
language changes. We invent new words, add or change meanings of existing
words and change names of existing things. This results in a dynamic language
that keeps up with our needs and provides us the possibility to express ourselves
and describe the world around us. The resulting phenomenon is called language
evolution (or language change in linguistics).</p>
      <p>For all contemporary use, language evolution is trivial as we are constantly made
aware of the changes. At each point in time, we know the most current version of
our language and, possibly, some older changes. However, our language does not
carry a memory; words, expressions and meanings used in the past are forgotten
over time. Thus, as users, we are limited when we want to nd and interpret
information about the past from content stored in digital archives.
In the past, published and preserved content were stored in repositories like
national libraries and access was simpli ed with the help of librarians. These
experts would read hundreds of books to help students, scholars or interested
public to nd relevant information expressed using any language, modern or old.
Today, because of the easy access to digital content, we are no longer limited
to physical hard copies stored in one library. Instead we can aggregate
information and resources from any online repository stored at any location. The sheer
volume of content prevents librarians to keep up and thus there are no experts
to help us to nd and interpret information. The same applies to the
increasing number of national archives that are being created by libraries which crawl
and preserve their national Web. Language in user generated content is more
dynamic than language in traditional written media and, thus, is more likely to
change over shorter periods of time [TGR12].</p>
      <p>Much of our culture and history is documented in the form of written testimony.
Today, more and more e ort and resources are spent digitizing and making
available historical resources that were previously available only as physical hard
copies, as well as gathering modern content. However, making the resources
available to the users has little value in itself; the broad public cannot fully
understand or utilize the content because the language used in the resources has
changed, or will change, over time. To fully utilize these e orts, this vast pool of
content should be made semantically accessible and interpretable to the public.
Modern words should be translated into their historical counterparts and words
should be represented with their past meanings and senses.</p>
      <p>In this paper we will discuss the role of language evolution in digital archives
and the problems that arise as a result. We will review state-of-the-art in
detecting language evolution and discuss future directions to make digital archives
semantically accessible and interpretable, thus ensuring useful archives also for
the future. The rest of the paper is organized as follows: In Sec. 2 we discuss
di erent types of evolution and the corresponding problem caused. In Sec. 3 we
discuss the di erences between digitized, historical content and archives with
new content, e.g., Web archives. In Sec. 4 we provide a reivew of current
methods for detecting evolution and nally, in Sec. 5 we conclude and discuss future
directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Evolution</title>
      <p>There are two major problems that we face when searching for information in
long-term archives; rstly nding content and secondly, interpreting that
content. When things, locations and people have di erent names in the archives
than those we are familiar with, we cannot nd relevant documents by means
of simple string matching techniques. The strings matching the modern name
will not correspond to the strings matching the names stored in the archive. The
resulting phenomenon is called named entity evolution and can be illustrated
with the following:
\The Germans are brought nearer to Stalingrad and the command of
the lower Volga."
The quote was published on July 18, 1942 in The Times [TT42] and refers to the
Russian city that often gures in the context of World War II. In reference to
World War II people speak of the city of Stalingrad or the Battle of Stalingrad,
however, the city cannot be found on a modern map. In 1961, Stalingrad was
renamed to Volgograd and has since been replaced on maps and in modern
resources. Not knowing of this change leads to several problems; 1. knowing only
about Volgograd means that the history of the city becomes inaccessible because
documents that describe its history only contain the name Stalingrad. 2. knowing
only about Stalingrad makes is di cult to nd information about the current
state and location of the city3.</p>
      <p>The second problem that we face is related to interpretation of content; words
and expressions re ect our culture and evolve over time. Without explicit
knowledge about the changes we risk placing modern meanings on these expressions
which lead to wrong interpretations. This phenomenon is called word sense
evolution and can be illustrated with the following:
\Sestini's bene t last night at the Opera-House was over owing with the
fashionable and gay."
The quote was published in April 27, 1787 in The Times [The87]. When read
today, the word gay will most likely be interpreted as homosexual. However, this
sense of the word was not introduced until early 20th century and instead, in
this context, the word should be interpreted with the sense of happy.
Language evolution also occurs in shorter time spans; modern examples of named
entity evolution include company names (Andersen Consulting !Accenture)
and Popes (Jorge Mario Bergoglio !Pope Francis ). Modern examples of word
sense evolution include words like Windows or sur ng with new meanings in the
past decades.</p>
      <p>In addition, there are many words and concepts that appear and stay in our
vocabulary for a short time period, like smartphone face, cli- and cat shing 4
that are examples of words that have not made it into e.g., Oxford English
Dictionary, and are unlikely to ever do so.
3 Similar problems arise due to spelling variations that are not covered here.
4 http://www.wordspy.com/
2.1</p>
      <p>Formal problem de nition
Formally, the problems caused by language evolution (illustrated in Figure 1) can
be described with the following: Assume a digital archive where each document
di in the archive is written at some time ti prior to current time tnow. The larger
the time gap is between ti and tnow, the more likely it is that current language
has experienced evolution compared to the language used in document di. For
each word w and its intended sense sw at time ti in di there are two possibilities;
1. The word can still be in use at time tnow and 2. The word can be out of use
(outdated) at time tnow.</p>
      <p>Each of the above options opens up a range of possibilities that correspond to
di erent types of language evolution that a ect nding and interpreting in digital
archives.</p>
      <p>No
Evolution</p>
      <p>sense
different/evolved
@tnow
sense outdated
word removed</p>
      <p>@tnow
detect past</p>
      <p>sense
Word Sense</p>
      <p>Evolution
Fig. 1: Diagram of Word Evolution
sense active
word replaced</p>
      <p>@tnow
Term to Term</p>
      <p>Evolution
query word</p>
      <sec id="sec-2-1">
        <title>Word w at time ti in use at tnow</title>
        <p>No Evolution: The word is in use at time tnow and has the same sense
sw and thus there has been no evolution for the word. The word and its
sense are stable in the time interval [ti; tnow] and no action is necessary to
understand the meaning of the word or to nd content.</p>
        <p>Word Sense Evolution: The word is still in use at time tnow but with
a di erent sense s0w. The meaning of the word has changed, either to a
completely new sense or to a sense that can be seen as an evolution of
the sense at time ti. The change occurred at some point in the interval
(ti; tnow). We consider this to be the manifestation of word sense evolution.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Word w from ti out of use at tnow</title>
        <p>Word Sense Evolution - Outdated Sense: The word is out of use
because the word sense is outdated and the word is no longer needed in the
language. This can follow as a consequence of, among others, technology,
disease or occupations that are no longer present in our society. The word
w as well as the associated word sense sw have become outdated during
the interval (ti; tnow). To be able to interpret the word in a document
from time ti it becomes necessary to detect the active sense sw at time ti.
Because it is necessary to recover a word sense that is not available at time
tnow we consider this to be a case of word sense evolution.</p>
        <p>Term to Term Evolution: The word w is outdated but the sense sw is
still active. Therefore, there must be another word w0 with the same sense
sw that has replaced the word w. That means, di erent words, in this case
w and w0, are used as a representation for the sense sw and the shift is
made somewhere in the time interval (ti; tnow). We consider this to be term
to term evolution where the same sense (or entity) is being represented by
two di erent words. If the word w represents an entity, we consider it to
be named entity evolution.</p>
        <p>In addition to the above types of evolution, there are also spelling variations that
can a ect digital archives; historical variations with di erent spellings for the
same word or modern variations in the form of e.g., abbreviations and symbols.
Spelling variations are not considered in this paper.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Historical Data vs. Modern Data {</title>
    </sec>
    <sec id="sec-4">
      <title>Old Content vs. New Content</title>
      <p>When working with language evolution from a computational point of view there
are two main perspectives available. The rst considers today as the point of
reference and searches for all types of language evolution that has occurred
until today. In this perspective the language that we have today is considered
as common knowledge and understanding past language and knowledge is the
primary goal.</p>
      <p>In the second perspective the goal is to prepare today's language and
knowledge for interpretation in the future. We monitor the language for changes and
incrementally map each change to what we know today. We can assume that
knowledge banks and language resources are available and all new changes are
added to the resources. In the next paragraphs we will discuss the di erences
between the two perspectives, and the a ect on digital archives, in more
detail.
3.1</p>
      <p>Looking to the Past { The Backward Perspective
When looking to the past we assume that we have the following scenario. A
user is accessing a long-term archive and wants to be able to nd and interpret
information from the past. There are several problems which the user must face.
Firstly, there are few or no machine readable dictionaries or other resources like
Wikipedia, which su ciently cover language of the past. The user must rely on
his or her own knowledge or search extensively in other resources like
encyclopedias or the Web in order to nd an appropriate reformulation for modern words.
Once the resource is found the user must repeat the process to nd the meanings
of words, phrases and names in the document. Because of the low coverage of
the past, the user can nd only limited amount of help in this process.
In order to help users in their research of the past we need to automatically
nd and handle language evolution. This can be done by making use of existing
algorithms and tools or by developing new ones. For both existing and new tools
there are severe limitations caused by the lack of digital, high quality,
longterm collections. Most existing tools have been designed and trained on modern
collections and can have di culty with problems caused by language evolution.
For example, part-of-speech tagging, lemmatization and entity recognition can be
a ected by the age of the collection and thus limit the accuracy and coverage of
language evolution detection which relies on the mentioned technologies.
There is much work being done currently to overcome this lack of resources by
digitizing historical documents by means of optical character recognition (OCR).
However, many older collections have been stored for a long time which leads to
less than perfect quality of the resulting text. Degraded paper, wear or damage
as well as old fonts cause errors in the OCR process. This leads to problems in
the processing, for example to detect word boundaries or to recognize characters,
as well as to verify the results. If words cannot be understood by humans then
the correctness of the algorithms cannot be judged. Because of the historical
nature of the language, it is also di cult to nd people that are quali ed to
verify, improve or help detect language evolution on such collections.
3.2</p>
      <p>Looking to the Future { The Forward Perspective
When looking to the future to nd language evolution we have many advantages
compared to when looking to the past. The largest advantage is that most
resources are born digitally today and thus many of the problems with degraded
paper quality and OCR errors are avoided. In addition, there is an abundance
of available data. Most concepts, senses and entities are described and
referenced over and over again which makes it easier to gather evidence for each one
individually.</p>
      <p>In addition to the higher amount and quality of the text, there are plenty of tools
and resources available that can solve many smaller tasks automatically. Natural
language processing tools, machine readable dictionaries, and encyclopedias form
an army of resources which can be used to tackle current language. Changes in
our world are captured in resources like Wikipedia and questions like What is the
new name of the city XYZ? can be answered using machine readable resources
like Yago [SKW07] or DBpedia [BLK+09]. To prevent information loss in the
future, resources like Wikipedia, WordNet and natural language processing tools
can be stored alongside the archives. This can signi cantly simplify nding and
verifying language evolution in the future.
In the perspective of looking to the future we assume that current language is
common knowledge and therefore we can employ humans to help detect
language evolution. Crowd sourcing [How06] is collaborative work performed by
large amounts of people and is the mechanism behind creating and maintaining
Wikipedia. Such mechanisms could be used to monitor language and detect
evolution. If models for representing and storing language evolution are provided,
crowd sourcing could be used to detect language evolution manually or to verify
automatically detected language evolution. It is important to note that crowd
sourcing is time sensitive and must be done together with the data harvesting
to avoid that the crowd forgets.</p>
      <p>There are however several limitations. The rst limitation is noisy data being
published on the Web. With increasing amounts of user generated text and lack
of editorial control, there are increasing problems with grammars, misspellings,
abbreviations, etc. To which level this can be considered as real noise like with
OCR errors is debatable, however, it is clear that this noise reduces the e
ciency of tools and algorithms available today. This in turn limits the quality
of evolution detection as we depend on existing tools and their e ciency. The
second limitation is the restricted nature of resources like Wikipedia. As with
dictionaries, Wikipedia does not cover all entities, events and words that exist.
Instead, much is left out or only mentioned brie y which limits to which extent
we can depend exclusively on these resources.</p>
      <p>In order to avoid that future generations face the same problems as we have to
face, we need to start thinking about these problems already now. In particular
for Web archives that are continuously created and updated, with ephemeral
words, expressions and concepts. Otherwise we risk to render a large portion
of our archives semantically inaccessible and cannot utilize the great power of
crowd sourcing.
4</p>
    </sec>
    <sec id="sec-5">
      <title>State-of-the-art</title>
      <p>Word Sense Evolution Automatic detection of changes and variations in word
senses over time is a topic that is increasingly gaining interest. During the past
years researchers have evaluated and researched di erent parts of the problem
mainly in the eld of computational linguistics.
[SKC09] presented work on nding narrowing and broadening of senses over
time by applying semantic density analysis. Their work provides indication of
semantic change, unfortunately without clues to what has changed but can be
used as an initial warning system.</p>
      <p>The work presented by [LCM+12] aims to detect word senses that are novel
in a later corpus compared to an earlier one and use LDA topics to represent
word senses. Overall, the method shows promising results for detecting novel (or
outdated) word senses by means of topic modeling. However, alignment of word
senses over time or relations between senses is not covered in this work.
[WY11] report on automatic tracking of word senses over time by clustering
topics. Change in meaning of a term is assumed to correspond to a change in
cluster for the corresponding topic. A few di erent words are analyzed and there
is indication that the method works and can nd periods when words change
their primary meaning. In general, the work in this paper is preliminary but
with promising indications.</p>
      <p>Our previous work presented in [Tah13] was the rst to automatically track
individual word senses over time to determine changes in the meanings of terms. We
found narrowing and broadening as well as slow shifts in meaning in individual
senses and relations between senses over time like splitting, merging, polysemy
and homonymy. For most of the evaluated terms, the automatically extracted
results corresponded well to the expected evolution with regards to the main
evolution. However, word senses were not assigned to individual word instances,
which is necessary to help users understand individual documents.
In general, word sense disambiguation methods are not su cient to solve the
problem of word sense evolution because discrimination methods 1. often rely
on an existing set of word sense; and 2. do not map word senses to each other
over time.</p>
      <p>Named Entity Evolution Previous work on automatic detection of named
entity evolution has been very limited. The interest has largely been from an
information retrieval (IR) point of view as named entity evolution makes
nding relevant documents more challenging. Unfortunately, no e ort has been put
towards scalable methods and presentation of evolution to users.
Query reformulation is proposed in [BBSW09] where the degree of relatedness
between two terms is measured by comparing co-occurring terms from di erent
time periods. The approach requires recurrent computation for each query as it
depends on a target time speci ed by the user and is not well suited for large
datasets.</p>
      <p>Semantically identical concepts (nouns) used at di erent time periods are
discovered using association rule mining in [KVB+10]. Entities are associated to
events (verbs) and linked across time via the event. The method could be used
for shorter time spans but is less suited for longer time spans as verbs are more
likely to change over time than nouns [Sag10].</p>
      <p>Time-based synonyms (i.e., named entity evolution) are found in [KN10] by
utilizing link anchor texts in Wikipedia articles. Unfortunately, link information,
such as anchor text, is rarely available in historical archives but might be well
suited for Web data.</p>
      <p>In our previous work, [Tah13, TGK+12], we proposed NEER, an unsupervised
method for named entity evolution recognition independent of external
knowledge sources. Using burst detection we nd change periods, i.e., periods with high
likelihood of name change, and search exclusively in these periods for changes.
We avoid comparing terms from arbitrary time periods and thus overcome a
severe limitation of existing methods; the need to compare co-occurring terms or
associated events from di erent time periods. The method needs to be targeted
to Web data and streams of data to avoid re-computation.
In addition to detecting evolution, it is necessary to store evolution and to utilize
it for nding and interpreting at query time. Though there is some work done in
indexing and retrieval, e.g., [ABBS12, BMRV11], few target the particularities
of language evolution.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Outlook</title>
      <p>Language evolves over time. This leads to a gap between language known to the
user and language stored in digital archives. To ensure that content can be found
and semantically interpreted in our digital archive, we must consider semantic
preservation and prepare our archives for future processing and long-term
storage. Automatic detection of language evolution is a rst step towards o ering
semantic access, however, several other measures need to be taken. Dictionaries,
natural language processing tools and other resources must be stored alongside
each archive to help processing in the future. Data structures and indexes that
respect temporal evolution are needed to utilize language evolution for searching,
browsing and understanding of content. To take full advantage of continuously
updated archives that do not require expensive, full re-computation with each
update, we must invest e ort into transforming our digital archives into living
archives that continuously learn changes in language.</p>
      <p>There are methods for automatically nding language evolution, however, these
are initial and have little focus on scalability. E ort needs to be invested into
nding large scale methods that provide high quality evolution detection. In
addition, the possibility to make use of crowd sourcing to improve detection of
language evolution should be investigated. Studies are needed to establish where
and in which format human input is most bene cial, in particular, when the
input is in the form of the crowd without explicit domain expertise. If crowd
sourcing solutions are to be employed, the processing must take place at the
time of archiving to avoid the crowd forgetting up-to-date changes in the
language.</p>
      <p>To make the most out of our digital archives, language evolution must be given
a cultural dimension. For example, the term travel has had the same overall
meaning over time; transporting from location A to location B. However, this
does not tell the full story of the word or the concept represented by the word.
Today travel is mostly for business or as a happy occasion for holidays, without
any substantial risks involved. In the past, traveling contained great dangers
and was done at the risk of life. This inherent meaning of a word should be
communicated to the user to allow for a full interpretation of language and to
entail all dimensions of our language and culture. One possible solution is the
use of images that can better capture and more easily convey culture.
In addition to viewing language as variant over time, language can be considered
variant over demographics. When archiving the Web we have the possibilities to
gather knowledge of many subcultures and parts of the world. By continuously
detecting language evolution, we can better determine what content to harvest
and store for the future to ensure diverse archives.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [ABBS12]
          <string-name>
            <given-names>Avishek</given-names>
            <surname>Anand</surname>
          </string-name>
          , Srikanta Bedathur, Klaus Berberich, and
          <string-name>
            <given-names>Ralf</given-names>
            <surname>Schenkel</surname>
          </string-name>
          .
          <article-title>Index maintenance for time-travel text search</article-title>
          .
          <source>In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <source>SIGIR '12</source>
          , pages
          <fpage>235</fpage>
          {
          <fpage>244</fpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [BBSW09]
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Berberich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Srikanta J.</given-names>
            <surname>Bedathur</surname>
          </string-name>
          , Mauro Sozio, and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Bridging the Terminology Gap in Web Archive Search</article-title>
          .
          <source>In 12th Int. Workshop on the Web and Databases (WebDB'09)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [BLK+09]
          <string-name>
            <surname>Christian</surname>
            <given-names>Bizer</given-names>
          </string-name>
          , Jens Lehmann, Georgi Kobilarov, Soren Auer, Christian Becker, Richard Cyganiak, and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Hellmann</surname>
          </string-name>
          .
          <article-title>DBpedia - A crystallization point for the Web of Data</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>154</volume>
          {
          <fpage>165</fpage>
          ,
          <year>September 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [BMRV11]
          <string-name>
            <given-names>Siarhei</given-names>
            <surname>Bykau</surname>
          </string-name>
          , John Mylopoulos, Flavio Rizzolo, and
          <string-name>
            <given-names>Yannis</given-names>
            <surname>Velegrakis</surname>
          </string-name>
          .
          <article-title>Supporting queries spanning across phases of evolving artifacts using steiner forests</article-title>
          .
          <source>In Proceedings of the 20th ACM international conference on Information and knowledge management</source>
          ,
          <source>CIKM '11</source>
          , pages
          <fpage>1649</fpage>
          {
          <fpage>1658</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [How06]
          <string-name>
            <given-names>Je</given-names>
            <surname>Howe</surname>
          </string-name>
          .
          <source>The Rise of Crowdsourcing. Wired Magazine</source>
          ,
          <volume>14</volume>
          (
          <issue>6</issue>
          ),
          <year>06 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [KN10]
          <article-title>Nattiya Kanhabua and Kjetil N rvag</article-title>
          .
          <article-title>Exploiting time-based synonyms in searching document archives</article-title>
          .
          <source>In Joint Conference on Digital Libraries (JCDL'10)</source>
          , pages
          <fpage>79</fpage>
          {
          <fpage>88</fpage>
          ,
          <string-name>
            <surname>Australia</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [KVB+10]
          <string-name>
            <given-names>Amal</given-names>
            <surname>Chaminda</surname>
          </string-name>
          <string-name>
            <surname>Kaluarachchi</surname>
          </string-name>
          , Aparna S. Varde,
          <string-name>
            <given-names>Srikanta J.</given-names>
            <surname>Bedathur</surname>
          </string-name>
          , Gerhard Weikum, Jing Peng, and
          <string-name>
            <given-names>Anna</given-names>
            <surname>Feldman</surname>
          </string-name>
          .
          <article-title>Incorporating terminology evolution for query translation in text retrieval with association rules</article-title>
          .
          <source>In Proceedings of ACM Conf. on Information and Knowledge Management</source>
          ,
          <source>(CIKM'10)</source>
          , Canada,
          <source>October 26-30</source>
          , pages
          <fpage>1789</fpage>
          {
          <fpage>1792</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [LCM+12] Jey Han Lau, Paul Cook,
          <string-name>
            <surname>Diana</surname>
            <given-names>McCarthy</given-names>
          </string-name>
          ,
          <string-name>
            <surname>David Newman</surname>
            ,
            <given-names>and Timothy</given-names>
          </string-name>
          <string-name>
            <surname>Baldwin</surname>
          </string-name>
          .
          <article-title>Word Sense Induction for Novel Sense Detection</article-title>
          . In Walter Daelemans, Mirella Lapata, and Llu s Marquez, editors,
          <source>EACL</source>
          , pages
          <volume>591</volume>
          {
          <fpage>601</fpage>
          . The Association for Computer Linguistics,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Sag10]
          <string-name>
            <given-names>Eyal</given-names>
            <surname>Sagi</surname>
          </string-name>
          .
          <article-title>Nouns are more stable than Verbs: Patterns of semantic change in 19th century English</article-title>
          .
          <source>The 32nd Annual Conference of the Cognitive Science Society</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [SKC09]
          <string-name>
            <given-names>Eyal</given-names>
            <surname>Sagi</surname>
          </string-name>
          , Stefan Kaufmann, and
          <string-name>
            <given-names>Brady</given-names>
            <surname>Clark</surname>
          </string-name>
          .
          <article-title>Semantic density analysis: comparing word meaning across time and phonetic space</article-title>
          .
          <source>In Proc. of the Workshop on Geometrical Models of Natural Language Semantics</source>
          , GEMS '
          <volume>09</volume>
          , pages
          <fpage>104</fpage>
          {
          <fpage>111</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>[SKW07] Fabian M. Suchanek</surname>
            , Gjergji Kasneci, and
            <given-names>Gerhard</given-names>
          </string-name>
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th international conference on World Wide Web, WWW '07</source>
          , pages
          <fpage>697</fpage>
          {
          <fpage>706</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Tah13]
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          .
          <article-title>Models and Algorithms for Automatic Detection of Language Evolution. Towards Finding and Interpreting of Content in Long-Term Archives</article-title>
          .
          <source>PhD thesis</source>
          , Leibniz Universitat Hannover, To be published
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [TGK+12]
          <string-name>
            <surname>Nina</surname>
            <given-names>Tahmasebi</given-names>
          </string-name>
          , Gerhard Gossen, Nattiya Kanhabua, Helge Holzmann, and Thomas Risse.
          <article-title>NEER: An Unsupervised Method for Named Entity Evolution Recognition</article-title>
          .
          <source>In Proceedings of COLING 2012</source>
          , pages
          <fpage>2553</fpage>
          {
          <fpage>2568</fpage>
          , Mumbai, India,
          <year>December 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [TGR12]
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          , Gerhard Gossen, and Thomas Risse.
          <article-title>Which Words Do You Remember? Temporal Properties of Language Use in Digital Archives</article-title>
          .
          <source>In TPDL</source>
          , volume
          <volume>7489</volume>
          , pages
          <fpage>32</fpage>
          {
          <fpage>37</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [The87]
          <article-title>The Times</article-title>
          .
          <article-title>Sestini's bene t last night at the Opera-House was overowing with the fashionable and gay</article-title>
          . In London, England, Apr
          <volume>27</volume>
          , 1787; pg. 3; Issue 736.
          <string-name>
            <given-names>Gale</given-names>
            <surname>Doc</surname>
          </string-name>
          .
          <source>No.: CS50726043</source>
          ,
          <fpage>1787</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>[TT42] DIPLOMATIC CORRESPONDENT The Times</article-title>
          .
          <article-title>Menace To The Volga</article-title>
          . In London, England, Jul
          <volume>17</volume>
          ,
          <year>1942</year>
          ; pg. 3; Issue 49290.
          <string-name>
            <given-names>Gale</given-names>
            <surname>Doc</surname>
          </string-name>
          .
          <source>No.: CS52116209</source>
          ,
          <year>1942</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>