<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The case of InterCorp, a multilingual parallel corpus. In:
International Journal of Corpus Linguistics</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Corpus Methods and Semantic Fields: the Concept of Empire in English, Russian and Czech</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Natalya Semenova nathalja.v.semenova@gmail.com</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ekaterina Gvozdyova</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saint-Petersburg State University Saint-Petersburg</institution>
          ,
          <addr-line>Russian Federation</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Svetlana Pivovarova</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Victor Zakharov</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1997</year>
      </pub-date>
      <volume>17</volume>
      <issue>3</issue>
      <abstract>
        <p>The paper presents the results of the ongoing research on creating the semantic eld of empire. A semantic eld is a collection of words and word combinations covering a certain area of human experience and forming a relatively autonomous microsystem with one or several centres. Relations in such microsystems are also called associations. We draw upon data on syntagmatic collocability and distributional analysis techniques to form a set of lexical units connected by systemic paradigmatic relations of various types and strength. We have developed a methodology to ll a semantic eld with lexical units based on morphologically tagged corpora and Sketch Engine built-in tools of statistical distributional analysis. Text material is represented by our corpora in the domain of empire. As part of the study, we have retrieved lists of items lling the semantic eld of empire. Our research is focused on the concept of empire in di erent languages; therefore, we also deal with translation equivalents in language pairs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Languages tend to reflect everything native speakers think or know about the world. Such beliefs exist
not only in the mindset of an individual, but are also characteristic of larger groups (families, nations, societies,
etc.) and form what is called a picture of the world. Language and picture of the world are inextricable: a
picture of the world can be described by language which in its turn creates a linguistic picture of the world
depending on a nation’s culture and experience. Researchers seem to be interested in studying linguocultural
correlations between similar items within different linguistic pictures of the world.</p>
      <p>Linguistic picture of the world determines the most important features of each language, its vocabulary,
word formation and syntax [Weisgerber, 1993]. Studies of various linguistic pictures of the world contribute to
the research of a native speaker mindset, but a proper study of linguistic picture of the world and linguistic
consciousness requires that they should be presented in the form of something tangible and researchable.
Associative fields and thesauri might play this role [Ufimtseva, 2014].</p>
      <p>Associative fields relate to semantic fields, which can be defined as sets of language units (words and set
phrases) with a common meaning component. Groups of lexemes within a semantic field have both linguistic
and non-linguistic relations. Such links between elements within a field are of great importance because they
reflect dependencies between words in a language. Words are not used separately; all lexemes tend to form a
model of the lexical-semantic system.</p>
    </sec>
    <sec id="sec-2">
      <title>Semantic Fields Description</title>
      <p>
        The term “field” was introduced by G. Ipsen to denote a group of words which forms a unit of meaning
[
        <xref ref-type="bibr" rid="ref3">Ipsen, 1924</xref>
        ]. The field theory of semantics (defined as the study of words with similar meaning) was developed
by W. Humboldt and H. Osthoff [
        <xref ref-type="bibr" rid="ref5">Shur, 1974</xref>
        ]. Semantic fields are considered to be semantic groups which consist
of lexemes and relations between them, such as synonymy, antonymy, hyponymy, meronymy, etc.
      </p>
      <p>
        A semantic field can be described by its features. Each field has several semantic properties; elements of
semantic fields are joined together by some common semantic property. V. G. Admoni argued that all semantic
fields have the following structure: a nucleus, or a core, and a periphery [
        <xref ref-type="bibr" rid="ref6">Admoni, 1973</xref>
        ]. All elements of a field
(lexemes) are connected to its nucleus. The nucleus must convey a general meaning of its field, therefore only
lexemes with the simplest meaning components become a nucleus while meaning components of the nucleus
must be present within other field elements. The further an element is from the field core, the more complex are
its meaning components. Elements of the field periphery usually consist of several simple meaning components
and thus may be connected to the nearby fields. Coherence between elements might differ within a field and
could be measured using methods of computational linguistics.
      </p>
      <p>
        Elements of a field have systemic syntagmatic (textual), paradigmatic (associative) and epigmatic
(lexically derived) relations [
        <xref ref-type="bibr" rid="ref7">Novikov, 2011</xref>
        ]. In order to place an element into a lexical system of a language
and form a semantic field, all three types of relations should be determined. However, many studies prove that
two main types (textual and associative relations) are enough for this purpose. In order to form a semantic
field, we need to implement the following steps: choose a list of lexemes that will be constituents of the field
and determine relations between them.
      </p>
      <p>There are various ways of making a list of candidates. The easiest one seems to be a monotonous task
of selecting items from all types of dictionaries (glossaries, semantic dictionaries, dictionaries of collocations,
etc.) representing syntagmatics as well as paradigmatics. However, this method is somewhat time-consuming
and subjective. Besides, it does not guarantee that the list of elements will be comprehensive. Alternatively,
we can use an association experiment, which might prove effective.</p>
      <p>We believe that methods and approaches of computational linguistics provide the best way to form a
semantic field. All lexemes are represented in texts, which can be used for extracting necessary items. First,
we should find or create a large set of texts (corpus). The next step is to calculate coherence using statistical
measures, which would provide us with enough data for the extraction of relevant texts and lexemes. This
methodology allows us to make semantic field construction automatic.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Corpus-Based Semantic Research</title>
      <p>A corpus is a big collection of annotated and structured language data presented in electronic form and
designed for linguistic research [Zakharov, Bogdanova, 2019]. The number of corpus-based studies is rapidly
increasing due to linguistic annotation within corpora and a large amount of language data available for
research purposes. Moreover, corpora can be easily built for a specific purpose, which is appealing to many
scholars.</p>
      <p>As corpus linguistics developed and new larger semantically annotated corpora emerged semantic
corpusbased studies slowly evolved as well. A. Kilgariff, arguing in one of his works that semantics should be studied
using corpora, stated that a word sense corresponds to a cluster of texts for this word [Kilgarriff, 1997]. Corpora
are widely used in quantitative semantics: word sense disambiguation is often based on frequencies of words
in context extracted from corpora; semantic space (semantic field) construction relies on calculating semantic
distances and determining relations between words within a corpus.</p>
      <p>Statistical scores can describe elements that tend to co-occur in corpora. Such co-occurring elements which
form syntagmatic relations in texts can be called collocations. Coherence within collocations is calculated using
various statistical measures implemented in modern corpus systems: the most widely known are MI (Mutual
Information), log-likelihood, t-score, minimum sensitivity, logDice.</p>
      <p>Paradigmatic relations describe semantically similar elements: semantic similarity is based on determining
the agreement of words’ lexical neighbourhood [Ruge, 1992]. Therefore, in order to form paradigmatic relations,
we need to determine co-occurrence vectors for each word and compute their similarity. By comparing vectors,
we compare distances between them; the smaller the distance, the closer the words are in meaning to each
other.</p>
      <p>Paradigmatic relations can also be formed based on syntagmatic ones. Two words are considered
paradigmatically related if each of them is systematically connected to a third text element.</p>
      <p>One more way to compute paradigmatic relations is using statistical distributional analysis. The analysis
includes a set of algorithms for language description based solely on the distribution of elements within texts
[Yartseva, 1976] and provides a clear indication of functional and semantic relations between field elements.
Algorithms of statistical distributional analysis are implemented in many corpora systems (e. g. Sketch Engine
(Fig.1)).</p>
    </sec>
    <sec id="sec-4">
      <title>Formation of Semantic Fields Using Corpus Managers</title>
      <p>A corpus manager is a corpus analysis tool for data search and extraction. The most widely known
are Sketch Engine, NoSketch Engine, AntConc, MonoConc, etc. Corpus managers allow users to perform
the statistical analysis necessary for term extraction (these terms form a semantic field) and identification of
semantic relations between the terms. For detailed description of corpus managers’ tools, refer to documentation
[Sketch Engine documentation], [AntConc Software].</p>
      <p>In this research, we used Sketch Engine – a corpus analysis tool which allows users to create corpora by
uploading files or downloading content from the web using WebBootCaT technology. Sketch Engine built-in
tools deal with lexical-semantic patterns (word sketch), statistical analysis, distributional thesauri, clustering,
keyword extraction, etc. The goal of our study – to form a semantic field – implies identifying syntagmatic
and paradigmatic relations between corpus items, for which purpose we use the following Sketch Engine tools:
“Collocations” for syntagmatic relations (using association measures) and “Thesaurus” for paradigmatic ones.</p>
      <p>The “Thesaurus” tool in Sketch Engine retrieves words having a distribution similar to that of the given
word, which, as a rule, is due to their semantic proximity, i.e., in fact, this tool forms a uniterm semantic field.
Word distribution similarity is calculated statistically by using the association measure logDice [Rychly´, 2008]
and lexical-syntactic patterns [Kilgarriff, Rychly´, 2007]. At the next step, the most typical word co-occurrences
identified using the “Collocations” tool are added to the semantic field.
4.2</p>
      <sec id="sec-4-1">
        <title>Material and Research Methodology</title>
        <p>As this study deals with three languages (Russian, English and Czech) and their linguistic pictures of the
world, we needed a concept equally significant for each of these linguocultures. The concept of empire seems
to be the right one for our purposes. We have undertaken the following tasks: an automatic formation and
comparative analysis of the semantic field of empire for three languages, as well as compiling a thesaurus for
these semantic fields specifying quantitative features and examples from the corpora.</p>
        <p>Our research was conducted on English, Russian and Czech text corpora in the domain of empire (the
Russian corpus was created earlier in association with M. Khohlova). Russian and English text corpora consist
of four subcorpora, each covering different periods: the XVIIIth century, the first half of the XIXth century,
the second half of the XIXth century, and the XXth century.</p>
        <p>Our methodology relies on the “Thesaurus” tool within the SketchEngine system for extracting a ranked
list for each subcorpora and retrieving a list of lexemes that are present in all thesauri. The detailed description
of our methodology for semantic field formation could be found in [Zakharov, 2018]. After following all the
steps, we have retrieved summaries of distributional thesauri for Russian and English (Table 1, 2).</p>
        <p>Due to the lack of Czech historical corpora, we failed to conduct a diachronic study for the Czech
language. The research was carried out using corpora of modern texts: syn v7 (all synchronic written corpora
of the Czech National Corpus) and csTenTen 2017 (Sketch Engine Czech web corpus).</p>
        <p>At the first stage, various lexicographic sources were used to describe the concept of empire in terms of
keywords. Having analysed definitions of empire in various Czech dictionaries, we identified the main meanings
and corresponding semantic attributes of the concept of empire:
1. monarchy, headed by the emperor;
2. large state, consisting of several parts, possibly colonies;
3. metaphoric meanings derived from one of the first two (e.g. a large enterprise, parts of the natural world,
etc.).</p>
        <p>In our analysis, we deal only with vocabulary related to the first concept.</p>
        <p>We have carried out a definitional analysis of explanatory dictionaries and dictionaries of synonymy and
identified elementary units of a meaningful plan. In doing so, we sought to make these terms monosemic.</p>
        <p>Lexical identifiers of the concept of empire in Czech are as follows: c´ısaˇr (the emperor), c´ısaˇrstv´ı (empire),
dynastie (dynasty), imp´erium (empire), kra´l (king), mocna´ˇrstv´ı (monarchy), monarchie (monarchy), panovn´ık
(ruler), ˇr´ıˇse (empire), vla´dce (ruler).</p>
        <p>Then for each of these identifiers, 10 distributional thesauri were built in Sketch Engine based on csTenTen
2017 corpora (Fig. 2). In order to avoid retrieving nonrelevant vocabulary, the volume of the distributional
thesaurus was limited to 15 items.</p>
        <p>At the next stage, all 10 thesauri were put together into one dataset. Furthermore, for each term, the
average score was calculated. We have made the following empirical assumption: if a lexeme occurs in at least
N thesauri (we call N the stability coefficient), it is a candidate for inclusion in the core of the semantic field.
The lexemes with a value of the score less than N form its periphery. Both in the centre and the periphery
area the lexemes can be sorted according to their score.</p>
        <p>Further, for each element of the field core, the most typical bigram collocations were identified using
CˇNK syn v7 corpus and the “Collocations” tool. Bigrams were sorted by MI.log_f – one of the most effective
association measures.
4.3
4.3.1</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <sec id="sec-4-2-1">
          <title>Semantic field of empire in Russian</title>
          <p>Summary distributional thesaurus for the semantic field of empire in Russian included 160 entries with
79 unique words occurring once. 33 different words occur in 2 or more minithesauri. We call these 33 words
the nucleus (or the core) of the semantic field. Their distribution in the subcorpora is as follows: XVIII: 8,
XIX-1: 24, XIX-2: 26, XX: 23. The full alphabetical list of the core lexemes includes:</p>
          <p>Англия (England), государственность (statehood), государство (state), держава (derzhava),
Европа (Europe), император (emperor), искусство (art), история (history), культура (culture),
литература (literature), мир (world), монархия (monarchy), наука (science), нация (nation), общество (society),
община (community), политика (policy), правительство (government), просвещение (education),
революция (revolution), религия (religion), Рим (Rome), Россия (Russia), союз (union), страна (country),
традиция (tradition), учреждение (institution), философия (philosophy), Франция (France), христианство
(Christianity), царство (kingdom), церковь (church)
4.3.2</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Semantic field of empire in English</title>
          <p>In this section, we present the results of semantic field formation for English. The full list of lexemes
includes 113 items, 46 being unique.</p>
          <p>The alphabetical list of terms for the English semantic field includes the following lexemes: affair, Africa,
ally, America, arm, army, assembly, Austria, authority, body, Britain, camp, Canada, capital, church, city,
colony, commerce, community, conquest, Constantinople, constitution, corps, country, court, crown, dominion,
dominions, dynasty, East, emperor, Empire, enemy, England, Europe, family, fleet, force, fortune, France,
freedom, frontier, garrison, Gaul, Germany, government, group, happiness, history, house, India, industry,
interest, Ireland, island, Italy, king, kingdom, land, language, law, liberty, life, line, man, monarch, monarchy,
movement, name, nation, order, part, party, people, person, policy, population, position, possession, power,
prince, property, province, Prussia, question, race, religion, republic, Republic, revolution, right, Rome, rule,
Russia, service, settlement, ship, society, sovereign, Spain, state, States, subject, system, territory, throne,
time, town, trade, troop, war, work, world.</p>
          <p>Lexemes that occur in 2 or more thesauri form the core of the semantic field of empire. In English, the
core includes 67 lexemes, which are distributed the following way: XVIII: 42, XIX-1: 12, XIX-2: 7, XX: 6. Here
is the full list of the core lexemes in alphabetical order:</p>
          <p>ally, army, Austria, authority, Britain, capital, church, city, colony, conquest, constitution, country,
court, crown, dominions, emperor, Empire, enemy, England, Europe, family, fleet, force, France, Germany,
government, history, interest, island, Italy, king, kingdom, land, law, liberty, life, line, man, monarch,
monarchy, name, nation, part, party, people, position, power, prince, province, race, religion, republic, Rome,
Russia, society, sovereign, Spain, state, system, territory, throne, town, trade, troop, war, work, world
4.3.3</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Semantic field of empire in Czech</title>
          <p>The intersection of 10 thesauri (in total 150 lexical units) gave 88 unique lexemes, of which 13 met 3 or
more times, 20 – 2 times and 55 – once. If we take the stability coefficient equal to 3, then 13 lexemes form the
core of the semantic field of empire for the Czech language. Interestingly, three of the original identifiers of the
concept of empire which we took from Czech dictionaries were found in the combined distributional thesaurus
for the Czech language only 2 times (c´ısaˇrstv´ı, dynastie, mocn´aˇrstv´ı). However, we included them in the core
of the semantic field for the Czech language.</p>
          <p>The full list of the core of the semantic field of empire for the Czech language is as follows (in alphabetical
order): c´ısaˇr (emperor), c´ısaˇrstv´ı (empire), dynastie (dynasty), gener´al (general), imp´erium, imp´erium
(empire), kn´ıˇze (prince), kr´al (king), kr´alovna (queen), kr´alovstv´ı (kingdom), mocn´aˇrstv´ı (monarchy),
monarchie (monarchy), panovn´ık (ruler), ˇr´ıˇse (empire), velitel (commander), vl´adce (ruler), v˚udce (leader).</p>
          <p>The periphery of the field includes 72 lexemes.</p>
          <p>A list of bigram collocations – candidates for the semantic field of empire – can also be formed, but this
question is beyond the scope of this paper.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Translation Equivalents of Lexemes of the Semantic</title>
    </sec>
    <sec id="sec-6">
      <title>Empire in Language Pairs</title>
    </sec>
    <sec id="sec-7">
      <title>Field of</title>
      <p>It is also interesting for us to find out how one and the same concept is translated in different languages,
namely, English, Russian and Czech. This task can be accomplished by means of parallel corpora containing
texts in different languages but in the same cultural and historical paradigm. Thus, such an instrument can
be used to interpret our results from historical and cultural perspectives.
5.1</p>
      <sec id="sec-7-1">
        <title>Russian and English equivalents</title>
        <p>In this study we have set a goal to identify translation equivalents for the elements of the semantic field
of empire, English being a source language, while Russian is a target language.</p>
        <p>With that end in view, we have compiled a test English-Russian parallel corpus of about 994,000 words.
The corpus currently comprises XVIIIth century historical texts. LFAligner [LFAligner] was used for sentence
alignment, while word alignment was done by GIZA++ word alignment tool developed within Moses statistical
machine translation system [Moses].</p>
        <p>The Sketch Engine tool “Thesaurus” was used to create an English distributional thesaurus for the word
empire (Fig. 3) comprising 100 lexemes.</p>
        <p>Each lexeme from the thesaurus was matched with a list of possible Russian translation equivalents
identified by GIZA++ word alignment tool. Table 3 shows Russian translation equivalents for the first ten
elements of the English distributional thesaurus for the word empire and their percentage of the total number
of Russian translation equivalents for the element (percentage values are rounded off). Erroneous translation
equivalents resulted from incorrect word alignment were discarded.</p>
        <p>These results could be further interpreted using the “Parallel concordance” tool within Sketch Engine.
This instrument allows us to see the use of translation equivalents in context (Fig. 4).</p>
        <p>Comparing the structure of the semantic field of empire in Russian and Czech, we have made the following
observations. If we temporarily exclude from consideration lexemes that mean roughly the same in Russian</p>
        <sec id="sec-7-1-1">
          <title>English distributional thesaurus lexeme Lexeme frequency in the test corpus</title>
          <p>monarchy
church
province</p>
          <p>city
monarch
kingdom</p>
          <p>East
prince
power
world
131
553
399
864
188
275
307
688
688
265</p>
          <p>Russian translation
equivalent
монархия
(‘monarchy’)
империя (‘empire’)
владычество
(‘dominion’)
держава (‘power’)
династия (‘dynasty’)
церковь (‘church’)
храм (‘temple’)
провинция
(‘province’)
владение (‘domain’)
местность (‘region’)
страна (‘country’)
сфера (‘realm’)</p>
          <p>город (‘city’)
столица (‘capital’)</p>
          <p>горожане
(‘city-dwellers’)
монарх (‘monarch’)
королевство
(‘kingdom’)
царство (‘realm’)
владение (‘domain’)
государство (‘state’)
страна (‘country’)
владычество
(‘dominion’)
Восток (‘East’)
монарх (‘monarch’)
князь (‘duke/prince’)</p>
          <p>принц (‘prince’)
владетель (‘owner’)
государь (‘sovereign’)
император
(‘emperor’)
власть (‘power’)
могущество (‘might’)
права (‘rights’)
полномочия
(‘authority’)
держава (‘power’)
влияние (‘influence’</p>
          <p>мир (‘world’)
свет (‘world/society’)</p>
          <p>Percentage of
the equivalent
75%
10%
3%
1%
1%
80%
2%
88%
3%
2%
1%
1%
87%
7%
1%
92%
31%
21%
19%
10%
6%
3%
48%
49%
31%
11%
4%
2%
1%
49%
27%
4%
3%
3%
3%
72%
1%
and Czech and lexemes that are present only in one of the semantic fields (государственность (statehood),
папа (pope), князь (prince), цивилизация (civilization), gener´al (general), velitel (commander)), we can see
that the remaining lexemes are related to the two microfields.</p>
          <p>The first microfield contains different names for the concept of empire: in Russian, they are империя
(empire), царство (kingdom), держава (power), partly монархия (monarchy); in Czech, imp´erium (empire),
ˇr´ıˇse (empire), kr´alovstv´ı (kingdom), c´ısaˇrstv´ı (empire), mocn´aˇrstv´ı (monarchy), partly monarchie (monarchy).
The second microfield contains different names for the concept of “emperor”: in Russian, they are монарх
(monarch), правитель (ruler), царь (tsar), владыка (ruler), государь (sovereign), император (emperor),
императрица (empress); in Czech, panovn´ık (ruler), vl´adce (ruler), c´ısaˇr (emperor), kr´al (king), kr´alovna
(queen).</p>
          <p>The last stage of our research deals with interlanguage equivalents. A preliminary assessment was carried
out based on 2-volume dictionaries edited by L.V. Kopecky (Russian-Czech, Czech-Russian). Vocabulary
equivalents can be seen in the left column in Table 4. When analysing translation dictionaries, we cannot
say with what probability one or another equivalent is used.</p>
          <p>It is interesting to see which words (and why) will prevail when translating the same concept. For
example, the Czech “ ˇr´ıˇse” in Russian can be translated as империя (empire), королевство (kingdom),
царство (kingdom), рейх (Reich), Германия (Germany). The Russian империя (empire) can be translated into
Czech as imp´erium, ˇr´ıˇse, c´ısaˇrstv´ı, drˇzava, etc. The same applies to other terms, too.</p>
          <p>Using the terms from our semantic field as an example, we attempted to evaluate them using the InterCorp
parallel corpus that is a part of CˇNK [ Cˇerma´k, Rosen, 2012]. CˇNK programmers developed the Treq tool based
on the InterCorp [Sˇkrabal, Vavˇr´ın, 2017], which retrieves all the translations of a given word and statistics on
the frequency of translation equivalents that were found in the corpus.</p>
          <p>The results obtained (translations from Czech to Russian) are shown in Table 4. The left column contains
a word in the input language with a translation from the dictionary, the top row contains words of the output
language (translations). The cells show quantitative characteristics of the translated equivalents: the upper
number is the number of translations for a given pair of words encountered in the InterCorp corpus, the
lower number is the percentage of this translation from all translations of this word (the percentage value is
rounded). Rare and erroneous cases are not included, so the percentage sum is not always 100%. The most
frequent translations are highlighted in bold.
ˇr´ıˇse
империя,
царство
imp´erium
империя
kr´alovstv´ı
королевство
c´ısaˇrstv´ı
империя
mocn´aˇrstv´ı
монархия
drˇzava
владение
carstv´ı
царство
monarchie
монархия
империя
200
51%
230
97%
6
86%
61%
6</p>
          <p>Conclusion</p>
          <p>In dictionaries, only the main translation is usually given, and as a rule it is most frequent in the corpus,
but the number of translation equivalents in real texts is greater (see, for example, the translations for ˇr´ıˇse)
and we see their ratio, too.</p>
          <p>We can see that text corpora and “smart” corpus tools can be used to identify syntagmatic and
paradigmatic relations in an automated mode and fill the term system properly. In our research, we attempted
to form the semantic field of empire, and lists of words retrieved expand significantly available lexicographic
resources.</p>
          <p>Lexemes were extracted using Sketch Engine; the lexemes form and adequately describe the semantic
field of empire and could successfully complement data from other sources. A list of empire-related lexemes
from WordNet thesaurus, for instance, contains a few items from our semantic fields and therefore could be
expanded.</p>
          <p>A comparative analysis of the semantic fields of empire for Russian, English and Czech reveals some
interesting patterns. English semantic field of empire includes more lexemes that are somehow related to
military action (army, conquest, enemy, fleet, force, troop, war etc.), power and statehood (authority, capital,
church, city, constitution, court, crown, law, liberty, etc.).</p>
          <p>Another peculiarity concerns distributions of lexemes within the core of semantic fields. Lexemes tend
to spread equally between the XIXth and XXth centuries in Russian. However, in English, the core lexemes
are mainly found in the XVIIIth subcorpus, which might have resulted from an unbalanced corpus of English
texts.</p>
          <p>
            Finally, it can be stated that the task of building one small semantic field reflects the peculiarities of
[
            <xref ref-type="bibr" rid="ref3">Ipsen, 1924</xref>
            ]
the lexico-semantic system of a language as well as opportunities and barriers in the automation of semantic
processing.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work was financially supported by the Russian Foundation for Basic Research, Project No.
18-01200474 “Semantic field of empire in Russian, English and Czech”.
[Ruge, 1992]
[Yartseva, 1976]</p>
      <p>Kilgarri A. (1997) I don't believe in word senses. Computers and the Humanities 31 (2).
1997. Pp. 91{113
Ruge, G. (1992) Experiments on Linguistically Based Term Associations. Information
Processing Management 28(3), 1992. Pp. 317{332
Yartseva V. N. (1976) Principles and Methods of Semantic Research. Ed. by V. N. Yartseva.
Moscow, Nauka, 1976. 380 p. (In Rus.) = Yartseva V. N. (pod red.) Printsipy i metody
semanticheskikh issledovanii. M.: Nauka, 1976. 380 s.
[Sketch Engine documentation] Sketch Engine documentation.</p>
      <p>https://www.sketchengine.eu/documentation/
(2019)</p>
      <p>Available
at
[AntConc Software] AntConc Software. (2019) Available at https://www.laurenceanthony.net/software/antconc/
[Rychly, 2008]</p>
      <p>Rychly P. (2008) A lexicographer-friendly association score. In: Proceedings of Recent
Advances in Slavonic Natural Language Processing, RASLAN, Brno, 2008. Pp. 6{9
[Kilgarri , Rychly, 2007] Kilgarri A., Rychly P. (2007) An e cient algorithm for building a distributional
thesaurus (and other Sketch Engine developments). In: Proceedings of the 45th Annual
Meeting of the ACL on Interactive Poster and Demonstration Sessions. Czech Republic,
June 2007. Pp. 41{44
[Moses]</p>
      <p>LFAligner. Available at https://sourceforge.net/p/aligner/wiki/Home/</p>
      <p>Moses. (2019) Available at http://www.statmt.org/moses/index.php?n=Main.HomePage</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Weisgerber</source>
          , 1993] Weisgerber
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>1993</year>
          )
          <article-title>Muttersprache und Geistesbildung. Translation from German by O. A</article-title>
          .
          <string-name>
            <surname>Radchenko</surname>
          </string-name>
          . Moscow.
          <year>1993</year>
          . 170 p.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [U mtseva, 2014]
          <string-name>
            <surname>U mtseva N. V.</surname>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>The Associative Dictionary as a Model of the Linguistic Picture of the World</article-title>
          .
          <source>Procedia - Social and Behavioral Sciences</source>
          <volume>154</volume>
          ,
          <year>2014</year>
          . Pp.
          <volume>36</volume>
          {
          <fpage>43</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Ipsen G.</surname>
          </string-name>
          (
          <year>1924</year>
          )
          <article-title>The Ancient Orient and Indogermans Feast Scipts for W</article-title>
          . Streitburg.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Heidelberg</surname>
          </string-name>
          ,
          <year>1924</year>
          . Pp.
          <volume>30</volume>
          -
          <fpage>45</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Shur G. S.</surname>
          </string-name>
          (
          <year>1974</year>
          )
          <article-title>Field theories in linguistics: a monography</article-title>
          . Moscow, Nauka,
          <year>1974</year>
          . 254 p.
          <article-title>(In Rus</article-title>
          .) = Shur G. S.
          <article-title>Teorii polya v lingvistike: monogra ya</article-title>
          . M.:
          <string-name>
            <surname>Nauka</surname>
          </string-name>
          ,
          <year>1974</year>
          . 254 s.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Admoni</surname>
            <given-names>V. G.</given-names>
          </string-name>
          (
          <year>1973</year>
          )
          <article-title>Syntax of modern German language: A system of relations and build system</article-title>
          .
          <source>Leningrad, Nauka</source>
          ,
          <year>1973</year>
          . 366 p.
          <article-title>(In Rus</article-title>
          .) =
          <article-title>Sintaksis sovremennogo nemetskogo iazyka. Sistema otnoshenii i sistema postroeniia</article-title>
          . L.:
          <string-name>
            <surname>Nauka</surname>
          </string-name>
          ,
          <year>1973</year>
          . 366 s.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Novikov A. L.</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>An Essay on Semantic Field. RUDN journal of language studies, semiotics and semantics</article-title>
          . Moscow,
          <year>2011</year>
          . Pp.
          <volume>7</volume>
          -
          <fpage>17</fpage>
          . (In Rus.) =
          <article-title>Eskiz semanticheskogo polya</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>Vestnik rossiiskogo universiteta druzhby narodov. Seriya: teoriya yazyka</article-title>
          .
          <source>Semiotika</source>
          . Semantika. .,
          <year>2011</year>
          .
          <fpage>7</fpage>
          -
          <lpage>17</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Zakharov, Bogdanova, 2019]
          <string-name>
            <surname>Zakharov</surname>
            <given-names>V. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bogdanova</surname>
            <given-names>S. U.</given-names>
          </string-name>
          (
          <year>2019</year>
          )
          <article-title>Corpus Linguistics: Textbook for Students. 3-rd edition, revised and extended</article-title>
          .
          <source>Saint-Petersburg</source>
          ,
          <year>2019</year>
          . 230 p.
          <article-title>(In Rus</article-title>
          .) =
          <article-title>Korpusnaya lingvistika: Uchebnik dlya studentov napravlenya Lingvistika i Pedagogicheskoe obrazovanie . 3- izd</article-title>
          .,
          <source>pererab. i dopoln. Spb.</source>
          ,
          <year>2019</year>
          . 230 s.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Zakharov</source>
          , 2018]
          <string-name>
            <surname>Zakharov</surname>
            <given-names>V. P.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>The Distributive and Statistical Analysis as a Tool to Automate the Formation of Semantic Fields (on the Example of the Linguocultural Concept of "Empire")</article-title>
          .
          <source>Proc. of Comp. Models in Language and Speech Workshop CMLS</source>
          , Vol.
          <volume>2</volume>
          ,
          <year>2018</year>
          . Pp.
          <volume>163</volume>
          -
          <fpage>180</fpage>
          . (In Rus.) = Zakharov V. P.
          <article-title>Distributivno-statistichesky analyz kak instrument avtomatizacii formirovanya semanticheskyh poley (na primere polya imperya ). V XV Mezhd</article-title>
          . konf.
          <source>po komp. i kogn. lingv. TEL</source>
          <year>2018</year>
          .
          <article-title>Sb</article-title>
          . trudov, Tom
          <volume>2</volume>
          , s.
          <fpage>163</fpage>
          -
          <lpage>180</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>