<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semi-automatic Description of Named Rivers and Bays for their Representation in a Terminological Knowledge Base</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pamela Faber</string-name>
          <email>pfaber@ugr.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Translation and Interpreting, University of Granada</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Juan Rojas-Garcia</institution>
        </aff>
      </contrib-group>
      <fpage>57</fpage>
      <lpage>70</lpage>
      <abstract>
        <p>EcoLexicon (http://ecolexicon.ugr.es) is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of landform concepts, this paper presents a semi-automatic method for extracting terms associated with named rivers (e.g., Pearl River ) and named bays (e.g., San Francisco Bay). Terms were extracted from an English specialized corpus on Coastal Engineering, where named rivers and bays were automatically identified. Statistical procedures were applied for selecting terms, rivers, and bays in distributional semantic models to construct the conceptual structures underlying the usage of named rivers and bays in Coastal Engineering texts. The rivers sharing associated terms were also automatically clustered and represented in the same conceptual network. The same was done for named bays sharing associated terms. The results showed that the method successfully described the semantic frames for named rivers and bays with explanatory adequacy, according to the premises of Frame-based Terminology. Furthermore, the semantic networks unveiled that the named rivers and bays mentioned in the Coastal Engineering corpus are both thematically related to sediment concentration and sediment transport in rivers, sediment discharge into bays and seas, and the negative effects of sediment supply decrease on coastal erosion because of human activities. 2012 ACM Subject Classification Computing methodologies → Information extraction; Computing methodologies → Lexical semantics; Computing methodologies → Semantic networks</p>
      </abstract>
      <kwd-group>
        <kwd>and phrases Named river</kwd>
        <kwd>Named bay</kwd>
        <kwd>Conceptual information extraction</kwd>
        <kwd>Geographic contextualization</kwd>
        <kwd>Text mining</kwd>
        <kwd>Frame-based Terminology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Acknowledgements This research was carried out as part of project FFI2017-89127-P,
TranslationOriented Terminology Tools for Environmental Texts (TOTEM), funded by the Spanish Ministry of
Economy and Competitiveness. Funding was also provided by an FPU grant given by the Spanish
Ministry of Education to the first author.
EcoLexicon is a multilingual, terminological knowledge base on environmental science (http:
//ecolexicon.ugr.es) that is the practical application of Frame-based Terminology ([12]).
Since most concepts designated by environmental terms are multidimensional ([11]), the
flexible design of EcoLexicon permits the contextualization of data so that they are more
relevant to specific subdomains, communicative situations, and geographic areas ([19]).
However, the geographic contextualization of landform concepts, namely, named lanforms, is
barely tackled in terminological resources because of two reasons in our opinion: (1) they
are considered mere instances of concepts such as river, bay, or beach, and their specific
relational behaviour with other concepts in a specialized knowledge domain is thus neglected
and not semantically described; (2) their semantic representation depends on knowing which
terms are related to each named landform, and how these terms are related to each other, a
time-consuming task taking into account that terminologists do not often resort to natural
language processing systems beyond corpus tools such as Sketch Engine ([17]).</p>
      <p>Consequently, this paper presents a semi-automatic method of extracting terms associated
with named rivers (e.g., Omaru River ) and named bays (e.g., Suisun Bay) as types of landform
from a corpus of English Coastal Engineering texts. The aim is to represent that knowledge
in semantic networks in EcoLexicon according to the theoretical premises of Frame-based
Terminology. Hence, on the hypothesis that named rivers and bays should be considered
concepts rather than instances in the Coastal Engineering domain, each named river and
bay should appear in the context of a specialized semantic frame that highlights both its
relation to other terms and the relations between those terms.</p>
      <p>These semantic frames, such as that shown in Figure 1 underlying the usage of Escambia
and Pensacola bays in Coastal Engineering texts, provide the background knowledge about
named rivers and bays necessary in communicative situations, such as specialized translation
to appropriately render terms into another language ([12]). Moreover, they make the semantic
and syntactic behavior of terms explicit by means of the description of conceptual relations
and term combinations ([10]).</p>
      <p>The rest of this paper is organized as follows. Section 2 provides motivations for the
research, and background on distributional semantic models and clustering techniques.
Section 3 explains the materials and methods applied in this study, namely, the automatic
identification of named rivers and bays, the selection procedures for terms, bays, and rivers
in distributional semantic models, and the clustering technique for bays and rivers sharing
associated terms. Section 4 shows the results obtained. Finally, Section 5 discusses the results
and presents the conclusions derived from this work as well as plans for future research.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background and literature review</title>
    </sec>
    <sec id="sec-3">
      <title>Motivations for the research</title>
      <p>Despite the fact that named landforms, among other named entities, are frequently found in
specialized texts on environment, their representation and inclusion in knowledge resources
has received little research attention, as evidenced by the lack of named landforms in
terminological resources for the environment such as DiCoEnviro2 , GEMET3 or FAO
Term Portal4 . In contrast, AGROVOC5 basically contains a list of named landforms with
hyponymic information, whereas ENVO6 provides descriptions of the named landforms
with only geographic details, and minimal semantic information consisting of the relation
located_in (and tributary_of in the case of named rivers and bays).</p>
      <p>Up to the present, knowledge resources have limited themselves to representing concepts
such as bay, river or beach, on the assumption that the concepts linked to each of them
are also appropriate, respectively, to all instances of named bays, rivers and beaches in the
real world. This issue is evident in the following description of forcing mechanisms acting on
suspended sediment concentrations (SSC) in bays and rivers.</p>
      <p>According to [26], temporal variations in the SSC of bays and rivers are the result of a
variety of forcing mechanisms. River discharge is a primary controlling factor, as well as tides,
meteorological forcing (i.e., wind-wave resuspension, offshore winds, storm and precipitation),
and human activities. Several of these mechanisms tend to act simultaneously. Nonetheless,
the specific mix of active mechanisms is different in each bay and river. For example, SSC in
San Francisco Bay is controlled by spring-neap tidal variability, winds, freshwater runoff, and
longitudinal salinity differences, whereas precipitation and river discharge are the mechanisms
in Suisun Bay. In Yangtze River, SSC is controlled by tides and wind forcing, whereas river
discharge, tides, circulation, and stratification are the active forcing mechanisms in York
River.</p>
      <p>Consequently, in a knowledge resource, a list of forcing mechanism concepts semantically
linked to bay and river concepts would not represent the knowledge really transmitted in
specialized texts. To cope with this type of situation, terminological knowledge bases should
include the semantic representation of named landforms.</p>
      <p>To achieve that aim in EcoLexicon regarding named rivers and bays, the knowledge should
be represented in a semantic network according to the theoretical premises of Frame-based
Terminology ([12]), which propose knowledge representations with explanatory adequacy for
enhanced knowledge acquisition in communicative situations such as specialized translation
([10]). Hence, on the hypothesis that named rivers and bays should be considered concepts
rather than instances, each named river and bay should appear in the context of a specialized
2 http://olst.ling.umontreal.ca/cgi-bin/dicoenviro/search_enviro.cgi
3 https://www.eionet.europa.eu/gemet/en/themes/
4 http://www.fao.org/faoterm/en/
5 http://aims.fao.org/en/agrovoc
6 http://www.environmentontology.org/Browse-EnvO
semantic frame that highlights both its relation to other terms and the relations between those
terms. The construction of these semantic networks and the semi-automatic extraction of
terms from a specialized corpus are described in this paper. As far as we know, this framework
has not been studied in the context of specialized lexicography, which is an innovative aspect
of this work. Needless to say that the extraction and description of named landforms from
text corpora have been applied in the field of Geographic Information Retrieval ([8]; [33]),
but not with the purposes of the Frame-based Terminology.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Distributional semantic models</title>
      <p>Distributional semantic models (DSMs) represent the meaning of a term as a vector, based on
its statistical co-occurrence with other terms in the corpus. According to the distributional
hypothesis, semantically similar terms tend to have similar contextual distributions ([24]).
The semantic relatedness of two terms is estimated by calculating a similarity measure of
their vectors, such as Euclidean distance, or cosine similarity, inter alia.</p>
      <p>Depending on the language model ([3]), DSMs are either count-based or prediction-based.
Count-based DSMs calculate the frequency of terms within a term’s context (i.e., a sentence,
paragraph, document, or a sliding context window spanning a given number of terms on either
side of the target term). Correlated Occurrence Analogue to Lexical Semantics (COALS)
([29]) is an example of this type of model.</p>
      <p>Prediction-based models exploit neural probabilistic language models, which represent
terms by predicting the next term on the basis of previous terms. Examples of predictive
models include continuous bag-of-words (CBOW) and skip-gram (SG) models ([23]).</p>
      <p>DSMs have been used in combination with clustering. Work on lexical semantics applying
DSMs and clustering techniques includes identification of semantic relations ([5]), word
sense discrimination and disambiguation ([28]), automatic metaphor identification ([31]), and
classification of verbs into semantic groups ([14]).
3
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Materials and methods</title>
    </sec>
    <sec id="sec-6">
      <title>Materials 3.1.1</title>
    </sec>
    <sec id="sec-7">
      <title>Corpus data</title>
      <p>The terms related to named rivers and bays were extracted from a subcorpus of English texts
on Coastal Engineering, comprising roughly 7 million tokens and composed of specialized
texts (scientific articles and PhD dissertations) and semi-specialized texts (textbooks on
Coastal Engineering). This subcorpus is part of the English EcoLexicon corpus (23.1 million
tokens) (see [21] for a detailed description).
3.1.2</p>
    </sec>
    <sec id="sec-8">
      <title>GeoNames geographic database</title>
      <p>The automatic detection of the named rivers and bays in the corpus was performed with
a GeoNames database dump. GeoNames (http://www.geonames.org) has over 10 million
proper names for 645 different geographic entities, such as bays, beaches, rivers, mountains,
etc. For each entity, information about their normalized designations, alternate designations,
latitude, longitude, and location name is stored. A daily GeoNames database dump is
publicly available as a worldwide text file.
After their compilation and cleaning, the corpus texts were tokenized, tagged with parts
of speech, lemmatized, and lowercased in R programming language. The multi-word terms
in EcoLexicon were then automatically matched in the lemmatized corpus and joined with
underscores.
3.2.2</p>
    </sec>
    <sec id="sec-9">
      <title>Named landform recognition</title>
      <p>Both normalized and alternate names of the rivers and bays in GeoNames were searched
in the lemmatized corpus. A total of 681 designations for rivers and 306 for bays were
recognized and listed. Nevertheless, since various designations can refer to the same river or
bay because of syntactic variation (e.g., Mersey River and River Mersey; Bay of Ingleses and
Ingleses Bay), and orthographic variation (e.g., Yangtze and Yangtse River ), a procedure
was created to identify variants and give them a single designation in the corpus. Because of
space constraints, the procedure is not described.</p>
      <p>Once the variants were normalized in the lemmatized corpus and joined with underscores,
the number of named rivers was 662, and 294 for bays. The bays are shown on the map
in Figure 2, with color-coded rectangles that depict their frequency in the corpus. Their
latitudes and longitudes were retrieved from the GeoNames database dump.</p>
      <p>The occurrence frequency for the named rivers ranged from 118 to one mention, and
from 127 to one mention for the bays. In our study, only those rivers and bays with a
frequency greater than 9 were considered. Figure 3 shows the 55 named rivers that fulfilled
this condition, along with their number of mentions. In the case of the bays, 29 designations
fulfilled the condition.
3.2.3</p>
      <p>Construction of two term-term matrices for named rivers and
bays
Two count-based DSMs for named rivers and bays, respectively, were selected to obtain term
vectors since this type of DSM outperforms prediction-based ones on small-sized corpora ([2];
[30]).</p>
      <p>For the construction of both DSMs, terms with fewer than 3 characters, numbers and
punctuation marks were removed. Additionally, the minimal occurrence frequency was set to
5 ([9]). The sliding context window spanned 30 terms on either side of the target term because
large windows improve the DSM performance for small corpora ([29]; [7]), and capture more
semantic relations ([15]). We followed standard practice and did not use stopwords (i.e.,
determiners, conjunctions, relative adverbs, and prepositions) as context words ([16]). Since
only nouns are represented in the semantic networks, adjectives, adverbs, and verbs were
also disregarded as context words.</p>
      <p>For the rivers, the resulting DSM was a 4, 705 × 4, 705 matrix, whose row vectors
represented the 55 named rivers plus the 4,650 terms inside the context windows of 30 terms
on either side of those rivers. For the bays, a 3, 867 × 3, 867 matrix was obtained, which
represented the 29 named bays plus the 3,838 terms inside their context windows.
3.2.4</p>
    </sec>
    <sec id="sec-10">
      <title>Term selection procedure and weighting schemes</title>
      <p>Subsequently, for the rivers, a 55×4, 650 submatrix was extracted, where the rows represented
the 55 named rivers, and the columns represented the 4,650 terms co-occurring with them.
For the bays, a 29×3, 838 submatrix was extracted. To cluster rivers sharing associated terms,
the terms that best discriminated different groups of rivers were selected. This was done by
applying Moisl’s statistical criteria ([25]), whereby only the column vectors with the highest
values in raw frequency, variance, variance-to-mean ratio (vmr), and term frequency-inverse
document frequency (tf-idf) were retained. Figure 4 shows the co-plot of the four criteria
for the rivers in descending order of magnitude. A threshold of 2000 was set. This meant
that only 1,858 column terms fulfilled all criteria for the rivers. In the case of the bays, a
threshold of 1000 was set, and only 847 column terms fulfilled thus all criteria.</p>
      <p>Accordingly, in the case of the rivers, a reduced matrix of 1, 913 × 1, 913 dimensions was
obtained (1,858 terms plus 55 named rivers). For the bays, the reduced matrix consisted of
876 × 876 dimensions (847 terms plus 29 named bays). Both matrices were then subjected to
two weighting schemes. First, the statistical log-likelihood measure calculated the association
score between all term pairs, since it captures syntagmatic and paradigmatic relations ([4];
[18]) and achieves better performance for small-sized corpora ([1]). Secondly, the scores were
transformed by applying natural logarithm to reduce skewness ([18]).
3.2.5</p>
    </sec>
    <sec id="sec-11">
      <title>Clustering of named rivers and bays</title>
      <p>A hierarchical clustering technique was applied to both weighted, reduced matrices, using
cosine distance as the intervector distance measure, and Ward’s Method as the clustering
algorithm.</p>
      <p>Since it is not clear how strong a cluster is supported by data, a means for assessing the
certainty of the existence of a cluster in corpus data was devised. For this, probability values
(p-values) were computed for each hierarchical cluster using multiscale bootstrap resampling,
implemented in the R package pvclust ([32]). For the rivers, thirteen groups with p-values
higher than 95% were strongly supported by corpus data, as marked by the red rectangles in
the dendrogram in Figure 5. For the bays, Figure 6 shows five clusters.
3.2.6</p>
    </sec>
    <sec id="sec-12">
      <title>Terms characterizing each cluster</title>
      <p>To ascertain the terms strongly associated with each of the clusters, the following procedure
was used:
1. For each of the named rivers and bays in their corresponding clusters, a set of the top-30
terms, most semantically related to each, was extracted from the corresponding DSM
using cosine similarity.
2. For each cluster, the mathematical operation set intersection was applied to the sets of
the top-30 terms, most semantically related to the rivers and bays in the same cluster.
Only the shared terms with a cosine similarity higher than 0.55 were selected.</p>
      <p>A reduced set of terms was thus obtained for each cluster to describe the named rivers
and bays.
4</p>
    </sec>
    <sec id="sec-13">
      <title>Results</title>
      <p>Because of space constraints, only the results for some clusters are provided. Numbering
the clusters in Figures 5 and 6 from left to right, the second and twelfth cluster for rivers
(Figure 5), and the fourth cluster for bays (Figure 6) are described. As shown in Figure 5,
the second cluster is formed by the basin, delta, and estuary of the Pearl River, and the river
itself, located in China. The Omaru and the Mimigawa rivers, placed in Japan, comprise the
twelfth cluster. The fourth cluster in Figure 6 consists of the San Francisco and the Suisun
bays, in California (the USA). These clusters were selected because the named rivers and
bays, despite being different landforms and located in different world areas, are related to
the same topic as explained in the following subsections, namely, the sediment concentration
and sediment transport in rivers, sediment discharge into bays and seas, and the negative
effects of sediment supply decrease on coastal erosion because of human activities.</p>
      <p>For the description of the semantic networks, the semantic relations were manually
extracted by querying the corpus in Sketch Engine ([17]), and analysing knowledge-rich
contexts, namely, a context indicating at least one item of domain knowledge that could be
useful for conceptual analysis ([22]). The query results were concordances of any elements
between the river/bay in a cluster and related terms in a ±40 span. The semantic relations
were those in EcoLexicon ([13]), with the addition of supplies, prevents, accumulates_in,
simulates, tributary_of, increases, decreases, belongs_to, uses, and instance_of, necessary for
the explanatory adequacy of the frames ([10]). Furthermore, the semantic frames shown in
the following were validated by Coastal Engineering experts from the University of Granada
(Spain).
4.1</p>
    </sec>
    <sec id="sec-14">
      <title>Second cluster in Figure 5: Pearl River</title>
      <p>Predicting sediment load in a river system have long been a goal of earth scientists for
numerous reasons, including alternation of fish habitats, changes in the load from anthropogenic
effects, and the evolution of deltas, estuaries, and coastal environments. Hence, hydrologists
have made efforts in applying sediment rating curves that can empirically describe the
relationship between suspended sediment concentration (g/km3) and water discharge (m3/s)
for a certain location. In sediment rating curves, sediment rating parameters also intervene,
which are often associated with river bed morphology and soil erodibility. Engineers use
sediment rating curves for predicting the life span of a dam on a river, and earth scientists
use them to study the erosional and depositional environments.</p>
      <p>Dam and reservoir construction are regarded as the main cause of the decline in sediment
load. For that reason, the issue of sediment load in the Pearl River Delta was studied.
Attention was paid to the sediment rating parameters of the sediment rating curves. The
parameters reflected a temporal relationship between water discharge and suspended sediment
concentration due to human activities, such as land use and reservoir construction. These
activities are causing a decrease in sediment supply from the Pearl River, with grave
consequences on the coast (see Figure 7).
4.2</p>
      <p>Twelfth cluster in Figure 5: Omaru River and Mimigawa River
Owing to the interruption of sediment flow at dams, the degradation of riverbed was observed
on the downstream of the Omaru, Mimigawa, Hitotsuse and Ooyodo rivers. Sediment
discharge through these four rivers is thus considered to decrease considerably, causing
coastal erosion on the Miyazaki Coast. The Sumiyoshi Beach, located on this coast, is thus a
severe eroded beach because of the decrease in sediment supply from the four rivers, and the
blocking of longshore sand transport by the breakwater of the Miyazaki Port (see Figure 8).
4.3</p>
      <p>Fourth cluster in Figure 6: San Francisco Bay and Suisun Bay
San Francisco and Suisun bays are involved in research studies to determine whether the
timescale dependence of forcing mechanisms on suspended sediment concentration (SSC)
is typical in bays and estuaries, based on SSC data. Of the forcing mechanisms, several
tend to be concurrently active in bays and estuaries, rather than only one. Multiple active
forcing mechanisms have been observed in bays and estuaries, but the specific mix of active
mechanisms is different in each. It poses the question whether named estuaries and bays
should be considered either instances of the estuarine water concept or concepts for
themselves (see Figure 9).
5</p>
    </sec>
    <sec id="sec-15">
      <title>Conclusions</title>
      <p>To extract knowledge for the semantic frames or conceptual structures ([12]) that underlie
the usage of named rivers and bays in Coastal Engineering texts, a semi-automated method
for the extraction of terms and semantic relations was devised. The semantic relations linking
concepts in the semantic frames were manually extracted by querying the corpus in Sketch
Engine, and analysing knowledge-rich contexts. It was a time-consuming task, although
essential for the explanatory adequacy of frames ([10]). In future research, the knowledge
patterns by [20] for the automatic extraction of semantic relations will be tested.</p>
      <p>The method for the extraction of terms closely associated with named rivers and bays
combined, on the one hand, the use of a count-based DSM, weighted by the log-likelihood
association measure, and on the other hand, a selection procedure for terms based on four
statistical criteria. Although this term selection procedure offered successful results to
construct the semantic frames, Topic Modelling ([6]), a domain-specific dimension reduction
technique for texts, will be also applied, and a comparison of both methods will be carried
out.</p>
      <p>The semantic frames in the previous section reflect that most terms related to named
rivers and bays are multi-word terms (MWT) since specialized language units are mostly
represented by such compound forms ([27]). The MWT extraction was possible because they
were previously matched and joined by means of underscoring in the lemmatized corpus,
thanks to the list of MWTs stored in EcoLexicon. This confirms that EcoLexicon is a
valuable resource for any natural language processing tasks related to specialized corpora on
environmental science.</p>
      <p>Finally, the conceptual structures also highlighted that Coastal Engineering texts attach
great importance to the study of the processes that each named river triggers, the processes
that affect a certain named river, the crucial role that a named river plays to prevent
coastal erosion, and the close relation between rivers and bays in sediment concentration and
transport. On the evidence of these findings supporting our working hypothesis, it would
be more appropriate for named rivers and bays in the Coastal Engineering domain to be
considered concepts for themselves rather than mere instances of the river and bay concepts
to be semantically represented in terminological resources.</p>
      <p>1
2
3
4
5
6
7
8
25
26
27</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Maha</given-names>
            <surname>Alrabia</surname>
          </string-name>
          , Nawal Alhelewh,
          <string-name>
            <surname>Abdul Malik</surname>
            Al-Salman, and
            <given-names>Eric</given-names>
          </string-name>
          <string-name>
            <surname>Atwell</surname>
          </string-name>
          .
          <article-title>An empirical study on the holy quran based on a large classical arabic corpus</article-title>
          .
          <source>International Journal of Computational Linguistics</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Fatemeh</given-names>
            <surname>Ars</surname>
          </string-name>
          , Jon Willits, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Comparing predictive and co-occurrence based models of lexical semantics trained on child directed speech</article-title>
          .
          <source>In 38th Annual Conference of the Cognitive Science Society</source>
          , Austin, Texas, USA,
          <year>August</year>
          10-
          <issue>13</issue>
          ,
          <year>2016</year>
          , pages
          <fpage>1092</fpage>
          -
          <lpage>1097</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          , Georgiana Dinu, and
          <string-name>
            <given-names>Germán</given-names>
            <surname>Kruszewski</surname>
          </string-name>
          .
          <article-title>Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors</article-title>
          .
          <source>In 52nd Annual Meeting of the Association for Computational Linguistics</source>
          , Baltimore,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA, June 22-27,
          <year>2014</year>
          , vol.
          <volume>1</volume>
          , pages
          <fpage>238</fpage>
          -
          <lpage>247</lpage>
          ,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>P14</fpage>
          -1023.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Gabriel</surname>
            Bernier-Colborne and
            <given-names>Patrick</given-names>
          </string-name>
          <string-name>
            <surname>Drouin</surname>
          </string-name>
          .
          <article-title>Evaluation of distributional semantic models: a holistic approach</article-title>
          .
          <source>In 5th International Workshop on Computational Terminology (Computerm2016)</source>
          , Osaka, Japan, December
          <volume>12</volume>
          ,
          <year>2016</year>
          , pages
          <fpage>52</fpage>
          -
          <lpage>61</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Ann</given-names>
            <surname>Bertels</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dirk</given-names>
            <surname>Speelman</surname>
          </string-name>
          .
          <article-title>Clustering for semantic purposes: Exploration of semantic similarity in a technical corpus</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ):
          <fpage>279</fpage>
          -
          <lpage>303</lpage>
          ,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          .1075/term.20.2.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>07ber.</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>David M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Andrew Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Michael I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>John A. Bullinaria and Joseph P. Levy.</surname>
          </string-name>
          <article-title>Extracting semantic representations from word co occurrence statistics: A computational study</article-title>
          .
          <source>Behavior Research Methods</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>510</fpage>
          -
          <lpage>526</lpage>
          ,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .3758/BF03193020.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Curdin</given-names>
            <surname>Derungs and Ross S. Purves</surname>
          </string-name>
          .
          <article-title>From text to landscape: locating, identifying and mapping the use of landscape features in a swiss alpine corpus</article-title>
          .
          <source>International Journal of Geographical Information Science</source>
          ,
          <volume>28</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1272</fpage>
          -
          <lpage>1293</lpage>
          ,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          .1080/13658816.
          <year>2013</year>
          .
          <volume>772184</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Pantel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dekang</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Discovering word senses from text</article-title>
          .
          <source>In ACM Conference on Knowledge Discovery and Data Mining (KDD-02)</source>
          , Edmonton, Canada,
          <source>July 23-26</source>
          ,
          <year>2002</year>
          , pages
          <fpage>613</fpage>
          -
          <lpage>619</lpage>
          ,
          <year>2002</year>
          . doi:
          <volume>10</volume>
          .1145/775047.775138.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Douglas</given-names>
            <surname>Rohde</surname>
          </string-name>
          , Laura Gonnerman, and
          <string-name>
            <given-names>David</given-names>
            <surname>Plaut</surname>
          </string-name>
          .
          <article-title>An improved model of semantic similarity based on lexical co occurrence</article-title>
          . .
          <source>Communications of the ACM</source>
          ,
          <volume>8</volume>
          :
          <fpage>627</fpage>
          -
          <lpage>633</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Magnus</given-names>
            <surname>Sahlgren</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Lenci</surname>
          </string-name>
          .
          <article-title>The effects of data size and frequency range on distributional semantic models</article-title>
          .
          <source>In 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Austin, Texas, USA, November 1-
          <issue>5</issue>
          ,
          <year>2016</year>
          , pages
          <fpage>975</fpage>
          -
          <lpage>980</lpage>
          ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>D16</fpage>
          -1099.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Ekaterina</given-names>
            <surname>Shutova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lin</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Anna</given-names>
            <surname>Korhonen</surname>
          </string-name>
          .
          <article-title>Metaphor identification using verb and noun clustering</article-title>
          .
          <source>In 23rd International Conference on Computational Linguistics</source>
          , vol.
          <volume>2</volume>
          , Beijing, China,
          <source>August 23-27</source>
          ,
          <year>2010</year>
          , pages
          <fpage>1002</fpage>
          -
          <lpage>1010</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Ryota</given-names>
            <surname>Suzuki</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hidetoshi</given-names>
            <surname>Shimodaira</surname>
          </string-name>
          . Pvclust:
          <article-title>An r package for assessing the uncertainty in hierarchical clustering</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>22</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1540</fpage>
          -
          <lpage>1542</lpage>
          ,
          <year>2006</year>
          . doi:
          <volume>10</volume>
          .1093/ bioinformatics/btl117.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Flurina M. Wartmann</surname>
          </string-name>
          , Elise Acheson, and
          <string-name>
            <surname>Ross</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Purves</surname>
          </string-name>
          .
          <article-title>Describing and comparing landscapes using tags, texts, and free lists: an interdisciplinary approach</article-title>
          .
          <source>International Journal of Geographical Information Science</source>
          ,
          <volume>32</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1572</fpage>
          -
          <lpage>1592</lpage>
          ,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1080/13658816.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>