=Paper=
{{Paper
|id=Vol-1749/paper15
|storemode=property
|title=Increasing Information Accessibility on the Web: a Rating System for Specialized Dictionaries
|pdfUrl=https://ceur-ws.org/Vol-1749/paper15.pdf
|volume=Vol-1749
|authors=Valeria Caruso,Anna De Meo,Vincenzo Norman Vitale
|dblpUrl=https://dblp.org/rec/conf/clic-it/CarusoMV16
}}
==Increasing Information Accessibility on the Web: a Rating System for Specialized Dictionaries==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper15.pdf</pdf>
<pre>
    Increasing information accessibility on the Web: a rating system for
                         specialized dictionaries
                   Valeria Caruso*, Anna De Meo*, Vincenzo Norman Vitaleº
                          *Università degli Studi di Napoli ‘L’Orientale’,
                           ºUniversità degli Studi di Napoli Federico II
                        vcaruso@unior.it, ademeo@unior.it,
                             vincenzon.vitale@studenti.unina.it


                                                        designed to be flexible and can be readapted to es-
                      Abstract                          timate the supportive value of other resources as
                                                        well, like grammars or corpora. On the other side,
    English. The paper illustrates the features         once the score assignment for each dictionary fea-
    of the WLR (Web Linguistic Resources)               ture has been decided, grades are given automati-
    portal, which collects specialized online           cally by the database.
    dictionaries and asses their suitability for        The assessment procedure is straight and strictly
    different functions using a specifically de-        operationalized (Swanepoel, 2008, 2013), and it
    signed rating system. The contribution              can be used as a guided process to collect data pro-
    aims to demonstrate how the existing tool           vided by the users themselves. The system is in
    has improved the usefulness of lexico-              fact going to be updated and transformed in a col-
    graphical portals and how its effectiveness         laborative (Carr, 1977) dictionary portal, collect-
    can be further increased by transforming            ing forms that have been filled in by the Web surf-
    the portal into a collaborative resource.           ers themselves.

    Italiano. Questo contributo descrive le             2    Information overload on the Internet
    caratteristiche del portale WLR (Web Lin-
    guistic Resources) che raccoglie dizionari          The WLR dictionary portal has been designed as a
    specialistici della Rete e ne stima l’utiliz-       tool that can offer assistance to solve different
    zabilità per diverse funzioni, avvalendosi          problems concerning specialized knowledge and
    di uno specifico sistema di valutazione.            lexicon that Web users might experience on dif-
    Viene quindi mostrato come questo stru-             ferent occasions in their lives. For example, if they
    mento incrementi l’utilizzabilità dei por-          need to understand specific concepts belonging to
    tali lessicografici finora sviluppati e come        some technical fields, like a journalist who needs
    la sua efficacia possa essere ulterior-             to acquire specific information about different
    mente migliorata trasformandolo in ri-              topics during his/her professional activity. Or
    sorsa collaborativa.                                translators, who need both concise explanations of
                                                        concepts and cross linguistic correspondences in
                                                        order to understand specialized texts and translate
1    Introduction
                                                        them. Dictionaries can offer, in fact, proper assis-
This paper sketches out the current features and an     tance in a wide variety of different occasions, pro-
upcoming new application of a rating system de-         vided that they are reliable and efficient tools. The
signed to assess online specialized dictionaries.       enormous inventory of specialized online diction-
The system evaluative parameters are managed            aries counts already reference works for top pro-
through a relational database accessible for free       fessionals in one field, like the authoritative The
online at the Web Linguistic Resources (WLR)            New Palgrave Dictionary of Economics, but also
site. These parameters are used to identify the best    different hybrid 1 tools addressed to school chil-
available dictionaries to satisfy different types of    dren, like the entertaining Math Spoken Here!,
information needs experienced by the Internet           which has been conceived to assist in learning and
surfers, while the assessment procedure has been        homework activities.

1
 For the concept of hybridization in electronic lexi-
cography, see Granger 2011.
Surfing the Web it is possible to experience the       single search engine that gives access to many dic-
tremendous amount of specialized dictionaries          tionaries.
that are available for the most different fields.      The right of ownership to the inventoried diction-
Compared to these resources, the number of gen-        aries is one of major restrictions determining the
eral language vocabularies is but a few drops in       kind of access to the lexicographical information,
the ocean. This state of affairs is however unsur-     thus influencing the portal typology. In the classi-
prising, since similar disproportions were the rule    fication proposed by Engelberg and Müller-
in the paper dictionary era (Tarp, 2010), when vo-     Spitzer (2013), dictionaries issued by the same
cabularies were not so easily accessible and one       publishing house may form ‘integrated dictionary
could not directly experience the real composition     nets’, if every vocabulary has been compiled with
of the lexicographical production. The availability    “a common concept of data modelling and struc-
of these resources on the Internet has however         turing”, thus allowing users to retrieve lemmata
overturned the proportion between the user, who        with similar properties from the different diction-
is in need of lexicographical assistance, and the      aries inventoried, such as in the OWID. On the
number of specialized resources he can consult,        contrary, portals having no rights of ownership to
thus causing such an information overload that the     the dictionaries, called ‘dictionary collections’,
user is either forced to resort to one of the usual    generally offer simple lists of links to external re-
Wikipedia pages, or to abandon the search com-         sources. Only a few of them are also provided
pletely. In both cases the user is stressed by the     with query systems that carry out searches in the
demanding activity of finding a source of infor-       lemma lists or in the whole text of the inventoried
mation, rather than solving his/her information        resources (see OneLook).
voids.
                                                       3.1   The WLR database assessment system
3    Solutions for integrated information              In addition to the types listed by Engelberg and
     access                                            Müller-Spitzer (2013), the WLR site increases the
                                                       typologies of ‘dictionary collections’ by offering
Information overproduction on the Web has be-
                                                       inventories of vocabularies that have been evalu-
come one of the tasks of electronic lexicography
                                                       ated on the basis of the kind of data they contain
since the advent of the first metalexicographical
                                                       (Caruso & De Meo, 2014). The assessment is car-
sites, called ‘dictionary collections’ (Engelberg
                                                       ried out by a multi-parametric searchable data-
and Müller-Spitzer, 2013), offering lists of links
                                                       base, which inventories dictionary features and
to different dictionaries. This practice has rapidly
                                                       assigns scores in order to display lists of resources
evolved into steadier solutions that have served
                                                       that are more suited for two different types of pa-
also the opposite aim of a controlled integration of
                                                       rameters. It is in fact possible to search for dic-
lexicographical data, made possible by the ‘dic-
                                                       tionaries assisting with specific tasks, or ‘lexico-
tionary portals’ (Engelberg and Müller-Spitzer,
                                                       graphical functions’ that the dictionary should be
2013) of well-established publishing houses,
                                                       able to fulfill (Tarp 2008), like acquiring new
which have implemented the integration among
                                                       knowledge on a specific topic, solving communi-
their vocabularies in order to better meet the in-
                                                       cative issues, or giving assistance with transla-
formation needs of their users. In the Pons or
                                                       tions or learning tasks. These parameters can be
Cambridge dictionary sites, for example, it is pos-
                                                       set in WLR database by choosing the correspond-
sible to access different vocabularies by filling in
                                                       ing option in the ‘Kind of assistance’ box of the
a single search mask and selecting the desired re-
                                                       search form. Additionally, the user can set his/her
source from a menu.
                                                       level of expertise in the specialized field consid-
According to Engelberg and Müller-Spitzer
                                                       ered, and thus select the layman, semi-expert or
(2013), dictionary portals “have followed [the]
                                                       expert profile in the ‘Expertise level’ box.
course from the single lexicographic product to
                                                       The rating system used in the WLR site is intended
the general lexicographic information service”
                                                       to increase the effectiveness and efficacy of por-
that was predicted by Arnold (1979) and Kay
                                                       tals, making dictionary collections less time wast-
(1983) as far as thirty years ago, thus creating a
                                                       ing and more useful also for the less experienced
new type of dictionary. The possibility to cross-
                                                       dictionary users, since they avoid the display of
link well-structured informative resources, such
                                                       “long lists” that show “results from trustworthy
as dictionaries, has in fact broaden the possibility
                                                       sources and downright amateurish concoctions all
of users to be informed promptly, by querying a
                                                       mixed up” (de Schryver 2003: 157). The evalua-
tion system relies in fact on the presence or ab-        The WLR database developed so far assures that
sence of 58 types of data, addressing all the com-       the search for a dictionary is less time wasting for
ponent parts of dictionaries (Caruso & De Meo,           the user but it does not guarantee that the data pro-
2014): from the host site and the general organi-        vided by one dictionary are correct or correctly
zation (or macrostructure), to the mediostructure        stated. Contrarily, the quality of data is always
and microstructure, for which both linguistic and        paramount, and users’ searches would be more ef-
encyclopedic data are taken into consideration.          fective if they could avoid to consult vocabularies
Additionally, explicit guidelines are followed for       whose data are unreliable.
the score assignment system: characterizing data         For example, the following Spanish oenological
for a specific parameter receive one or two points,      dictionary (Infoagro.com - Diccionario del vino)
according to their degree of relevance. Negative         explains that ‘ácido’ is a “green wine” whose col-
scores (-1, -2) are instead given to contradictory       our seems to be a consequence of a bed fermenta-
data. Similarly, each lexicographical parameter          tion:
considered (‘Kind of assistance’ and ‘Expertise
level’) can reach the same maximum score: for ex-            [1] “Ácido: Vino verde. Producto de una
ample, the different types of users may have no                  mala fermentación maloláctica, una uva
more than 24 points. In the meanwhile, for contra-               en mal estado o recolectada antes de
dictory profiles, such as laymen and experts, the                tiempo.”
score distribution cannot be the same.
All this things considered, one can affirm that the      On the contrary, many other dictionaries explain
WLR site aims to support different types of users        the same term as denoting a sour wine, or a wine
decreasing the information overload that occurs          that is high in acidity, like in following entry (Dic-
while consulting rich inventories of non-inte-           cionario del vino.com):
grated resources, such as dictionary collections
sites. Additionally, the WLR rating system is in             [2] “Ácido: 1.- Vino cuya acidez sobrepasa la
line with the parameters identified by Swanepoel                 media de la región. La acidez puede ser
(2008; 2013) to carry out dictionary evaluations                 debida a un exceso de ácidos organicos o
that are scientifically grounded, i.e. assessments               a un desequilibrio entre los sabores del
that explicitly state the analytic principles they use           vino.
and the way these are applied, together with in-                 2.- Vinos con PH inferior a 3,2”
structions to measure the compliance or non-com-
pliance to these same principles.                        In order to carry out more efficient searches using
Additionally, the portal wires together fragments        the current release of the WLR database, one can
of the huge repository of specialized knowledge          look for dictionaries compiled exclusively by au-
available on the Internet (Caruso 2014), hosting         thoritative institutions, thus restricting the search
dictionaries of around 60 different fields, such as      to ‘Institutional’ and ‘Specialized’ host sites, two
oenology, mathematics and medicine.                      features that users can select in the database
                                                         search form. However, even the dictionaries ed-
4    How to make effective searches                      ited by the most authoritative institutions offer ex-
                                                         amples of bad explanations that can be misleading
Recent studies have underlined that electronic dic-
                                                         for the user, or even difficult to interpret (Caruso
tionaries are special types of information systems
                                                         & De Meo, 2014). For example, the Talking Glos-
(Tarp, 2008; Bothma, 2011; Gows, 2011; Heid,
                                                         sary of Genetics, published by the National Hu-
2011) and evaluative parameters borrowed from
                                                         man Genome Research Institute, in the Chromo-
the Information Science are used in the literature
                                                         some definition explains that: “Humans have 23
on electronic lexicography topics. In particular,
                                                         pairs of Chromosomes (…), and one pair of sex
the quality of one dictionary can be assessed on
                                                         chromosomes, X and Y”. Stated this way the def-
the basis of its usefulness for a task completion,
                                                         inition is incorrect, since only male humans have
like finding a specific collocate while writing a
                                                         an XY pair of chromosomes, while females have
text. Therefore, the dictionary is considered to be
                                                         an XX pair. Effective lexicographical definitions
effective if it provides “the right data and the right
                                                         should obviously provide more complete descrip-
amount of data to the user” (Heid 2011: 290). On
                                                         tions and should avoid incorrect generalizations
the contrary, it is efficient if gives quick access to
                                                         like this.
the data needed.
                                                         Assessing data quality poses however many meth-
                                                         odological and theoretical problems regarding the
terms and the definition features that must be rated   changed, the inventoried dictionaries will imme-
(see Caruso & De Meo, 2014) by the system. For         diately change their evaluations. The automatiza-
example, the number of the assessed lemma must         tion of grades assignment guarantees no errors in
remain the same despite the number of dictionary       the final score computation, however, the selec-
entries? Which definition features are suited to es-   tion of values that describe the dictionary features
timate whatever concept belonging the special-         are of crucial relevance for the accuracy of the
ized fields as different as, for example, figurative   evaluation.
arts and finance? Furthermore, at least one expert     Under this respect, the inventoried resources must
for each specialized field considered should verify    be analysed carefully, because most of the times
the information provided, which is probably the        specialized online dictionaries lack strict lexico-
most serious obstacle to future developments of        graphical organization and display different data
the project. However, a completely different solu-     types unsystematically: for example, basic infor-
tion has been imagined, as will be shown in a mo-      mation on the word form might be given exclu-
ment.                                                  sively in some of the entries of one dictionary, in-
                                                       dependently of any significant paradigmatic vari-
4.1    The database as a data validation tool          ation of the language considered. For similar
The WLR database has been conceived as a flexi-        cases, the compiler must set the ‘sometimes’ value
ble tool that allows its administrators to add or      in the corresponding feature of the evaluation
change labels in the three component parts that        form, and the record of the data that are sporadi-
make up the repository system, which are called        cally given by the dictionary will make the evalu-
‘categories’, ‘features’ and ‘rating system’. The      ation procedure more reliable.
first component, or ‘category’, lists the types of     Actually, the current development of the project is
inventoried linguistic resources: only dictionaries    improving the existing database components with
have been assessed so far, but other supportive in-    an additional part that keeps track of where unsys-
struments to solve linguistic issues could be added    tematic data, like those mentioned above, are pre-
to the database, like corpora or grammars. To each     sent in the dictionary. This addition will make the
category the administrator assigns different de-       assessment procedure extremely reliable, since
scriptive features, which is the second component      the less evident features can be registered, making
of the rating system, and can be both binary or        the evaluation accuracy easily verifiable.
multivalve. The ‘dictionary’ category has 58 fea-      With this new database component, the evaluation
ture (see Caruso, 2014 for a complete list), some      forms will be fillable by anyone and the WLR da-
of them can only be present or not, thus are binary,   tabase will become a collaborative portal. This,
like Cultural Notes, others are multivalue and thus    hopefully, will make the number of the invento-
need further specifications, like the Kind of Dic-     ried resources increase, and it will offer other ad-
tionary, which must be set choosing among dif-         ditional developments.
ferent choices: Monolingual dictionary, Monolin-       While compiling the forms, in fact, users could
gual word list, Multilingual dictionary, Multilin-     also contribute to verify the quality of the data
gual word list, Plurilingual dictionary 2 . Lastly,    provided, signalling for each dictionary feature if
grades are assigned to each of these values accord-    any wrong information is given. For each incon-
ing to the methodology described above. The ad-        sistency the user should indicate one alternative
ministrator can decide to set different evaluative     data and the source of information from which this
parameters for each category taken into account:       was driven. On the other hand, the database will
for example, if grammars were added to the repos-      offer warning signals that indicate the presence of
itory, the language proficiency level could be a       problematic data within one dictionary.
suitable evaluation parameters for it.
Once however that the grades distribution has          Acknowledgements
been decided, the database assigns points auto-        We wish to thank Gianluca Monti for managing
matically and independently from any actions per-      the first version of the WLR database and site.
formed by the compiler of the evaluation forms,        The present research has been sustained by aca-
who can set only the values of the different fea-      demic grants from the University of Naples
tures. Likewise, if the score assignment is            ‘L’Orientale’.


2
 For the concept of Plurilingual Dictionary, see Ca-
ruso, 2011.
References                                                  Computer in Producing and Publishing Dictionar-
                                                            ies: Proceedings of the European Science Founda-
Arnold, D. I., 1979, “Synonyms and the College-Level        tion Workshop, Pisa: Giardini Editori, 161–74.
      Dictionary”, Dictionaries, 1: 103–12.
                                                          Swanepoel, Piet, 2008, “Towards a Framework for the
Bothma, T.J.D, 2011, “Filtering and Adapting Data              Description and Evaluation of Dictionary Eval-
     and Information in an Online Environment in               uation Criteria”, Lexikos, 18: 207-231.
     Response to User Needs”, in Fuertes-Olivera,
     P.A., Bergenholtz, H. (Eds.), 71-102.                     - 2013, “Evaluation of dictionaries”, in Gouws, R.
                                                                 H., Heid, U., Schweickard, W., e Wiegand, H.
Carr, M., 1997, “Internet Dictionaries and Lexicogra-            E. (a cura di), Dictionaries. An international en-
       phy”, International Journal of lexicography,              cyclopedia of lexicography. Supplementary vol-
       10/3: 209-230.                                            ume: Recent developments with special focus on
Caruso V., 2011, “Online specialised dictionaries: a             computational lexicography. Berlin/New York:
      critical survey”, in Kosem I., Kosem, K. (eds.)            de Gruyter, 587-596.
      Electronic lexicography in the 21st century:        Tarp, S., 2008, Lexicography in the borderland be-
      new applications for new users. Proceedings of            tween knowledge and non-knowledge, Tü-
      eLex 2011, Ljubljana: Trojina, Institute for Ap-          bingen: Niemeyer.
      plied Slovene Studies, 66-75.
                                                               - 2010, “Beyond Lexicography: New Visions and
Caruso V., 2014, “A Guide (not only) for Economics               Challenges in the Information Age”, in Ber-
      Dictionaries”, Hermes – Journal of Language                genholtz, H., Nielsen, S. & S. Tarp (eds.), Lexi-
      and Communication in Business, 52: 75-91.                  cography at a Crossroads. Dictionaries and En-
Caruso, V., De Meo, A., 2014, “A Dictionary Guide for            cyclopedias Today, Lexicographical Tools To-
      Web Users”, in Abel, A., Vettori, C. & Ralli, N.           morrow, Bertlin et.: Peter Lang, 17-32.
      (eds.), Proceedings of the XVI EURALEX Inter-       Online Dictionaries and resources
      national Congress: The User in Focus, Bol-
      zano: EURAC, 1087-1098.                             Cambridge Dictionaries, http://dictionary.cam-
                                                              bridge.org/it/, accessed July 2016.
De Schryver, G. M., 2003, “Lexicographers’ Dreams         Diccionario del vino.com, http://www.diccionariodel-
      in the Electronic-Dictionary Age”, International        vino.com/index.php/letra/a/.
      Journal of Lexicography, 16/ 2: 143-199.            Infoagro.com - Diccionario del vino, http://www.in-
Engelberg, S., Müller-Spitzer, C, 2013, “Dictionary           foagro.com/viticultura/diccionario/dicciona-
      Portals”, in Gouws, R. H., Heid, U., Schweick-          rio.htm, accessed July 2016.
      ard, W., e Wiegand, H. E. (eds.), Dictionaries.     Math Spoken Here! An Arithmetic and Algebra Dic-
      An international encyclopedia of lexicography.          tionary,     http://www.mathnstuff.com/math/spo-
      Supplementary volume: Recent developments               ken/here/, accessed July 2016.
      with special focus on computational lexicogra-      OneLook, www.onelook.com, accessed July 2016.
      phy. Berlin/New York: de Gruyter, 1023-1035.        OWID (Online-Wortschatz-Informationssystem
Fuertes-Olivera, P.A., Bergenholtz, H. (Eds.), 2011, e-      Deutsch), http://www.owid.de/, accessed July
       Lexicography: The Internet, Digital Initiatives       2016.
       and Lexicography, London, New York: Contin-        Pons, www.pons.eu, accessed July 2016.
       uum.                                               Talking Glossary of Genetics, https://www.ge-
                                                             nome.gov/glossary/, accessed July 2016.
Gouws, R.H, 2011, “Learning, Unlearning and Innova-       The new Palgrave dictionary of economics, Basing-
     tion in the Planning of Electronic Dictionaries”,       stoke and New York: Palgrave Macmillan,
     in Fuertes-Olivera, P.A., Bergenholtz, H.               http://www.dictionaryofeconomics.com, accessed
     (Eds.), 17-29.                                          July 2016.
Granger, S., 2012, “Introduction: Electronic Lexicog-     Web Linguistic Resources (WLR), www.weblinguis-
      raphy from Challenge to Opportunity”, in              ticresources.org, accessed July 2016.
      Granger, S. & Paquot, M. (Eds.), Oxford: OUP,
      1-11.
Heid, U. 2011. ‘Electronic Dictionaries as Tools: To-
  wards an Assessment of Usability.’ In P. A. Fuertes-
  Olivera and H. Bergenholtz (eds.), 287–304.
Kay, M., 1983, “The dictionary of the future and the
  future of the dictionary”, in Zampolli, A. & Cap-
  pelli, A. (eds.), The Possibilities and Limits of the

</pre>