=Paper=
{{Paper
|id=Vol-1749/paper15
|storemode=property
|title=Increasing Information Accessibility on the Web: a Rating System for Specialized Dictionaries
|pdfUrl=https://ceur-ws.org/Vol-1749/paper15.pdf
|volume=Vol-1749
|authors=Valeria Caruso,Anna De Meo,Vincenzo Norman Vitale
|dblpUrl=https://dblp.org/rec/conf/clic-it/CarusoMV16
}}
==Increasing Information Accessibility on the Web: a Rating System for Specialized Dictionaries==
Increasing information accessibility on the Web: a rating system for specialized dictionaries Valeria Caruso*, Anna De Meo*, Vincenzo Norman Vitaleº *Università degli Studi di Napoli ‘L’Orientale’, ºUniversità degli Studi di Napoli Federico II vcaruso@unior.it, ademeo@unior.it, vincenzon.vitale@studenti.unina.it designed to be flexible and can be readapted to es- Abstract timate the supportive value of other resources as well, like grammars or corpora. On the other side, English. The paper illustrates the features once the score assignment for each dictionary fea- of the WLR (Web Linguistic Resources) ture has been decided, grades are given automati- portal, which collects specialized online cally by the database. dictionaries and asses their suitability for The assessment procedure is straight and strictly different functions using a specifically de- operationalized (Swanepoel, 2008, 2013), and it signed rating system. The contribution can be used as a guided process to collect data pro- aims to demonstrate how the existing tool vided by the users themselves. The system is in has improved the usefulness of lexico- fact going to be updated and transformed in a col- graphical portals and how its effectiveness laborative (Carr, 1977) dictionary portal, collect- can be further increased by transforming ing forms that have been filled in by the Web surf- the portal into a collaborative resource. ers themselves. Italiano. Questo contributo descrive le 2 Information overload on the Internet caratteristiche del portale WLR (Web Lin- guistic Resources) che raccoglie dizionari The WLR dictionary portal has been designed as a specialistici della Rete e ne stima l’utiliz- tool that can offer assistance to solve different zabilità per diverse funzioni, avvalendosi problems concerning specialized knowledge and di uno specifico sistema di valutazione. lexicon that Web users might experience on dif- Viene quindi mostrato come questo stru- ferent occasions in their lives. For example, if they mento incrementi l’utilizzabilità dei por- need to understand specific concepts belonging to tali lessicografici finora sviluppati e come some technical fields, like a journalist who needs la sua efficacia possa essere ulterior- to acquire specific information about different mente migliorata trasformandolo in ri- topics during his/her professional activity. Or sorsa collaborativa. translators, who need both concise explanations of concepts and cross linguistic correspondences in order to understand specialized texts and translate 1 Introduction them. Dictionaries can offer, in fact, proper assis- This paper sketches out the current features and an tance in a wide variety of different occasions, pro- upcoming new application of a rating system de- vided that they are reliable and efficient tools. The signed to assess online specialized dictionaries. enormous inventory of specialized online diction- The system evaluative parameters are managed aries counts already reference works for top pro- through a relational database accessible for free fessionals in one field, like the authoritative The online at the Web Linguistic Resources (WLR) New Palgrave Dictionary of Economics, but also site. These parameters are used to identify the best different hybrid 1 tools addressed to school chil- available dictionaries to satisfy different types of dren, like the entertaining Math Spoken Here!, information needs experienced by the Internet which has been conceived to assist in learning and surfers, while the assessment procedure has been homework activities. 1 For the concept of hybridization in electronic lexi- cography, see Granger 2011. Surfing the Web it is possible to experience the single search engine that gives access to many dic- tremendous amount of specialized dictionaries tionaries. that are available for the most different fields. The right of ownership to the inventoried diction- Compared to these resources, the number of gen- aries is one of major restrictions determining the eral language vocabularies is but a few drops in kind of access to the lexicographical information, the ocean. This state of affairs is however unsur- thus influencing the portal typology. In the classi- prising, since similar disproportions were the rule fication proposed by Engelberg and Müller- in the paper dictionary era (Tarp, 2010), when vo- Spitzer (2013), dictionaries issued by the same cabularies were not so easily accessible and one publishing house may form ‘integrated dictionary could not directly experience the real composition nets’, if every vocabulary has been compiled with of the lexicographical production. The availability “a common concept of data modelling and struc- of these resources on the Internet has however turing”, thus allowing users to retrieve lemmata overturned the proportion between the user, who with similar properties from the different diction- is in need of lexicographical assistance, and the aries inventoried, such as in the OWID. On the number of specialized resources he can consult, contrary, portals having no rights of ownership to thus causing such an information overload that the the dictionaries, called ‘dictionary collections’, user is either forced to resort to one of the usual generally offer simple lists of links to external re- Wikipedia pages, or to abandon the search com- sources. Only a few of them are also provided pletely. In both cases the user is stressed by the with query systems that carry out searches in the demanding activity of finding a source of infor- lemma lists or in the whole text of the inventoried mation, rather than solving his/her information resources (see OneLook). voids. 3.1 The WLR database assessment system 3 Solutions for integrated information In addition to the types listed by Engelberg and access Müller-Spitzer (2013), the WLR site increases the typologies of ‘dictionary collections’ by offering Information overproduction on the Web has be- inventories of vocabularies that have been evalu- come one of the tasks of electronic lexicography ated on the basis of the kind of data they contain since the advent of the first metalexicographical (Caruso & De Meo, 2014). The assessment is car- sites, called ‘dictionary collections’ (Engelberg ried out by a multi-parametric searchable data- and Müller-Spitzer, 2013), offering lists of links base, which inventories dictionary features and to different dictionaries. This practice has rapidly assigns scores in order to display lists of resources evolved into steadier solutions that have served that are more suited for two different types of pa- also the opposite aim of a controlled integration of rameters. It is in fact possible to search for dic- lexicographical data, made possible by the ‘dic- tionaries assisting with specific tasks, or ‘lexico- tionary portals’ (Engelberg and Müller-Spitzer, graphical functions’ that the dictionary should be 2013) of well-established publishing houses, able to fulfill (Tarp 2008), like acquiring new which have implemented the integration among knowledge on a specific topic, solving communi- their vocabularies in order to better meet the in- cative issues, or giving assistance with transla- formation needs of their users. In the Pons or tions or learning tasks. These parameters can be Cambridge dictionary sites, for example, it is pos- set in WLR database by choosing the correspond- sible to access different vocabularies by filling in ing option in the ‘Kind of assistance’ box of the a single search mask and selecting the desired re- search form. Additionally, the user can set his/her source from a menu. level of expertise in the specialized field consid- According to Engelberg and Müller-Spitzer ered, and thus select the layman, semi-expert or (2013), dictionary portals “have followed [the] expert profile in the ‘Expertise level’ box. course from the single lexicographic product to The rating system used in the WLR site is intended the general lexicographic information service” to increase the effectiveness and efficacy of por- that was predicted by Arnold (1979) and Kay tals, making dictionary collections less time wast- (1983) as far as thirty years ago, thus creating a ing and more useful also for the less experienced new type of dictionary. The possibility to cross- dictionary users, since they avoid the display of link well-structured informative resources, such “long lists” that show “results from trustworthy as dictionaries, has in fact broaden the possibility sources and downright amateurish concoctions all of users to be informed promptly, by querying a mixed up” (de Schryver 2003: 157). The evalua- tion system relies in fact on the presence or ab- The WLR database developed so far assures that sence of 58 types of data, addressing all the com- the search for a dictionary is less time wasting for ponent parts of dictionaries (Caruso & De Meo, the user but it does not guarantee that the data pro- 2014): from the host site and the general organi- vided by one dictionary are correct or correctly zation (or macrostructure), to the mediostructure stated. Contrarily, the quality of data is always and microstructure, for which both linguistic and paramount, and users’ searches would be more ef- encyclopedic data are taken into consideration. fective if they could avoid to consult vocabularies Additionally, explicit guidelines are followed for whose data are unreliable. the score assignment system: characterizing data For example, the following Spanish oenological for a specific parameter receive one or two points, dictionary (Infoagro.com - Diccionario del vino) according to their degree of relevance. Negative explains that ‘ácido’ is a “green wine” whose col- scores (-1, -2) are instead given to contradictory our seems to be a consequence of a bed fermenta- data. Similarly, each lexicographical parameter tion: considered (‘Kind of assistance’ and ‘Expertise level’) can reach the same maximum score: for ex- [1] “Ácido: Vino verde. Producto de una ample, the different types of users may have no mala fermentación maloláctica, una uva more than 24 points. In the meanwhile, for contra- en mal estado o recolectada antes de dictory profiles, such as laymen and experts, the tiempo.” score distribution cannot be the same. All this things considered, one can affirm that the On the contrary, many other dictionaries explain WLR site aims to support different types of users the same term as denoting a sour wine, or a wine decreasing the information overload that occurs that is high in acidity, like in following entry (Dic- while consulting rich inventories of non-inte- cionario del vino.com): grated resources, such as dictionary collections sites. Additionally, the WLR rating system is in [2] “Ácido: 1.- Vino cuya acidez sobrepasa la line with the parameters identified by Swanepoel media de la región. La acidez puede ser (2008; 2013) to carry out dictionary evaluations debida a un exceso de ácidos organicos o that are scientifically grounded, i.e. assessments a un desequilibrio entre los sabores del that explicitly state the analytic principles they use vino. and the way these are applied, together with in- 2.- Vinos con PH inferior a 3,2” structions to measure the compliance or non-com- pliance to these same principles. In order to carry out more efficient searches using Additionally, the portal wires together fragments the current release of the WLR database, one can of the huge repository of specialized knowledge look for dictionaries compiled exclusively by au- available on the Internet (Caruso 2014), hosting thoritative institutions, thus restricting the search dictionaries of around 60 different fields, such as to ‘Institutional’ and ‘Specialized’ host sites, two oenology, mathematics and medicine. features that users can select in the database search form. However, even the dictionaries ed- 4 How to make effective searches ited by the most authoritative institutions offer ex- amples of bad explanations that can be misleading Recent studies have underlined that electronic dic- for the user, or even difficult to interpret (Caruso tionaries are special types of information systems & De Meo, 2014). For example, the Talking Glos- (Tarp, 2008; Bothma, 2011; Gows, 2011; Heid, sary of Genetics, published by the National Hu- 2011) and evaluative parameters borrowed from man Genome Research Institute, in the Chromo- the Information Science are used in the literature some definition explains that: “Humans have 23 on electronic lexicography topics. In particular, pairs of Chromosomes (…), and one pair of sex the quality of one dictionary can be assessed on chromosomes, X and Y”. Stated this way the def- the basis of its usefulness for a task completion, inition is incorrect, since only male humans have like finding a specific collocate while writing a an XY pair of chromosomes, while females have text. Therefore, the dictionary is considered to be an XX pair. Effective lexicographical definitions effective if it provides “the right data and the right should obviously provide more complete descrip- amount of data to the user” (Heid 2011: 290). On tions and should avoid incorrect generalizations the contrary, it is efficient if gives quick access to like this. the data needed. Assessing data quality poses however many meth- odological and theoretical problems regarding the terms and the definition features that must be rated changed, the inventoried dictionaries will imme- (see Caruso & De Meo, 2014) by the system. For diately change their evaluations. The automatiza- example, the number of the assessed lemma must tion of grades assignment guarantees no errors in remain the same despite the number of dictionary the final score computation, however, the selec- entries? Which definition features are suited to es- tion of values that describe the dictionary features timate whatever concept belonging the special- are of crucial relevance for the accuracy of the ized fields as different as, for example, figurative evaluation. arts and finance? Furthermore, at least one expert Under this respect, the inventoried resources must for each specialized field considered should verify be analysed carefully, because most of the times the information provided, which is probably the specialized online dictionaries lack strict lexico- most serious obstacle to future developments of graphical organization and display different data the project. However, a completely different solu- types unsystematically: for example, basic infor- tion has been imagined, as will be shown in a mo- mation on the word form might be given exclu- ment. sively in some of the entries of one dictionary, in- dependently of any significant paradigmatic vari- 4.1 The database as a data validation tool ation of the language considered. For similar The WLR database has been conceived as a flexi- cases, the compiler must set the ‘sometimes’ value ble tool that allows its administrators to add or in the corresponding feature of the evaluation change labels in the three component parts that form, and the record of the data that are sporadi- make up the repository system, which are called cally given by the dictionary will make the evalu- ‘categories’, ‘features’ and ‘rating system’. The ation procedure more reliable. first component, or ‘category’, lists the types of Actually, the current development of the project is inventoried linguistic resources: only dictionaries improving the existing database components with have been assessed so far, but other supportive in- an additional part that keeps track of where unsys- struments to solve linguistic issues could be added tematic data, like those mentioned above, are pre- to the database, like corpora or grammars. To each sent in the dictionary. This addition will make the category the administrator assigns different de- assessment procedure extremely reliable, since scriptive features, which is the second component the less evident features can be registered, making of the rating system, and can be both binary or the evaluation accuracy easily verifiable. multivalve. The ‘dictionary’ category has 58 fea- With this new database component, the evaluation ture (see Caruso, 2014 for a complete list), some forms will be fillable by anyone and the WLR da- of them can only be present or not, thus are binary, tabase will become a collaborative portal. This, like Cultural Notes, others are multivalue and thus hopefully, will make the number of the invento- need further specifications, like the Kind of Dic- ried resources increase, and it will offer other ad- tionary, which must be set choosing among dif- ditional developments. ferent choices: Monolingual dictionary, Monolin- While compiling the forms, in fact, users could gual word list, Multilingual dictionary, Multilin- also contribute to verify the quality of the data gual word list, Plurilingual dictionary 2 . Lastly, provided, signalling for each dictionary feature if grades are assigned to each of these values accord- any wrong information is given. For each incon- ing to the methodology described above. The ad- sistency the user should indicate one alternative ministrator can decide to set different evaluative data and the source of information from which this parameters for each category taken into account: was driven. On the other hand, the database will for example, if grammars were added to the repos- offer warning signals that indicate the presence of itory, the language proficiency level could be a problematic data within one dictionary. suitable evaluation parameters for it. Once however that the grades distribution has Acknowledgements been decided, the database assigns points auto- We wish to thank Gianluca Monti for managing matically and independently from any actions per- the first version of the WLR database and site. formed by the compiler of the evaluation forms, The present research has been sustained by aca- who can set only the values of the different fea- demic grants from the University of Naples tures. Likewise, if the score assignment is ‘L’Orientale’. 2 For the concept of Plurilingual Dictionary, see Ca- ruso, 2011. References Computer in Producing and Publishing Dictionar- ies: Proceedings of the European Science Founda- Arnold, D. I., 1979, “Synonyms and the College-Level tion Workshop, Pisa: Giardini Editori, 161–74. Dictionary”, Dictionaries, 1: 103–12. Swanepoel, Piet, 2008, “Towards a Framework for the Bothma, T.J.D, 2011, “Filtering and Adapting Data Description and Evaluation of Dictionary Eval- and Information in an Online Environment in uation Criteria”, Lexikos, 18: 207-231. Response to User Needs”, in Fuertes-Olivera, P.A., Bergenholtz, H. (Eds.), 71-102. - 2013, “Evaluation of dictionaries”, in Gouws, R. H., Heid, U., Schweickard, W., e Wiegand, H. Carr, M., 1997, “Internet Dictionaries and Lexicogra- E. (a cura di), Dictionaries. An international en- phy”, International Journal of lexicography, cyclopedia of lexicography. Supplementary vol- 10/3: 209-230. ume: Recent developments with special focus on Caruso V., 2011, “Online specialised dictionaries: a computational lexicography. Berlin/New York: critical survey”, in Kosem I., Kosem, K. (eds.) de Gruyter, 587-596. Electronic lexicography in the 21st century: Tarp, S., 2008, Lexicography in the borderland be- new applications for new users. Proceedings of tween knowledge and non-knowledge, Tü- eLex 2011, Ljubljana: Trojina, Institute for Ap- bingen: Niemeyer. plied Slovene Studies, 66-75. - 2010, “Beyond Lexicography: New Visions and Caruso V., 2014, “A Guide (not only) for Economics Challenges in the Information Age”, in Ber- Dictionaries”, Hermes – Journal of Language genholtz, H., Nielsen, S. & S. Tarp (eds.), Lexi- and Communication in Business, 52: 75-91. cography at a Crossroads. Dictionaries and En- Caruso, V., De Meo, A., 2014, “A Dictionary Guide for cyclopedias Today, Lexicographical Tools To- Web Users”, in Abel, A., Vettori, C. & Ralli, N. morrow, Bertlin et.: Peter Lang, 17-32. (eds.), Proceedings of the XVI EURALEX Inter- Online Dictionaries and resources national Congress: The User in Focus, Bol- zano: EURAC, 1087-1098. Cambridge Dictionaries, http://dictionary.cam- bridge.org/it/, accessed July 2016. De Schryver, G. M., 2003, “Lexicographers’ Dreams Diccionario del vino.com, http://www.diccionariodel- in the Electronic-Dictionary Age”, International vino.com/index.php/letra/a/. Journal of Lexicography, 16/ 2: 143-199. Infoagro.com - Diccionario del vino, http://www.in- Engelberg, S., Müller-Spitzer, C, 2013, “Dictionary foagro.com/viticultura/diccionario/dicciona- Portals”, in Gouws, R. H., Heid, U., Schweick- rio.htm, accessed July 2016. ard, W., e Wiegand, H. E. (eds.), Dictionaries. Math Spoken Here! An Arithmetic and Algebra Dic- An international encyclopedia of lexicography. tionary, http://www.mathnstuff.com/math/spo- Supplementary volume: Recent developments ken/here/, accessed July 2016. with special focus on computational lexicogra- OneLook, www.onelook.com, accessed July 2016. phy. Berlin/New York: de Gruyter, 1023-1035. OWID (Online-Wortschatz-Informationssystem Fuertes-Olivera, P.A., Bergenholtz, H. (Eds.), 2011, e- Deutsch), http://www.owid.de/, accessed July Lexicography: The Internet, Digital Initiatives 2016. and Lexicography, London, New York: Contin- Pons, www.pons.eu, accessed July 2016. uum. Talking Glossary of Genetics, https://www.ge- nome.gov/glossary/, accessed July 2016. Gouws, R.H, 2011, “Learning, Unlearning and Innova- The new Palgrave dictionary of economics, Basing- tion in the Planning of Electronic Dictionaries”, stoke and New York: Palgrave Macmillan, in Fuertes-Olivera, P.A., Bergenholtz, H. http://www.dictionaryofeconomics.com, accessed (Eds.), 17-29. July 2016. Granger, S., 2012, “Introduction: Electronic Lexicog- Web Linguistic Resources (WLR), www.weblinguis- raphy from Challenge to Opportunity”, in ticresources.org, accessed July 2016. Granger, S. & Paquot, M. (Eds.), Oxford: OUP, 1-11. Heid, U. 2011. ‘Electronic Dictionaries as Tools: To- wards an Assessment of Usability.’ In P. A. Fuertes- Olivera and H. Bergenholtz (eds.), 287–304. Kay, M., 1983, “The dictionary of the future and the future of the dictionary”, in Zampolli, A. & Cap- pelli, A. (eds.), The Possibilities and Limits of the