Towards a Knowledge Diversity Model Rakebul Hasan, Katharina Siorpaes, Fabian Flöck Reto Krummenacher Institute of Applied Informatics and Formal Semantic Technology Institute (STI) Description Methods University of Innsbruck Karlsruhe Institute of Technology A-6020 Innsbruck, Austria D-76131 Karlsruhe, Germany firstname.lastname@sti2.at fabian.floeck@kit.edu ABSTRACT representing diversity that reflects the plurality of opinions The Web is an unprecedented enabler for publishing, using and viewpoints on a particular topic. In a first step, the con- and exchanging information at global scale. Virtually any sidered content such as articles, blog entries or news feeds topic is covered by an amazing diversity of opinions, view- are transformed into a semantic representation according to points, mind sets and backgrounds. The research project the knowledge model that is accessible both cognitively to RENDER works on methods and techniques to leverage di- human users as well as computationally to the machine. The versity as a crucial source of innovation and creativity, and semantic representation is then leveraged for improving the designs novel algorithms that exploits diversity for ranking, selection and ranking of content, and the presentation to aggregating and presenting Web content. Essential in this users. In RENDER, selection and ranking will go beyond respect is a knowledge model that makes accessible — cogni- widely adopted approaches based on popularity or personal- tively to human users as well as computationally to the ma- ization, and take opinions and viewpoints into account when chine — the diversity in content. In this paper, we present computing the relevance of results. a glossary of relevant terms that serves as baseline to the In this paper we present a glossary of terms relevant in the specification of the Knowledge Diversity Model. scope of knowledge diversity. Creating a shared understand- ing of terms and relationships between terms is an essential first step towards the specification of a conceptual model for Categories and Subject Descriptors knowledge diversity. In that sense, this paper provides the A.1 [General Literature]: Introductory and Survey; I.2.4 necessary baseline for the definition of a knowledge diversity [Computing Methodologies]: Artificial Intelligence— ontology, which allows for formalizing, gathering, evaluating Knowledge Representation Formalisms and Methods and processing diversity in various (written) online medias. In a first section (Section 2) we provide three motivating Keywords scenarios for this work, which are derived from the project’s showcases that are brought to RENDER by Google, Wiki- Knowledge diversity, Glossary, Knowledge model media, and Telefonica. Section 3 provides a glossary of terms such as diversity, opinion, sentiment, bias and many more. 1. INTRODUCTION Section 4 presents a short overview of the related work. In The Web is a tremendous facilitator and catalyst for the Section 5 we take a quick look at next steps, at how the publication, use and exchange of information, fostering a targeted knowledge model will be used and leveraged in the global network of news, stories and statements which repre- given scenarios and throughout the project, and conclude sent an amazing diversity of opinions, viewpoints, mind sets the paper. and backgrounds. Its design principles and core technology have led to an unprecedented growth in mass collaboration; 2. MOTIVATING SCENARIOS a trend that is also increasingly impacting business environ- ments. In the following we present three motivating business sce- The RENDER project1 aims at leveraging the diversity in- narios for the formalization of a knowledge diversity model. herently unfolding through world wide scale publishing and collaboration by developing methods, techniques, software 2.1 Wikipedia and data sets that make diversity accessible as an important Despite efforts for a balanced coverage at Wikipedia, sys- source of innovation and creativity, and by designing novel temic biases influenced by the individual views of the more algorithms that reflect diversity in the ways information is than 100’000 volunteer contributors have been introduced. selected, ranked, aggregated, presented and used. The increasing complexity of the control processes for creat- An important component for the capturing of diversity in ing and editing articles that are put in place to overcome online documents, is a comprehensive knowledge model for the problem of biases, negatively impacts the growth of Wikipedia. Edit conflict resolution, arbitration committees, 1 banning policies, a complex hierarchy of contributors, edi- render-project.eu tors and administrators is not sustainable. Effectively, re- cent statistics show that the number of new articles has been decreasing dramatically over the past years, while the num- Copyright is held by the author/owner(s). ber of edits is still growing steadily. Discovering missing con- DiversiWeb 2011, March 28, Hyderabad, India.. tent from one language version of Wikipedia to another, or corporation, an institution, a community).2 As an extension the detection of diverse viewpoints within a topic or article of this concept, an agent expressing an opinion of his own are urgently needed support to the editorial team for man- can be called an opinion holder. aging and encouraging large-scale participation and sustain- Belief is given by Wikipedia as “the psychological state in able growth. Diversity-empowered services such as quality which an individual holds a proposition or premise to be or reliability assessment of an article or a specific statement, true”.3 WordNet defines belief as “any cognitive content held conflict resolution, anomaly detection, and cross-lingual con- as true”, or alternatively as “a vague idea in which some con- sistency checking are expected to considerably improve the fidence is placed”.4 way information is currently managed in Wikipedia. Bias is defined by Wikipedia as “an inclination to present or 2.2 Google News hold a partial perspective at the expense of (possibly equally The news aggregator service of Google (Google News) in- valid) alternatives”.5 The definition of bias by Giunchiglia dexes several ten thousands of news Web sites which are et al. in [5] states that “bias is the degree of correlation summaries into more than forty regional issues in more than between (a) the polarity of an opinion and (b) the context 15 languages. The considered news content is created by of the opinion holder”. The context can be a variety of fac- professional journalists and by Web users, and offers as such tors such as ideological, political, or educational background, a rich diversity of information. Current ranking algorithms ethnicity, race, profession, age, location, or time. result in news summaries that are dominated by popular Data is definded by WordNet as “a collection of facts from viewpoints or opinion holders such as large news agencies. which conclusions may be drawn”.6 Wikipedia states that Alternative opinions, or arguments from smaller publish- “the term data refers to qualitative or quantitative attributes ers often disappear and do not reach the interested audi- of a variable or set of variables”. Furthermore, data is the ence. Consequently, even though Google aims for wide and lowest level of abstraction from which first information and comprehensive news coverage, the presented view points are then knowledge are derived.7 highly biased. Manual processing is costly and impractical, and techniques to automatically discover diverse opinions, Diversity is described in the philosophical sense, according viewpoints and discussions surrounding a topic are required to [3], as “the relation that holds between two entities when to fully leverage the richness in news content. Diversity- and only when they are not identical”. In the Cambridge Ad- aware ranking of news posts for covering the most diverse vanced Learner’s Dictionary diversity is defined as: “when view points on a particular topic, and enriching these with many different types of things or people are included in data from other sources like blogs, tweets, and wiki pages something”.8 In [5] diversity is given from a more knowledge is expected to considerably increase the interconnection be- diversity focused point of view as “the co-existence of con- tween diversifying news and discussions on the Web. tradictory opinions and/or statements (some typically non- factual or referring to opposing beliefs/opinions)”. In the 2.3 Customer Relationship Management same paper, different dimensions of diversity are described Telefónica is one of the World’s largest telecommunica- such as: diversity of resources, diversity of topic, diversity tions companies by market share, operating in 25 countries of viewpoint, diversity of genre, diversity of language, geo- with a global customer base exceeding 280 millions. The graphical/spatial diversity, and temporal diversity. company maintains various different communication chan- Emotion is defined by Liu as “subjective feelings and thou- nels including call centers, Web sites and public forums ghts” [7]. As Liu discusses, people use language expressions and blogs to collect customer feedback about their prod- to describe their mental state (or feelings). According to ucts and services. This offers a massive amount of valuable [8], there are a large number of language expressions to de- user opinions coming from diverse sources, countries and pict the six types of emotions; i.e., love, joy, surprise, anger, socio-demographic groups that are currently only marginally sadness and fear. Similarly, people use a large number of exploited, as the technical support for automation is miss- opinion expressions to convey opinions with positive or neg- ing and manual processing is not feasible to the desired ex- ative sentiment. tent. Discovering and automatically evaluating customer re- actions and discussions are expected to allow Telefónica to Entity is described by Wikipedia as “something that has a react more efficiently and effectively to trends, to make more distinct, separate existence, although it need not be a mate- precise forecasts, and to eventually improve future business rial existence”.9 In entity-relationship modelling, an entity decisions. is defined as “a thing which is recognized as being capable of an independent existence and which can be uniquely identi- fied”.10 3. KNOWLEDGE DIVERSITY GLOSSARY The first step towards our knowledge diversity model is 2 to create a shared understanding of the relevant terms and ontologydesignpatterns.org/ont/dul/DUL.owl 3 relationships between them in the scope of knowledge diver- en.wikipedia.org/wiki/Belief 4 sity. In this section, we present a summary of definitions of wordnetweb.princeton.edu/perl/webwn?s=belief 5 possibly relevant terms to get a rough understanding of the en.wikipedia.org/wiki/Bias 6 key concepts in the scope of knowledge diversity. We do not wordnetweb.princeton.edu/perl/webwn?s=data 7 attempt to define these concepts in this paper; instead we en.wikipedia.org/wiki/Data 8 refer to the existing definitions of these concepts. dictionary.cambridge.org/dictionary/british/ diversity 9 Agent is described in DOLCE+DnS Ultralite as an agen- en.wikipedia.org/wiki/Entity 10 tive object, either physical (e.g. a person), or social (e.g. a en.wikipedia.org/wiki/Entity-relationship_model Event is described in DOLCE+DnS Ultralite as “any physi- used. WordNet simplifies the meaning of metadata as “data cal, social, or mental process, event, or state”. DOLCE+DnS about data”.14 Ultralite classifies events based on ‘aspect’ (e.g., stative, con- Object is described in DOLCE+DnS Ultralite as “any phys- tinuous, accomplishment, achievement, etc.), on ‘agentivity’ ical, social, or mental object, or a substance”. The definition (e.g., intentional, natural, etc.), or on ‘typical participants’ of objects by Liu states that “an object o is an entity which (e.g., human, physical, abstract, food, etc.). can be a product, person, event, organization, or topic [7]. Fact, according to Liu, is the “objective expressions about It is associated with a pair, o: (T, A), where T is a hierar- entities, events and their properties” [7]. Wikipedia states chy of components (or parts), sub-components, and so on, that facts “refer to verified information about past or present and A is a set of attributes of o. Each component has its circumstances or events which are presented as objective re- own set of sub-components and attributes”. ality”.11 The Merriam-Webster Online Dictionary defines Objectivity is the expression of facts [1]. Wikipedia more- fact, inter alia, as 1) “the quality of being actual.” 2) “some- over describes objectivity as “a proposition is generally con- thing that has actual existence.” or “An actual occurrence”, sidered to be objectively true when its truth conditions are 3. “a piece of information presented as having objective re- mind-independent – that is, not the result of any judgements ality”.12 made by a conscious entity or subject”.15 WordNet defines Information is defined in [4] in terms of data + meaning: it as the “judgment based on observable phenomena and σ is an instance of information, understood as semantic uninfluenced by emotions or personal prejudices”,16 while content, if and only if: according to [7] objective sentences express factual informa- i) σ consists of n data, for n > 1; tion about the world. ii) the data are well formed ; Object Feature represents the components and attributes iii) the well-formed data are meaningful. of objects [7]. The term object feature is also referred sim- According to this definition, information is made of data ply as feature. Object features are used to simplify the com- and ‘well formed’ here means that data are rightly put to- plexity of hierarchical representation of the components of gether. Well formed and meaningful data are also known objects. as semantic content. Information, understood as semantic content, has two major types: (a) instructional information, Opinion is defined by Wikipedia as “a subjective statement conveying the need for a specific action (b) factual informa- or thought about an issue or topic, and is the result of emo- tion. tion or interpretation of facts”.17 Furthermore, “an opinion may be supported by an argument, although people may Information Object is described by DOLCE+DnS Ultra- draw opposing opinions from the same set of facts”. In [5], lite as “a piece of information, such as a musical composition, opinion is defined as “a statement, i.e. a minimum semanti- a text, a word, a picture, independently from how it is con- cally self-contained linguistic unit, asserted by at least one cretely realized”. actor, called the opinion holder, at some point in time, but Knowledge is informally described in [2]. In a sentence like which cannot be verified according to an established stan- “John knows that Sara will come to the party”, knowledge is dard of evaluation. It may express a view, attitude, or “a relation between a knower, like John, and a proposition, appraisal on an entity. This view is subjective, with pos- that is, the idea expressed by a simple declarative sentence”, itive/neutral/negative polarity (i.e. support for, or oppo- like “Sara will come to the party”. The proposition here are sition to, the statement)”. Another definition of opinion, the abstract entities that can be true or false, right or wrong. given by Liu [7], states that “an opinion on a feature f is a More specifically, the sentences expressing the propositions, positive or negative view, attitude, emotion or appraisal on which are factual or non-factual, are true or false. The re- f from an opinion holder”. lationship between agents and propositions have different Opinion Expression is given by Liu as subjective expres- propositional attitude denoted by verbs like “know”, “hope”, sion that describes sentiments, appraisals or feeling toward “fear”, “regret”, and “doubt” etc. Brachman and Levesque do entities, events and their properties [7]. More generally not consider the sentences involving knowledge that do not speaking, it could be said that opinion expressions are indi- explicitly mention a proposition. For example, it is not clear vidual statements that contain an assessment of reality from if there is any useful proposition involved in the sentences the point of view of the opinion holder. like “John knows how to play guitar” or “John knows Bob well”. Brachman and Levesque also discuss that the notion Opinion Holder, according to Liu [7], is “the person or of belief is related to the notion of knowledge. People use organization that expresses the opinion”; see Agent above. the notion of belief if they do not want to claim that the Polarity of Opinion on a feature f indicates if the opinion judgement of an agent about the world is necessarily accu- rate. is positive, negative or neutral [7]. [5] describes polarity as the degree to which a statement is positive, negative or neu- Metadata is defined by Wikipedia as the “data providing tral. The polarity of an opinion is also known as sentiment information about one or more aspects of the data”,13 ; e.g., orientation or semantic orientation [7]. means of creation of the data, purpose of the data, time and date of creation, creator or author of data, placement on a Sentiment is defined in the American Heritage Dictionary computer network where the data was created, or standards 14 wordnetweb.princeton.edu/perl/webwn?s=metadata 11 15 en.wikipedia.org/wiki/Fact en.wikipedia.org/wiki/Objectivity_(philosophy) 12 16 www.merriam-webster.com/dictionary/fact wordnetweb.princeton.edu/perl/webwn?s=objectivity 13 17 en.wikipedia.org/wiki/Metadata en.wikipedia.org/wiki/Opinion of the English Language as “a thought, view, or attitude, es- Wikipedia articles, or customer feedback. Only when diver- pecially one based mainly on emotion instead of reason”.18 sity can be computationally accessible to the machine, the Sentiments can be seen as a way to express opinions. Hence, capturing and interpretation of opinions and sentiments can sentiments, as much as opinions, can be negative, positive be automated and results extracted at larger scale. or neutral [7]. The intention is to derive a knowledge diversity model from the glossary presented in this paper. In the next step Subjectivity refers to the subject and the perspective, feel- it will be necessary to determine the concrete questions that ings, beliefs, and desires of the subject [6]. Liu defines sub- will have to be answered for the showcase scenarios, and jective sentences as the sentences which “express some per- to extract the definitions that cover these relevant aspects. sonal feelings or beliefs” [7]. Another important future work would be to determine the Text is defined by Dictionary.com, in the linguistic sense, as relationships among the aforementioned concepts. As an “a unit of connected speech or writing, especially composed example, based on the definition presented in this paper we of more than one sentence, that forms a cohesive whole”.19 can conclude that sentiments are a way to express opinions. The Free On-line Dictionary of Computing describes it as Subjectivity refers to the perspective, beliefs and feelings of the “textual material in the mainstream sense”, and in the a person. Bias is influenced by someone’s personal opinion. computing sense as the “data in ordinary ASCII or EBCDIC A particular bias can influence the subjectivity of a sen- representation”, where ASCII and EBCDIC are computer tence when it contains an opinion. Opinions are expressed codes for representing alphanumeric characters.20 by the opinion expressions. Opinion expressions are subjec- tive statements contained in the information objects. The Topic has three definitions in Wikipedia: “a.) the phrase in concepts and relationships can be seen as the baseline for the a clause that the rest of the clause is understood to be about, specification of the knowledge diversity ontology that yields b.) the phrase in a discourse that the rest of the discourse the schema information for semantically capturing the diver- is understood to be about, c.) a special position in a clause sity and context of the textual content considered. Context, (often at the right or left-edge of the clause) where topics also not part of the collected definitions above, is impor- typically appear”.21 WordNet defines topic as “the subject tant to interpret diverse standpoints in view of their socio- matter of a conversation or discussion”.22 demographic, spatio-temporal and historic relationship to each other. In many situation, taking the customer rela- 4. RELATED WORK tionship management as an example, it is not only relevant Giunchiglia et al. consider knowledge diversity as an asset to interpret diverging opinions and sentiments of customers to improve navigation and search [5], however, they do not but also to understand the situation of the opinion holders provide a representation model to represent the knowledge such as for example their country of residence. This allows gathered using their technology. Liu introduces the core for drawing further conclusions relevant for shaping the busi- topics in the field of sentiment analysis and opinion mining, ness. such as sentiment and subjectivity classification, feature- Acknowledgments: The work presented in this paper is based sentiment analysis, sentiment analysis of comparative supported by the European Union’s 7th Framework Pro- sentences, opinion search and retrieval, opinion spam and gramme (FP7/2007-2013) under Grant Agreement 257790. utility of opinions [7]. Liu provides definitions of the rel- evant concepts but the work is aimed at the processing of opinions, and not at representing opinions. Balahur and 6. REFERENCES Steinberger provide their insight on sentiment analysis for [1] A. Balahur and R. Steinberger. Rethinking Sentiment the news domain [1], and as such argue the need for clearly Analysis in the News: from Theory to Practice and defining the source and target of a sentiment. They provide back. In 1st Workshop on Opinion Mining and guidelines on annotating news contents with different senti- Sentiment Analysis, 2009. ments, however, they do neither discuss the representation [2] R. Brachman and H. Levesque. Knowledge of the captured knowledge. representation and reasoning. Morgan Kaufmann The listed works present technologies and methodologies Publishers, 2004. to gather different aspects of diversity, but they do not pro- [3] J. Butterfield. Collins English dictionary: Complete vide any representation model for this gathered knowledge. and unabridged. HarperCollins Publishers, 2003. In contrast, our aim is to work towards developing a knowl- [4] L. Floridi. Information: A Very Short Introduction. edge diversity model to represent the different aspects of Oxford University Press, 2010. diversity. [5] F. Giunchiglia, V. Maltese, D. Madalli, A. Baldry, C. Wallner, P. Lewis, K. Denecke, D. Skoutas, and 5. FUTURE WORK AND CONCLUSIONS I. Marenzi. Foundations for the representation of The goal of this paper was to collect a comprehensive diversity, evolution, opinion and bias. Technical Report glossary of terms that are relevant in the context of knowl- DISI-09-063, University of Trento, 2009. edge diversity. Aspects such as opinion, sentiment or bias [6] T. Honderich. The Oxford Companion to Philosophy. are essential in understanding the diversity of news posts, Oxford University Press, 2005. [7] B. Liu. Handbook of Natural Language Processing, 18 www.houghtonmifflinbooks.com/ahd/ chapter Sentiment Analysis and Subjectivity, pages 19 dictionary.reference.com/browse/text 627–666. CRC Press, 2010. 20 foldoc.org/text [8] W. Parrott. Emotions in Social Psychology: Essential 21 en.wikipedia.org/wiki/Topic Readings. Psychology Press, 2001. 22 wordnetweb.princeton.edu/perl/webwn?s=topic