=Paper= {{Paper |id=Vol-2773/wdw2 |storemode=property |title=Diversity in a Language-Independent Wiki: Six Design Requirements and Goals to Embed a Diversity Mindset |pdfUrl=https://ceur-ws.org/Vol-2773/paper-06.pdf |volume=Vol-2773 |authors=Marc Miquel Ribé |dblpUrl=https://dblp.org/rec/conf/semweb/Miquel-Ribe20 }} ==Diversity in a Language-Independent Wiki: Six Design Requirements and Goals to Embed a Diversity Mindset== https://ceur-ws.org/Vol-2773/paper-06.pdf
               Diversity in a Language-Independent Wiki:

Six Design Requirements and Goals to Embed a Diversity Mindset

                                          Marc Miquel-Ribé1
                           1
                               Universitat Pompeu Fabra, Barcelona, Catalonia
                                          marcmiquel@gmail.com



           Abstract. Abstract Wikipedia is a Wikimedia Foundation project comprised of
           two repositories, one of functions and another of abstract content. Its goal is to
           collaboratively create functions that convert content from multiple Wikipedia
           language editions into an abstract language, allowing editors to maintain it, and
           feeding it back to each of these language editions so they can expand more rapidly
           and efficiently. In this paper, we argue that it is necessary to design the functions
           and user interfaces so that they encourage contributors to actively work on
           content diversity across Wikipedia language editions. Based on different
           Wikipedia studies on editing dynamics and content contextualisation, we suggest
           six design goals and requirements: topic identification, gap bridging, perspective
           modelling, items representation, article composition, and dispute visualization.
           Finally, we discuss the importance of helping smaller languages grow and
           fostering knowledge diversity through the exchange of existing contents.

Keywords: Abstract Wikipedia, Diversity, Content Gaps, Online Collaboration.


1          Introduction
   Wikipedia exists in more than 300 language editions, but their distribution of content
is unequal. We know there is a gap between language editions as they do not cover each
other’s content. Even large language editions like English and German only overlap in
around a third of their articles. Yet, we also know that there exist dynamics of importing
and exporting content from one language to others, aimed at both expanding existing
articles and generating new ones.
   Based on this, a project named Abstract Wikipedia has been announced1 in July 2020
to enhance the creation and expansion of articles across Wikipedia language editions.
The project proposes a repository of functions that allow capturing content from
language editions, converting it to an abstract language, storing it into a language-
independent content repository, manipulating it, and finally translating it back to a
natural language to be integrated into a Wikipedia language edition [1].
   Having a set of functions to operate on multiple Wikipedia language editions and a
language-independent repository to store and draw content from seems like a logical
solution. A single repository may be less expensive to maintain, and its content can help

    1
        https://meta.wikimedia.org/wiki/Abstract_Wikipedia/July_2020_announcement

Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0)
2

grow multiple projects at the same time. However, formalising a global and language
independent repository could inherently limit the diversity emerging in each language
edition, as all the main operations of importing and exporting content would possibly
be based on the preferences and biases of its contributors, who might be a less diverse
subset of the entire Wikimedia movement.
    In this paper, we argue that a project with a multilingual scope like Abstract
Wikipedia should take a proactive stance towards diversity and design its repository of
functions and its repository of content to best address the existing content gaps and
biases in Wikipedia languages (e.g. gender, culture, etc.) at article and in-article level
(Section 1). In Section 2, we review the Wikipedia literature and propose six
requirements and six goals the system should be planned for, without going further into
its technical implementation. In Subsection 2.1, we deepen into article-level and explain
why it is necessary to encourage topic identification and gap bridging to export and
import articles. In Subsection 2.2, we deepen into in-article level to explain the need
for modelling and including, in every article, all the available perspectives or points of
view; allowing multiple compositions of the same article with different perspectives;
and also raising awareness on disputes in articles before every action. In Section 3, we
argue that if these goals were properly addressed (i.e. through some functions and user
interfaces), the new repositories could grow faster, fulfilling their unique opportunity
to increase the content diversity in Wikipedia language editions.


2         System Design Requirements
In the following two sections, we want to propose a set of six design requirements to
ensure that Abstract Wikipedia new repositories are more suited to improve content
diversity in all Wikimedia projects at both article and in-article level. Each subsection
is entitled according to a system design requirement that is motivated by some research
findings and aims at fulfilling a specific design goal.

2.1       Article-Level
More diversity at article-level means that each Wikimedia project, regardless of its size,
represents the existing variety of knowledge in Wikimedia. Hence the functions of
Abstract Wikipedia should help editors search for groups of articles on a particular
topic, to work on them in the language-independent repository, and later, if they wish,
to be able to create an article from scratch or improve an existing one in a Wikipedia
language edition. In the following Subsection 2.1.1, we will present the two design
requirements of topic identification and gap bridging that should enable both to import
and export articles from and to Wikipedia language editions.

2.1.1          Topic Identification and Gap Bridging
If we want Wikipedia to gather “the sum of human knowledge”2, we must inevitably
forget about an elitist selection of “universal content” that once motivated the first
encyclopaedias. Instead, we need to be able to collect content about every topic. With

    2
        https://en.wikipedia.org/wiki/Wikipedia:Prime_objective
                                                                                           3

this comes the challenge of identifying topics and writing articles about them. Even
though this is often done through manually generated lists of articles, browsing
categories, and navigating Wikidata, it is often challenging to locate a gap, understand
its importance and try to bridge it. Abstract Wikipedia as a project proposes to help
exchange content across language editions. But what kind of content? The most genuine
and unique contribution from each Wikipedia language edition to the global content
diversity in Wikimedia is “local content”. As it has been stated [2], it is “the group of
articles in a Wikipedia language edition that relates to the editors' geographical and
cultural context (places, traditions, language, politics, agriculture, biographies, etc.).”.
Local content takes in average a quarter of the 40 largest Wikipedia language editions
(e.g. half of English Wikipedia and 10% of Dutch Wikipedia). It tends to be more edited
and developed in terms of Bytes, references, images, among other features [3].
Interestingly, anonymous editors and administrators tend to dedicate a higher
proportion of their edits to this kind of content than regular registered editors [3].
   What is less known is the fact that the gap between language editions is mainly due
to local content [2]. Wikipedia language editions that cover a minimum of others’ local
content are those which are either very large or simply linguistically or geographically
close. Differently from the content gender gap, which exists on most language editions
with a proportion of 80-20%, and that can be bridged with the creation of more women
biographies for every male biography created, the content culture gap requires content
from the local content of every other language edition. [4] proposed a method and tool3
to select the local content of every language edition and create lists of 100 to 500 articles
through algorithms which rank them according to relevance features (i.e. number of
editors, number of pageviews, etc.) and specific topics (e.g. books, places, gender,
ethnicity, etc.). These “top priority” lists allow editors to have a reference point to start
bridging the gaps. Namely, they are centred on a topic and on content local to a
language edition and they show the articles’ availability in every other language. In a
similar vein, the new functions should ease the process of finding articles about a topic
and massively create them in other language editions. In other words, a good design
goal would be: “help editors in the process of finding relevant articles about topics
they do not know about and import them to a language edition.”
   But how about nurturing the other language editions with topics you are familiar
with? In many language editions’ communities, 50% of the editors who accumulate
between 101-1000 edits are multilingual or have contributed at least once to language
editions other than their primary language. Such percentage increases to more than 70%
for those who accumulated over 1000 edits, and reaches 90% for those who
accumulated over 10000 edits [3]. However, the proportion of edits any of these types
of editors dedicates to other language editions is always limited to 1-4% of their total
contributions. This is also true for some language editions like Catalan or Basque,
whose editors also speak another language, as editors generally focus on a primary
language. [2] saw that when editors contribute to other language editions, they
sometimes translate local content from their primary language to other language
editions. In fact, 90 to 100% of the editors with a flag (e.g. sysops) have acted as
exporters of their local content, while only 40% of the multilingual registered editors

   3
       https://wcdo.wmflabs.org/top_ccc_articles/
4

have done so. Most of the multilingual edits are not dedicated to exporting local content
but to creating content in general. Unsurprisingly, articles that receive more attention
by exporters tend to be focused on the country, the political or historical figures or
cities, rather than on an artist or a typical dish. By encouraging the exporting of specific
topics (e.g. local content, women biographies, ethnicity, etc.) we can ensure more
diversity in every language edition. Taking this into account, a design goal would be:
“help editors in the process of exporting a group of selected articles on a topic they
are familiar with into one or more Wikipedia language editions.”
    The process of exporting content from large language editions into smaller ones is
one of the main promises of Abstract Wikipedia. One would expect that such content
would be composed mainly of lists of Vital articles4 (i.e. “articles that every Wikipedia
should have”) or articles from a local content whose editors are especially interested in
making it available in multiple language editions. However, does it make sense to
export local content from a language edition into other language editions which have
not yet created their own? [4] shows us that in 145 language editions, local content
takes only 5 to 10% of all the articles and that there are relevant gaps in topics as
important as central political figures, or places, among others. In fact, 92 language
editions contain less than 100 geolocated articles in the territories where the language
is spoken. Moreover, the speakers of these languages are more likely to search and read
articles on local topics directly in a large language edition, even when they are also
available in their own language. This is perhaps due to the acceptance of another
language as of higher status.
    Since local content receives most of the pageviews, nothing would benefit these
smaller language editions more than receiving assistance in the creation of their own
local content articles (rather than creating articles on Julius Cesar or international pop-
stars). Local content articles can help these language editions reach higher positions in
the search engines, hence potentially attracting new contributors. Similarly to this tool5
proposed by [5], the Abstract Wikipedia new functions should allow searching for
articles about a language local content that do not exist in its related Wikipedia language
edition, so that local editors can use the content as starting point to create them.
    As seen, the motivation to ensure a global audience to an article drives editors to
export articles to multiple language editions. However, when the article does not yet
exist in any language edition some editors may wonder whether they should first create
it in their local language edition or create it directly in the English Wikipedia or, in this
case, in the new content repository. While it seems that in the English Wikipedia the
article may receive a wider audience geographically speaking, [2] shows that an article
belonging to local content tends to receive more pageviews in its related language
edition. Most of the editors might prefer the option of starting it locally, and then
translate it into English, which tends to be the preferred choice as a secondary language
by multilingual editors [5]. What will happen once the Abstract Wikipedia content
repository is settled and functioning is uncertain. The available linguistic resources in
order to transform abstract content into natural language and back to abstract might
probably determine the different paths contributors will take in order to create new
content and broadcast it across the Wikimedia projects.

    4
        https://en.wikipedia.org/wiki/Wikipedia:Vital_articles
    5
        https://wcdo.wmflabs.org/missing_ccc_articles/
                                                                                          5

2.2      In-Article Level
More diversity at in-article level means that articles contain a wide variety of
perspectives. Hence Abstract Wikipedia functions and interfaces should help editors
include all the existing points of views in an article and also make more conscious
decisions on which perspective is more widely represented in its text. In the following
Subsections 2.2.1 to 2.2.4, we will present four design requirements: perspective
modelling, items representation, article composition and dispute visualization.


2.2.1        Perspective Modelling
Neutral Point of View (NPOV) is one of Wikipedia’s core content policies. It requires
an article to be diverse to include “all the significant views that have been published by
reliable sources on a topic”, but at the same time, it also suggests that these views should
be fairly proportionate to the prominence of their sources. In practice, some
perspectives like those related to gender or some specific groups are present in some
language editions and underrepresented in others. For Abstract Wikipedia to increase
in-article diversity in all language editions, it would be necessary that editors recognize
the perspectives included in each version of an article. For this, a good design goal is
the following: “help editors to model all the perspectives of an article in all the
language editions where it exists.”
   But what is a perspective? A perspective is a unit of meaning. It includes one or more
statements that complement each other around a specific topic and expands the general
topic of the article. The statement is the smallest unit that compounds a perspective.
Following Wikidata’s definition of statement6, we understand it as a Subject-Predicate-
Object (item, property, value). Sentences usually include more than one statement. In
fact, depending on the number of statements, a perspective can take the form of a
sentence, a paragraph or even an entire section. You can only increase the usability of
content if you are able to accurately model each perspective as a group of statements,
which would be a process that could benefit from automated approaches but require
manual revision. In fact, the only way to ensure that all the perspectives of all the
versions of an article are included in the Abstract Wikipedia content repository, is if
they are well-modelled at a statement level. In the process of writing, editors identify
perspectives all time and question the validity or suitability of each statement.
   With this modularity, it is expected that some recurring perspectives of the articles
become standardised as sections or paragraphs, and always receive the same label. For
example, some articles would seem to benefit from standardised sections such as
“Notable books”, “Demographics of a city”, “Filmography”, among others. While these
would be more general perspectives, others could have the length of a sentence and go
further into details, e.g. a biography containing one statement which specifies a
person’s political affiliation or a paragraph with the main ideological affinities. After a
while developing perspectives in Abstract Wikipedia, it might lead to more aligned
sections of all the articles on the same topic and across language editions. One could
say it would eventually work in a similar way to Infoboxes and Wikidata, fact-based
and up-to-date. With more perspective modularity, it is also expected that combined

  6
      https://www.wikidata.org/wiki/Help:Statements
6

with topic identification and article management (see Subsection 2.1.2) it will be
possible to detect topic-based perspectives from one or more articles, as a prior step to
complementing them. For example, given a selected group of biographies of women,
we could search for gender-based perspectives in twenty-century historical articles in
the Abstract Wikipedia content repository. Using these perspectives, we could update
the articles of those language editions which lack them. Many topics but especially
those related to gender, geography, colonised nations or ethnic groups could benefit
from processes using the modularity that modelling statements and perspectives offers.

2.2.2          Items Representation
When importing content from a language edition into another, we do it with its biases
and gaps. In a comparative study of biographies of Americans and Poles, [6] showed
that their versions in the Polish and English Wikipedia differed not only at structural
level (i.e. number of categories or length) but also in the amount of personal
information, mentions to education and nationality, among other aspects. If the English
Wikipedia, which is edited by many multilingual editors, does not contain a more
complete set of points of view, who could guarantee us that Abstract Wikipedia content
repository will? After all, in the demographics of active editors in all Wikimedia
projects the English community is the largest, and we might expect that an Abstract
Wikipedia article is initially created after the English Wikipedia version. For this
reason, we must design the Abstract Wikipedia content repository in such a way that
editors are encouraged to make its articles as complete as possible. So, a design goal is
the following: “help editors to include all the perspectives available in all the
language editions’ versions of an article.” This could be done by showing gaps and
differences in the user interface. In some research studies [7, 8] article similarity is
computed by counting the number of links two versions of the same article have in
common. This idea has also been explored visually by tools7,8. One allows you to pick
two or more versions of the same article in different language editions and show the
articles in common in a cloud map [7], while the other shows those that are unique to
each language in a different colour [9]. Abstract Wikipedia content repository user
interface should remind editors which items (articles or Wikidata Qitems) are used and
unused in each article’s statements and perspectives, so they complete it with the
content from all language editions. Similar strategies should be explored to stimulate
the completion of the necessary linguistic resources to transform abstract content into
the natural languages.

2.2.3          Article Composition
Even if an Abstract Wikipedia article may contain all the perspectives from all the
language editions’ versions of it, we cannot escape the fact that some perspectives may
be more extensively represented than others. This is due to the process through which
this content has been created, the number of contributors from each community, as well
as their skills and capacity to contribute to Abstract Wikipedia content repository. If an
article that is exported to a language edition contains controversial perspectives to the

    7
        http://manypedia.com/
    8
        https://omnipedia.northwestern.edu/
                                                                                        7

eyes of the local community, it could be rejected. We know that some articles
accumulate many changes on the first lines, such as deletion of substantives, changes
in their order, and the alteration of the number of occurrences [10]. Therefore, we
should ask if an article of the Abstract Wikipedia content repository, even when it
includes all the language’s editions perspectives, can satisfy all the target Wikipedia
language edition’s communities. It seems inevitable that different human groups
speaking the same or a different language consider that perspectives on the same topic
deserve a different weight. Therefore, having a single arrangement of all the
perspectives in an abstract content article to be exported to other language editions may
not be the best strategy. For this reason, we propose the design goal: “allow editors to
compose multiple versions of an article to avoid spreading a specific group of
perspectives to all language editions.” The modelling of the different perspectives
explained in subsection 2.2.2 should facilitate the creation of multiple layouts in which
editors, for example, could select which statements they feel more comfortable with.
Since Abstract Wikipedia’s repository content is not meant to be read like that of any
other Wikipedia, it should be flexible to allow editors to choose which perspectives
they consider they deserve more representation in the article they are exporting, rather
than having a single representation of all perspectives to export.

2.2.4           Dispute Visualization
   The reason for allowing multiple arrangements and selections of the statements
available in a Wikipedia article is that it can prevent reverts after exporting content to
a language edition. But such a strategy comes with a trade-off in terms of efficiency,
namely the more potential different compositions are, the more difficult it is to
massively create articles. Editors may need to evaluate the different perspectives
contained in an article before deciding to export. For this reason, it would be
recommended to have a “default” composition. Although it may have an implicit bias,
it is nevertheless the most usable solution for editors when there are no relevant
disputes. Instead, when dealing with perspectives that have been previously in dispute
in the original language editions, the user interface could warn the editor before making
an export. Therefore, a good design goal is the following: “make editors aware of the
disagreements in articles’ perspectives when importing or exporting content to
language editions.” Based on previous research [10], it is expected that articles on
scientific discoveries, politics and historical figures are the ones which will require a
more thorough examination before exporting. Like in this tool9, the disputes could be
highlighted along with their language edition origin, as this could be useful information
for the user to choose whether to omit it or include it before exporting content to a
version of the article in another language edition.


3          Conclusions
Abstract Wikipedia repositories of functions and of abstract content will bring
efficiency and robustness to Wikimedia, as it guarantees growth through the exchange
of contents. Even though the project will be very beneficial to expand these Wikipedia

    9
        http://contropedia.net/
8

language editions whose editors strive to settle a community or even struggle to have
access, we need to include them in the decision-making10. Likewise, the approach of
Abstract Wikipedia to increase diversity is constrained to the existing content in
Wikimedia projects, and growing beyond these limits requires more diverse
communities. The imbalanced participation in different geographical regions or the
enormous gender gap in editors are challenges to be addressed by improving the user
experience and providing more safety to editors. Abstract Wikipedia cannot tackle these
issues and it should try not to become a technical barrier to existing or new contributors.
   We argue that it is necessary to embed diversity in Abstract Wikipedia core functions
and content repository interfaces, so that editors are compelled to work on it
effortlessly. In this paper we suggested six design goals to enable editors to monitor the
gaps, import and export content both at article and in-article level. We proposed six
design requirements aimed at fulfilling these goals (topic identification and gap
bridging, perspective modelling, items representation, article composition and dispute
visualization) based on research studies and practical tools that have been in use in the
past decade. We argued that by addressing diversity since the very beginning, the
project can become a valuable tool to all those groups of contributors that aim at more
knowledge equity, and guarantee an exchange of contents that is both strategic and
respectful to the diversity of topics and perspectives.

References
 1. Vrandečić, D. (2016). Architecture for a multilingual Wikipedia. arXiv preprint
    arXiv:2004.04733.
 2. Miquel-Ribé, M., Laniado, D. Wikipedia Culture Gap: Quantifying Content Imbalances
    Across 40 Language Editions. Frontiers in Physics, (2018).
 3. Miquel-Ribé, M. Identity-based motivation in digital engagement: the influence of
    community and cultural identity on participation in Wikipedia (Doctoral dissertation,
    Universitat Pompeu Fabra), (2017).
 4. Miquel-Ribé, M., Laniado, D. Wikipedia Diversity Observatory: A Project to Identify and
    Bridge Content Gaps in Wikipedia. In Proceedings of OpenSym, (2020).
 5. Hale, S. A. Multilinguals and Wikipedia editing (pp. 99–108). WS '14: Proceedings of the
    2014 ACM conference on Web science, (2014).
 6. Callahan, E. S., Herring, S. C. Cultural bias in Wikipedia content on famous persons. Journal
    of the American Society for information science and technology, 62(10), 1899-1915, (2011).
 7. Massa, P., Scrinzi, F. Manypedia: Comparing language points of view of Wikipedia
    communities. In Proceedings of the Eighth Annual International Symposium on Wikis and
    Open Collaboration (pp. 1-9) (2012).
 8. Hecht, B., Gergle, D. The tower of Babel meets web 2.0: user-generated content and its
    applications in a multilingual context. In Proceedings of the SIGCHI conference on human
    factors in computing systems (pp. 291-300), (2010).
 9. Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D. Omnipedia: bridging
    the Wikipedia language gap. In Proceedings of the SIGCHI Conference on Human Factors
    in Computing Systems (pp. 1075-1084) (2012).
10. Borra, E., Weltevrede, E., Ciuccarelli, P., Kaltenbrunner, A., Laniado, D., Magni, G., ... &
    Venturini, T. Contropedia-the analysis and visualization of controversies in Wikipedia
    articles. In OpenSym (pp. 34-1) (2014).


    10
         https://en.wikipedia.org/wiki/Nothing_About_Us_Without_Us