Experiences with Multilingual Modeling in the
       Development of the International Classification of
               Traditional Medicine Ontology

             Csongor Nyulas, Tania Tudorache, Samson Tu, Mark A. Musen

         Stanford Center for Biomedical Informatics Research, Stanford University, US
               {nyulas, tudorache, swt, musen}@stanford.edu


        Abstract. The World Health Organization (WHO) in collaboration with sev-
        eral international stakeholders have started recently the work on the International
        Classification of Traditional Medicine (ICTM), which will provide a standardized
        system for encoding and collecting health statistics data related to Traditional
        Medicine practice throughout the world. ICTM is represented in OWL, and is
        developed by Traditional Medicine experts in a collaborative Semantic Web plat-
        form, called iCAT-TM. The content of ICTM is developed simultaneously in four
        languages (English, Chinese, Japanese and Korean). In this paper, we describe
        how we modeled the multilingual content, the Web platform used for editing, and
        some of the challenges we have encountered related to the multilingual aspects
        of the model and use of the platform.


1     The International Classification of Traditional Medicine (ICTM)
The World Health Organization (WHO) in collaboration with a large group of in-
ternational stakeholders is developing the International Classification of Traditional
Medicine (ICTM).1 ICTM will provide a standardized international system for classify-
ing Traditional Medicine (TM) related health concepts, such as disorder names, disease
patterns, signs and symptoms, causal factors, and interventions [9]. One of the goals
of the project is to be able to unify the data collection and monitoring for Traditional
Medicine systems with those of the conventional (i.e., “Western”) medicine, which will
be realized by integrating a relevant part of ICTM as Chapter 23 of the 11th revision
of the International Classification of Diseases (ICD).2 ICD is an essential classification
used in the United Nation countries for compiling basic health statistics, billing, and
clinical documentation [8].
    The content of ICTM is based on classifications of Traditional Medicine from three
countries, China, Japan and Korea. Even if these classifications have a common root,
they have diverged significantly over the years. The role of ICTM is to harmonize these
different efforts and come to a consensus classification that can be used in health sys-
tems around the world.
    With the information age revolution, WHO has changed significantly the way they
build classifications. To make them ready for electronic health records and enable easy
 1
     https://sites.google.com/site/whoictm/
 2
     http://www.who.int/classifications/icd11/browse/f/en
cross-linking between them, the classifications have now a formal underpinning. ICTM,
similarly to ICD-11, is represented as an OWL ontology and is developed using Seman-
tic Web technologies.
    Given the international nature of ICTM, tackling the multilinguality problem is one
of the main challenges in the project. Domain experts from the three countries and
the project coordinators in Geneva, Switzerland, are developing the content of ICTM
simultaneously in four languages: English, Chinese, Japanese, and Korean. Our group
has provided the ontology modeling support and the Web platform infrastructure used
for editing ICTM. In this paper, we describe our experiences in supporting multiple
languages in the ICTM ontology, including the model and tooling, and the challenges
we encountered.
    The rest of the paper is organized as follows: Section 2 describes the related work,
in Section 3, we describe how we modeled the multilingual content in ICTM, Section 4
presents the collaborative Semantic Web platform used by the domain experts to edit
ICTM, and finally, Section 5 presents the challenges and some lessons learned in the
project, and gives an overview of the future work.

2      Related Work
As the Semantic Web matures, there is an increasing body of research on localizing
ontologies. For example, the SKOS-XL extension [1] treats labels as first order re-
sources, thus enabling the definition of explicit links between labels associated to the
same concept. Montiel-Ponsoda et al. [4] try to overcome some of the limitations of the
SKOS-XL representation and propose a module for lemon [3] that supports different
types of translation relations and metadata, such as provenance and reliability scores.
Extensive work on ontology localization [2] has also been done in the NEON project3
that proposes guidelines and a tool to support this process.
    Silva et al. present conceptME [5], a collaboration framework that supports ontol-
ogy localization starting early in the conceptualization phase. Providing terminological
support so early in the development process proved to enhance the conceptualization
of the domain. conceptME has also support for sharing conceptual models, for content
negotiation and discussion.
    In this work we did not use any of the related approaches, as one of the main re-
quirements in the project (see Section 3) was to use and/or extend the ICD-11 ontology
to ensure that these two ontologies will be easily integrated at a later stage. We plan to
investigate the related approaches (such as SKOS-XL and the extensions to lemon) to
see if they would fit the requirements for ICTM, and if so, we will refactor our ontology
accordingly.

3      Multilingual Modeling in ICTM
As we mentioned before, one of the main requirements for ICTM is that it should fol-
low similar modeling patterns to the ICD-11 ontology [6], so that these two can be
easily integrated.4 In addition, all ICTM textual content should be available in four lan-
guages: English, Chinese, Japanese and Korean, which Traditional Medicine experts
 3
     http://www.neon-project.org
 4
     As we mentioned before, part of ICTM will be available as a separate chapter in ICD-11.
Fig. 1. Excerpt from the ICTM ontology. Language terms are modeled as instances of the rei-
fied class LanguageTerm. Subclasses of LanguageTerm represent different linguistic terms (title,
definition, synonym, and so on). The subclasses may have additional properties that represent dif-
ferent metadata of the language term. The class level is shown in boxes with white background.
We also show an example instantiation for the title terms for the Qi goiter disorder disease class
using the boxes with darker background.


from different countries will input during development. A further requirement, which
came later in the project, was to support transliteration of titles, i.e., converting the Chi-
nese, Japanese and Korean scripts into Latin script. For example, a common translit-
eration for converting Chinese characters into Latin script is Pinyin. Figures 1 and 2
show the transliteration of the simplified Chinese disease title 气瘿 (meaning, Qi goiter
disorder) into Pinyin as Qi ying. Other metadata will be attached to the label of a term
into a specific language, such as the source of the label (e.g., the Traditional Medicine
classification where the label originates from), and an internal id that is used by other
WHO software.
    We modeled ICTM in OWL 1.0. We used a reified class, LanguageTerm, to repre-
sent all linguistic terms in the ontology. We have created a taxonomy of language terms
as subclasses of the LanguageTerm class, as some of the term types have additional
properties attached to them. For example, the SynonymTerm has additional properties
that describe if and how it will be included in an electronic index for the classification.
In the current version of the model, there are eight subclasses of LanguageTerm that
represent among others, the title, the fully specified title, the definition, other external
definitions, the synonyms, and so on. The actual value for a language term is an instance
of a subclass of LanguageTerm.
    Figure 1 shows an excerpt of the class level modeling for language terms and how
different properties of a disease class (title, definition, synonym, etc.) have been reified.
The figure also shows an example for modeling the five title terms for the Qi goiter dis-
order disease class. The Qi goiter disorder class has a property title that has as values
five instances of the class TitleTerm that correspond to the titles in 5 languages (En-
glish, Japanese, Korean, simplified Chinese and traditional Chinese). Some TitleTerm
instances (e.g., TitleTerm 4788 for simplified Chinese) has in addition to the label and
language properties, also another property, alternativeLabel, to represent the transliter-
ation of the Chinese script to Latin characters. Other language terms (not shown in the
figure), such as the ExternalDefinitionTerm—used to reference textual definitions from
external resources — have additional properties that specify the source of the definition
in greater detail (e.g., the ontology name, the IRI of the source ontology entity, the URL
for the source ontology, etc.).
    As there might be confusion about the difference between synonyms and translitera-
tions, we would like to clarify this issue. A synonym is a term that has a similar meaning
to a another term (in our case, the title term). In ICTM, as in ICD-11, synonyms are also
used to store alternative titles for a disease, that are either found in scientific literature,
or have minor linguistic variations, or are used in the colloquial language (e.g, the syn-
onym for Roseola infantum is Sixth disease). The synonyms apply for terms in the same
language (e.g., an English title may have other English synonyms). A transliteration, on
the other hand, represents exactly the same term in the same language, but in a different
script. A term may have several transliterations (Korean has 4 different transliterations).

4   The iCAT-TM Platform
Traditional Medicine experts around the world are editing ICTM using the collaborative
iCAT-TM Web platform. iCAT-TM is a customization of the generic WebProtégé ontol-
ogy editor [7]. The user interface of iCAT-TM is tailored for domain experts, who are
not knowledgeable about ontologies or knowledge representation. iCAT-TM presents a
form-based interface shown in Figure 2 that is is less intimidating for the experts than
a generic ontology editor would be. The experts can edit the class taxonomy in the left
panel of the ICTM Content Tab, and the class details, including the language terms, in
the right panel.
    iCAT-TM has many collaboration features inherited from WebProtégé, such as the
support for simultaneous editing, change history of users’ actions, and notes and dis-
cussions attached to any entity in the ontology.
    We have created a generic widget that displays the content of reified individuals,
and we have reused it for displaying and editing the different language terms. In Fig-
ure 2, the ICTM Title uses this widget to display the values of the TitleTerm individuals
associated to the title property of a disease. A row in the widget table corresponds to one
of the reified individual values, and the columns display the properties of the respective
row individual value. The same widget is also used for displaying the short definition
of a disease (the transliteration column has been hidden from view).
Fig. 2. The iCAT-TM platform is used by domain experts from the three countries to develop
ICTM collaboratively on the Web. The panel on the left hand side shows the class tree, and the
right hand side panel shows the details of the selected class (in this case, the Qi goiter disorder).
The language terms for the title and short definition are also shown, as well as the transliteration
for the title.


    One “bonus” of using reified individual for language terms (which, as a conse-
quence, have identity) is that we can attach notes and discussion threads to a particular
individual. For example, in Figure 2, the second short definition of the disease has a
comment attached to it (shown as the number 1 next to the comment icon on the sec-
ond row). This feature enables domain experts to have focused discussions right in the
context in which they are editing. The contextual discussions are particularly useful
because each disease has several properties that need to be filled, in many cases by
different experts, and the overview and management of notes and discussions is much
easier.

5    Discussions and Future Work
The iCAT-TM has been in production use since February 2011 by 25 Traditional Medicine
experts. As a result, ICTM contains now more than 1,500 classes, 15,000 reified terms,
out of which, 10,000 are language terms. The users have created more than 60,000
changes in the ontology, and added more than 1,100 notes and discussions.
    Since the beginning of the project, we have encountered several challenges related
to the multilingual aspects in the modeling, tooling and use of the platform.
    Modeling. ICTM was developed using OWL 1.0 to make it compatible with the
ICD-11 ontology. For this reason, we had to use reified relations to model the language
terms. Reified relations, even though they have the advantages described earlier, have
several disadvantages, as well. First, the reified individuals clutter the domain ontol-
ogy, and increase its size significantly (in ICTM, almost all property values are reified).
Second, these anonymous individuals are used in reasoning (as part of the domain on-
tology) and can slow it down significantly. We plan to overcome these limitations by
upgrading the ontology to OWL 2.0 (ICD-11 will also upgrade), and rather than using
reified individuals, we plan to use annotations on axioms. We plan to change the mod-
eling in other aspects, too. For example, the transliterations are currently modeled as
a multiple cardinality datatype property that take string literals as values. Even if we
can now add more transliterations for the same label (e.g., Korean has four different
transliterations), we cannot specify to which script or alphabet a transliteration belongs
to. We plan to address this issue by using nested annotations on axioms in the OWL 2.0
modeling. Additionally, we plan to investigate if other approaches for ontology local-
izations, such as the ones we mentioned in the Related Work section, are suitable for
ICTM. If these approaches fit the requirements, we will refactor the ontology to use a
more standard approach. This undertaking will, however, require significant effort, as
we need to also change the modeling of ICD-11, as well as migrate all existing content
of two live production system (iCAT for ICD-11, and iCAT-TM for ICTM) to the new
structure.

    Tooling. We had to make sure that our tooling works well with international char-
acters. While these are not an issue for the Web application per se (Web browsers can
show pages in different encodings), we had to adjust our Lucene-based search mecha-
nism to work properly with multiple languages. One hurdle for the domain experts in
using iCAT-TM is that the user interface is presented in English, and many of them are
not very comfortable with it. We plan to redesign the user interface to better follow the
principles of internationalization, so that we can more easily provide language specific
user interfaces. We do expect that this step will involve a significant re-design effort.

    Use of the platform. We had several user related challenges that are not necessar-
ily of technical nature. For example, when we started the project, we used (wrongly)
the country codes to model the languages (ch, jp, and kr). Later, in the process, we
changed the language codes to the correct ones from the ISO 639-1 (en, zh, ja, ko),
however, some of the domain experts complained that the correct language codes are
less intuitive to use. Also, when we started the project, we did not anticipate that some
content will be entered in simplified Chinese, while other will be entered in traditional
Chinese, which created some confusion with the users. As a solution, we added also
the traditional Chinese language code (zh-Hant), so that at a later date the Chinese con-
tent can be easier curated and harmonized (it is expected that in the official distribution
only simplified Chinese will be used). Another challenge is related to the communi-
cation among the domain experts, as most of them speak only their native language,
and sometimes English, too. To improve the communication among the domain experts
and the WHO coordinators in Geneva, we have introduced the transliteration. Another
challenge related to the language barrier is that experts do not agree on the English
translation for a term, and “invent” new English translations. This fact also makes the
curation and verification of the entire classification content very challenging, because
finding Traditional Medicine experts who understand all languages and can verify that
the terms in different languages really mean the same thing, is very difficult.
    As future work, we plan to upgrade ICTM to OWL 2.0 to overcome the modeling
issues we described before. We will also create linkages between ICTM classes and
ICD-11 classes that will put into correspondence Traditional Medicine disorders with
“Western” diseases. As the project progresses, we will also provide a peer-reviewing
mechanism, in which external domain experts will review different aspects of ICTM.
    The iCAT-TM platform is currently in production use, and we expect that by 2015,
when the ICD-11 major revision is planned to end, the ICD-11 Chapter 23, containing
a part of ICTM, will be finalized as well. Even after 2015, ICTM will continue to be
developed as an independent classification that will address the needs of the Traditional
Medicine practices around the world.

Acknowledgments
We thank our WHO collaborators and the ICTM project members for developing the project re-
quirements and for the fruitful collaboration. The work presented in this paper is partly supported
by the NIGMS Grant 1R01GM086587.

References
1. SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL) Names-
   pace Document - HTML Variant. http://www.w3.org/TR/skos-reference/
   skos-xl.html. Last accessed: August, 2012.
2. M. Espinoza, E. Montiel-Ponsoda, and A. Gómez-Pérez. Ontology localization. In Proceed-
   ings of the fifth international conference on Knowledge capture, pages 33–40. ACM, 2009.
3. J. McCrae, G. Aguado-de Cea, P. Buitelaar, P. Cimiano, T. Declerck, A. Gomez-Perez, J. Gar-
   cia, L. Hollink, E. Montiel-Ponsoda, D. Spohr, et al. Interchanging lexical resources on the
   semantic web. Language Resources and Evaluation, 2011.
4. E. Montiel-Ponsoda, J. Gracia, G. Aguado-de Cea, and A. Gómez-Pérez. Representing trans-
   lations on the semantic web. MSW 2011, page 25, 2011.
5. M. Silva, A. Soares, and R. Costa. Supporting collaboration in multilingual ontology specifi-
   cation: the conceptme approach. TKE 2012, page 27.
6. T. Tudorache, S. Falconer, C. Nyulas, N. Noy, and M. Musen. Will Semantic Web Technolo-
   gies Work for the Development of ICD-11? In The 9th Intl. Semantic Web Conference (ISWC
   2010), pages 257–272. Springer, 2010.
7. T. Tudorache, C. Nyulas, N. Noy, and M. Musen. Webprotégé: A collaborative ontology editor
   and knowledge acquisition tool for the web. Semantic Web Journal, pages 1–11, 2012.
8. World Health Organization. International Classification of Diseases (ICD). http://www.
   who.int/classifications/icd/. Last accessed: August, 2012.
9. World Health Organization. Traditional Medicine in Health Information Systems: Integrat-
   ing Traditional Medicine into the WHO Family of International Classifications. https:
   //sites.google.com/site/whoictm/home/ICTMProjectPlan.pdf. Last ac-
   cessed: August, 2012.