Jump-starting a Body-of-Knowledge with a
       Semantic Wiki on a Discipline Ontology

              Vı́ctor Codocedo, Claudia López, and Hernán Astudillo

                       Universidad Técnica Federico Santa Marı́a,
                        Avenida España 1680, Valparaı́so. Chile
                      {vcodocedo,clopez,hernan}@inf.utfsm.cl
                                 http://www.usm.cl/


        Abstract. Several communities have engaged recently in assembling a
        Body of Knowledge (BOK) to organize the discipline knowledge for learn-
        ing and sharing. BOK ideally represents the domain, contextualizes as-
        sets (e.g. literature), and exploits the Social Web potential to maintain
        and improve it. Semantic wikis are excellent tools to handle domain (on-
        tological) representations, to relate items, and to enable collaboration.
        Unfortunately, creating a whole BOK (structure, content and relations)
        from scratch may fall prey to the “white page syndrome”1 , given the
        size and complexity of the domain information. This article presents an
        approach to jump-start a BOK, by implementing it as a semantic wiki or-
        ganized around a domain ontology. Domain representation (structure and
        content) are initialized by automatically creating wiki pages for each on-
        tology concept and digital asset; the ontology itself is semi-automatically
        built using natural language processing (NLP) techniques. Contextual-
        ization is initialized by automatically linking concept- and asset-pages.
        The proposal’s feasibility is shown with a prototype for a Software Archi-
        tecture BOK, built from 1,000 articles indexed by a well-known scientific
        digital library and completed by volunteers. The proposed approach sep-
        arates the issues of domain representation, resources contextualization,
        and social elaboration, allowing communities to try on alternate solutions
        for each issue.


Key words: semantic wiki, body of knowledge, automated domain ontology,
digital assets contextualization


1     Introduction

In recent years, several professional and academic communities have undertaken
to organize and systematize their knowledge with a “Body Of Knowledge” (BOK
for short). BOK’s have been created most famously for project management
1
    Colloquial name for writers’ mental block when starting a new piece from scratch
2      Jump-starting a BOK with a stylized semantic wiki

(PMBOK 2 by the PMI 3 ) and for software engineering (SWEBOK 4 5 ), but also
for IT architecture (ITABOK 6 by IASA 7 8 9 ), and other related disciplines.
    Body-of-Knowledge (BOK) requirements typically include representing the
domain, contextualizing resources (e.g. literature), and relying on Social Web
members to maintain and improve it. Semantic wikis are excellent tools to han-
dle domain (ontological) representations, to relate items, and to enable collabo-
ration. Unfortunately, creating a whole BOK (structure, content and relations)
from scratch may easily lead to the “white page syndrome”, given the size and
complexity of the domain information.
    This article presents an approach that differs from most current BOK’s in ex-
ploiting a formal discipline description to maintain the knowledge organization.
It also presents several tools to automate the creation of a domain conceptu-
alization (in concepts of a populated ontology), a semantic wiki to manage the
domain representation and its assets, stylized wiki elements, and a timeline-based
browser to explore the domain.
    The reminder of the article is structured as follows: section 2 summarizes
earlier related work; section 3 introduces the proposed approach for building a
BOK; section 4 explains how the wiki structure, content and linking are ini-
tialized; section 5 describes the ConcepTion tools that implement the proposal;
section 6 suggests some future work; 7 summarizes and concludes.


2     Related Work

Several strands of work are directly related to this approach.


2.1   Semantic Wiki

Semantic Wikis are designed to allow collaborative creation of content using a
fixed syntax and semantics to improve searching and querying. In traditional
wikis it is possible to find basic building blocks to create content (on most wikis
only a set of pages each one with a set of links). Semantic wikis provides an
expanded set of building blocks such as relations, entity types and RDF or OWL
annotations [4].
2
  PMBOK - Project Management Body Of Knowledge: www.pmi.org/Resources/
  Pages/Library-of-PMI-Global-Standards.aspx
3
  PMI - Project Management Institute: www.pmi.org/
4
  SWEBOK - Software Engineering Body of Knowledge: www.computer.org/portal/
  web/swebok
5
  ACM - Association for Computing Machinery: www.acm.org/
6
  ITABOK - IT Architect Body of Knowledge: www.iasahome.org/web/home/
  skillset
7
  IASA - International Association of Software Architects: www.iasahome.org/
8
  EABOK - Enterprise Architecture Body Of Knowledge: www.mitre.org/work/tech_
  papers/tech_papers_04/04_0104/index.html
9
  CBK - Common Body Of Knowledge: www.cissp.com/
                         Jump-starting a BOK with a stylized semantic wiki       3

    Semantic Media Wiki [11] is a semantic wiki implementation that supports
semantic templates creation, allowing to create fixed representations for each
concept of the BOK. Semantic Media Wiki is an extension of the popular Media
Wiki project10 , the platform on which Wikipedia works on. By this reason it
provides a large set of useful extensions like SIMILE Timeline 11 , an interactive
Timeline browser.
    The Kiwi wiki [19] (a EU-funded project) is another semantic wiki imple-
mentation that provides some advanced semantic annotation features, allowing
a deeper granularity of the information (this feature was inherited from its prede-
cessor IkeWiki [18]). It also provides what they call “Content Versatility”, which
are different views over the same content implemented by different applications.
Unfortunately, Kiwi does not provides as many extensions as Semantic Media
Wiki does. By using Kiwi, we think that we will lose some time on building
them.


2.2   Semantic Digital Libraries and Ontology-based Approaches

Angelo di Iorio et al. [9] proposed WikiFactory to automatically create a domain
semantic wiki from a domain ontology. Their work is based on customizing a
semantic wiki from an ontology definition to add the content afterwards.
    Jerome DL [13] is a semantic digital library whose main requirements are:
provide user-oriented browsing features and allow efficient searching using se-
mantic tools. The description of resources is based on Dublin Core12 and FOAF13 .
Unfortunately, this two ontologies are quite simple on their specification. In that
way, documents cannot be contextualized to a domain specific categorization for
searching purposes.
    ScholOnto [20] is a discourse ontology for describing Digital Libraries de-
signed to support searching, tracking and analyzing concepts from academic
perspectives. It is focused on expressing the claims that authors make on their
documents. Although this is an interesting perspective we realize that such an
approach leads to the “white page syndrome” as authors lack on time and mo-
tivation to fill templates with this information.


2.3   Bodies of Knowledge

There is not a single, common structure for all BOK’s:

 – The SWEBOK [22] is organized into ten knowledge areas (KAs): require-
   ments, design, construction, testing, maintenance, configuration manage-
   ment, engineering management, engineering process, engineering tools and
10
   http://www.mediawiki.org
11
   SIMILE: www.simile-widgets.org/timeline/
12
   Dublin Core: www.dublincore.org/
13
   FOAF - Friend of a Friend Project: www.foaf-project.org/
4         Jump-starting a BOK with a stylized semantic wiki

      methods, and quality. The SWEBOK contents were authored under the guid-
      ance, coordination and editing of a committee, originally composed of mem-
      bers of several professional societies; and benefited from systematic revision
      by hundreds of individuals.
    – The PMBOK [17] identifies 44 processes, organized into five process groups
      and nine knowledge areas; the process groups are: Initiating, Planning, Ex-
      ecuting, Controlling and Monitoring, and Closing; and the knowledge areas
      are: Project Integration Management, Project Scope Management, Project
      Time Management, Project Cost Management, Project Quality Manage-
      ment, Project Human Resource Management, Project Communications Man-
      agement, Project Risk Management, and Project Procurement Management.
    – The ITABOK 14 , also called The Aspiring Architect Skills Library, is or-
      ganized around a taxonomy of IT architect skills, proposed by IASA as
      well; the taxonomy categories are: Bussiness Technology Strategy, Design,
      Human Dynamics, Infrastructure, IT Environment, Quality Attributes, and
      Software. The ITABOK holds several articles in each category; topics were
      defined by a Training Committee, and bid on by practitioners.

   Clearly, there are alternative notions of what a BOK is and how it should be
written. But some generalizations can be made:

    – A BOK is not just another textbook (an authoritative view by an individual
      or a committee); if so, it runs the risk of quickly becoming (or being born
      already) obsolete.
    – A BOK can be created from resource collections, but it is more than their
      sum; otherwise, an overall “big picture” does not emerge.

   Although digital assets (e.g. papers, learning objects, Web sites...) are im-
portant, a BOK cannot be just a search engine for assets.


3      Proposal

Building a body of knowledge (BOK) is expensive in human resources and time:
it demands not only defining concepts and relations among them, but also re-
quires a management system capable of support a whole community that will
collaborate to create knowledge and enable inexperienced members of the com-
munity to understand the domain. To simplify and speed-up these requirements,
we propose an ontology-based BOK which is semi-automatically populated from
authoritative documents (such as articles). The BOK is enriched socially us-
ing the wiki, and is presented on a timeline to help better understand topics
evolution in the community.

14
     www.iasahome.org/web/home/skillset
                         Jump-starting a BOK with a stylized semantic wiki       5

3.1   Ontology-based Body of Knowledge

There is a link between ontologies and BOK’s: an ontology is a knowledge rep-
resentation in which concepts are organized in hierarchies and are related to
each other through relations, and a BOK is also a knowledge organization in
which a discipline is presented through definitions of concepts. (REFERENCIA
A MAX VOLKEL). Both ontologies and BOK’s are knowledge organizations,
their difference being for whom they are constructed: ontologies are intended to
be machine-readable whereas BOK’s are intended to be used and understood by
humans. It is not only a format difference that arises here (structured informa-
tion v/s free text).
    Our approach tries to balance the trade-off between representation accuracy
and usability of the organization [1] by maintaining a simple ontology that rep-
resents the Software Architecture discipline. Thus, we benefit from the good
representation given by ontologies and the “good” user experience provided by
BOKs. The ontology is created from authoritative documents, and the BOK pre-
sented to the user is based on a software architecture thesaurus and the manual
organization provided by Software Architects.


3.2   An Ontology for Software Architecture from the Literature

From a very simplistic point of view, the more papers of a given domain a
researcher is able to read, the more understanding he will have of what is hap-
pening with that domain. It should be possible to aid this process by automating
the analysis of publications, using basic Information Extraction [6] techniques
and Concept frequency analysis. Although clearly the process of understanding
a discipline is not yet automatable, current technologies allow to jump-start the
creation of a knowledge model such as an ontology. For this work we used and
extended SKOS ontology 15 to model the Concepts of a domain. We added a new
Class called DigitalAsset that represents a digital artifact that contains explicit
knowledge about a Concept (REFERENCIA A VOLKEL DE NUEVO). The
simplicity of the ontology we chose owes much to the design criteria for Minimal
Ontological Commitment [8].
    The publication full body is not used for analysis since it would require a
much more complex and expensive process for extracting information. Instead,
we analyze publications’ metadata since simple, structured and also freely avail-
able on Internet from Web sites such as DBLP16 , CiteSeer17 or ScienceDirect18 .


15
   SKOS - Simple Knowledge Organization System: www.w3.org/2004/02/skos/
16
   www.dblp.org
17
   www.citeseer.org
18
   www.sciencedirect.org
6         Jump-starting a BOK with a stylized semantic wiki

                             Table 1. Papers per Concept

               no. Concept                 Digital Assets Set Frequency
               1   Architecture Rationale p1,p2,p3,p4,p5,p6,p7      7
               2   Reusability               p0,p2,p4,p5,p6         4


Mining digital assets metadata to extract Concepts The following ex-
cerpt is a typical Bibtex19 entry provided by ScienceDirect20 .

@article{Kazman2005511,
title = "From requirements negotiation to software architecture decisions",
year = "2005",
...
author = "Rick Kazman and Hoh Peter In and Hong-Mei Chen",
keywords = "Requirements negotiation", "Architecture analysis",...
abstract = "Architecture design and requirements..."}

    Three main fields may contain information of the Software Architecture dis-
cipline: keywords, title and abstract. We use keywords as a primary data source,
since it is the simplest information available (tags of no more than 3 words). The
analysis is based on two properties of the keywords:

    – Keyword Frequency: If a keyword is present on several papers (that is, a
      keyword was used to tag several papers) that keyword represent an important
      Concept for the discipline that is being analyzed.
    – Co-occurrence: If a subset of keywords is present on several papers, all the
      keywords in the subset are likely to be related to each other.

     We extended the analysis to the Abstract field, which contains a short text
comprising the main ideas of the content of the document. This text was used
as a search-base for the Keywords (processed with Named Entity Recognition
21
   ).
     This analysis yields a thesaurus with Concepts related to each other but with
no hierarchy among them.


Creating a hierarchy of Concepts Given two Concepts related by co-occurrence
analysis, we would like to know which Concept is broader and which one is
narrower in the discipline, to add semantics to their relation. We proposed to
identify and compare all digital assets associated to the Concepts. Table 1 shows
two Concepts, each with an associated collection of digital assets.
19
   Bibtex is a tool and file format to describe and process references - see www.bibtex.
   org
20
   ScienceDirect: www.sciencedirect.com
21
   Named Entity Recognition is an Information Extraction technique used to identify
   entities on texts
                         Jump-starting a BOK with a stylized semantic wiki       7

    Both Concepts co-occur on 4 different digital assets so we could say that they
are related by co-occurrence. However, an 80% of the digital assets of the Concept
#2 are contained on the set of concept #1, and only a 57% of the digital assets
of concept #1 are in the concept #2 set (we call these percentages co-ocurrence
factors). We can make the simple assumption that 80% of the literature of the
concept Reusability is part of the literature of the concept Architecture Rationale
and thus, Reusability represents something in the subdomain of Architecture
Rationale. Since we cannot know what is this “something” that it represents we
use a shallow relation stating only that Reusability is a narrower concept than
Architecture Rationale (actually, Reusability of design rationale documents is a
major goal of Architecture Rationale).
    Applying this technique to every pair of co-occurrent concepts yields a hi-
erarchy that emerges from the flat thesaurus built by mining the digital assets
metadata. We can choose the minimal co-ocurrence factor to create the “nar-
rower” relation between two concepts. We call this the co-ocurrence filter. Notice
that a concept is not constrained to be narrower of only one concept (Reusability
also is narrower than Non-functional requirement ).

Enriching Keywords with a thesaurus The ontology built is used as a
backbone of the BOK. That means that it should be as complete as possible to
cover all the main aspects of the discipline on research. Nevertheless, using only
the keywords provided by the authors of papers yields some drawbacks:
 – Ambiguous Concepts: Authors often get too creative to tag their documents.
   Ambiguity is a main problem of tagging as author s will tag using their own
   knowledge (different from shared knowledge) (architecture design, architec-
   tural design).
 – Too Generic Concepts: Some Concepts are too generic for the discipline and
   may not appear in the collection of Keywords since they do not represent a
   good tag for categorization. For instance, the word System is never used as
   a Keyword to tag a Software Architecture paper.
 – Too Specific Concepts: Many Keywords are too specific and do not add
   useful information that can be used on the BOK. For example, proper names,
   identificators, etc. These kind of Keywords add noise to the final ontology.
    To overcome these issues, the initial dictionary of concepts to search on ab-
stracts is created over a thesaurus (we use a Software Architecture thesaurus
presented by Fraga et al.[7]). The thesaurus plays a triple role in the process:
 – Using tools such as lemmatization, we can anchor different tags to a single
   concept within the thesaurus ({architecture design, architectural design} ⇒
   {Software Architecture Design}) reducing ambiguity.
 – It adds words that, for being too generic, will not appear as Keywords on
   papers (System is a main concept in the thesaurus).
   Too Specific Concepts need to be managed on a different way. We cannot just
simply ignore all Keywords from papers’ metadata and use only those on the
8      Jump-starting a BOK with a stylized semantic wiki

hand-made thesaurus because we would lose the capacity to discover information
or new trends and topics. Specific concepts that cause noise are avoided by
filtering them by the frequency they have. The idea is simple, the more specific
a concept is, the less frequency it will have. Only concepts that appear in more
than X papers will be used. We called X the frequency filter.


4     Use of Semantic Wiki for a BOK
The configuration of a wiki for the identified metamodel implies creating two
kinds of pages: those representing domain concepts, and those representing dig-
ital assets. Both kinds of pages make use of specialized Infoboxes, allowing a
standardized visual representation of the (concept or asset) attributes. The rela-
tionship between assets and concepts is represented by inter-pages referencing.

4.1   Discipline Exploration: Page per Concept
The ontology is later used on a semantic wiki, where a single wiki page is created
for each concept (a little program in java was used to do such labor). At this
point is necessary to understand that the we are providing a jump-start approach
for the SABOK, but of course, the definitions and contents of this knowledge
representation remains in the hands of the Software Architecture Community.
Of course, some information is provided on the semantic wiki for each concept:
Broader Concepts, Narrower Concepts, Associated Digital Assets, and Topic
Category. According to the properties of the concept, we have created specific
types of topics.
    The semantic wiki allows the community to create and maintain content
collaborative, populating and enriching the SABOK; explaining such tools is
out of the scope of this work. Searching concepts can be done either with the
free-text searching tool provided by the wiki framework, or by browsing the
thesaurus used to build the ontology.

4.2   Resource Contextualization: Page per Asset
Since each concept on the SABOK has several digital assets associated (the same
used to build the ontology), it can be used as a digital asset search tool as well.
The ontology behind the SABOK allow us to use inference on answering queries.
We have identified two inference levels: basic, and based on concepts.

Basic transitivity. Since the concepts are arranged on a hierarchy we can
provide transitivity inference level for digital assets associated on a branch of
concepts. For instance, all digital assets associated to Reusability will be an-
swered to the query “digital assets for Architecture Rationale”.
   As it can be seen, the SABOK besides from organizing the discipline knowl-
edge, provides a searching capability of Digital Assets associated to each concept
based on inference powered by its ontology.
                         Jump-starting a BOK with a stylized semantic wiki        9

4.3     Subject-based Exploration with Timelines

The generated BOK can be browsed with a timeline-based tool, which shows the
evolution of concepts and how they relate to each other. A timeline-based visu-
alization tool can show which concepts concite attention currently. Crosscuting
concepts can be visually identified because they have a constant participation in
the timeline over the years. Users can access the community-created information
of concepts and the wiki itself to edit and manage it.
    The information that tool requires resides as year of publication in the digital
assets information (see section 3.2). In the timeline, the concepts are presented
with the dates of the first and (currently) last publication that use it.
    The timeline can also be used to present Digital Assets evolution around a
concept. This should be really useful for researchers looking for the last publi-
cations according certain subject, for example.
    Finally, new Digital Assets can be added to the SABOK, such as lessons,
presentations, posters, video, etc.


5      Case Study: A Software Architecture BOK

The proposed approach has been implemented in a system named ConcepTion
22
   , composed of three main tools: a Miner, a Hierarchizer, and a Visualizer.
     The approach was validated with a case study for the Software Architecture
domain.


5.1     Software Architecture(s) Descriptions

Several efforts have been carried out to build a vocabulary for Software Ar-
chitecture (SA). However, most of them are not intended to describe the entire
Software Architecture discipline but systems and parts thereof (i.e. the discipline
subject matter, not the discipline itself).
    The SA community has recently focused on describing and recording archi-
tecture knowledge (AK) that supports the architecting process (e.g. adopted and
discarded decisions, rationale, tradeoffs), and several metamodels and ontologies
has been proposed to systematize it (PAKME [2], ADDSS [5], Archium [10],
AREL [21], NDR [16], [12], among others). Also, Liang et al. [15] tackled the
measuring of semantic distance among several proposals to describe AK, and
defined a set of characteristics to categorize all AK concepts.
    Unfortunately, only a couple of articles have proposed a broader descrip-
tion of the entire software architecture discipline. Babu et al. [14] introduced
ArchVoc, the most cited software architecture ontology, which was generated
with combined manual and semi-automatic techniques to identify software ar-
chitecture concepts. The manual technique used the back-of-the-book index of
major software architecture books, and the semi-automatic technique parsed
22
     www.toeska.cl/conception/wiki/
10     Jump-starting a BOK with a stylized semantic wiki

architecture-related Wikipedia23 pages. The first approach yield 480 concepts,
and the second one, 1650 concepts; they were organized into 9 overall categories,
which were also sorted according to architecting phases.
    Fraga et al. [7] also employed both an automatic and a manual technique
to generate a software architecture thesaurus. The corpus of both generation
techniques were the back-of-the-book index of major software architecture books
(in 2005). The manual process yield a 500-concept thesaurus, and the automatic
technique generated a 1200-concept thesaurus. Both thesauri were combined
yielding 27 top-level concepts.
    Although these two thesauri are good vocabularies to classify existing SA
knowledge, there are several challenges that have not been already tackled in
building a software architecture discipline vocabulary:

 – Both thesauri have been manually manipulated to better classify SA knowl-
   edge, so their hierarchies and relationships are usually very influenced by ex-
   isting conceptual frameworks present in the discipline. This aspect certainly
   helps to create good thesauri for information search, but it usually hampers
   its ability to describe real connections among concepts. For example, they
   group “fault-tolerance”, “performance” and “usability” into a single cate-
   gory (“Quality Requirements” or “Non-Functional Requirements”), but in
   practice all three concepts are rarely present in the same article; indeed, most
   papers (and communities) focus on only one of them. Also, “fault-tolerance”
   is more frequently related to “validation” and “formal methods” than to any
   other quality requirement.
 – The starting corpus of these thesauri did not include published research
   or industry articles, either; they used back-of-the-book indices and/or SA-
   related Wikipedia pages. This corpus selection reduces the vocabulary scope
   to those topics already published in books, omitting new trending topics
   or novel techniques that might be being discussed in major refereed SA
   conferences or journals. For example, none of these thesauri mention “design
   rationale” or “software architecture rationale”, both dealt with in several
   recent mainstream articles.


5.2   Mining

The ontology was populated using 1,000 Bibtex files (including abstract) re-
turned by ScienceDirect 24 for the “Software Architecture” search concept. Ex-
tracted metadata was stored in RDF 25 . Table 2 shows some statistics generated
by the Miner.
   Over 10% of the articles do not have an abstract in their Bibtex file, so we can
only rely on the keywords that the authors used to tag them. Interestingly, only
23
   Wikipedia: www.wikipedia.org
24
   ScienceDirect: www.sciencedirect.com
25
   RDF: Resource Description Framework, the industry standard to store Semantic
   Information; see www.w3.org/RDF/.
                         Jump-starting a BOK with a stylized semantic wiki      11

                    Table 2. Statistics from ConcepTion Miner

                      Name                             Value
                      Quantity of Papers               1000
                      Quantity of Papers with Abstract 886
                      Quantity of unique Concepts      2203
                      Concepts over 50%                 47


47 tags account for more than the 50% of the matches produced by comparing
searching dictionary concepts in abstracts. These are the most important and
which we focused on.


5.3   Hierarchizer

The Hierarchizer compares every pair of concepts and calculates a co-occurrence
factor between them, (see section 3.2). We can lower the co-ocurrence filter to
find more relations among concepts, but of course, the lower it is, the more false
positives we will find. We have found empirically that a co-ocurrence filter of
80% is appropriate to discover new relations and maintain false positives on a
low level.
    The co-occurrence filter and frequency filter (see section 3.2) are the two
parameters that can be used to adjust the quality of the hierarchy obtained, and
thus, the ontology’s instances.
    After creating the hierarchy, it can be visualizated with Graphviz26 to draw
the concepts and their relations, allowing Software Architecture experts to audit
it and manually filter false-positives. Some samples of hierarchies can be found
on Toeska’s Website27 .


5.4   The prototype SABOK

The prototype SABOK was implemented using the semantic wiki platform Se-
mantic Media Wiki 28 (SMW). A simple ad-hoc tool adds a wiki page for each
concept in the hierarchy.
    A timeline browser was also built with the MIT SIMILE Timeline29 allowing
to use HTML and JavaScript to use XML data, namely, a Knowledge Base with
the ontology created.
    Figure 1 shows a screenshot of the prototype SABOK. The evolution of the
Concept Architecture is shown. Each line represents a narrower Concept dis-
played from the year of the first paper published with this Concept to the last
paper. Figure 2 shows the wiki page for the Concept Reusability. Along with
26
   www.graphviz.org/
27
   Toeska Research Group, Universidad Técnica Federico Santa Marı́a: www.toeska.cl
28
   www.semantic-mediawiki.org
29
   SIMILE Project: http://simile.mit.edu/timeline/
12     Jump-starting a BOK with a stylized semantic wiki


Fig. 1. Screenshot of ConcepTion SABOK Timeline - Architecture Concept Evolution


the information of broader concepts and narrower Concepts a Timeline of the
publications using this Concept is provided. The Timeline is fully interactive
and allow user to browse research papers. Figure 3 shows two infoboxes: Digital
Asset and Concept. Digital Asset’s infobox displays useful information such as
title, author and Concepts used on this paper. It also provides information of
inferenced Concepts related to the paper. Concept’s infobox the upper and lower
concepts in the hierarchy. It also displays inferred concepts and Digital Assets
associated to the concept.


6    Further Work
Along with adding more advanced NLP tools and adding more papers to the
analysis to improve our hierarchy, we believe that there are two topics that
could add a lot of value to the SABOK presented.
 – Cluster Analysis: Through cluster analysis we can understand better which
   are the areas the discipline is divided into. Also, it should be possible to
   acknowledge some useful intersections of areas and define them as different
   elements in the ontology to improve searching capability. We think that
   using Formal Concept Analysis tools would allow us to find this clusters
   of information by identifying classes of concepts as shown on PACTOLE
   methodology [3].
 – Emerging Topics Tracking: With our approach is possible to find which are
   the most newer topics in the discipline and how they are related to each
   other. However, that does not mean that these are emerging topics. We think
   that emerging topics have a low frequency and thus, they will not emerge
   on our hierarchy. Besides that, we think that emerging topics appears on
   publications with a high impact factor and that’s how we think that they
                         Jump-starting a BOK with a stylized semantic wiki      13


                  Fig. 2. ConcepTion wiki - Concept Reusability


    should be identified. Though, that kind of information is not available on
    bibtex files and should be obtained on a different way.

   Although we think the best validation for our SABOK should be made by the
community, we are planning on making validation tests with Software Architects
and Software Engineering students in the following months.


7   Conclusions

This article has presented a novel method to jump-start the creation of an
ontology-based Body of Knowledge (BOK).
    Using authorative documents from a community, we can mine and extract
information about a discipline to hierarchize it and create an ontology. The ontol-
ogy is used to organize the BOK and search Digital Assets (research publications
in our example) using inference. The resulting BOK provides contextualization
allowing document discovering and search inference.
    The ConcepTion set of tools allows to extract, mine, hierarchize and display
a BOK using a semantic wiki to manage information and a timeline tool to show
evolution of topics in the discipline. The community is then asked to feed the
BOK with definitions and their own Digital Assets. Future work will be focused
on improving the quality of the resulting BOK and adding more features.


References

 1. H. Astudillo. Maximizing object reuse with a biological metaphor.     TAPOS,
    3(4):235–251, 1997.
14      Jump-starting a BOK with a stylized semantic wiki


                     Fig. 3. Screenshot of ConcepTion Infoboxes


 2. M. A. Babar, I. Gorton, and B. Kitchenham. Rationale management in software
    engineering. In A Framework for Supporting Architecture Knowledge and Rationale
    Management, pages 237–254. Springer Berlin Heidelberg, 2007.
 3. R. Bendaoud, Y. Toussaint, and A. Napoli. Pactole: A methodology and a system
    for semi-automatically enriching an ontology from a collection of texts. 5113:203–
    216, 2008.
 4. F. Bry, M. Eckert, J. Kotowski, and K. A. Weiand. What the user interacts with:
    Reflections on conceptual models for semantic wikis. In C. L. 0002, S. Schaffert,
    H. Skaf-Molli, and M. Völkel, editors, SemWiki, volume 464 of CEUR Workshop
    Proceedings. CEUR-WS.org, 2009.
 5. R. Capilla, F. Nava, S. Pérez, and J. C. Dueñas. A web-based tool for managing
    architectural design decisions. SIGSOFT Softw. Eng. Notes, 31(5):4, 2006.
 6. H. Cunningham. Encyclopedia of Language and Linguistics, chapter Information
    Extraction, Automatic, pages 665–677. 2nd edition, 2005.
 7. A. Fraga, S. Sánchez-Cuadrado, J. Lloréns, and H. Astudillo. Knowledge represen-
    tation for software architecture domain by manual and automatic methodologies.
    CLEI Electron. J., 9(1), 2006.
 8. T. R. Gruber. Toward principles for the design of ontologies used for knowledge
    sharing. Int. J. Hum.-Comput. Stud., 43(5-6):907–928, 1995.
 9. A. D. Iorio, V. Presutti, and F. Vitali. Wikifactory: An ontology-based application
    for creating domain-oriented wikis. In Y. Sure and J. Domingue, editors, ESWC,
    volume 4011 of Lecture Notes in Computer Science, pages 664–678. Springer, 2006.
10. A. Jansen, J. van der Ven, P. Avgeriou, and D. K. Hammer. Tool support for ar-
    chitectural decisions. In WICSA ’07: Proceedings of the Sixth Working IEEE/IFIP
                         Jump-starting a BOK with a stylized semantic wiki         15

    Conference on Software Architecture, page 4, Washington, DC, USA, 2007. IEEE
    Computer Society.
11. M. Krötzsch, D. Vrandecic, M. Völkel, H. Haller, and R. Studer. Semantic
    wikipedia. J. Web Sem., 5(4):251–261, 2007.
12. P. Kruchten, P. Lago, and H. van Vliet. Building up and exploiting architectural
    knowledge. In QoSA’05: Second International Conference on Quality of Software
    Architectures, pages 43–58. Springer Berlin / Heidelberg, 2006.
13. S. R. Kruk, M. Cygan, A. Gzella, T. Woroniecki, and M. Dabrowski. Jeromedl: The
    social semantic digital library. In S. R. Kruk and B. McDaniel, editors, Semantic
    Digital Libraries, pages 139–150. Springer, 2009.
14. B. T. Lenin, S. R. M., P. T. V., and R. D. ArchVoc-Towards an ontology for
    software architecture. In SHARK-ADI ’07: Proceedings of the Second Workshop on
    SHAring and Reusing architectural Knowledge Architecture, Rationale, and Design
    Intent, page 5, Washington, DC, USA, 2007. IEEE Computer Society.
15. P. Liang, A. Jansen, and P. Avgeriou. Selecting a high-quality central model for
    sharing architectural knowledge. Quality Software, International Conference on,
    0:357–365, 2008.
16. C. López, P. Inostroza, L. M. Cysneiros, and H. Astudillo. Visualization and
    comparison of architecture rationale with semantic web technologies. Journal of
    Systems and Software, 82(8):1198–1210, 2009.
17. Project Management Institute. A Guide to the Project Management Body of
    Knowledge (PMBOK Guide) - Third Edition, Paperback. Project Management
    Institute, 2004.
18. S. Schaffert. Ikewiki: A semantic wiki for collaborative knowledge management.
    In WETICE ’06: Proceedings of the 15th IEEE International Workshops on En-
    abling Technologies: Infrastructure for Collaborative Enterprises, pages 388–396,
    Washington, DC, USA, 2006. IEEE Computer Society.
19. S. Schaffert, J. Eder, S. Grünwald, T. Kurz, and M. Radulescu. Kiwi — a platform
    for semantic social software (demonstration). In ESWC 2009 Heraklion: Proceed-
    ings of the 6th European Semantic Web Conference on The Semantic Web, pages
    888–892, Berlin, Heidelberg, 2009. Springer-Verlag.
20. S. B. Shum, E. Motta, and J. Domingue. Scholonto: an ontology-based digital
    library server for research documents and discourse. Int. J. on Digital Libraries,
    3(3):237–248, 2000.
21. A. Tang, Y. Jin, and J. Han. A rationale-based architecture model for design
    traceability and reasoning. J. Syst. Softw., 80(6):918–934, 2007.
22. L. L. Tripp. Guide to the Software Engineering Body of Knowledge: 2004 Version.
    2005.