<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Jump-starting a Body-of-Knowledge with a Semantic Wiki on a Discipline Ontology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>V´ıctor Codocedo</string-name>
          <email>vcodocedo@inf.utfsm.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Lo´pez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hern´an Astudillo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad T ́ecnica Federico Santa Mar ́ıa, Avenida Espan ̃a 1680</institution>
          ,
          <addr-line>Valpara ́ıso.</addr-line>
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Several communities have engaged recently in assembling a Body of Knowledge (BOK) to organize the discipline knowledge for learning and sharing. BOK ideally represents the domain, contextualizes assets (e.g. literature), and exploits the Social Web potential to maintain and improve it. Semantic wikis are excellent tools to handle domain (ontological) representations, to relate items, and to enable collaboration. Unfortunately, creating a whole BOK (structure, content and relations) from scratch may fall prey to the “white page syndrome”1, given the size and complexity of the domain information. This article presents an approach to jump-start a BOK, by implementing it as a semantic wiki organized around a domain ontology. Domain representation (structure and content) are initialized by automatically creating wiki pages for each ontology concept and digital asset; the ontology itself is semi-automatically built using natural language processing (NLP) techniques. Contextualization is initialized by automatically linking concept- and asset-pages. The proposal's feasibility is shown with a prototype for a Software Architecture BOK, built from 1,000 articles indexed by a well-known scientific digital library and completed by volunteers. The proposed approach separates the issues of domain representation, resources contextualization, and social elaboration, allowing communities to try on alternate solutions for each issue.</p>
      </abstract>
      <kwd-group>
        <kwd>semantic wiki</kwd>
        <kwd>body of knowledge</kwd>
        <kwd>automated domain ontology</kwd>
        <kwd>digital assets contextualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years, several professional and academic communities have undertaken
to organize and systematize their knowledge with a “Body Of Knowledge” (BOK
for short). BOK’s have been created most famously for project management
1 Colloquial name for writers’ mental block when starting a new piece from scratch
(PMBOK 2 by the PMI 3) and for software engineering (SWEBOK 4 5), but also
for IT architecture (ITABOK 6 by IASA 7 8 9 ), and other related disciplines.</p>
      <p>Body-of-Knowledge (BOK) requirements typically include representing the
domain, contextualizing resources (e.g. literature), and relying on Social Web
members to maintain and improve it. Semantic wikis are excellent tools to
handle domain (ontological) representations, to relate items, and to enable
collaboration. Unfortunately, creating a whole BOK (structure, content and relations)
from scratch may easily lead to the “white page syndrome”, given the size and
complexity of the domain information.</p>
      <p>This article presents an approach that differs from most current BOK’s in
exploiting a formal discipline description to maintain the knowledge organization.
It also presents several tools to automate the creation of a domain
conceptualization (in concepts of a populated ontology), a semantic wiki to manage the
domain representation and its assets, stylized wiki elements, and a timeline-based
browser to explore the domain.</p>
      <p>The reminder of the article is structured as follows: section 2 summarizes
earlier related work; section 3 introduces the proposed approach for building a
BOK; section 4 explains how the wiki structure, content and linking are
initialized; section 5 describes the ConcepTion tools that implement the proposal;
section 6 suggests some future work; 7 summarizes and concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Semantic Wiki</title>
        <p>Several strands of work are directly related to this approach.</p>
        <p>
          Semantic Wikis are designed to allow collaborative creation of content using a
fixed syntax and semantics to improve searching and querying. In traditional
wikis it is possible to find basic building blocks to create content (on most wikis
only a set of pages each one with a set of links). Semantic wikis provides an
expanded set of building blocks such as relations, entity types and RDF or OWL
annotations [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
2 PMBOK - Project Management Body Of Knowledge: www.pmi.org/Resources/
        </p>
        <p>Pages/Library-of-PMI-Global-Standards.aspx
3 PMI - Project Management Institute: www.pmi.org/
4 SWEBOK - Software Engineering Body of Knowledge: www.computer.org/portal/
web/swebok
5 ACM - Association for Computing Machinery: www.acm.org/
6 ITABOK - IT Architect Body of Knowledge: www.iasahome.org/web/home/
skillset
7 IASA - International Association of Software Architects: www.iasahome.org/
8 EABOK - Enterprise Architecture Body Of Knowledge: www.mitre.org/work/tech_
papers/tech_papers_04/04_0104/index.html
9 CBK - Common Body Of Knowledge: www.cissp.com/</p>
        <p>
          Semantic Media Wiki [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a semantic wiki implementation that supports
semantic templates creation, allowing to create fixed representations for each
concept of the BOK. Semantic Media Wiki is an extension of the popular Media
Wiki project10, the platform on which Wikipedia works on. By this reason it
provides a large set of useful extensions like SIMILE Timeline 11, an interactive
Timeline browser.
        </p>
        <p>
          The Kiwi wiki [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (a EU-funded project) is another semantic wiki
implementation that provides some advanced semantic annotation features, allowing
a deeper granularity of the information (this feature was inherited from its
predecessor IkeWiki [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]). It also provides what they call “Content Versatility”, which
are different views over the same content implemented by different applications.
Unfortunately, Kiwi does not provides as many extensions as Semantic Media
Wiki does. By using Kiwi, we think that we will lose some time on building
them.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Semantic Digital Libraries and Ontology-based Approaches</title>
        <p>
          Angelo di Iorio et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] proposed WikiFactory to automatically create a domain
semantic wiki from a domain ontology. Their work is based on customizing a
semantic wiki from an ontology definition to add the content afterwards.
        </p>
        <p>
          Jerome DL [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is a semantic digital library whose main requirements are:
provide user-oriented browsing features and allow efficient searching using
semantic tools. The description of resources is based on Dublin Core12and FOAF13.
Unfortunately, this two ontologies are quite simple on their specification. In that
way, documents cannot be contextualized to a domain specific categorization for
searching purposes.
        </p>
        <p>
          ScholOnto [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is a discourse ontology for describing Digital Libraries
designed to support searching, tracking and analyzing concepts from academic
perspectives. It is focused on expressing the claims that authors make on their
documents. Although this is an interesting perspective we realize that such an
approach leads to the “white page syndrome” as authors lack on time and
motivation to fill templates with this information.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Bodies of Knowledge</title>
        <p>
          There is not a single, common structure for all BOK’s:
– The SWEBOK [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] is organized into ten knowledge areas (KAs):
requirements, design, construction, testing, maintenance, configuration
management, engineering management, engineering process, engineering tools and
10 http://www.mediawiki.org
11 SIMILE: www.simile-widgets.org/timeline/
12 Dublin Core: www.dublincore.org/
13 FOAF - Friend of a Friend Project: www.foaf-project.org/
methods, and quality. The SWEBOK contents were authored under the
guidance, coordination and editing of a committee, originally composed of
members of several professional societies; and benefited from systematic revision
by hundreds of individuals.
– The PMBOK [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] identifies 44 processes, organized into five process groups
and nine knowledge areas; the process groups are: Initiating, Planning,
Executing, Controlling and Monitoring, and Closing; and the knowledge areas
are: Project Integration Management, Project Scope Management, Project
Time Management, Project Cost Management, Project Quality
Management, Project Human Resource Management, Project Communications
Management, Project Risk Management, and Project Procurement Management.
– The ITABOK 14, also called The Aspiring Architect Skills Library, is
organized around a taxonomy of IT architect skills, proposed by IASA as
well; the taxonomy categories are: Bussiness Technology Strategy, Design,
Human Dynamics, Infrastructure, IT Environment, Quality Attributes, and
Software. The ITABOK holds several articles in each category; topics were
defined by a Training Committee, and bid on by practitioners.
        </p>
        <p>Clearly, there are alternative notions of what a BOK is and how it should be
written. But some generalizations can be made:
– A BOK is not just another textbook (an authoritative view by an individual
or a committee); if so, it runs the risk of quickly becoming (or being born
already) obsolete.
– A BOK can be created from resource collections, but it is more than their
sum; otherwise, an overall “big picture” does not emerge.</p>
        <p>Although digital assets (e.g. papers, learning objects, Web sites...) are
important, a BOK cannot be just a search engine for assets.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposal</title>
      <p>Building a body of knowledge (BOK) is expensive in human resources and time:
it demands not only defining concepts and relations among them, but also
requires a management system capable of support a whole community that will
collaborate to create knowledge and enable inexperienced members of the
community to understand the domain. To simplify and speed-up these requirements,
we propose an ontology-based BOK which is semi-automatically populated from
authoritative documents (such as articles). The BOK is enriched socially
using the wiki, and is presented on a timeline to help better understand topics
evolution in the community.
14 www.iasahome.org/web/home/skillset</p>
      <sec id="sec-3-1">
        <title>Ontology-based Body of Knowledge</title>
        <p>There is a link between ontologies and BOK’s: an ontology is a knowledge
representation in which concepts are organized in hierarchies and are related to
each other through relations, and a BOK is also a knowledge organization in
which a discipline is presented through definitions of concepts. (REFERENCIA
A MAX VOLKEL). Both ontologies and BOK’s are knowledge organizations,
their difference being for whom they are constructed: ontologies are intended to
be machine-readable whereas BOK’s are intended to be used and understood by
humans. It is not only a format difference that arises here (structured
information v/s free text).</p>
        <p>
          Our approach tries to balance the trade-off between representation accuracy
and usability of the organization [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] by maintaining a simple ontology that
represents the Software Architecture discipline. Thus, we benefit from the good
representation given by ontologies and the “good” user experience provided by
BOKs. The ontology is created from authoritative documents, and the BOK
presented to the user is based on a software architecture thesaurus and the manual
organization provided by Software Architects.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>An Ontology for Software Architecture from the Literature</title>
        <p>
          From a very simplistic point of view, the more papers of a given domain a
researcher is able to read, the more understanding he will have of what is
happening with that domain. It should be possible to aid this process by automating
the analysis of publications, using basic Information Extraction [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] techniques
and Concept frequency analysis. Although clearly the process of understanding
a discipline is not yet automatable, current technologies allow to jump-start the
creation of a knowledge model such as an ontology. For this work we used and
extended SKOS ontology 15 to model the Concepts of a domain. We added a new
Class called DigitalAsset that represents a digital artifact that contains explicit
knowledge about a Concept (REFERENCIA A VOLKEL DE NUEVO). The
simplicity of the ontology we chose owes much to the design criteria for Minimal
Ontological Commitment [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>The publication full body is not used for analysis since it would require a
much more complex and expensive process for extracting information. Instead,
we analyze publications’ metadata since simple, structured and also freely
available on Internet from Web sites such as DBLP16, CiteSeer17 or ScienceDirect18.
Mining digital assets metadata to extract Concepts The following
excerpt is a typical Bibtex19entry provided by ScienceDirect20.
@article{Kazman2005511,
title = "From requirements negotiation to software architecture decisions",
year = "2005",
...
author = "Rick Kazman and Hoh Peter In and Hong-Mei Chen",
keywords = "Requirements negotiation", "Architecture analysis",...
abstract = "Architecture design and requirements..."}</p>
        <p>Three main fields may contain information of the Software Architecture
discipline: keywords, title and abstract. We use keywords as a primary data source,
since it is the simplest information available (tags of no more than 3 words). The
analysis is based on two properties of the keywords:
– Keyword Frequency: If a keyword is present on several papers (that is, a
keyword was used to tag several papers) that keyword represent an important
Concept for the discipline that is being analyzed.
– Co-occurrence: If a subset of keywords is present on several papers, all the
keywords in the subset are likely to be related to each other.</p>
        <p>We extended the analysis to the Abstract field, which contains a short text
comprising the main ideas of the content of the document. This text was used
as a search-base for the Keywords (processed with Named Entity Recognition
21).</p>
        <p>This analysis yields a thesaurus with Concepts related to each other but with
no hierarchy among them.</p>
        <p>Creating a hierarchy of Concepts Given two Concepts related by co-occurrence
analysis, we would like to know which Concept is broader and which one is
narrower in the discipline, to add semantics to their relation. We proposed to
identify and compare all digital assets associated to the Concepts. Table 1 shows
two Concepts, each with an associated collection of digital assets.
19 Bibtex is a tool and file format to describe and process references - see www.bibtex.</p>
        <p>org
20 ScienceDirect: www.sciencedirect.com
21 Named Entity Recognition is an Information Extraction technique used to identify
entities on texts</p>
        <p>Both Concepts co-occur on 4 different digital assets so we could say that they
are related by co-occurrence. However, an 80% of the digital assets of the Concept
#2 are contained on the set of concept #1, and only a 57% of the digital assets
of concept #1 are in the concept #2 set (we call these percentages co-ocurrence
factors). We can make the simple assumption that 80% of the literature of the
concept Reusability is part of the literature of the concept Architecture Rationale
and thus, Reusability represents something in the subdomain of Architecture
Rationale. Since we cannot know what is this “something” that it represents we
use a shallow relation stating only that Reusability is a narrower concept than
Architecture Rationale (actually, Reusability of design rationale documents is a
major goal of Architecture Rationale).</p>
        <p>Applying this technique to every pair of co-occurrent concepts yields a
hierarchy that emerges from the flat thesaurus built by mining the digital assets
metadata. We can choose the minimal co-ocurrence factor to create the
“narrower” relation between two concepts. We call this the co-ocurrence filter. Notice
that a concept is not constrained to be narrower of only one concept (Reusability
also is narrower than Non-functional requirement ).</p>
      </sec>
      <sec id="sec-3-3">
        <title>Enriching Keywords with a thesaurus The ontology built is used as a</title>
        <p>backbone of the BOK. That means that it should be as complete as possible to
cover all the main aspects of the discipline on research. Nevertheless, using only
the keywords provided by the authors of papers yields some drawbacks:
– Ambiguous Concepts: Authors often get too creative to tag their documents.</p>
        <p>Ambiguity is a main problem of tagging as author s will tag using their own
knowledge (different from shared knowledge) (architecture design,
architectural design).
– Too Generic Concepts: Some Concepts are too generic for the discipline and
may not appear in the collection of Keywords since they do not represent a
good tag for categorization. For instance, the word System is never used as
a Keyword to tag a Software Architecture paper.
– Too Specific Concepts: Many Keywords are too specific and do not add
useful information that can be used on the BOK. For example, proper names,
identificators, etc. These kind of Keywords add noise to the final ontology.</p>
        <p>
          To overcome these issues, the initial dictionary of concepts to search on
abstracts is created over a thesaurus (we use a Software Architecture thesaurus
presented by Fraga et al.[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). The thesaurus plays a triple role in the process:
– Using tools such as lemmatization, we can anchor different tags to a single
concept within the thesaurus ({architecture design, architectural design} ⇒
{Software Architecture Design}) reducing ambiguity.
– It adds words that, for being too generic, will not appear as Keywords on
papers (System is a main concept in the thesaurus).
        </p>
        <p>Too Specific Concepts need to be managed on a different way. We cannot just
simply ignore all Keywords from papers’ metadata and use only those on the
hand-made thesaurus because we would lose the capacity to discover information
or new trends and topics. Specific concepts that cause noise are avoided by
filtering them by the frequency they have. The idea is simple, the more specific
a concept is, the less frequency it will have. Only concepts that appear in more
than X papers will be used. We called X the frequency filter.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Use of Semantic Wiki for a BOK</title>
      <p>The configuration of a wiki for the identified metamodel implies creating two
kinds of pages: those representing domain concepts, and those representing
digital assets. Both kinds of pages make use of specialized Infoboxes, allowing a
standardized visual representation of the (concept or asset) attributes. The
relationship between assets and concepts is represented by inter-pages referencing.
4.1</p>
      <sec id="sec-4-1">
        <title>Discipline Exploration: Page per Concept</title>
        <p>The ontology is later used on a semantic wiki, where a single wiki page is created
for each concept (a little program in java was used to do such labor). At this
point is necessary to understand that the we are providing a jump-start approach
for the SABOK, but of course, the definitions and contents of this knowledge
representation remains in the hands of the Software Architecture Community.
Of course, some information is provided on the semantic wiki for each concept:
Broader Concepts, Narrower Concepts, Associated Digital Assets, and Topic
Category. According to the properties of the concept, we have created specific
types of topics.</p>
        <p>The semantic wiki allows the community to create and maintain content
collaborative, populating and enriching the SABOK; explaining such tools is
out of the scope of this work. Searching concepts can be done either with the
free-text searching tool provided by the wiki framework, or by browsing the
thesaurus used to build the ontology.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Resource Contextualization: Page per Asset</title>
        <p>Since each concept on the SABOK has several digital assets associated (the same
used to build the ontology), it can be used as a digital asset search tool as well.
The ontology behind the SABOK allow us to use inference on answering queries.
We have identified two inference levels: basic, and based on concepts.
Basic transitivity. Since the concepts are arranged on a hierarchy we can
provide transitivity inference level for digital assets associated on a branch of
concepts. For instance, all digital assets associated to Reusability will be
answered to the query “digital assets for Architecture Rationale”.</p>
        <p>As it can be seen, the SABOK besides from organizing the discipline
knowledge, provides a searching capability of Digital Assets associated to each concept
based on inference powered by its ontology.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Subject-based Exploration with Timelines</title>
        <p>The generated BOK can be browsed with a timeline-based tool, which shows the
evolution of concepts and how they relate to each other. A timeline-based
visualization tool can show which concepts concite attention currently. Crosscuting
concepts can be visually identified because they have a constant participation in
the timeline over the years. Users can access the community-created information
of concepts and the wiki itself to edit and manage it.</p>
        <p>The information that tool requires resides as year of publication in the digital
assets information (see section 3.2). In the timeline, the concepts are presented
with the dates of the first and (currently) last publication that use it.</p>
        <p>The timeline can also be used to present Digital Assets evolution around a
concept. This should be really useful for researchers looking for the last
publications according certain subject, for example.</p>
        <p>Finally, new Digital Assets can be added to the SABOK, such as lessons,
presentations, posters, video, etc.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Case Study: A Software Architecture BOK</title>
      <p>The proposed approach has been implemented in a system named ConcepTion
22, composed of three main tools: a Miner, a Hierarchizer, and a Visualizer.</p>
      <p>The approach was validated with a case study for the Software Architecture
domain.
5.1</p>
      <sec id="sec-5-1">
        <title>Software Architecture(s) Descriptions</title>
        <p>Several efforts have been carried out to build a vocabulary for Software
Architecture (SA). However, most of them are not intended to describe the entire
Software Architecture discipline but systems and parts thereof (i.e. the discipline
subject matter, not the discipline itself).</p>
        <p>
          The SA community has recently focused on describing and recording
architecture knowledge (AK) that supports the architecting process (e.g. adopted and
discarded decisions, rationale, tradeoffs), and several metamodels and ontologies
has been proposed to systematize it (PAKME [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], ADDSS [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Archium [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
AREL [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], NDR [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], among others). Also, Liang et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] tackled the
measuring of semantic distance among several proposals to describe AK, and
defined a set of characteristics to categorize all AK concepts.
        </p>
        <p>
          Unfortunately, only a couple of articles have proposed a broader
description of the entire software architecture discipline. Babu et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] introduced
ArchVoc, the most cited software architecture ontology, which was generated
with combined manual and semi-automatic techniques to identify software
architecture concepts. The manual technique used the back-of-the-book index of
major software architecture books, and the semi-automatic technique parsed
22 www.toeska.cl/conception/wiki/
architecture-related Wikipedia23pages. The first approach yield 480 concepts,
and the second one, 1650 concepts; they were organized into 9 overall categories,
which were also sorted according to architecting phases.
        </p>
        <p>
          Fraga et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] also employed both an automatic and a manual technique
to generate a software architecture thesaurus. The corpus of both generation
techniques were the back-of-the-book index of major software architecture books
(in 2005). The manual process yield a 500-concept thesaurus, and the automatic
technique generated a 1200-concept thesaurus. Both thesauri were combined
yielding 27 top-level concepts.
        </p>
        <p>Although these two thesauri are good vocabularies to classify existing SA
knowledge, there are several challenges that have not been already tackled in
building a software architecture discipline vocabulary:
– Both thesauri have been manually manipulated to better classify SA
knowledge, so their hierarchies and relationships are usually very influenced by
existing conceptual frameworks present in the discipline. This aspect certainly
helps to create good thesauri for information search, but it usually hampers
its ability to describe real connections among concepts. For example, they
group “fault-tolerance”, “performance” and “usability” into a single
category (“Quality Requirements” or “Non-Functional Requirements”), but in
practice all three concepts are rarely present in the same article; indeed, most
papers (and communities) focus on only one of them. Also, “fault-tolerance”
is more frequently related to “validation” and “formal methods” than to any
other quality requirement.
– The starting corpus of these thesauri did not include published research
or industry articles, either; they used back-of-the-book indices and/or
SArelated Wikipedia pages. This corpus selection reduces the vocabulary scope
to those topics already published in books, omitting new trending topics
or novel techniques that might be being discussed in major refereed SA
conferences or journals. For example, none of these thesauri mention “design
rationale” or “software architecture rationale”, both dealt with in several
recent mainstream articles.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Mining</title>
        <p>The ontology was populated using 1,000 Bibtex files (including abstract)
returned by ScienceDirect 24 for the “Software Architecture” search concept.
Extracted metadata was stored in RDF 25. Table 2 shows some statistics generated
by the Miner.</p>
        <p>Over 10% of the articles do not have an abstract in their Bibtex file, so we can
only rely on the keywords that the authors used to tag them. Interestingly, only
23 Wikipedia: www.wikipedia.org
24 ScienceDirect: www.sciencedirect.com
25 RDF: Resource Description Framework, the industry standard to store Semantic
Information; see www.w3.org/RDF/.
47 tags account for more than the 50% of the matches produced by comparing
searching dictionary concepts in abstracts. These are the most important and
which we focused on.
The Hierarchizer compares every pair of concepts and calculates a co-occurrence
factor between them, (see section 3.2). We can lower the co-ocurrence filter to
find more relations among concepts, but of course, the lower it is, the more false
positives we will find. We have found empirically that a co-ocurrence filter of
80% is appropriate to discover new relations and maintain false positives on a
low level.</p>
        <p>The co-occurrence filter and frequency filter (see section 3.2) are the two
parameters that can be used to adjust the quality of the hierarchy obtained, and
thus, the ontology’s instances.</p>
        <p>After creating the hierarchy, it can be visualizated with Graphviz26to draw
the concepts and their relations, allowing Software Architecture experts to audit
it and manually filter false-positives. Some samples of hierarchies can be found
on Toeska’s Website27.
The prototype SABOK was implemented using the semantic wiki platform
Semantic Media Wiki 28 (SMW). A simple ad-hoc tool adds a wiki page for each
concept in the hierarchy.</p>
        <p>A timeline browser was also built with the MIT SIMILE Timeline29allowing
to use HTML and JavaScript to use XML data, namely, a Knowledge Base with
the ontology created.</p>
        <p>Figure 1 shows a screenshot of the prototype SABOK. The evolution of the
Concept Architecture is shown. Each line represents a narrower Concept
displayed from the year of the first paper published with this Concept to the last
paper. Figure 2 shows the wiki page for the Concept Reusability. Along with
26 www.graphviz.org/
27 Toeska Research Group, Universidad T´ecnica Federico Santa Mar´ıa: www.toeska.cl
28 www.semantic-mediawiki.org
29 SIMILE Project: http://simile.mit.edu/timeline/
the information of broader concepts and narrower Concepts a Timeline of the
publications using this Concept is provided. The Timeline is fully interactive
and allow user to browse research papers. Figure 3 shows two infoboxes: Digital
Asset and Concept. Digital Asset’s infobox displays useful information such as
title, author and Concepts used on this paper. It also provides information of
inferenced Concepts related to the paper. Concept’s infobox the upper and lower
concepts in the hierarchy. It also displays inferred concepts and Digital Assets
associated to the concept.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Further Work</title>
      <p>Along with adding more advanced NLP tools and adding more papers to the
analysis to improve our hierarchy, we believe that there are two topics that
could add a lot of value to the SABOK presented.</p>
      <p>
        – Cluster Analysis: Through cluster analysis we can understand better which
are the areas the discipline is divided into. Also, it should be possible to
acknowledge some useful intersections of areas and define them as different
elements in the ontology to improve searching capability. We think that
using Formal Concept Analysis tools would allow us to find this clusters
of information by identifying classes of concepts as shown on PACTOLE
methodology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
– Emerging Topics Tracking: With our approach is possible to find which are
the most newer topics in the discipline and how they are related to each
other. However, that does not mean that these are emerging topics. We think
that emerging topics have a low frequency and thus, they will not emerge
on our hierarchy. Besides that, we think that emerging topics appears on
publications with a high impact factor and that’s how we think that they
should be identified. Though, that kind of information is not available on
bibtex files and should be obtained on a different way.
      </p>
      <p>Although we think the best validation for our SABOK should be made by the
community, we are planning on making validation tests with Software Architects
and Software Engineering students in the following months.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>This article has presented a novel method to jump-start the creation of an
ontology-based Body of Knowledge (BOK).</p>
      <p>Using authorative documents from a community, we can mine and extract
information about a discipline to hierarchize it and create an ontology. The
ontology is used to organize the BOK and search Digital Assets (research publications
in our example) using inference. The resulting BOK provides contextualization
allowing document discovering and search inference.</p>
      <p>The ConcepTion set of tools allows to extract, mine, hierarchize and display
a BOK using a semantic wiki to manage information and a timeline tool to show
evolution of topics in the discipline. The community is then asked to feed the
BOK with definitions and their own Digital Assets. Future work will be focused
on improving the quality of the resulting BOK and adding more features.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>H.</given-names>
            <surname>Astudillo</surname>
          </string-name>
          .
          <article-title>Maximizing object reuse with a biological metaphor</article-title>
          .
          <source>TAPOS</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>235</fpage>
          -
          <lpage>251</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Babar</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gorton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Kitchenham</surname>
          </string-name>
          .
          <article-title>Rationale management in software engineering. In A Framework for Supporting Architecture Knowledge and Rationale Management</article-title>
          , pages
          <fpage>237</fpage>
          -
          <lpage>254</lpage>
          . Springer Berlin Heidelberg,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R.</given-names>
            <surname>Bendaoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Toussaint</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          .
          <article-title>Pactole: A methodology and a system for semi-automatically enriching an ontology from a collection of texts</article-title>
          .
          <volume>5113</volume>
          :
          <fpage>203</fpage>
          -
          <lpage>216</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>F.</given-names>
            <surname>Bry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eckert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kotowski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Weiand</surname>
          </string-name>
          .
          <article-title>What the user interacts with: Reflections on conceptual models for semantic wikis</article-title>
          . In C. L. 0002,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schaffert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Skaf-Molli</surname>
          </string-name>
          , and M. V¨olkel, editors,
          <source>SemWiki</source>
          , volume
          <volume>464</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Capilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nava</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>P´erez, and</article-title>
          <string-name>
            <surname>J. C.</surname>
          </string-name>
          <article-title>Duen˜as. A web-based tool for managing architectural design decisions</article-title>
          .
          <source>SIGSOFT Softw. Eng. Notes</source>
          ,
          <volume>31</volume>
          (
          <issue>5</issue>
          ):
          <fpage>4</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          .
          <source>Encyclopedia of Language and Linguistics</source>
          , chapter Information Extraction, Automatic, pages
          <fpage>665</fpage>
          -
          <lpage>677</lpage>
          . 2nd edition,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fraga</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>S´anchez-</article-title>
          <string-name>
            <surname>Cuadrado</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Llor´ens, and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Astudillo</surname>
          </string-name>
          .
          <article-title>Knowledge representation for software architecture domain by manual and automatic methodologies</article-title>
          .
          <source>CLEI Electron. J.</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Gruber</surname>
          </string-name>
          .
          <article-title>Toward principles for the design of ontologies used for knowledge sharing</article-title>
          .
          <source>Int. J. Hum.-Comput</source>
          . Stud.,
          <volume>43</volume>
          (
          <issue>5-6</issue>
          ):
          <fpage>907</fpage>
          -
          <lpage>928</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>A. D. Iorio</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Presutti</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Vitali</surname>
          </string-name>
          . Wikifactory:
          <article-title>An ontology-based application for creating domain-oriented wikis</article-title>
          . In Y. Sure and J. Domingue, editors,
          <source>ESWC</source>
          , volume
          <volume>4011</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>664</fpage>
          -
          <lpage>678</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van der Ven</surname>
          </string-name>
          , P. Avgeriou, and
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Hammer</surname>
          </string-name>
          .
          <article-title>Tool support for architectural decisions</article-title>
          .
          <source>In WICSA '07: Proceedings of the Sixth Working IEEE/IFIP Conference on Software Architecture, page 4</source>
          , Washington, DC, USA,
          <year>2007</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. M. Kr¨otzsch,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>V¨olkel</article-title>
          , H. Haller, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Studer</surname>
          </string-name>
          .
          <article-title>Semantic wikipedia</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <fpage>251</fpage>
          -
          <lpage>261</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. P. Kruchten,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lago</surname>
          </string-name>
          , and H. van Vliet.
          <article-title>Building up and exploiting architectural knowledge</article-title>
          .
          <source>In QoSA'05: Second International Conference on Quality of Software Architectures</source>
          , pages
          <fpage>43</fpage>
          -
          <lpage>58</lpage>
          . Springer Berlin / Heidelberg,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Kruk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cygan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gzella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Woroniecki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Dabrowski</surname>
          </string-name>
          . Jeromedl:
          <article-title>The social semantic digital library</article-title>
          . In S. R. Kruk and B. McDaniel, editors,
          <source>Semantic Digital Libraries</source>
          , pages
          <fpage>139</fpage>
          -
          <lpage>150</lpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Lenin</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. R. M.</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. T. V.</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>R. D.</surname>
          </string-name>
          <article-title>ArchVoc-Towards an ontology for software architecture</article-title>
          .
          <source>In SHARK-ADI '07: Proceedings of the Second Workshop on SHAring and Reusing architectural Knowledge Architecture</source>
          , Rationale, and Design Intent, page 5, Washington, DC, USA,
          <year>2007</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jansen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Avgeriou</surname>
          </string-name>
          .
          <article-title>Selecting a high-quality central model for sharing architectural knowledge</article-title>
          .
          <source>Quality Software</source>
          , International Conference on,
          <volume>0</volume>
          :
          <fpage>357</fpage>
          -
          <lpage>365</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. C. L´opez, P. Inostroza,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Cysneiros</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Astudillo</surname>
          </string-name>
          .
          <article-title>Visualization and comparison of architecture rationale with semantic web technologies</article-title>
          .
          <source>Journal of Systems and Software</source>
          ,
          <volume>82</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1198</fpage>
          -
          <lpage>1210</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Project Management Institute</surname>
          </string-name>
          .
          <article-title>A Guide to the Project Management Body of Knowledge (PMBOK Guide) - Third Edition</article-title>
          , Paperback.
          <source>Project Management Institute</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>S.</given-names>
            <surname>Schaffert</surname>
          </string-name>
          .
          <article-title>Ikewiki: A semantic wiki for collaborative knowledge management</article-title>
          .
          <source>In WETICE '06: Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises</source>
          , pages
          <fpage>388</fpage>
          -
          <lpage>396</lpage>
          , Washington, DC, USA,
          <year>2006</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>S.</given-names>
            <surname>Schaffert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eder</surname>
          </string-name>
          , S. Gru¨nwald, T. Kurz, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Radulescu</surname>
          </string-name>
          .
          <article-title>Kiwi - a platform for semantic social software (demonstration)</article-title>
          .
          <source>In ESWC 2009 Heraklion: Proceedings of the 6th European Semantic Web Conference on The Semantic Web</source>
          , pages
          <fpage>888</fpage>
          -
          <lpage>892</lpage>
          , Berlin, Heidelberg,
          <year>2009</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. S. B.
          <string-name>
            <surname>Shum</surname>
            , E. Motta, and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Domingue</surname>
          </string-name>
          .
          <article-title>Scholonto: an ontology-based digital library server for research documents and discourse</article-title>
          .
          <source>Int. J. on Digital Libraries</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <fpage>237</fpage>
          -
          <lpage>248</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>A.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          .
          <article-title>A rationale-based architecture model for design traceability and reasoning</article-title>
          . J.
          <string-name>
            <surname>Syst</surname>
          </string-name>
          . Softw.,
          <volume>80</volume>
          (
          <issue>6</issue>
          ):
          <fpage>918</fpage>
          -
          <lpage>934</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Tripp</surname>
          </string-name>
          . Guide to the
          <source>Software Engineering Body of Knowledge: 2004 Version</source>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>