<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM Computing Surveys</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Visualization of Large Datasets using Semantic Web Technologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Suvodeep Mazumdar</string-name>
          <email>s.mazumdar@sheffield.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Studies, University of Sheffield Regent</institution>
          <addr-line>Court - 211 Portobello Street, S1 4DP, Sheffield</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2007</year>
      </pub-date>
      <volume>39</volume>
      <issue>4</issue>
      <abstract>
        <p>Visualization technologies provide means to comprehend, understand and explore data. Observing patterns and anomalies via visualization tools help users to understand issues and take informed decisions. Semantic web technologies used to represent different data types, conforming to particular standards can be exploited to provide meaningful and intuitive visualizations. In this paper, we propose how we intend to provide intuitive and interactive visualizations for large datasets, formalized by multiple ontologies.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Visualization</kwd>
        <kwd>semantic web</kwd>
        <kwd>dynamic queries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This research looks at highly complex domains such as aerospace engineering. A
jet engine’s life cycle can last up to 50 years, requiring regular maintenance,
overhauls, tests and services. Each of these activities involves documentation in the
form of text reports, numeric data, images, in-flight data, CAD drawings etc. The
volume of this information can easily exceed several terabytes and some structuring is
needed for this extra large heterogeneous information set to be usable. Information
extraction and semantic web technologies can provide a standardized and structured
representation of the multimedia information. An overarching domain ontology is
essential to provide an overall view of the entire domain. In order to gain
homogeneity, the overarching ontology will be effective, but doing so would be at the
cost of losing details embedded in the document. Hence, each document type can be
formalized by its own representative ontology, thereby providing more detailed
information respect to the global (overarching) representation. Therefore different
ontologies provide different lenses to look at the same document type. It is therefore
possible to explore the data at different levels of granularity: a coarse view provided
by the domain ontology, and a fine-grain view that makes use of the document
ontology. How these two different levels are combined in an effective user interface
and how can the users effectively manipulate and explore them is our main research
question.</p>
    </sec>
    <sec id="sec-2">
      <title>Related work and motivation</title>
      <p>
        Several tools for data visualization and exploration have been proposed. For
example, Semaplorer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] visualize people, tags, photos etc on a geographical map;
GapMinder1 provides an exploratory tool for visualizing statistical trends in data over
time; ManyEyes2 allows users to upload their own data and create visualizations.
However, most of these visualization tools do not address the main questions of this
research, generality and scalability for an effective user interaction. For example
current visualization techniques cannot handle very large volumes of data. A.Katifori
et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] looked at tools for visualizing ontologies, the available visualization methods
and the number of nodes they intend to support (Table1). They found very few
visualization methods capable of handling more than 10,000 nodes. GRIDL3 provides
an approach that is scalable by hierarchically presenting each axes and for each axis
element, a statistical display (bar chart) is presented; GreenMax [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] provides tree
visualization for a million nodes on a representative smaller network of much fewer
clusters.
10:38
A. Katifori et al.
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] present a faceted searching and visualization interface for
peripheral, which are small but distinguishable, and fringe, which are not individually
heterogeneous data by mapping them to known vocabularies extracted from the web
distinguishable but are useful to display the structure. The 3D Hyperbolic Browser
caannd shvoiswuaulpiztiong50tmheaindantoadeuss,i5n0g0 hpuren-ddreefdinpeedripinhteerraflacoenesw, iadngdettsh.ouTshaendsproefsfernincege of
oinnteesr.active multiple visualizations is desirable since it helps in effectively exploring
      </p>
      <p>
        In the user survey in Ernst and Storey [2003], five ontology size categories are
identtihfieedu:nderlying data. One such example is Exhibit, part of the SIMILE Project4, that
allows swapping between different perspectives such as timeline or maps. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
1. Fewer than 100 nodes,
2p.roBpeotwseesena 1m0o1raenadd1v,a0n0c0endodapesp,roach as the multiple visualizations are available and
3u.pBdaettewdeensi1m,0u0lt1anaenodu1s0ly,0.00Tnhoedsees, visualizations, however, do not fuse different
4d.oBcuemtweenetn s1e0t,s00f1oramndal1iz0e0d,00b0yndodifefse,rent ontologies, the first goal of the research.
5S.imMiolraerltyh,athne1w00o,0rk01donnoedebsy. the information visualization community has been mainly
limThiteednutomhbeormoofgneondeeosuisn dthatias.caTsoe oinvcelrucdoems ebotthhescelalsismesitaantidonins,sttahniscerse.search combines
SeMmoasnttiucserWseabre atencthicnipoalotegdy,to ubseedwortkoingagwgirtehgatthee saencdondstcrautcetguorrey odfisopnetrosleogdiesa,nd
whereas none is anticipated to be working with the last. In our case, we will use the three
chaetteegroorgieenseionuTsadbaletaX, wasitah cfrinitdeirniogns fforromthethcelaisnsfifiocrmataiotinonof vthiseuaolnitzoaltoiogyn vciosmuamlizuantiitoyn, to
mpreotvhioddes l(athrgeet-wscoafiler,stinctautietigvoerievsisoufaElizrnatsitoannsdanSdtormeyan[i2p0u0l3a]tiaorne omfehrgeetedroingteonaeosuinsgdleata.
one, and so are the last two). In Table X each category lists the method that could be
eIfnfdeceteidv,elydeusspeidte, uepvitdoetnhcee ntuhmatbehrigohflmyednytnioanmedicniondteesr.acTthioenclatososilfiscaeftfioenctiisveblaysesdupopnort
the existing literature as presented in this section. When there was no information
regarding which category the method belongs to, an estimation was made comparing
i1tGwAasipthMseoientnhdeefrrr,oshmottfpTi:t/as/wbclwaetwXe.g,goaorpnym.lyintdherre.oergm/ethods claim to provide support for more than
120M,0a0n0yEnyoedse,sh.tTtph:/i/smfaancytesyheosw.aslpthhaawtotrhkes.iibsmsu.ceoomf /smcaanlaybeiylietsy/ in the visualization domain
i3sGsrtailplhaicnailmInptoerrftaacnetfoonreD.igital Libraries, www.cs.umd.edu/hcil/west-legal/gridl/
4 TVhaenSHIMa mILaEnPdroVjaenct:Whitjtkp:[/2/s0im02il]ep.mroipt.oedseu;thErxeheibsiot:luhtttipo:n//ssitmotilhee.mpirto.ebdlue/mwiokfi/vEisxuhaibliitzation
of many nodes:
1. Increase available display space, by either using three dimensional and/or hyperbolic
spaces.
2. Reduce the number of information elements by clustering or hiding nodes.
3. Use the given visualization space more efficiently by using every available pixel.
      </p>
      <p>Such solutions have been employed by most of the presented visualizations with
varying degrees of effectiveness.</p>
      <p>
        On the whole, as Munzner [1997] also states that information density should not
be the only metric in ontology visualization: when taken too far, it becomes a clutter.
Drawing for example all the links in a highly connected graph yields a picture that
can give a high level overview of the global structure but is useless for examining
the details. There is always a trade-off between maximum number of nodes displayed
users in data exploration, very little has been done in the area of Semantic Data
visualization. This research builds upon our previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that implements the
concept of dynamic queries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to provide highly interactive manipulation of multiple
visualization, namely tables, timeline, geographical and topological plots.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach</title>
      <p>We aim to engage the user communities at Rolls Royce actively during the
research period. Our approach is to follow the process of iterative user-centered
design. Since there are different types of users from different areas of Rolls Royce
aerospace engineering domain (design, manufacturing and service), the target system
must be able to generate visualizations that are equally interactive, intuitive and
informative for all. We intend to conduct personal interviews of users to understand
their daily jobs and the kinds of visualizations they are used to. We will then present
the users with use case scenarios supported by low-fidelity mockups and sketches of
the system that we perceive will benefit the users. We will be following this process
iteratively to gain a sound understanding of the user’s requirements and expectations.</p>
      <p>We will also be studying the different ontologies and their inter-dependencies that
have been developed for different sets of data currently in use at Rolls Royce. This
will help generate a taxonomy that will relate the data types of the concepts to their
corresponding visualizations and interactions they can support. We will be using the
results from the participatory design sessions to decide the best interactions for
different visualizations, so that the user can seamlessly explore the data in different
hierarchical layers.</p>
      <p>
        The usability of the semantic data visualization tools would be core. Applying
filters to millions of documents generates very large retrieved sets with thousands of
results, too much information for the user to process. Past proposals to mitigate this
problem include: increase display area by using 3D plots instead of 2D, cluster or hide
nodes or utilizing every pixel in the visualization space [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our approach is radically
different and uses classification, clustering and overlapping of data to provide
contextual layered visualizations, where each layer contains information only relevant
to that layer. For example consider a pie chart generated on the basis of the domain
ontology and intended to provide a generic overview of the distribution of the data
respect to a specific concept; if the user clicks on a pie chart section which has more
detailed information formalized by another supported ontology, then further details
corresponding to the specific ontology will be displayed providing a semantic zoom.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>Adopting the User-Centered approach discussed above, a core part of the research is
understanding how Rolls Royce engineers conduct their daily work and what tools
will be useful during data analysis. The starting point will be observations conducted
at Rolls-Royce premises aiming at identifying current practices of data display and
analysis. By collecting examples of artifacts currently in use we aim at finding
inspiration for a design that will be naturally usable because already familiar. We
have already started a series of participatory design sessions with several potential
users from different areas of Rolls Royce aerospace engineering domain (design,
manufacturing and service). In these sessions we are discussing mock-ups of the
visualizations and related interactions so as to actively involve the user community in
selecting the - potentially optimal - solution(s). This requirements gathering is paired
with the system architecture design to be completed in first year of research.</p>
      <p>A series of exhaustive tests on the query response time, loading times, efficiency
etc. of the various triple stores will be conducted to select the most efficient system
architecture. Once a back-end system is determined, we will be performing tests on
loading query results ‘on-the-fly’. Tests conducted in X-Media show that there is a
significant waiting time for the visualizations to be initialized. This is the base line
against which we will work to improve display efficiency, a core issue in user
interaction. The software coding phase would be throughout the second year of the
research, when we will also be preparing evaluation and trial materials based on the
use case scenarios being developed in year one.</p>
      <p>
        The evaluation of the solution will be carried out with the Rolls Royce engineers at
their premises during the first two months of the third year. We will follow the
methodology we have used previously in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: participants will be requested to carry
out specific tasks designed in partnership with Rolls Royce experts; the interaction
will be logged and the screen activity recorded; participants will then be requested to
fill in a questionnaire and answer a few targeted questions in an interview. Results
from this user evaluation will be used to re-design and modify the application where
needed, following which we would be conducting a long-term user trial. The
remainder of the third year would be dedicated to thesis writing and providing bug
fixes and minor enhancements.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>The work already done in X-Media shows the importance and effectiveness of
multiple visualizations in a large complex organization. The ability of a user to
visualize the same data in different dimensions, query them and identify patterns and
areas of interest is useful in providing or identifying possible solutions. The findings
from the X-Media project has been a good stepping stone for the research we intend
to conduct over the next few years.</p>
      <p>The research, although organized around the case of aerospace engineering, is
expected to be generic and applicable to different domains that share similar
characteristics and problems. Specifically, we will test our result with the data from
GrassPortal8, an online resource for accessing data related to grass species, global
8 GrassPortal, http://www.grassportal.org
environmental data, evolutionary relationships among grasses etc. to test the
portability of the approach adopted. This will be a good way to measure how
successfully the semantic visualization technology can be ported to other domains
represented by their respective domain ontologies.</p>
      <p>Acknowledgments. This research is supported by SAMULET, a Rolls Royce and
DTI funded project for knowledge management in aerospace manufacturing domain.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahlberg</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williamson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shneiderman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Dynamic Queries for Information Exploration: An Implementation and Evaluation</article-title>
          . CHI'
          <volume>92</volume>
          ,
          <fpage>619</fpage>
          -
          <lpage>626</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Katifori</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halatsis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lepouras</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassilakis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannopoulou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Ontology visualization methods-a survey</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>39</volume>
          ,
          <issue>4</issue>
          (Nov.
          <year>2007</year>
          ),
          <fpage>10</fpage>
          . DOI= http://doi.acm.
          <source>org/10</source>
          .1145/1287620.1287621
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Petrelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazumdar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dadzie</surname>
            ,
            <given-names>A.-S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>: Multi Visualization and Dynamic Query for Effective Exploration of Semantic Data</article-title>
          .
          <source>In Proceedings of the 8th International Semantic Web Conference</source>
          , pp.
          <fpage>505</fpage>
          -
          <lpage>520</lpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Schenk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saathoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>SemaPlorer-Interactive semantic exploration of data and media based on a federated cloud infrastructure</article-title>
          .
          <source>Web Semant. 7</source>
          ,
          <issue>4</issue>
          (Dec.
          <year>2009</year>
          ),
          <fpage>298</fpage>
          -
          <lpage>304</lpage>
          . DOI= http://dx.doi.org/10.1016/j.websem.
          <year>2009</year>
          .
          <volume>09</volume>
          .006
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Van</given-names>
            <surname>Ham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Van Wijk</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. J.</surname>
          </string-name>
          <year>2002</year>
          .
          <article-title>Beamtrees: Compact visualization of large hierarchies</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Information Visualization</source>
          . IEEE CS Press,
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foote</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackey</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Chin Jr.,
          <string-name>
            <given-names>G.</given-names>
            , Sofia, H., and
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Dynamic</given-names>
            <surname>Multiscale</surname>
          </string-name>
          <article-title>Magnifying Tool for Exploring Large Sparse Graphs</article-title>
          ,
          <source>Information Visualization 7</source>
          ,
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hildebrand</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Ossenbruggen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hardman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          .
          <article-title>Supporting subject matter annotation using heterogeneous thesauri: A user study in web data reuse</article-title>
          .
          <source>International Journal of Human-Computer Studies</source>
          ,
          <volume>67</volume>
          (
          <issue>10</issue>
          ):
          <fpage>887</fpage>
          -
          <lpage>902</lpage>
          ,
          <year>10 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hildebrand</surname>
          </string-name>
          and
          <string-name>
            <surname>J. van Ossenbruggen.</surname>
          </string-name>
          <article-title>Configuring semantic web interfaces by data mapping</article-title>
          . In S. Handschuh,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and V. Thai, editors,
          <article-title>Visual Interfaces to the Social and the Semantic Web (VISSW</article-title>
          <year>2009</year>
          ), volume
          <volume>443</volume>
          ,
          <year>February 2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>