<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Hildesheim, Oct.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Visual Exploration of Patent Collections with IPC Clouds</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dominik Herr</string-name>
          <email>dominik.herr@vis.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qi Han</string-name>
          <email>qi.han@vis.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steffen Lohmann</string-name>
          <email>steffen.lohmann@vis.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sören Brügmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Ertl</string-name>
          <email>thomas.ertl@vis.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Brügmann Software Bokeler Straße 18</institution>
          ,
          <addr-line>26871 Papenburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Graduate School of Excellence advanced Manufacturing Engineering (GSaME) University of Stuttgart</institution>
          ,
          <addr-line>Universitätsstraße 38, 70569 Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Visualization and Interactive Systems</institution>
          ,
          <addr-line>VIS</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>7</volume>
      <issue>2014</issue>
      <abstract>
        <p>The International Patent Classi cation (IPC) is the most widely used system for the classi cation of patents. It is indispensable in patent retrieval, as it allows to lter patents by their IPC classes, groups, and subgroups. However, the selection of appropriate IPC symbols can be challenging and there is the risk that important patents are overlooked because relevant IPC symbols are not considered in the search. Therefore, the identi cation of appropriate IPC symbols is a crucial activity in patent retrieval that could signi cantly bene t from better IT support. This paper introduces IPC clouds, an interactive visualization technique that shows the relatedness of IPC symbols based on their co-use in the patent data. In contrast to the IPC hierarchy, IPC clouds allow to dynamically explore the IPC space while taking into account how the IPC symbols are actually used in the patent data. They provide an alternative view on the IPC system and assist in identifying relevant IPC symbols and associated patents. The general visualization technique is not limited to the IPC system but can also be applied to similar classi cation systems or to keywords and concepts extracted from the patent documents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Patents</kwd>
        <kwd>retrieval</kwd>
        <kwd>mining</kwd>
        <kwd>IPC</kwd>
        <kwd>CPC</kwd>
        <kwd>classi cation</kwd>
        <kwd>visual analysis</kwd>
        <kwd>tag cloud</kwd>
        <kwd>visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.2.8 [Information interfaces and presentation]: User
Interfaces|Graphical user interfaces (GUI)</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        A technological advantage over competitors is often the key
to a superior positioning on the market in today's industry.
Therefore, the protection of intellectual property becomes
Copyright c 2014 for the individual papers by the papers'
authors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.
Published at CEUR-WS.org
more and more important. At the same time, it is important
to know what the relevant patents in a certain eld are. As
more than one million patents are issued each year [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], it
is increasingly challenging to nd the relevant ones.
The International Patent Classi cation (IPC) is \one of the
most important tools available to people who want to search
patent databases" [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It is developed and maintained by the
World Intellectual Property Organization (WIPO) for more
than 40 years and used by almost all patent o ces for the
classi cation of patents. The IPC divides technology into
eight thematic sections with more than 70,000 subdivisions
that are hierarchically organized. The IPC symbols are
usually assigned to the patents by the national o ces that
publish the patent documents.
      </p>
      <p>
        The IPC system can be very useful in navigating the patent
database and retrieving relevant patents. Its hierarchical
structure allows to lter patents by their IPC classes,
subclasses, groups, or subgroups. Often, a set of IPC symbols
is used to retrieve patterns of interest for a deeper analysis.
This bears the risk that relevant patents are not considered
only because they are classi ed with other IPC symbols than
expected. An overview on the actual use and particularly
the co-use of IPC symbols would therefore be most helpful
to discover related IPC symbols that could be relevant in a
certain retrieval context. Inspired by the tag cloud
visualization technique [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], we developed IPC clouds to visualize
the co-use of IPC symbols in patent data and to support
the identi cation of relevant relationships within the IPC
space. IPC symbols that are identi ed to be related can be
from very di erent classes or groups of the IPC hierarchy
but may fruitfully extend the set of IPC symbols already
used in patent retrieval.
      </p>
      <p>In this paper, we introduce IPC clouds in detail and describe
their creation from patent data. Our implementation uses a
noSQL database containing bibliographic data for a large
amount of patents. We rst compute the similarities of each
pair of IPC symbols based on their co-use in the patent
documents. We then map the similarities on a two-dimensional
plane to get a global representation of the IPC space. Based
on this mapping, we developed two di erent types of IPC
clouds, one giving a general overview on the IPC space and
another focusing on selected IPC symbols. Both
visualizations o er several interaction techniques to further support
the exploration of the IPC space.</p>
    </sec>
    <sec id="sec-3">
      <title>2. RELATED WORK</title>
      <p>
        Modern systems for patent retrieval and analysis
increasingly provide interactive visualizations to improve access to
patent data. As an example, PatAnalyse [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] shows weighted
links between applicants and other patent data in matrix
visualizations with histograms and color scales. The patent
documents themselves are often represented as high
dimensional data objects using vector space models. Examples are
the \landscape maps" in Patent iNSIGHT Pro [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or the
ThemeScape maps in Thomson Aureka [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Another popular visualization technique in the patent
domain are node-link diagrams. They are often used in patent
citation analysis [
        <xref ref-type="bibr" rid="ref16 ref21">16, 21</xref>
        ] to show relationships between
patents based on citation links. A commercial system
incorporating such node-link diagrams is Delphion Citation
Link [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Other approaches use node-link diagrams to show
relations between patents and priority documents [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], or to
graphically depict networks of applicants or inventors [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
Node-link diagrams can be very useful to explore the patent
space and to identify important clusters in the patent data.
The IPC space is rarely visualized in related work. Usually,
it is shown in some kind of tree view that the user can
navigate to nd IPC symbols of interest. Kutz uses a sequence of
treemaps to visualize the evolution of the IPC system over
time [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. However, the treemaps are again structured
according to the IPC hierarchy without considering other IPC
relations in the patent data.
      </p>
      <p>
        IPC clouds, in contrast, do not make use of the IPC
hierarchy but visualize the relatedness of IPC symbols based
on their actual co-use in the patent data. Furthermore, the
IPC relatedness is not explicitly visualized but implicitly by
their spatial arrangement, similar to the idea of clustered tag
clouds [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Also, like in tag clouds, the labels are weighted
in the visualization so that their size re ects the usage
frequency of the corresponding IPC symbol.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. PATENT DATA</title>
      <p>
        We use the document-oriented NoSQL database Elastic
Search [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to store the patent data. A document-oriented
database has some advantages over a relational one in text
mining contexts. In particular, it is less rigid than a
relational database in that it does not require a certain data
schema or a clear structuring for every record. Di erent
records can have di erent elds and semi-structured data
is usually not a problem. New information can easily be
added to a subset of records without the need to update
other records in the database or to use empty elds.
Another useful characteristic of document-oriented databases
is that they typically allow to retrieve documents based on
their content. Elastic Search is based on Apache Lucene,
which is a powerful text search engine o ering sophisticated
full-text indexing and searching. Both Elastic Search and
Apache Lucene are open source projects written in Java and
released under the Apache License. The patent data is
accessible via HTTP and exchanged in JSON format, i.e., it
can be retrieved over the web via a RESTful web service.
Moreover, we can directly access the Lucene repository to
preprocess the data and perform computationally expensive
tasks, such as the later described computation of similarities.
The database comprises two repositories, a large one with
bibliographic information and a smaller one containing the
texts from the patent documents. The bibliographic
information was taken from the PatStat database [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of the
European Patent O ce. It includes the patent ID, title, abstract,
applicant, inventor, ling and application dates, IPC
symbols, as well as citations for more than 70 million patents.
We transformed the PatStat data into the JSON structure
of our Elastic Search database using MongoDB [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
The patent texts comprise the descriptions and claims for
88,000 arbitrarily chosen patents. They were retrieved from
Espacenet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the European Patent Register [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and the
European Publication Server [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], using RESTful web services
of the Open Patent Services [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. All texts are indexed by
Lucene and linked to the bibliograhic information via their
unique patent IDs. In this paper, we will focus on how the
IPC symbols are used in the patent data.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. DATA PREPROCESSING</title>
      <p>Before IPC clouds are generated, the patent data is
preprocessed. The preprocessing consists of two steps: We rst
compute the pairwise similarities between the IPC symbols
and then map these similarities onto a 2D space.</p>
    </sec>
    <sec id="sec-6">
      <title>4.1 Computation of IPC Similarities</title>
      <p>Similarities can be computed on di erent levels of the IPC
hierarchy, i.e. on the class, subclass, group, or subgroup level.
We computed the similarities on the subclass level in our
work, which is the third level of the IPC hierarchy
comprising 638 classes (in the current version IPC-2014.01). The
IPC symbols on this level have four characters, starting with
a letter for the section followed by a two-digit number for
the class and a letter for the subclass (e.g. \A01B"). This
four-character IPC symbol forms a common unit in patent
retrieval and provides a good classi cation granularity. That
is, the number of classes on this hierarchy level is ideal for
the generation of IPC clouds, since they already contain a
good amount of detailed information about the IPC class,
but still retain a generality that provides an overview of
potentially relevant IPC classes. However, the computation
and mapping could also be performed on other levels of the
IPC hierarchy.1
To compute the similarities between the IPC symbols, we
rst build a vector space for the patent data. In our case, we
used the 88,000 patents from the second repository of our
database (see above). We created a vector for each of the 615
IPC symbols contained in that dataset2, with the patents as
dimensions of the vector space: If the considered IPC symbol
is used to classify a patent, the corresponding dimension has
a positive value; otherwise it is zero. Then, we compute the
cosine similarity of each pair of IPC symbols to determine
their relatedness in the patent data. That is, given two IPC
symbols x and y, we rst calculate the vectors Vx and Vy
and subsequently compute their similarity with the formula
sim(Vx; Vy) =</p>
      <p>Vx Vy :
jVxj jVyj
(1)
1In the following, we will also use the term IPC symbol when we
refer to the shortened four-character version of the IPC symbol
for the sake of simplicity.
223 of the 638 available IPC symbols were not used in the dataset.
The cosine similarity is an e cient measure for sparse
vectors, which is useful in our case, as each IPC symbol is
associated with only a small fraction of the patents. This results
in a small number of non-zero dimensions per vector
compared to the total number of dimensions in the vector space,
and hence in sparse vectors.</p>
    </sec>
    <sec id="sec-7">
      <title>4.2 Dimensionality Reduction of IPC Space</title>
      <p>
        In the second step, we map the IPC symbols onto a 2D
plane required for the visualization. The goal of this step is
to nd a 2D representation that approximates the similarity
matrix. That is, IPC symbols that are frequently co-used in
the patent data are ideally placed close to each other, while
those that never appear together are placed far apart.
Our implementation uses t-SNE [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] as mapping technique.
We rst normalize the similarity matrix to get a
probability distribution P , where pij represents the similarity
between IPC symbol i and IPC symbol j. The t-SNE
algorithm aims to nd positions x1; :::; xn 2 R2 which minimize
the Kullback-Leibler divergence between two distributions
P and Q:
where qij is de ned as:
      </p>
      <p>KL(P jjQ) =
i6=j
X pijlog pij</p>
      <p>
        qij
qij = Pk6=l(1 + jjxk
(1 + jjxi
xjjj2) 1
xljj2) 1
(2)
(3)
representing the similarity between point xi and xj.
For the maximum number of iterations, we use the default
parameter of 1000 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>5. IPC CLOUD VISUALIZATIONS</title>
      <p>
        The 2D mapping of the IPC space provides the basis for
the creation of IPC clouds. In particular, we developed two
di erent types of IPC clouds that we call map view and
darts view and that will be detailed in the following. While
the map view provides a global overview on the IPC space,
the darts view puts selected IPC symbols in the focus and
supports the visual identi cation of IPC symbols that are
related to the selected ones. Both views follow the \visual
information seeking mantra" [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] by giving an overview rst,
then allowing to zoom and lter, and nally showing details
on demand.
      </p>
    </sec>
    <sec id="sec-9">
      <title>5.1 Map View</title>
      <p>The map view is basically a normalized and rescaled
depiction of the 2D representation we get after the dimensionality
reduction. Additionally, the font sizes re ect the frequencies
with which the IPC symbols are used.</p>
      <p>If we would directly visualize the previously computed 2D
representation of the IPC space, we would get many overlaps
resulting from the fact that the text labels (i.e., the IPC
symbols) have a non-zero width and height. As dimensionality
reduction techniques typically map the data to an arbitrary
Cartesian coordinate system, we rst normalize and rescale
the mapping. By doing so, we transform the mapping into
a coordinate system appropriate for visualization, while we
retain the spatial distribution. In our case, a scaling factor
of 25,000 resulted in a good overview and only few overlaps
of the text labels.</p>
      <p>After the layout has been computed, the IPC symbols are
placed at the determined positions on the screen, as shown in
Figure 1 a . The font size of each IPC symbol correlates with
the number of associated patents, i.e., IPC symbols with a
large font size are used more often in the patent data than
those with a small font size. We use a logarithmic scaling for
the font sizes, as the frequencies of the IPC symbols roughly
follow a power law distribution (cp. Figure 2) and we do not
want to overemphasize certain IPC symbols. The resulting
map view shows the whole IPC space, with the IPC symbols
spatially arranged according to their relatedness and scaled
in size according to their usage frequency.</p>
      <p>
        In addition, we o er the user the option to remove even the
few remaining overlaps, in case he or she wants to. We use
the push variant of the Force-Scan Algorithm (FSA) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
for this purpose, which preserves the general layout and, in
particular, the relative distances of the nodes. The algorithm
compares the label areas with each other and, if an overlap
is detected, the label which is further to the upper left is
xed and all other labels are moved in the direction where
the overlap is resolved the fastest.
      </p>
      <p>
        Keeping the relative distances of the labels roughly stable is
important, as they re ect the relatedness of the IPC
symbols. This disquali es many other algorithms for overlap
removal that preserve the orthogonal ordering of the labels
but not their relative distances [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. A common drawback
of the push variant of FSA is the increased size of the
visualization, which is, however, not a problem in our case, as
we usually expect only few label overlaps and as we added
zooming and panning to the IPC clouds.
      </p>
      <p>Panning and zooming are basic but important interaction
techniques that enable the user to explore di erent parts of
the map view in more detail. Furthermore, we added a
minimap that always shows the whole IPC cloud and indicates
which part of it is focused in the main view (Figure 1 c ).
The minimap can also be used to change the focused area
and to reset the zoom level. It therefore helps to avoid that
the user gets lost in the IPC space.</p>
      <p>D
Since users are typically interested in speci c IPC symbols,
they can lter the map view to show only a subset of IPC
symbols and those that are co-used. This can be done by
selecting any number of IPC symbols on the map and adding
them to a whitelist displayed on the right of the visualization
(Figure 1 b ). As it can be hard to spot speci c IPC symbols
on the map, the IPC symbols can alternatively be entered in
a search eld (equipped with an autocomplete feature). Once
all IPC symbols of interest have been added and the lter
is activated, IPC symbols that are not related to at least m
of the whitelisted ones are removed from the visualization
(with a variable m that is set to m = 1 by default).
If the user selects an IPC symbol in the visualization, the
titles of patents associated with that symbol are listed
beneath the main view (Figure 1 d ). If several IPC symbols
are selected, only titles of patents associated with all of the
symbols are listed (i.e. they are connected by a logical
conjunction operator). More details on a patent, such as the
whole list of associated IPC symbols and its titles in
German and French, are shown in a tooltip when hovering over
the patent's title in the list.</p>
    </sec>
    <sec id="sec-10">
      <title>5.2 Darts View</title>
      <p>The darts view provides another perspective on selected IPC
symbols using the metaphor of a dartboard. In contrast to
the the map view, it does not provide a global overview on
the IPC space but focuses on speci c IPC symbols and their
local context. IPC symbols selected in the map view or
entered in the search eld are placed in the center of the darts
view (the bullseye), as they de ne what the user is
interested in. Related IPC symbols are concentrically arranged
around the bullseye in distances that re ect their
relatedness to the selected IPC symbols: While IPC symbols close
to the bullseye are strongly related, IPC symbols near the
border have a weaker relation. Figure 3 shows an example
where the IPC symbol \F02N" has been selected and hence
forms the bullseye.</p>
      <p>The darts view requires the de nition of two key parameters:
1) a maximum number n of IPC symbols shown in the
visualization, and 2) a threshold de ning the minimum
similarity value a related IPC symbol must have to be shown in
the visualization. Both parameters are interrelated and
suitable values are dependent on the application context, such
as the available screen space or the average font size of the
labels. We had good experiences with an n of 10 to 20, as
this number of IPC symbols can still be well perceived and
cognitively processed. A good value is more di cult to
choose, as the similarity values are dependent on the
considered patent data and IPC symbols. For our patent data,
an of 0.5 to 0.7 has led to good results in most cases. For
instance, we used an of 0.6 to generate the darts view
shown in Figure 3. However, it could happen that for some
IPC symbols no results are returned, as all similarity values
are below the given threshold .</p>
      <p>Another option would be to dynamically choose an
appropriate based on the number of related IPC symbols that
are returned. For instance, could be dynamically changed
in a way that there are always the n most related IPC
symbols shown in the darts view. However, such an adaptive
approach bears the risk that the user does not recognize the
variable threshold when analyzing di erent darts views. It
may also lead to a wrong impression, as the visualization
might include IPC symbols that are only very distantly
related to the selected ones in case of a low .</p>
      <p>After the related IPC symbols have been determined based
on the parameters n and , their positions on the dartboard
are computed. Like the map view, the darts view makes use
of the 2D representation we computed in Section 4, in that
the related IPC symbols are located in the representation
and their relative angle to the selected IPC symbol is
determined. If multiple IPC symbols are selected, the average
of the angles is taken. The related IPC symbols are then
ordered by their angle. However, they are not drawn with
their original angle on the dartboard but the angles are
normalized in a way that they are forming a circle around the
selected IPC symbol(s).</p>
      <p>Apart from the angles, we also compute the distances of the
IPC symbols in relation to the bullseye. We take the values
that resulted from the similarity computation (cf. Section 4)
and use a logarithmic scale to determine the nal positions
of the IPC labels. We decided for a logarithmic scale, as the
similarities of the IPC symbols follow roughly a power law
distribution again, i.e. the number of IPC symbols with a
high similarity value is much lower than the number of IPC
symbols with a lower similarity in nearly all cases. Finally,
the IPC symbols are placed at the determined positions on
the dartboard, while their font sizes indicate how often they
are used in the patent data, like in the map view.
Note that there is no xed value separating the inner from
the outer circle of the dartboard by default. If we want to
have such a value, we can simply de ne another threshold
for the inner circle (see Figure 3). This threshold sets
the borderline that separates IPC symbols in the inner circle
from the outer. Likewise, we can add any number of
additional circles to the darts view, each with its own threshold.</p>
    </sec>
    <sec id="sec-11">
      <title>5.3 Example of Use</title>
      <p>
        Let us assume we want to le a patent for a new technique to
start combustion engines. The IPC symbol \F02N" is ideally
suited to classify our invention, since it refers to the
\starting of combustion engines" [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In the map view, we have
already spotted said IPC symbol and noticed that the IPC
symbol \H02P" is very close to it (as in Figure 1). It
classies patents that describe a \control or regulation of electric
motors, generators, or dynamo-electric converters" [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
We can therefore assume that several technologies for
combustion engines are also used in electric motors. It seems
to be a good idea to analyze the patents related to
electrical engine starters, because there may already be a patent
which is in con ict with our invention.
      </p>
      <p>After switching to the darts view, we realize that there seem
to be several other IPC symbols that are also strongly
related to the IPC symbol we are interested in, leading us to
further technologies and patents that might be of relevance
and should be considered before ling our patent.</p>
    </sec>
    <sec id="sec-12">
      <title>6. DISCUSSION OF SCALABILITY</title>
      <p>Due to the massive number of patents that are digitally
available nowadays, scalability is one of the main issues in
any patent visualization approach. A key challenge in our
approach lies with the 2D mapping of the IPC symbols.
Dimensionality reduction methods are usually not stable, i.e. the
algorithms may map data to very di erent locations on the
2D plane even if the data changes only slightly. Therefore,
we do not recompute the 2D mapping with every change in
the dataset but keep the mapping stable as long as it still
re ects the IPC distances in a su cient way. That is,
stability has a higher priority than precision in this particular
case, as the distances in the 2D representation only roughly
indicate the relatedness of the IPC symbols anyway.
Besides the scalability of the visualization, the scalabilities
of the data storage and data model are crucial in patent
retrieval. The former is unproblematic in our approach, as new
patent records can simply be added to the Elastic Search
database. If new IPC symbols are added to the database,
only those patent records need to be updated that are
classi ed by these symbols, without the need to update any
other patent records.</p>
      <p>The data model is robust to an increasing amount of patents
in the sense that the similarities of the IPC symbols do not
need to be recomputed due to the usually large amount of
patent records that are processed in the initial mapping.
New patents will still be found if IPC symbols are selected in
the visualization because the search for related patents uses
the database without actually considering the data that has
been used by the data model. This robustness entails two
disadvantages: 1) it will be necessary to recompute the
similarity matrix at some point, which will also require a
remapping onto the 2D plane; 2) if a large number of patents will
emerge in a speci c eld, such that the associated IPC
symbols would get a lot more important, this approach would
not be able to detect this shift in the IPC space. To
represent new IPC symbols in the data model, it is necessary to
recompute the similarity matrix as well as the 2D mapping
of the IPC symbols.</p>
      <sec id="sec-12-1">
        <title>Data storage Data model</title>
      </sec>
      <sec id="sec-12-2">
        <title>Mapping</title>
        <p># of patents</p>
        <p>+</p>
        <p>Search: +
Sim. accuracy: 0
+
# of IPC symbols
+</p>
        <p>Table 1 summarizes the discussed scalabilities of the various
components of our approach. It indicates how well the data
storage, data model, and mapping scale with an increasing
amount of patents and IPC symbols after the initial
computation of the data model.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>7. CONCLUSION AND FUTURE WORK</title>
      <p>We presented IPC clouds, an interactive visualization for the
patent domain inspired by tag clouds that allows to explore
the IPC space. In contrast to related work, IPC clouds do
not make use of the prede ned IPC hierarchy but are based
on the actual co-use of IPC symbols in the patent data. They
provide an overview of the IPC space and enable the user
to `dive' into it and nd related IPC symbols that might be
relevant in a speci c retrieval context.</p>
      <p>We presented two di erent types of IPC clouds: The map
view arranges the IPC symbols globally on a 2D plane, while
the darts view provides a local and focused layout for a
selected subset of IPC symbols. It uses the metaphor of a
dartboard with the selected IPC symbols in the bullseye and
related symbols concentrically arranged around it. Although
the visualizations look di erent, they are strongly related
and can e ciently be created from the same 2D
representation. Like in tag clouds, the font sizes of the IPC symbols
are scaled according to their usage frequencies to
emphasize IPC symbols that occur very frequently in the analyzed
data. We added a simple search interface to the map view,
using a whitelist of IPC symbols for ltering. Both
visualizations are additionally equipped with several interaction
techniques that support the exploration of the IPC space
and allow to get more details about patents that are related
to selected IPC symbols.</p>
      <p>
        We are currently in the process of expanding our database
to contain data for all patents indexed in Espacenet, which
is more than 80 million [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Once these patents have been
loaded into our database, we will investigate if there are
distinguishable clusters or patterns of IPC symbols. We are
also planning to extract concepts and components from the
patent documents and visualize their relations in addition to
the IPC space. Finally, we aim to extend and combine the
map and darts view in a manner that they are integrated into
one highly dynamic and interactive IPC cloud visualization.
      </p>
    </sec>
    <sec id="sec-14">
      <title>8. ACKNOWLEDGMENTS</title>
      <p>This work was partially supported by the EU funded project
iPatDoc (grant no. 606163).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Delphion</given-names>
            <surname>Citation</surname>
          </string-name>
          <article-title>Link</article-title>
          . http://www.delphion.com/ products/research/products-citelink.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Elastic</given-names>
            <surname>Search</surname>
          </string-name>
          . http://www.elasticsearch.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>[3] EPO { Espacenet. http://www.espacenet.com.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>EPO</given-names>
            <surname>{ European Publication</surname>
          </string-name>
          <article-title>Server</article-title>
          . https://data.epo.org/publication-server.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>EPO</given-names>
            <surname>Worldwide Patent Statistical</surname>
          </string-name>
          <article-title>Database (PATSTAT)</article-title>
          . http://www.epo.org/searching/ subscription/raw/product-14-24_de.html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>European</given-names>
            <surname>Patent</surname>
          </string-name>
          <article-title>Register</article-title>
          . https://register.epo.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] IPC (International Patent Classi cation)</article-title>
          . http://www.epo.org/searching/essentials/ classification/ipc-reform.html.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[8] MongoDB. http://www.mongodb.org/.</mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Open</given-names>
            <surname>Patent</surname>
          </string-name>
          <article-title>Services (OPS)</article-title>
          . http://www.epo.org/searching/free/ops.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>PatAnalyse { Sample Patent Map</article-title>
          . http://www.patanalyse.com/samplemap.html.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Patent</surname>
          </string-name>
          iNSIGHT Pro. http://www.patentinsightpro.com/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Thomson</given-names>
            <surname>Innovation</surname>
          </string-name>
          . http://thomsonreuters.com/thomson-innovation.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>[13] WIPO { World Intellectual Property Organization. http://www.wipo.int.</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dwyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Marriott</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Stuckey</surname>
          </string-name>
          .
          <article-title>Fast node overlap removal</article-title>
          .
          <source>In Proceedings of the 13th Int. Conf. on Graph Drawing, GD'05</source>
          , pages
          <fpage>153</fpage>
          {
          <fpage>164</fpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Giereth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rotard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ertl</surname>
          </string-name>
          .
          <article-title>Web based visual exploration of patent information</article-title>
          .
          <source>In Proceedings of the 11th Int. Conf. on Information Visualization, IV '07</source>
          , pages
          <fpage>150</fpage>
          {
          <fpage>155</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
            <given-names>CS</given-names>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>A. B. Ja</surname>
            e
            <given-names>and M.</given-names>
          </string-name>
          <string-name>
            <surname>Trajtenberg</surname>
          </string-name>
          . Patents, Citations &amp;
          <article-title>Innovations: A Window on the Knowledge Economy</article-title>
          . MIT Press,
          <source>revised edition</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D. O.</given-names>
            <surname>Kutz</surname>
          </string-name>
          .
          <article-title>Examining the evolution and distribution of patent classi cations</article-title>
          .
          <source>In Proceedings of the 8th Int. Conf. on Information Visualisation, IV '04</source>
          , pages
          <fpage>983</fpage>
          {
          <fpage>988</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
            <given-names>CS</given-names>
          </string-name>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Tetzla</surname>
          </string-name>
          .
          <article-title>Comparison of tag cloud layouts: Task-related performance and visual exploration</article-title>
          .
          <source>In Proceedings of the 12th IFIP TC 13 Int. Conf. on Human-Computer Interaction</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , INTERACT '
          <volume>09</volume>
          , pages
          <fpage>392</fpage>
          {
          <fpage>404</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Misue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eades</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          .
          <article-title>Layout adjustment and the mental map</article-title>
          .
          <source>Journal of visual languages and computing</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <volume>183</volume>
          {
          <fpage>210</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shneiderman</surname>
          </string-name>
          .
          <article-title>The eyes have it: A task by data type taxonomy for information visualizations</article-title>
          .
          <source>In Proceedings of the 1996 IEEE Symposium on Visual Languages, VL '96</source>
          , pages
          <fpage>336</fpage>
          {
          <fpage>343</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
            <given-names>CS</given-names>
          </string-name>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sternitzke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bartkowski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Schramm</surname>
          </string-name>
          .
          <article-title>Visualizing patent statistics by means of social network analysis tools</article-title>
          .
          <source>World Patent Information</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <volume>115</volume>
          {
          <fpage>131</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L. Van der Maaten and G.</given-names>
            <surname>Hinton. Visualizing</surname>
          </string-name>
          high
          <article-title>-dimensional data using t-SNE</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>9</volume>
          (
          <fpage>2579</fpage>
          -2605):
          <fpage>85</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>F. B.</given-names>
            <surname>Viegas</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wattenberg</surname>
          </string-name>
          .
          <article-title>Tag clouds and the case for vernacular visualization</article-title>
          . interactions,
          <volume>15</volume>
          (
          <issue>4</issue>
          ):
          <volume>49</volume>
          {
          <fpage>52</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>