=Paper= {{Paper |id=Vol-1292/ipamin2014_paper8 |storemode=property |title=Visual Exploration of Patent Collections with IPC Clouds |pdfUrl=https://ceur-ws.org/Vol-1292/ipamin2014_paper8.pdf |volume=Vol-1292 |dblpUrl=https://dblp.org/rec/conf/konvens/HerrHLBE14 }} ==Visual Exploration of Patent Collections with IPC Clouds== https://ceur-ws.org/Vol-1292/ipamin2014_paper8.pdf
     Visual Exploration of Patent Collections with IPC Clouds

            Dominik Herr1,2 , Qi Han1 , Steffen Lohmann1 , Sören Brügmann3 , Thomas Ertl1
                                  1
                                    Institute for Visualization and Interactive Systems (VIS)
                   2
                       Graduate School of Excellence advanced Manufacturing Engineering (GSaME)
                          University of Stuttgart, Universitätsstraße 38, 70569 Stuttgart, Germany
                {dominik.herr, qi.han, steffen.lohmann, thomas.ertl}@vis.uni-stuttgart.de
                                                   3
                                                     Brügmann Software
                                      Bokeler Straße 18, 26871 Papenburg, Germany
                                              sb@bruegmann-software.eu

ABSTRACT                                                          more and more important. At the same time, it is important
The International Patent Classification (IPC) is the most         to know what the relevant patents in a certain field are. As
widely used system for the classification of patents. It is in-   more than one million patents are issued each year [13], it
dispensable in patent retrieval, as it allows to filter patents   is increasingly challenging to find the relevant ones.
by their IPC classes, groups, and subgroups. However, the
selection of appropriate IPC symbols can be challenging and       The International Patent Classification (IPC) is “one of the
there is the risk that important patents are overlooked be-       most important tools available to people who want to search
cause relevant IPC symbols are not considered in the search.      patent databases” [7]. It is developed and maintained by the
Therefore, the identification of appropriate IPC symbols is       World Intellectual Property Organization (WIPO) for more
a crucial activity in patent retrieval that could significantly   than 40 years and used by almost all patent offices for the
benefit from better IT support. This paper introduces IPC         classification of patents. The IPC divides technology into
clouds, an interactive visualization technique that shows the     eight thematic sections with more than 70,000 subdivisions
relatedness of IPC symbols based on their co-use in the           that are hierarchically organized. The IPC symbols are usu-
patent data. In contrast to the IPC hierarchy, IPC clouds         ally assigned to the patents by the national offices that pub-
allow to dynamically explore the IPC space while taking           lish the patent documents.
into account how the IPC symbols are actually used in the         The IPC system can be very useful in navigating the patent
patent data. They provide an alternative view on the IPC          database and retrieving relevant patents. Its hierarchical
system and assist in identifying relevant IPC symbols and         structure allows to filter patents by their IPC classes, sub-
associated patents. The general visualization technique is        classes, groups, or subgroups. Often, a set of IPC symbols
not limited to the IPC system but can also be applied to          is used to retrieve patterns of interest for a deeper analysis.
similar classification systems or to keywords and concepts        This bears the risk that relevant patents are not considered
extracted from the patent documents.                              only because they are classified with other IPC symbols than
                                                                  expected. An overview on the actual use and particularly
Categories and Subject Descriptors                                the co-use of IPC symbols would therefore be most helpful
H.2.8 [Information interfaces and presentation]: User             to discover related IPC symbols that could be relevant in a
Interfaces—Graphical user interfaces (GUI)                        certain retrieval context. Inspired by the tag cloud visual-
                                                                  ization technique [23], we developed IPC clouds to visualize
                                                                  the co-use of IPC symbols in patent data and to support
Keywords                                                          the identification of relevant relationships within the IPC
Patents, retrieval, mining, IPC, CPC, classification, visual      space. IPC symbols that are identified to be related can be
analysis, tag cloud, visualization.                               from very different classes or groups of the IPC hierarchy
                                                                  but may fruitfully extend the set of IPC symbols already
1.    INTRODUCTION                                                used in patent retrieval.
A technological advantage over competitors is often the key
                                                                  In this paper, we introduce IPC clouds in detail and describe
to a superior positioning on the market in today’s industry.
                                                                  their creation from patent data. Our implementation uses a
Therefore, the protection of intellectual property becomes
                                                                  noSQL database containing bibliographic data for a large
                                                                  amount of patents. We first compute the similarities of each
                                                                  pair of IPC symbols based on their co-use in the patent doc-
                                                                  uments. We then map the similarities on a two-dimensional
Copyright c 2014 for the individual papers by the papers’ au-
thors. Copying permitted for private and academic purposes.       plane to get a global representation of the IPC space. Based
This volume is published and copyrighted by its editors. Pub-     on this mapping, we developed two different types of IPC
lished at CEUR-WS.org                                             clouds, one giving a general overview on the IPC space and
Proceedings of the 1st International Workshop on Patent Mining    another focusing on selected IPC symbols. Both visualiza-
and Its Applications (IPaMin 2014). Hildesheim, Oct. 7, 2014.     tions offer several interaction techniques to further support
At KONVENS 2014, Oct. 8–10, 2014, Hildesheim, Germany.            the exploration of the IPC space.
2.   RELATED WORK                                                 The database comprises two repositories, a large one with
Modern systems for patent retrieval and analysis increas-         bibliographic information and a smaller one containing the
ingly provide interactive visualizations to improve access to     texts from the patent documents. The bibliographic infor-
patent data. As an example, PatAnalyse [10] shows weighted        mation was taken from the PatStat database [5] of the Euro-
links between applicants and other patent data in matrix          pean Patent Office. It includes the patent ID, title, abstract,
visualizations with histograms and color scales. The patent       applicant, inventor, filing and application dates, IPC sym-
documents themselves are often represented as high dimen-         bols, as well as citations for more than 70 million patents.
sional data objects using vector space models. Examples are       We transformed the PatStat data into the JSON structure
the “landscape maps” in Patent iNSIGHT Pro [11] or the            of our Elastic Search database using MongoDB [8].
ThemeScape maps in Thomson Aureka [12].
                                                                  The patent texts comprise the descriptions and claims for
Another popular visualization technique in the patent do-         88,000 arbitrarily chosen patents. They were retrieved from
main are node-link diagrams. They are often used in patent        Espacenet [3], the European Patent Register [6], and the
citation analysis [16, 21] to show relationships between          European Publication Server [4], using RESTful web services
patents based on citation links. A commercial system in-          of the Open Patent Services [9]. All texts are indexed by
corporating such node-link diagrams is Delphion Citation          Lucene and linked to the bibliograhic information via their
Link [1]. Other approaches use node-link diagrams to show         unique patent IDs. In this paper, we will focus on how the
relations between patents and priority documents [15], or to      IPC symbols are used in the patent data.
graphically depict networks of applicants or inventors [21].
Node-link diagrams can be very useful to explore the patent       4.    DATA PREPROCESSING
space and to identify important clusters in the patent data.      Before IPC clouds are generated, the patent data is pre-
                                                                  processed. The preprocessing consists of two steps: We first
The IPC space is rarely visualized in related work. Usually,      compute the pairwise similarities between the IPC symbols
it is shown in some kind of tree view that the user can navi-     and then map these similarities onto a 2D space.
gate to find IPC symbols of interest. Kutz uses a sequence of
treemaps to visualize the evolution of the IPC system over
time [17]. However, the treemaps are again structured ac-         4.1    Computation of IPC Similarities
cording to the IPC hierarchy without considering other IPC        Similarities can be computed on different levels of the IPC
relations in the patent data.                                     hierarchy, i.e. on the class, subclass, group, or subgroup level.
                                                                  We computed the similarities on the subclass level in our
IPC clouds, in contrast, do not make use of the IPC hier-         work, which is the third level of the IPC hierarchy compris-
archy but visualize the relatedness of IPC symbols based          ing 638 classes (in the current version IPC-2014.01). The
on their actual co-use in the patent data. Furthermore, the       IPC symbols on this level have four characters, starting with
IPC relatedness is not explicitly visualized but implicitly by    a letter for the section followed by a two-digit number for
their spatial arrangement, similar to the idea of clustered tag   the class and a letter for the subclass (e.g. “A01B”). This
clouds [18]. Also, like in tag clouds, the labels are weighted    four-character IPC symbol forms a common unit in patent
in the visualization so that their size reflects the usage fre-   retrieval and provides a good classification granularity. That
quency of the corresponding IPC symbol.                           is, the number of classes on this hierarchy level is ideal for
                                                                  the generation of IPC clouds, since they already contain a
                                                                  good amount of detailed information about the IPC class,
3.   PATENT DATA                                                  but still retain a generality that provides an overview of
We use the document-oriented NoSQL database Elastic               potentially relevant IPC classes. However, the computation
Search [2] to store the patent data. A document-oriented          and mapping could also be performed on other levels of the
database has some advantages over a relational one in text        IPC hierarchy.1
mining contexts. In particular, it is less rigid than a rela-
tional database in that it does not require a certain data        To compute the similarities between the IPC symbols, we
schema or a clear structuring for every record. Different         first build a vector space for the patent data. In our case, we
records can have different fields and semi-structured data        used the 88,000 patents from the second repository of our
is usually not a problem. New information can easily be           database (see above). We created a vector for each of the 615
added to a subset of records without the need to update           IPC symbols contained in that dataset2 , with the patents as
other records in the database or to use empty fields.             dimensions of the vector space: If the considered IPC symbol
                                                                  is used to classify a patent, the corresponding dimension has
Another useful characteristic of document-oriented databases      a positive value; otherwise it is zero. Then, we compute the
is that they typically allow to retrieve documents based on       cosine similarity of each pair of IPC symbols to determine
their content. Elastic Search is based on Apache Lucene,          their relatedness in the patent data. That is, given two IPC
which is a powerful text search engine offering sophisticated     symbols x and y, we first calculate the vectors Vx and Vy
full-text indexing and searching. Both Elastic Search and         and subsequently compute their similarity with the formula
Apache Lucene are open source projects written in Java and
released under the Apache License. The patent data is ac-                                            Vx · Vy
                                                                                  sim(Vx , Vy ) =                 .              (1)
cessible via HTTP and exchanged in JSON format, i.e., it                                            |Vx | · |Vy |
can be retrieved over the web via a RESTful web service.          1
                                                                    In the following, we will also use the term IPC symbol when we
Moreover, we can directly access the Lucene repository to         refer to the shortened four-character version of the IPC symbol
preprocess the data and perform computationally expensive         for the sake of simplicity.
                                                                  2
tasks, such as the later described computation of similarities.     23 of the 638 available IPC symbols were not used in the dataset.
                                                                              100
The cosine similarity is an efficient measure for sparse vec-




                                                                   Tausende
                                                                               90
tors, which is useful in our case, as each IPC symbol is asso-
                                                                               80
ciated with only a small fraction of the patents. This results
                                                                               70
in a small number of non-zero dimensions per vector com-
                                                                               60
pared to the total number of dimensions in the vector space,
                                                                               50
and hence in sparse vectors.
                                                                               40
                                                                               30
4.2    Dimensionality Reduction of IPC Space                                   20
In the second step, we map the IPC symbols onto a 2D
                                                                               10
plane required for the visualization. The goal of this step is
                                                                                0
to find a 2D representation that approximates the similarity




                                                                                    H04N
                                                                                    H01M
                                                                                    H04W

                                                                                    H01G

                                                                                    B60W
                                                                                     H04L




                                                                                     B60K




                                                                                     H02J
                                                                                     B60L
                                                                                     H01L



                                                                                     H04B




                                                                                      B01J



                                                                                    H04M
                                                                                     C07D




                                                                                     G06F
                                                                                     C07C

                                                                                    G01N
                                                                                     C12N




                                                                                     C08L
                                                                                     C08F




                                                                                     C01G



                                                                                     C01B
                                                                                     A61K




                                                                                     A61P

                                                                                     G02B
matrix. That is, IPC symbols that are frequently co-used in
the patent data are ideally placed close to each other, while
those that never appear together are placed far apart.             Figure 2: The distribution of the IPC usage frequen-
                                                                   cies roughly follows a power law, as illustrated for
Our implementation uses t-SNE [22] as mapping technique.           the 25 most often used IPC symbols in the 88.000
We first normalize the similarity matrix to get a probabil-        patent records that were analyzed (in thousands).
ity distribution P , where pij represents the similarity be-
tween IPC symbol i and IPC symbol j. The t-SNE algo-
rithm aims to find positions x1 , ..., xn ∈ R2 which minimize      of 25,000 resulted in a good overview and only few overlaps
the Kullback-Leibler divergence between two distributions          of the text labels.
P and Q:
                            X            pij                       After the layout has been computed, the IPC symbols are
               KL(P ||Q) =       pij log                  (2)      placed at the determined positions on the screen, as shown in
                                         qij
                              i6=j
                                                                   Figure 1 a . The font size of each IPC symbol correlates with
where qij is defined as:                                           the number of associated patents, i.e., IPC symbols with a
                                                                   large font size are used more often in the patent data than
                      (1 + ||xi − xj ||2 )−1                       those with a small font size. We use a logarithmic scaling for
            qij = P                         2 −1
                                                            (3)
                      k6=l (1 + ||xk − xl || )                     the font sizes, as the frequencies of the IPC symbols roughly
representing the similarity between point xi and xj .              follow a power law distribution (cp. Figure 2) and we do not
                                                                   want to overemphasize certain IPC symbols. The resulting
For the maximum number of iterations, we use the default           map view shows the whole IPC space, with the IPC symbols
parameter of 1000 [22].                                            spatially arranged according to their relatedness and scaled
                                                                   in size according to their usage frequency.
5.    IPC CLOUD VISUALIZATIONS                                     In addition, we offer the user the option to remove even the
The 2D mapping of the IPC space provides the basis for             few remaining overlaps, in case he or she wants to. We use
the creation of IPC clouds. In particular, we developed two        the push variant of the Force-Scan Algorithm (FSA) [19]
different types of IPC clouds that we call map view and            for this purpose, which preserves the general layout and, in
darts view and that will be detailed in the following. While       particular, the relative distances of the nodes. The algorithm
the map view provides a global overview on the IPC space,          compares the label areas with each other and, if an overlap
the darts view puts selected IPC symbols in the focus and          is detected, the label which is further to the upper left is
supports the visual identification of IPC symbols that are         fixed and all other labels are moved in the direction where
related to the selected ones. Both views follow the “visual        the overlap is resolved the fastest.
information seeking mantra” [20] by giving an overview first,
then allowing to zoom and filter, and finally showing details      Keeping the relative distances of the labels roughly stable is
on demand.                                                         important, as they reflect the relatedness of the IPC sym-
                                                                   bols. This disqualifies many other algorithms for overlap re-
5.1    Map View                                                    moval that preserve the orthogonal ordering of the labels
The map view is basically a normalized and rescaled depic-         but not their relative distances [14]. A common drawback
tion of the 2D representation we get after the dimensionality      of the push variant of FSA is the increased size of the visu-
reduction. Additionally, the font sizes reflect the frequencies    alization, which is, however, not a problem in our case, as
with which the IPC symbols are used.                               we usually expect only few label overlaps and as we added
                                                                   zooming and panning to the IPC clouds.
If we would directly visualize the previously computed 2D
representation of the IPC space, we would get many overlaps        Panning and zooming are basic but important interaction
resulting from the fact that the text labels (i.e., the IPC sym-   techniques that enable the user to explore different parts of
bols) have a non-zero width and height. As dimensionality          the map view in more detail. Furthermore, we added a mini-
reduction techniques typically map the data to an arbitrary        map that always shows the whole IPC cloud and indicates
Cartesian coordinate system, we first normalize and rescale        which part of it is focused in the main view (Figure 1 c ).
the mapping. By doing so, we transform the mapping into            The minimap can also be used to change the focused area
a coordinate system appropriate for visualization, while we        and to reset the zoom level. It therefore helps to avoid that
retain the spatial distribution. In our case, a scaling factor     the user gets lost in the IPC space.
                  A                                                                                           B




                                                                                                  C

                                                                          E
              D
Figure 1: Map view of the IPC space where the user filtered four IPC symbols b . These IPC symbols and
their related ones are shown in the overview a . The minimap c indicates which part of the IPC space is
focused. The two highlighted IPC symbols have been selected by the user. The bottom part lists all patents
that are associated with the selected IPC symbols d . Further information about the patent, including all
associated IPC symbols, can be displayed on demand e .


Since users are typically interested in specific IPC symbols,     ested in. Related IPC symbols are concentrically arranged
they can filter the map view to show only a subset of IPC         around the bullseye in distances that reflect their related-
symbols and those that are co-used. This can be done by           ness to the selected IPC symbols: While IPC symbols close
selecting any number of IPC symbols on the map and adding         to the bullseye are strongly related, IPC symbols near the
them to a whitelist displayed on the right of the visualization   border have a weaker relation. Figure 3 shows an example
(Figure 1 b ). As it can be hard to spot specific IPC symbols     where the IPC symbol “F02N” has been selected and hence
on the map, the IPC symbols can alternatively be entered in       forms the bullseye.
a search field (equipped with an autocomplete feature). Once
all IPC symbols of interest have been added and the filter        The darts view requires the definition of two key parameters:
is activated, IPC symbols that are not related to at least m      1) a maximum number n of IPC symbols shown in the visu-
of the whitelisted ones are removed from the visualization        alization, and 2) a threshold α defining the minimum simi-
(with a variable m that is set to m = 1 by default).              larity value a related IPC symbol must have to be shown in
                                                                  the visualization. Both parameters are interrelated and suit-
If the user selects an IPC symbol in the visualization, the       able values are dependent on the application context, such
titles of patents associated with that symbol are listed be-      as the available screen space or the average font size of the
neath the main view (Figure 1 d ). If several IPC symbols         labels. We had good experiences with an n of 10 to 20, as
are selected, only titles of patents associated with all of the   this number of IPC symbols can still be well perceived and
symbols are listed (i.e. they are connected by a logical con-     cognitively processed. A good α value is more difficult to
junction operator). More details on a patent, such as the         choose, as the similarity values are dependent on the con-
whole list of associated IPC symbols and its titles in Ger-       sidered patent data and IPC symbols. For our patent data,
man and French, are shown in a tooltip when hovering over         an α of 0.5 to 0.7 has led to good results in most cases. For
the patent’s title in the list.                                   instance, we used an α of 0.6 to generate the darts view
                                                                  shown in Figure 3. However, it could happen that for some
                                                                  IPC symbols no results are returned, as all similarity values
5.2    Darts View                                                 are below the given threshold α.
The darts view provides another perspective on selected IPC
symbols using the metaphor of a dartboard. In contrast to         Another option would be to dynamically choose an appro-
the the map view, it does not provide a global overview on        priate α based on the number of related IPC symbols that
the IPC space but focuses on specific IPC symbols and their       are returned. For instance, α could be dynamically changed
local context. IPC symbols selected in the map view or en-        in a way that there are always the n most related IPC sym-
tered in the search field are placed in the center of the darts   bols shown in the darts view. However, such an adaptive
view (the bullseye), as they define what the user is inter-       approach bears the risk that the user does not recognize the
                                           β        α            5.3    Example of Use
                                                                 Let us assume we want to file a patent for a new technique to
                                                                 start combustion engines. The IPC symbol “F02N” is ideally
                                                                 suited to classify our invention, since it refers to the “start-
                                                                 ing of combustion engines” [13]. In the map view, we have
                                                                 already spotted said IPC symbol and noticed that the IPC
                                                                 symbol “H02P” is very close to it (as in Figure 1). It classi-
                                                                 fies patents that describe a “control or regulation of electric
                                                                 motors, generators, or dynamo-electric converters” [13].

                                                                 We can therefore assume that several technologies for com-
                                                                 bustion engines are also used in electric motors. It seems
                                                                 to be a good idea to analyze the patents related to electri-
                                                                 cal engine starters, because there may already be a patent
                                                                 which is in conflict with our invention.

                                                                 After switching to the darts view, we realize that there seem
                                                                 to be several other IPC symbols that are also strongly re-
                                                                 lated to the IPC symbol we are interested in, leading us to
Figure 3: Darts view showing one selected IPC sym-               further technologies and patents that might be of relevance
bol in the bullseye and related IPC symbols concen-              and should be considered before filing our patent.
trically arranged around it indicating their related-
ness.                                                            6.    DISCUSSION OF SCALABILITY
                                                                 Due to the massive number of patents that are digitally
                                                                 available nowadays, scalability is one of the main issues in
variable threshold when analyzing different darts views. It      any patent visualization approach. A key challenge in our ap-
may also lead to a wrong impression, as the visualization        proach lies with the 2D mapping of the IPC symbols. Dimen-
might include IPC symbols that are only very distantly re-       sionality reduction methods are usually not stable, i.e. the
lated to the selected ones in case of a low α.                   algorithms may map data to very different locations on the
                                                                 2D plane even if the data changes only slightly. Therefore,
After the related IPC symbols have been determined based         we do not recompute the 2D mapping with every change in
on the parameters n and α, their positions on the dartboard      the dataset but keep the mapping stable as long as it still
are computed. Like the map view, the darts view makes use        reflects the IPC distances in a sufficient way. That is, sta-
of the 2D representation we computed in Section 4, in that       bility has a higher priority than precision in this particular
the related IPC symbols are located in the representation        case, as the distances in the 2D representation only roughly
and their relative angle to the selected IPC symbol is de-       indicate the relatedness of the IPC symbols anyway.
termined. If multiple IPC symbols are selected, the average
of the angles is taken. The related IPC symbols are then         Besides the scalability of the visualization, the scalabilities
ordered by their angle. However, they are not drawn with         of the data storage and data model are crucial in patent re-
their original angle on the dartboard but the angles are nor-    trieval. The former is unproblematic in our approach, as new
malized in a way that they are forming a circle around the       patent records can simply be added to the Elastic Search
selected IPC symbol(s).                                          database. If new IPC symbols are added to the database,
                                                                 only those patent records need to be updated that are clas-
Apart from the angles, we also compute the distances of the      sified by these symbols, without the need to update any
IPC symbols in relation to the bullseye. We take the values      other patent records.
that resulted from the similarity computation (cf. Section 4)
and use a logarithmic scale to determine the final positions     The data model is robust to an increasing amount of patents
of the IPC labels. We decided for a logarithmic scale, as the    in the sense that the similarities of the IPC symbols do not
similarities of the IPC symbols follow roughly a power law       need to be recomputed due to the usually large amount of
distribution again, i.e. the number of IPC symbols with a        patent records that are processed in the initial mapping.
high similarity value is much lower than the number of IPC       New patents will still be found if IPC symbols are selected in
symbols with a lower similarity in nearly all cases. Finally,    the visualization because the search for related patents uses
the IPC symbols are placed at the determined positions on        the database without actually considering the data that has
the dartboard, while their font sizes indicate how often they    been used by the data model. This robustness entails two
are used in the patent data, like in the map view.               disadvantages: 1) it will be necessary to recompute the simi-
                                                                 larity matrix at some point, which will also require a remap-
Note that there is no fixed value separating the inner from      ping onto the 2D plane; 2) if a large number of patents will
the outer circle of the dartboard by default. If we want to      emerge in a specific field, such that the associated IPC sym-
have such a value, we can simply define another threshold        bols would get a lot more important, this approach would
β for the inner circle (see Figure 3). This threshold β sets     not be able to detect this shift in the IPC space. To repre-
the borderline that separates IPC symbols in the inner circle    sent new IPC symbols in the data model, it is necessary to
from the outer. Likewise, we can add any number of addi-         recompute the similarity matrix as well as the 2D mapping
tional circles to the darts view, each with its own threshold.   of the IPC symbols.
                    # of patents       # of IPC symbols          [2] Elastic Search. http://www.elasticsearch.org.
 Data storage             +                   +                  [3] EPO – Espacenet. http://www.espacenet.com.
 Data model           Search: +                -                 [4] EPO – European Publication Server.
                   Sim. accuracy: 0                                  https://data.epo.org/publication-server.
     Mapping              +                      -               [5] EPO Worldwide Patent Statistical Database
                                                                     (PATSTAT). http://www.epo.org/searching/
Table 1: Scalability of the data storage, data model,                subscription/raw/product-14-24_de.html.
and mapping in relation with the number of patents               [6] European Patent Register.
and the number of IPC symbols.                                       https://register.epo.org.
                                                                 [7] IPC (International Patent Classification).
Table 1 summarizes the discussed scalabilities of the various        http://www.epo.org/searching/essentials/
components of our approach. It indicates how well the data           classification/ipc-reform.html.
storage, data model, and mapping scale with an increasing        [8] MongoDB. http://www.mongodb.org/.
amount of patents and IPC symbols after the initial compu-       [9] Open Patent Services (OPS).
tation of the data model.                                            http://www.epo.org/searching/free/ops.html.
                                                                [10] PatAnalyse – Sample Patent Map.
7.    CONCLUSION AND FUTURE WORK                                     http://www.patanalyse.com/samplemap.html.
We presented IPC clouds, an interactive visualization for the   [11] Patent iNSIGHT Pro.
patent domain inspired by tag clouds that allows to explore          http://www.patentinsightpro.com/.
the IPC space. In contrast to related work, IPC clouds do       [12] Thomson Innovation.
not make use of the predefined IPC hierarchy but are based           http://thomsonreuters.com/thomson-innovation.
on the actual co-use of IPC symbols in the patent data. They    [13] WIPO – World Intellectual Property Organization.
provide an overview of the IPC space and enable the user             http://www.wipo.int.
to ‘dive’ into it and find related IPC symbols that might be    [14] T. Dwyer, K. Marriott, and P. J. Stuckey. Fast node
relevant in a specific retrieval context.                            overlap removal. In Proceedings of the 13th Int. Conf.
                                                                     on Graph Drawing, GD’05, pages 153–164. Springer,
We presented two different types of IPC clouds: The map              2006.
view arranges the IPC symbols globally on a 2D plane, while     [15] M. Giereth, S. Koch, M. Rotard, and T. Ertl. Web
the darts view provides a local and focused layout for a se-         based visual exploration of patent information. In
lected subset of IPC symbols. It uses the metaphor of a dart-        Proceedings of the 11th Int. Conf. on Information
board with the selected IPC symbols in the bullseye and re-          Visualization, IV ’07, pages 150–155. IEEE CS, 2007.
lated symbols concentrically arranged around it. Although       [16] A. B. Jaffe and M. Trajtenberg. Patents, Citations &
the visualizations look different, they are strongly related         Innovations: A Window on the Knowledge Economy.
and can efficiently be created from the same 2D representa-          MIT Press, revised edition, 2005.
tion. Like in tag clouds, the font sizes of the IPC symbols     [17] D. O. Kutz. Examining the evolution and distribution
are scaled according to their usage frequencies to empha-            of patent classifications. In Proceedings of the 8th Int.
size IPC symbols that occur very frequently in the analyzed          Conf. on Information Visualisation, IV ’04, pages
data. We added a simple search interface to the map view,            983–988. IEEE CS, 2004.
using a whitelist of IPC symbols for filtering. Both visual-
                                                                [18] S. Lohmann, J. Ziegler, and L. Tetzlaff. Comparison of
izations are additionally equipped with several interaction
                                                                     tag cloud layouts: Task-related performance and visual
techniques that support the exploration of the IPC space
                                                                     exploration. In Proceedings of the 12th IFIP TC 13
and allow to get more details about patents that are related
                                                                     Int. Conf. on Human-Computer Interaction, Part I,
to selected IPC symbols.
                                                                     INTERACT ’09, pages 392–404. Springer, 2009.
We are currently in the process of expanding our database       [19] K. Misue, P. Eades, W. Lai, and K. Sugiyama. Layout
to contain data for all patents indexed in Espacenet, which          adjustment and the mental map. Journal of visual
is more than 80 million [3]. Once these patents have been            languages and computing, 6(2):183–210, 1995.
loaded into our database, we will investigate if there are      [20] B. Shneiderman. The eyes have it: A task by data
distinguishable clusters or patterns of IPC symbols. We are          type taxonomy for information visualizations. In
also planning to extract concepts and components from the            Proceedings of the 1996 IEEE Symposium on Visual
patent documents and visualize their relations in addition to        Languages, VL ’96, pages 336–343. IEEE CS, 1996.
the IPC space. Finally, we aim to extend and combine the        [21] C. Sternitzke, A. Bartkowski, and R. Schramm.
map and darts view in a manner that they are integrated into         Visualizing patent statistics by means of social
one highly dynamic and interactive IPC cloud visualization.          network analysis tools. World Patent Information,
                                                                     30(2):115 – 131, 2008.
8.    ACKNOWLEDGMENTS                                           [22] L. Van der Maaten and G. Hinton. Visualizing
This work was partially supported by the EU funded project           high-dimensional data using t-SNE. Journal of
iPatDoc (grant no. 606163).                                          Machine Learning Research, 9(2579-2605):85, 2008.
                                                                [23] F. B. Viégas and M. Wattenberg. Tag clouds and the
                                                                     case for vernacular visualization. interactions,
9.    REFERENCES                                                     15(4):49–52, 2008.
 [1] Delphion Citation Link. http://www.delphion.com/
     products/research/products-citelink.