=Paper=
{{Paper
|id=Vol-1292/ipamin2014_paper8
|storemode=property
|title=Visual Exploration of Patent Collections with IPC Clouds
|pdfUrl=https://ceur-ws.org/Vol-1292/ipamin2014_paper8.pdf
|volume=Vol-1292
|dblpUrl=https://dblp.org/rec/conf/konvens/HerrHLBE14
}}
==Visual Exploration of Patent Collections with IPC Clouds==
Visual Exploration of Patent Collections with IPC Clouds Dominik Herr1,2 , Qi Han1 , Steffen Lohmann1 , Sören Brügmann3 , Thomas Ertl1 1 Institute for Visualization and Interactive Systems (VIS) 2 Graduate School of Excellence advanced Manufacturing Engineering (GSaME) University of Stuttgart, Universitätsstraße 38, 70569 Stuttgart, Germany {dominik.herr, qi.han, steffen.lohmann, thomas.ertl}@vis.uni-stuttgart.de 3 Brügmann Software Bokeler Straße 18, 26871 Papenburg, Germany sb@bruegmann-software.eu ABSTRACT more and more important. At the same time, it is important The International Patent Classification (IPC) is the most to know what the relevant patents in a certain field are. As widely used system for the classification of patents. It is in- more than one million patents are issued each year [13], it dispensable in patent retrieval, as it allows to filter patents is increasingly challenging to find the relevant ones. by their IPC classes, groups, and subgroups. However, the selection of appropriate IPC symbols can be challenging and The International Patent Classification (IPC) is “one of the there is the risk that important patents are overlooked be- most important tools available to people who want to search cause relevant IPC symbols are not considered in the search. patent databases” [7]. It is developed and maintained by the Therefore, the identification of appropriate IPC symbols is World Intellectual Property Organization (WIPO) for more a crucial activity in patent retrieval that could significantly than 40 years and used by almost all patent offices for the benefit from better IT support. This paper introduces IPC classification of patents. The IPC divides technology into clouds, an interactive visualization technique that shows the eight thematic sections with more than 70,000 subdivisions relatedness of IPC symbols based on their co-use in the that are hierarchically organized. The IPC symbols are usu- patent data. In contrast to the IPC hierarchy, IPC clouds ally assigned to the patents by the national offices that pub- allow to dynamically explore the IPC space while taking lish the patent documents. into account how the IPC symbols are actually used in the The IPC system can be very useful in navigating the patent patent data. They provide an alternative view on the IPC database and retrieving relevant patents. Its hierarchical system and assist in identifying relevant IPC symbols and structure allows to filter patents by their IPC classes, sub- associated patents. The general visualization technique is classes, groups, or subgroups. Often, a set of IPC symbols not limited to the IPC system but can also be applied to is used to retrieve patterns of interest for a deeper analysis. similar classification systems or to keywords and concepts This bears the risk that relevant patents are not considered extracted from the patent documents. only because they are classified with other IPC symbols than expected. An overview on the actual use and particularly Categories and Subject Descriptors the co-use of IPC symbols would therefore be most helpful H.2.8 [Information interfaces and presentation]: User to discover related IPC symbols that could be relevant in a Interfaces—Graphical user interfaces (GUI) certain retrieval context. Inspired by the tag cloud visual- ization technique [23], we developed IPC clouds to visualize the co-use of IPC symbols in patent data and to support Keywords the identification of relevant relationships within the IPC Patents, retrieval, mining, IPC, CPC, classification, visual space. IPC symbols that are identified to be related can be analysis, tag cloud, visualization. from very different classes or groups of the IPC hierarchy but may fruitfully extend the set of IPC symbols already 1. INTRODUCTION used in patent retrieval. A technological advantage over competitors is often the key In this paper, we introduce IPC clouds in detail and describe to a superior positioning on the market in today’s industry. their creation from patent data. Our implementation uses a Therefore, the protection of intellectual property becomes noSQL database containing bibliographic data for a large amount of patents. We first compute the similarities of each pair of IPC symbols based on their co-use in the patent doc- uments. We then map the similarities on a two-dimensional Copyright c 2014 for the individual papers by the papers’ au- thors. Copying permitted for private and academic purposes. plane to get a global representation of the IPC space. Based This volume is published and copyrighted by its editors. Pub- on this mapping, we developed two different types of IPC lished at CEUR-WS.org clouds, one giving a general overview on the IPC space and Proceedings of the 1st International Workshop on Patent Mining another focusing on selected IPC symbols. Both visualiza- and Its Applications (IPaMin 2014). Hildesheim, Oct. 7, 2014. tions offer several interaction techniques to further support At KONVENS 2014, Oct. 8–10, 2014, Hildesheim, Germany. the exploration of the IPC space. 2. RELATED WORK The database comprises two repositories, a large one with Modern systems for patent retrieval and analysis increas- bibliographic information and a smaller one containing the ingly provide interactive visualizations to improve access to texts from the patent documents. The bibliographic infor- patent data. As an example, PatAnalyse [10] shows weighted mation was taken from the PatStat database [5] of the Euro- links between applicants and other patent data in matrix pean Patent Office. It includes the patent ID, title, abstract, visualizations with histograms and color scales. The patent applicant, inventor, filing and application dates, IPC sym- documents themselves are often represented as high dimen- bols, as well as citations for more than 70 million patents. sional data objects using vector space models. Examples are We transformed the PatStat data into the JSON structure the “landscape maps” in Patent iNSIGHT Pro [11] or the of our Elastic Search database using MongoDB [8]. ThemeScape maps in Thomson Aureka [12]. The patent texts comprise the descriptions and claims for Another popular visualization technique in the patent do- 88,000 arbitrarily chosen patents. They were retrieved from main are node-link diagrams. They are often used in patent Espacenet [3], the European Patent Register [6], and the citation analysis [16, 21] to show relationships between European Publication Server [4], using RESTful web services patents based on citation links. A commercial system in- of the Open Patent Services [9]. All texts are indexed by corporating such node-link diagrams is Delphion Citation Lucene and linked to the bibliograhic information via their Link [1]. Other approaches use node-link diagrams to show unique patent IDs. In this paper, we will focus on how the relations between patents and priority documents [15], or to IPC symbols are used in the patent data. graphically depict networks of applicants or inventors [21]. Node-link diagrams can be very useful to explore the patent 4. DATA PREPROCESSING space and to identify important clusters in the patent data. Before IPC clouds are generated, the patent data is pre- processed. The preprocessing consists of two steps: We first The IPC space is rarely visualized in related work. Usually, compute the pairwise similarities between the IPC symbols it is shown in some kind of tree view that the user can navi- and then map these similarities onto a 2D space. gate to find IPC symbols of interest. Kutz uses a sequence of treemaps to visualize the evolution of the IPC system over time [17]. However, the treemaps are again structured ac- 4.1 Computation of IPC Similarities cording to the IPC hierarchy without considering other IPC Similarities can be computed on different levels of the IPC relations in the patent data. hierarchy, i.e. on the class, subclass, group, or subgroup level. We computed the similarities on the subclass level in our IPC clouds, in contrast, do not make use of the IPC hier- work, which is the third level of the IPC hierarchy compris- archy but visualize the relatedness of IPC symbols based ing 638 classes (in the current version IPC-2014.01). The on their actual co-use in the patent data. Furthermore, the IPC symbols on this level have four characters, starting with IPC relatedness is not explicitly visualized but implicitly by a letter for the section followed by a two-digit number for their spatial arrangement, similar to the idea of clustered tag the class and a letter for the subclass (e.g. “A01B”). This clouds [18]. Also, like in tag clouds, the labels are weighted four-character IPC symbol forms a common unit in patent in the visualization so that their size reflects the usage fre- retrieval and provides a good classification granularity. That quency of the corresponding IPC symbol. is, the number of classes on this hierarchy level is ideal for the generation of IPC clouds, since they already contain a good amount of detailed information about the IPC class, 3. PATENT DATA but still retain a generality that provides an overview of We use the document-oriented NoSQL database Elastic potentially relevant IPC classes. However, the computation Search [2] to store the patent data. A document-oriented and mapping could also be performed on other levels of the database has some advantages over a relational one in text IPC hierarchy.1 mining contexts. In particular, it is less rigid than a rela- tional database in that it does not require a certain data To compute the similarities between the IPC symbols, we schema or a clear structuring for every record. Different first build a vector space for the patent data. In our case, we records can have different fields and semi-structured data used the 88,000 patents from the second repository of our is usually not a problem. New information can easily be database (see above). We created a vector for each of the 615 added to a subset of records without the need to update IPC symbols contained in that dataset2 , with the patents as other records in the database or to use empty fields. dimensions of the vector space: If the considered IPC symbol is used to classify a patent, the corresponding dimension has Another useful characteristic of document-oriented databases a positive value; otherwise it is zero. Then, we compute the is that they typically allow to retrieve documents based on cosine similarity of each pair of IPC symbols to determine their content. Elastic Search is based on Apache Lucene, their relatedness in the patent data. That is, given two IPC which is a powerful text search engine offering sophisticated symbols x and y, we first calculate the vectors Vx and Vy full-text indexing and searching. Both Elastic Search and and subsequently compute their similarity with the formula Apache Lucene are open source projects written in Java and released under the Apache License. The patent data is ac- Vx · Vy sim(Vx , Vy ) = . (1) cessible via HTTP and exchanged in JSON format, i.e., it |Vx | · |Vy | can be retrieved over the web via a RESTful web service. 1 In the following, we will also use the term IPC symbol when we Moreover, we can directly access the Lucene repository to refer to the shortened four-character version of the IPC symbol preprocess the data and perform computationally expensive for the sake of simplicity. 2 tasks, such as the later described computation of similarities. 23 of the 638 available IPC symbols were not used in the dataset. 100 The cosine similarity is an efficient measure for sparse vec- Tausende 90 tors, which is useful in our case, as each IPC symbol is asso- 80 ciated with only a small fraction of the patents. This results 70 in a small number of non-zero dimensions per vector com- 60 pared to the total number of dimensions in the vector space, 50 and hence in sparse vectors. 40 30 4.2 Dimensionality Reduction of IPC Space 20 In the second step, we map the IPC symbols onto a 2D 10 plane required for the visualization. The goal of this step is 0 to find a 2D representation that approximates the similarity H04N H01M H04W H01G B60W H04L B60K H02J B60L H01L H04B B01J H04M C07D G06F C07C G01N C12N C08L C08F C01G C01B A61K A61P G02B matrix. That is, IPC symbols that are frequently co-used in the patent data are ideally placed close to each other, while those that never appear together are placed far apart. Figure 2: The distribution of the IPC usage frequen- cies roughly follows a power law, as illustrated for Our implementation uses t-SNE [22] as mapping technique. the 25 most often used IPC symbols in the 88.000 We first normalize the similarity matrix to get a probabil- patent records that were analyzed (in thousands). ity distribution P , where pij represents the similarity be- tween IPC symbol i and IPC symbol j. The t-SNE algo- rithm aims to find positions x1 , ..., xn ∈ R2 which minimize of 25,000 resulted in a good overview and only few overlaps the Kullback-Leibler divergence between two distributions of the text labels. P and Q: X pij After the layout has been computed, the IPC symbols are KL(P ||Q) = pij log (2) placed at the determined positions on the screen, as shown in qij i6=j Figure 1 a . The font size of each IPC symbol correlates with where qij is defined as: the number of associated patents, i.e., IPC symbols with a large font size are used more often in the patent data than (1 + ||xi − xj ||2 )−1 those with a small font size. We use a logarithmic scaling for qij = P 2 −1 (3) k6=l (1 + ||xk − xl || ) the font sizes, as the frequencies of the IPC symbols roughly representing the similarity between point xi and xj . follow a power law distribution (cp. Figure 2) and we do not want to overemphasize certain IPC symbols. The resulting For the maximum number of iterations, we use the default map view shows the whole IPC space, with the IPC symbols parameter of 1000 [22]. spatially arranged according to their relatedness and scaled in size according to their usage frequency. 5. IPC CLOUD VISUALIZATIONS In addition, we offer the user the option to remove even the The 2D mapping of the IPC space provides the basis for few remaining overlaps, in case he or she wants to. We use the creation of IPC clouds. In particular, we developed two the push variant of the Force-Scan Algorithm (FSA) [19] different types of IPC clouds that we call map view and for this purpose, which preserves the general layout and, in darts view and that will be detailed in the following. While particular, the relative distances of the nodes. The algorithm the map view provides a global overview on the IPC space, compares the label areas with each other and, if an overlap the darts view puts selected IPC symbols in the focus and is detected, the label which is further to the upper left is supports the visual identification of IPC symbols that are fixed and all other labels are moved in the direction where related to the selected ones. Both views follow the “visual the overlap is resolved the fastest. information seeking mantra” [20] by giving an overview first, then allowing to zoom and filter, and finally showing details Keeping the relative distances of the labels roughly stable is on demand. important, as they reflect the relatedness of the IPC sym- bols. This disqualifies many other algorithms for overlap re- 5.1 Map View moval that preserve the orthogonal ordering of the labels The map view is basically a normalized and rescaled depic- but not their relative distances [14]. A common drawback tion of the 2D representation we get after the dimensionality of the push variant of FSA is the increased size of the visu- reduction. Additionally, the font sizes reflect the frequencies alization, which is, however, not a problem in our case, as with which the IPC symbols are used. we usually expect only few label overlaps and as we added zooming and panning to the IPC clouds. If we would directly visualize the previously computed 2D representation of the IPC space, we would get many overlaps Panning and zooming are basic but important interaction resulting from the fact that the text labels (i.e., the IPC sym- techniques that enable the user to explore different parts of bols) have a non-zero width and height. As dimensionality the map view in more detail. Furthermore, we added a mini- reduction techniques typically map the data to an arbitrary map that always shows the whole IPC cloud and indicates Cartesian coordinate system, we first normalize and rescale which part of it is focused in the main view (Figure 1 c ). the mapping. By doing so, we transform the mapping into The minimap can also be used to change the focused area a coordinate system appropriate for visualization, while we and to reset the zoom level. It therefore helps to avoid that retain the spatial distribution. In our case, a scaling factor the user gets lost in the IPC space. A B C E D Figure 1: Map view of the IPC space where the user filtered four IPC symbols b . These IPC symbols and their related ones are shown in the overview a . The minimap c indicates which part of the IPC space is focused. The two highlighted IPC symbols have been selected by the user. The bottom part lists all patents that are associated with the selected IPC symbols d . Further information about the patent, including all associated IPC symbols, can be displayed on demand e . Since users are typically interested in specific IPC symbols, ested in. Related IPC symbols are concentrically arranged they can filter the map view to show only a subset of IPC around the bullseye in distances that reflect their related- symbols and those that are co-used. This can be done by ness to the selected IPC symbols: While IPC symbols close selecting any number of IPC symbols on the map and adding to the bullseye are strongly related, IPC symbols near the them to a whitelist displayed on the right of the visualization border have a weaker relation. Figure 3 shows an example (Figure 1 b ). As it can be hard to spot specific IPC symbols where the IPC symbol “F02N” has been selected and hence on the map, the IPC symbols can alternatively be entered in forms the bullseye. a search field (equipped with an autocomplete feature). Once all IPC symbols of interest have been added and the filter The darts view requires the definition of two key parameters: is activated, IPC symbols that are not related to at least m 1) a maximum number n of IPC symbols shown in the visu- of the whitelisted ones are removed from the visualization alization, and 2) a threshold α defining the minimum simi- (with a variable m that is set to m = 1 by default). larity value a related IPC symbol must have to be shown in the visualization. Both parameters are interrelated and suit- If the user selects an IPC symbol in the visualization, the able values are dependent on the application context, such titles of patents associated with that symbol are listed be- as the available screen space or the average font size of the neath the main view (Figure 1 d ). If several IPC symbols labels. We had good experiences with an n of 10 to 20, as are selected, only titles of patents associated with all of the this number of IPC symbols can still be well perceived and symbols are listed (i.e. they are connected by a logical con- cognitively processed. A good α value is more difficult to junction operator). More details on a patent, such as the choose, as the similarity values are dependent on the con- whole list of associated IPC symbols and its titles in Ger- sidered patent data and IPC symbols. For our patent data, man and French, are shown in a tooltip when hovering over an α of 0.5 to 0.7 has led to good results in most cases. For the patent’s title in the list. instance, we used an α of 0.6 to generate the darts view shown in Figure 3. However, it could happen that for some IPC symbols no results are returned, as all similarity values 5.2 Darts View are below the given threshold α. The darts view provides another perspective on selected IPC symbols using the metaphor of a dartboard. In contrast to Another option would be to dynamically choose an appro- the the map view, it does not provide a global overview on priate α based on the number of related IPC symbols that the IPC space but focuses on specific IPC symbols and their are returned. For instance, α could be dynamically changed local context. IPC symbols selected in the map view or en- in a way that there are always the n most related IPC sym- tered in the search field are placed in the center of the darts bols shown in the darts view. However, such an adaptive view (the bullseye), as they define what the user is inter- approach bears the risk that the user does not recognize the β α 5.3 Example of Use Let us assume we want to file a patent for a new technique to start combustion engines. The IPC symbol “F02N” is ideally suited to classify our invention, since it refers to the “start- ing of combustion engines” [13]. In the map view, we have already spotted said IPC symbol and noticed that the IPC symbol “H02P” is very close to it (as in Figure 1). It classi- fies patents that describe a “control or regulation of electric motors, generators, or dynamo-electric converters” [13]. We can therefore assume that several technologies for com- bustion engines are also used in electric motors. It seems to be a good idea to analyze the patents related to electri- cal engine starters, because there may already be a patent which is in conflict with our invention. After switching to the darts view, we realize that there seem to be several other IPC symbols that are also strongly re- lated to the IPC symbol we are interested in, leading us to Figure 3: Darts view showing one selected IPC sym- further technologies and patents that might be of relevance bol in the bullseye and related IPC symbols concen- and should be considered before filing our patent. trically arranged around it indicating their related- ness. 6. DISCUSSION OF SCALABILITY Due to the massive number of patents that are digitally available nowadays, scalability is one of the main issues in variable threshold when analyzing different darts views. It any patent visualization approach. A key challenge in our ap- may also lead to a wrong impression, as the visualization proach lies with the 2D mapping of the IPC symbols. Dimen- might include IPC symbols that are only very distantly re- sionality reduction methods are usually not stable, i.e. the lated to the selected ones in case of a low α. algorithms may map data to very different locations on the 2D plane even if the data changes only slightly. Therefore, After the related IPC symbols have been determined based we do not recompute the 2D mapping with every change in on the parameters n and α, their positions on the dartboard the dataset but keep the mapping stable as long as it still are computed. Like the map view, the darts view makes use reflects the IPC distances in a sufficient way. That is, sta- of the 2D representation we computed in Section 4, in that bility has a higher priority than precision in this particular the related IPC symbols are located in the representation case, as the distances in the 2D representation only roughly and their relative angle to the selected IPC symbol is de- indicate the relatedness of the IPC symbols anyway. termined. If multiple IPC symbols are selected, the average of the angles is taken. The related IPC symbols are then Besides the scalability of the visualization, the scalabilities ordered by their angle. However, they are not drawn with of the data storage and data model are crucial in patent re- their original angle on the dartboard but the angles are nor- trieval. The former is unproblematic in our approach, as new malized in a way that they are forming a circle around the patent records can simply be added to the Elastic Search selected IPC symbol(s). database. If new IPC symbols are added to the database, only those patent records need to be updated that are clas- Apart from the angles, we also compute the distances of the sified by these symbols, without the need to update any IPC symbols in relation to the bullseye. We take the values other patent records. that resulted from the similarity computation (cf. Section 4) and use a logarithmic scale to determine the final positions The data model is robust to an increasing amount of patents of the IPC labels. We decided for a logarithmic scale, as the in the sense that the similarities of the IPC symbols do not similarities of the IPC symbols follow roughly a power law need to be recomputed due to the usually large amount of distribution again, i.e. the number of IPC symbols with a patent records that are processed in the initial mapping. high similarity value is much lower than the number of IPC New patents will still be found if IPC symbols are selected in symbols with a lower similarity in nearly all cases. Finally, the visualization because the search for related patents uses the IPC symbols are placed at the determined positions on the database without actually considering the data that has the dartboard, while their font sizes indicate how often they been used by the data model. This robustness entails two are used in the patent data, like in the map view. disadvantages: 1) it will be necessary to recompute the simi- larity matrix at some point, which will also require a remap- Note that there is no fixed value separating the inner from ping onto the 2D plane; 2) if a large number of patents will the outer circle of the dartboard by default. If we want to emerge in a specific field, such that the associated IPC sym- have such a value, we can simply define another threshold bols would get a lot more important, this approach would β for the inner circle (see Figure 3). This threshold β sets not be able to detect this shift in the IPC space. To repre- the borderline that separates IPC symbols in the inner circle sent new IPC symbols in the data model, it is necessary to from the outer. Likewise, we can add any number of addi- recompute the similarity matrix as well as the 2D mapping tional circles to the darts view, each with its own threshold. of the IPC symbols. # of patents # of IPC symbols [2] Elastic Search. http://www.elasticsearch.org. Data storage + + [3] EPO – Espacenet. http://www.espacenet.com. Data model Search: + - [4] EPO – European Publication Server. Sim. accuracy: 0 https://data.epo.org/publication-server. Mapping + - [5] EPO Worldwide Patent Statistical Database (PATSTAT). http://www.epo.org/searching/ Table 1: Scalability of the data storage, data model, subscription/raw/product-14-24_de.html. and mapping in relation with the number of patents [6] European Patent Register. and the number of IPC symbols. https://register.epo.org. [7] IPC (International Patent Classification). Table 1 summarizes the discussed scalabilities of the various http://www.epo.org/searching/essentials/ components of our approach. It indicates how well the data classification/ipc-reform.html. storage, data model, and mapping scale with an increasing [8] MongoDB. http://www.mongodb.org/. amount of patents and IPC symbols after the initial compu- [9] Open Patent Services (OPS). tation of the data model. http://www.epo.org/searching/free/ops.html. [10] PatAnalyse – Sample Patent Map. 7. CONCLUSION AND FUTURE WORK http://www.patanalyse.com/samplemap.html. We presented IPC clouds, an interactive visualization for the [11] Patent iNSIGHT Pro. patent domain inspired by tag clouds that allows to explore http://www.patentinsightpro.com/. the IPC space. In contrast to related work, IPC clouds do [12] Thomson Innovation. not make use of the predefined IPC hierarchy but are based http://thomsonreuters.com/thomson-innovation. on the actual co-use of IPC symbols in the patent data. They [13] WIPO – World Intellectual Property Organization. provide an overview of the IPC space and enable the user http://www.wipo.int. to ‘dive’ into it and find related IPC symbols that might be [14] T. Dwyer, K. Marriott, and P. J. Stuckey. Fast node relevant in a specific retrieval context. overlap removal. In Proceedings of the 13th Int. Conf. on Graph Drawing, GD’05, pages 153–164. Springer, We presented two different types of IPC clouds: The map 2006. view arranges the IPC symbols globally on a 2D plane, while [15] M. Giereth, S. Koch, M. Rotard, and T. Ertl. Web the darts view provides a local and focused layout for a se- based visual exploration of patent information. In lected subset of IPC symbols. It uses the metaphor of a dart- Proceedings of the 11th Int. Conf. on Information board with the selected IPC symbols in the bullseye and re- Visualization, IV ’07, pages 150–155. IEEE CS, 2007. lated symbols concentrically arranged around it. Although [16] A. B. Jaffe and M. Trajtenberg. Patents, Citations & the visualizations look different, they are strongly related Innovations: A Window on the Knowledge Economy. and can efficiently be created from the same 2D representa- MIT Press, revised edition, 2005. tion. Like in tag clouds, the font sizes of the IPC symbols [17] D. O. Kutz. Examining the evolution and distribution are scaled according to their usage frequencies to empha- of patent classifications. In Proceedings of the 8th Int. size IPC symbols that occur very frequently in the analyzed Conf. on Information Visualisation, IV ’04, pages data. We added a simple search interface to the map view, 983–988. IEEE CS, 2004. using a whitelist of IPC symbols for filtering. Both visual- [18] S. Lohmann, J. Ziegler, and L. Tetzlaff. Comparison of izations are additionally equipped with several interaction tag cloud layouts: Task-related performance and visual techniques that support the exploration of the IPC space exploration. In Proceedings of the 12th IFIP TC 13 and allow to get more details about patents that are related Int. Conf. on Human-Computer Interaction, Part I, to selected IPC symbols. INTERACT ’09, pages 392–404. Springer, 2009. We are currently in the process of expanding our database [19] K. Misue, P. Eades, W. Lai, and K. Sugiyama. Layout to contain data for all patents indexed in Espacenet, which adjustment and the mental map. Journal of visual is more than 80 million [3]. Once these patents have been languages and computing, 6(2):183–210, 1995. loaded into our database, we will investigate if there are [20] B. Shneiderman. The eyes have it: A task by data distinguishable clusters or patterns of IPC symbols. We are type taxonomy for information visualizations. In also planning to extract concepts and components from the Proceedings of the 1996 IEEE Symposium on Visual patent documents and visualize their relations in addition to Languages, VL ’96, pages 336–343. IEEE CS, 1996. the IPC space. Finally, we aim to extend and combine the [21] C. Sternitzke, A. Bartkowski, and R. Schramm. map and darts view in a manner that they are integrated into Visualizing patent statistics by means of social one highly dynamic and interactive IPC cloud visualization. network analysis tools. World Patent Information, 30(2):115 – 131, 2008. 8. ACKNOWLEDGMENTS [22] L. Van der Maaten and G. Hinton. Visualizing This work was partially supported by the EU funded project high-dimensional data using t-SNE. Journal of iPatDoc (grant no. 606163). Machine Learning Research, 9(2579-2605):85, 2008. [23] F. B. Viégas and M. Wattenberg. Tag clouds and the case for vernacular visualization. interactions, 9. REFERENCES 15(4):49–52, 2008. [1] Delphion Citation Link. http://www.delphion.com/ products/research/products-citelink.