=Paper=
{{Paper
|id=Vol-1437/ipamin2015_submission_5
|storemode=property
|title=Visualizing Query Comparisons in Patent Retrieval Systems
|pdfUrl=https://ceur-ws.org/Vol-1437/ipamin2015_paper5.pdf
|volume=Vol-1437
}}
==Visualizing Query Comparisons in Patent Retrieval Systems==
Visualizing Query Comparisons in Patent Retrieval Systems Julia J. Jürgens Thomas Mandl Christa Womser-Hacker Dept. of Information Science & Natural Language Processing University of Hildesheim Universitätsplatz 1 - 31141 Hildesheim, Germany {juerge, mandl, womser}@uni-hildesheim.de ABSTRACT tools. Two concrete prototypical visualizations are suggested. Patent retrieval is a very complex process where users need to be They were gained by using a user-centered development supported in order to finish their tasks efficiently and effectively. approach. There are many tasks in the process that can benefit from such The paper is structured as follows. Section 2 gives a short tools, one being the phase of query formulation. Being a highly introduction into patent information retrieval and explains the manual task, it is only possible to precompute possible helpful motivation for our prototypes. In Section 3, the field of data and to then visualize it for the user. The process of querying information visualization is described and the potential for patent and the pertaining results of information retrieval systems can be retrieval tasks is highlighted. Related work is presented in Section visualized in many ways. We present two prototypical system 4. Our prototypes are described in Section 5 before concluding the designs for comparing the queries in patent retrieval. The paper in Section 6. prototypes include elements of the query structure as well as the results set size. Both are crucial elements for patent experts to explore the effect of changes in a query. Our system supports the stepwise optimization of complex queries in patent searches. The 2. PATENT INFORMATION design ideas are based on knowledge engineering with domain RETRIEVAL experts. Patent retrieval differs from other retrieval processes in several ways [Lupu et al. 2011]. Of particular importance is the Keywords professional character of patent searches which emphasizes Patent Retrieval, Information Visualization, Information diligence and which leads to complex queries. Patent queries can Retrieval, User Centered Design. be one page long and may encompass many fields and may contain dozens of parameters. The development and maintenance of such a query strategy requires elaboration and iterative 1. INTRODUCTION optimization [Bonino 2010]. Patents are one of the most important sources for recent One way to support the complexity for patent searchers is the technology information. Over 2 million new patents are registered implementation and integration of more value-added components worldwide with high growth rates especially in Asia nowadays. like trend analysis [Kim et al. 2009] or network analysis [Han The retrieval of relevant information from patents is of crucial 2014], advanced linguistic analysis [Becks 2013] or even importance for investments of enterprises. forecasting and predictive analysis [Jung & Ha 2015]. In this paper, we analyze the role of information visualization in Currently, approaches taking a broader view at search processes patent retrieval and present how the field can benefit from visual and information behavior [Widen et al. 2014] are applied also to patent retrieval. A behavior model was developed which takes into account the phases of patent retrieval processes by patent experts [Jürgens & Womser-Hacker 2014]. This model defines and explains the following seven sub- Copyright © 2015 for the individual papers by the papers' authors. processes of patent retrieval: Recognize/Accept, Define Problem, Copying permitted for private and academic purposes. Select Database, Formulate Query, Examine Results, Extract This volume is published and copyrighted by its editors Published at CEUR-WS.org Info/Report, Reflect/Stop. The iterative character is clarified by Proceedings of the 2nd International Workshop on Patent Mining and Its the many arrows between the sub-phases. Jürgens & Womser- Applications (IPaMin 2015). Beijing, May. 27-28, 2015. Hacker (2014) further highlight the difficulties in these steps. The query formulation phase e.g. is one of the most critical tasks in the process since the problem needs to be translated into a query. The of the classes comply with the percental change in the number of quality of the query is highly dependent on the expertise and the documents in comparison to the previous interval: green classes experience of the patent searcher. This means that automatic denote an increase in patents and red ones a decrease. A third approaches alone fall short during this step, they can only be a color is introduced when it comes to the analysis of specific means for inspiration. Systems therefore need to deliver portfolios by assignees. Here, yellow rectangles signify that the precomputed data which then has to be presented to the user so applicants had not been granted patents in that specific class. The (s)he can further interact with it to be able to make better author also visualizes these treemaps on a timeline to better decisions. A field that is concerned with exactly such a scenario is understand the evolution of the patent landscapes [Kutz 2004]. information visualization. 3. INFORMATION VISUALIZATION Visualization intends to make data more easily understandable for humans. By making use the tremendous visual processing capabilities of human brains, system engineers can present more data than in textual or numerical modes. Visualizations can be applied either as a presentation tool to communicate ideas, explain data or provide support or they can be used for analysis where very complex data is illustrated and users can make use of a variety of interaction techniques. Especially this latter use of visualizations can lead to a dialog between the analyst and the data that promotes exploration and learning. Visualization is thus helpful in gaining insights, not only in the meaning of spontaneous “aha”-moments but also from the perspective of knowledge building [Chang et al. 2009]. In patent retrieval, both forms of visualizations can be of avail. In some search scenarios (like the state-of-the-art search), it is sufficient to get a general understanding of the field. Here, Figure 1. Visualization of a result set based on publication visualizations that give the user an overview, e.g. over the top countries [Questel] inventors or technologies, can be valuable. In other situations (like the validity search), a large number of patents needs to be examined in depth to extract the relevant passages. Here, visual tools that support this analytical task could be applied. In critical scenarios, the visual exploration of similar patents is also imaginable. The use cases for visualizations during complex patent searches are numerous. Visualizations currently offered in patent search systems and discussed in research are described in the next section. 4. RELATED WORK: VISUALIZATION IN PATENT RETRIEVAL Patent retrieval systems on the market integrate more and more visualization techniques. They mostly integrate classical diagrams and presentation techniques into the result analysis (see Figure 1). Some software products also contain more sophisticated Figure 2: Patent Landscape [STN Anavist] visualizations such as 3D-landscapes (see Figure 2). Independent from their specific visualizations, all systems focus on the presentation of result sets so that the potential of visualization for the retrieval process is often not fully exploited. The close coupling of query formulation and result assessment has On the one side, research concerning the use of visualizations in long been discovered in traditional information retrieval and its patent systems is rather limited. On the other side, very different effectiveness been demonstrated in systems such as the alpha applications for visualizations have been examined, ranging from slider system by Ahlberg & Shneiderman (1994). The prototype the presentation of the whole patent space to result set by McLean (2000) follows exactly this idea and aims to “integrate visualization and visualizations that should ultimately help users retrieval with interaction“. On the basis of requirements collected with improving their search queries. from patent searchers, he built a system where users can create “query stacks“. The users start from a broader query and then Kutz (2004) used treemaps to visualize all patents of the USPTO refine it using certain filters. The results are immediately shown archive between 1976 and 2002 on the basis of their 466 IPC on a 2-dimensional plot of results so that the consequences of classes. The data set was examined in 5 year intervals. The colors changes in the query can be quickly viewed in the plot. Each was modeled by introducing relevance feedback for individual patent is shown as a small rectangle, its position on the plot is documents. The effects of the relevance decisions of the user were determined by similarity measures. Certain attributes such as the immediately interpreted by the system and the ranking was IPC class can be colored as shown in Figure 3 [McLean 2000]. adapted. Here, visualization was used to increase the transparency of the ranking algorithm. As seen on Figure 6, the changes of positions compared to the last ranking were shown for each document. That way the user could explore extreme changes and find more interesting documents with potentially more relevant terms [Hackl 2009]. Figure 3: Query Stack and Result Visualization [McLean 2000] The system PatViz by Koch et al. (2009) has the same goal. It also lies its focus on the integration of insights from the analysis of Figure 4: Filter Graph [Koch et al. 2009] result sets into the reformulation of queries. The authors developed ten views (e.g. a patent graph and a geo-timeline) that show different perspectives on the current result set and that are linked so that users can make use of brushing. A further view called Filter Graph was developed to use different sets of results as building blocks to produce complex extraction strategies (see Figure 4). The different kinds of nodes allow the user to produce subsets of the result set using filters and other operators and to combine these in customized ways. Although this idea could be further adapted to query formulation, its application is currently restricted to result sets. Another visualization by the same authors also picks up the idea by McLean (2000) of presenting the different query facets of a search. Since their tool PatViz is based on work in the PatExpert project, where different search functionalities like full text search, metadata search, image similarity search, semantic search, and document similarity search are provided, the authors constructed a visual tool that allowed the user to combine these different searches. As depicted in Figure 5, the various search types are all presented in unique colors (Image similarity search (blue), semantic search (grey), keyword search (green), and metadata search (orange)), making it easy and obvious for the user to see how a query is constructed. Figure 5: Visual integration of different search facilities [Koch et al. 2009 ] The system by Hackl (2009) also aspires to optimize the patent search query, although by a different approach, namely relevance feedback. The system PatentAide aims to make weighting and advanced scoring models more transparent for patent retrieval where Boolean matching is still most widely used. PatentAide allows Boolean as well as probabilistic matching and ranking. The typical information behavior of stepwise optimization of a query spend a lot of time and effort in constructing the queries and demand a high degree of control over them. They desire a wide variety of search possibilities and appreciate systems that take the special requirements into account. We developed and designed two prototypes which allow the comparison of queries from two different points of view. The effect of changing parameters is shown to the user by different means. The prototypes are well suited to explore and optimize complex queries in interaction sequences. In the first case, different queries can be directly compared to enhance the user’s understanding concerning the scope of result sets and their overlaps or differences. The view that was developed for this scenario is called Query Comparison. The second suggestion is to support the patent searcher in the development of query combinations. The view Query Combination should inspire the user to produce effective combinations of queries without having to undertake too many iterations of query formulation. By giving the user an immediate Figure 6: Dynamic Relevance Feedback [Hackl 2009] impression on result set sizes, unsuitable combinations of queries might be prevented, thereby making the process more transparent and efficient. Both concepts and prototypes are described in detail The prototype by Herr et al. (2014) consists of two views that below. should support the user in identifying relevant IPCs to improve their search queries. The authors adapted tag clouds to visualize Figure 7 shows the paper prototype of the Query Comparison co-occurrences between IPC classes. They compute the pair-wise view. On the left, the user can choose which queries (s)he would similarities of IPC subclasses based on their co-use in patents and like to compare. These queries have been executed before and are map these onto a 2D-plane. Two different views are available to now available in a history. the user. In the first one, called map view, it is possible to gain a The selected queries are then depicted as symbols in the center of general overview of all IPC subclasses used in a patent set. The the screen. A query is represented by a circle and a combination similarity between these classes is depicted by their distance and of queries (connected through Boolean operators) looks rather the font size displays the overall frequency of the IPC subclass in cloud-like to visually remind the user of its formation. The bars the set. The darts view lets users specify a class as a focus. Like below contain the specified logic behind the comparisons of the on a dartboard, co-occurring subclasses are presented on queries. They can either be formulated manually or loaded from concentric circles. earlier comparisons. It is also possible to specify a group of As can be seen from the above literature, there have been some default comparisons that is automatically loaded when the view attempts to support patent searchers during query formulation. opens. The result set that fulfills the Boolean logic is calculated The users can learn from consequences on result sets or from upon clicking the „Execute comparison“ button in the lower right metadata such as IPC classes. The first idea seems very logical but and is then represented as a circle beneath the corresponding bar. the question arises if and how the searchers can abstract from the The number of documents is shown in the circles’ center, which presentation of results to making the right decisions concerning provides the user with helpful information concerning the further query reformulations. Maybe, other visualizations can support the development of the search strategy. To see a list of the patents in a users in making this task easier. This forms the starting point for new window, the user needs to double-click the circles. That way, the authors’ research which is described in detail in the next the user can immediately check if e.g. an expansion of a query led section. to more relevant results. These subsequent steps of query evaluation are especially important in patent retrieval since the 5. DESIGN OF QUERY COMPARISON result set needs to comprise all relevant documents but must at the same time be manageable. SYSTEMS Our approach is based on intensive knowledge engineering with The second visualization, Query Combination, is shown in Figure experts and a user centered design process with several design 8. Its goal is to let the user visually explore which query iterations. combinations might lead to manageable result sets. Patent searchers often formulate initial subqueries that describe parts of Interviews with domain experts from several technical fields have the search (e.g. certain materials or the use of a technology) and shown that for the development of complex queries for typical combine them later on to final queries that comprise all relevant patent information needs, it is crucial to compare the effects of aspects of the search. Since the first combination of queries different queries and find the optimal query for a certain usually doesn’t produce the final result set, it would be information need [Struß et al. 2014]. The state of the art in patent advantageous to specify a few candidates for query parts and let search in general also stresses the importance of iterative query the system calculate all combinations. The user can choose on the construction and query comparison. left which query parts should be included, thereby triggering the The study by Joho et al. (2010) emphasizes the importance of system to calculate all combinations. These are then depicted as search functionalities in the patent domain. The users differ very circles where the color and the size redundantly represent the much from the typical web searcher in that they are willing to result set sizes. All document sets can be opened and assessed by Figure 7: Query Comparison view double-clicking the particular circle. It must be noted that the “crash”, i.e. not deliver the anticipated amount of patents. Two calculation of all possibilities and their visual representation professionals were not sure about the benefit; one described the should be limited to a reasonable number. The immediate and size of the result set as being a “dangerous criterion” for the direct visualization of the size allows the experts to easily appropriateness of the result set. The meaning of the color scheme optimize the size of their final result set. was again criticized by one expert and the request for more We conducted an informal evaluation of these two prototypes information concerning each set was also expressed once. with seven professional patent searchers. The patent searchers In summary, the evaluation of thye ideas was very encouraging were recruited at the PatInfo 2014 in Ilmenau, Germany. Since and indicated that the ideas tackle real problems of patent this conference is highly domain-specific, all participants were searchers. The discussion with the professionals and their familiar with the patent domain. The patent searchers were invited suggestions will be taken into account in the further development to take part in an interview that lasted about an hour. This was of the visualizations. structured as follows: The experts were first asked to present their professional experience in patent retrieval to learn something 6. CONCLUSION AND FUTURE about their background. Then, they were given an introduction WORK into the study and were afterwards confronted with the prototypes In this paper, we argued that patent retrieval and especially query and the ideas behind them. The patent searchers were allowed to formulation is a complex process that needs to be supported by ask questions and were encouraged to give their opinion and to tools. Our research aims to provide such tools on the basis of suggest possible improvements. visualizations. We presented two prototypical visualizations that give users another perspective on query formulation and that were Out of the seven professionals, six experts commented favorably evaluated with seven professional patent searchers. Since the on the Query Comparison view. The visualization was evaluated feedback was encouraging, the prototypes will be further as meaningful and more efficient compared to current search developed and integrated into a fully functional system. One of facilities. One expert mentioned that the idea offered more the authors is currently working on the implementation, using information than currently available in the systems; another one JavaScript and the JS library D3 for the visualizations. highlighted its use as an analytical tool for a better understanding of the result sets. Negative comments were the unclear use of color, the lack of a drag and drop interaction and the question Apart from the sub-process of query formulation, there are other tasks during the patent retrieval process that can benefit from whether such functionality would be helpful at that point of the research process. visualizations. For these scenarios, visual prototypes will be developed and further requirements of domain experts taken into The Query Combination view was rated positively by four account. The final prototype that will consist of a number of experts. They saw value in the clear overview, liked the aesthetic visual tools for patent searchers will be thoroughly evaluated in design, and argued that one would not have to try out as many formal user test settings. queries anymore. Also, one could see when a query would Figure 8. Query Combination Konvens 2014. Hildesheim, Germany, October 6-7, 2014. http://ceur-ws.org/Vol-1292/ 7. ACKNOWLEDGEMENTS 7. Jung, H. & Ha, Y. (2015): InSciTe advisory: Prescriptive The authors would like to thank FIZ Karlsruhe for supporting this analytics service for enhancing research performance. In: research through a Doctoral Fellowship to the first author. Knowledge and Smart Technology (KST), 2015 7th International Conference on Knowledge and Smart 8. REFERENCES Technology. Chonburi, Thailand. 28-31 Jan. 2015. 1. Ahlberg, C. & Shneiderman, B. (1994): Visual information http://dx.doi.org/10.1109/KST.2015.7051448 seeking: tight coupling of dynamic query filters with starfield 8. Herr, D.; Han, Q.; Lohmann, S.; Brügmann, S. & Ertl, T. displays. In: Celebrating Interdependence. CHI’94 (2014): Visual Exploration of Patent Collections with IPC conference proceedings on Human Factors in Computing Clouds. In: Proceedings of the First International Workshop Systems. Boston, New York: ACM, 313-317. on Patent Mining and Its Applications (IPaMin 2014) co- 2. Becks, D. (2013): Die Nutzung von Head-Modifier Phrasen located with Konvens 2014. Hildesheim, Germany, October für Patent-Retrieval. Fachinformationszentrum Karlsruhe, 6-7, 2014. http://ceur-ws.org/Vol-1292/ FIZ. 9. Joho, H.; Azzopardi, L.A. & Vanderbauwhede, W. (2010): A 3. Bonino, D.; Ciaramella, A. & Corno, F. (2010): Review of survey of patent users: an analysis of tasks, behavior, search the state-of-the-art in patent information and forthcoming functionality and system requirements. In: Proceedings of the evolutions in intelligent patent informatics. In: World Patent third symposium on Information interaction in context. Information vol. 32, Issue 1, March 2010, 30-38 ACM, 13-24. 4. Chang, R.; Ziemkiewicz, C.; Green, T. M., & Ribarsky, W. 10. Jürgens, J.J.; Womser-Hacker, C. & Mandl, T. (2014): (2009): Defining insight for visual analytics. Computer Modeling the interactive patent retrieval process: an Graphics and Applications, IEEE, 29(2), 14-17. adaptation of Marchionini's information seeking model. In 5. Hackl, R. (2009): Transparentes Ranking und Relevanz- Proceedings of the 5th Information Interaction in Context Feedback im Patentretrieval . Fachinformationszentrum Symposium (IIiX '14). New York, NY, USA: ACM, 247-250. Karlsruhe, FIZ http://doi.acm.org/10.1145/2637002.2637034 6. Han,H.; Xu,S.; Zhu, L.; Qiao, X.; Gui,J. & Zhang, Z. (2014): 11. Jürgens, J.J. & Womser-Hacker, C. (2014): Limitations of Mining Technical Topic Networks from Chinese Patents. In: Automatic Patent IR. In: Datenbank-Spektrum. March 2014, Proceedings of the First International Workshop on Patent Volume 14, Issue 1, 5-17. Mining and Its Applications (IPaMin 2014) co-located with 12. Kim, Y.; Tian, Y.; Jeong, Y.; Jihee, R.& Myaeng, S.-H. (2009): Automatic Discovery of Technology Trends from Patent Text. In: Proceedings of the 2009 ACM Symposium on Workshop on Patent Retrieval. Athens, Greece, July 28, Applied Computing. SAC. Honolulu, Hawaii, USA, March 8- 2000. http://research.nii.ac.jp/~ntcadm/sigir2000ws/ 12, 2009. New York, NY, USA: ACM, 1480–1487. 17. Questel: https://www.questel.com/ Available online at http://doi.acm.org/10.1145/1529282.1529611 18. STN Anavist: http://www.stn- international.de/stn_anavist.html 13. Koch, S.; Bosch, H.; Giereth, M.,& Ertl, T. (2009): Iterative integration of visual insights during patent search and 19. Struß, J.M.; Mandl, T.; Schwantner, M. & Womser-Hacker, analysis. In: IEEE Symposium on Visual Analytics Science C. (2014): Understanding Trends in the Patent Domain. In: and Technology, VAST 2009, 203-210. Proceedings of the First International Workshop on Patent Mining and Its Applications (IPaMin 2014) co-located with 14. Kutz, D. O. (2004): Examining the evolution and distribution Konvens 2014. Hildesheim, Germany, October 6-7, 2014. of patent classifications. In: Proceedings of the Eighth http://ceur-ws.org/Vol-1292/ International Conference on Information Visualisation, IV 2004, IEEE, 983-988. 20. Widén, G.; Steinerová, J. & Voisey, P. (2014): Conceptual modelling of workplace information practices: a literature 15. Lupu, M.; Mayer, K.; Tait, J. & Trippe, A. (2011): Current review. In: Proceedings of ISIC: the information behaviour Challenges in Patent Information Retrieval. Springer. conference, Leeds, 2-5 September, 2014: Part 1. In: 16. McLean, A. W. (2000): Patent Space Visualization for Patent Information Research vol. 19 no. 4, December, 2014. Retrieval. In: Proceedings of the ACM SIGIR 2000 http://www.informationr.net/ir/19-4/isic/isic08.html#.VSO- POESqVA