=Paper= {{Paper |id=Vol-1437/ipamin2015_submission_5 |storemode=property |title=Visualizing Query Comparisons in Patent Retrieval Systems |pdfUrl=https://ceur-ws.org/Vol-1437/ipamin2015_paper5.pdf |volume=Vol-1437 }} ==Visualizing Query Comparisons in Patent Retrieval Systems== https://ceur-ws.org/Vol-1437/ipamin2015_paper5.pdf
                                 Visualizing Query Comparisons
                                   in Patent Retrieval Systems

                                                            Julia J. Jürgens
                                                            Thomas Mandl
                                                        Christa Womser-Hacker
                                  Dept. of Information Science & Natural Language Processing
                                                     University of Hildesheim
                                        Universitätsplatz 1 - 31141 Hildesheim, Germany
                                       {juerge, mandl, womser}@uni-hildesheim.de




ABSTRACT                                                                 tools. Two concrete prototypical visualizations are suggested.
Patent retrieval is a very complex process where users need to be        They were gained by using a user-centered development
supported in order to finish their tasks efficiently and effectively.    approach.
There are many tasks in the process that can benefit from such           The paper is structured as follows. Section 2 gives a short
tools, one being the phase of query formulation. Being a highly          introduction into patent information retrieval and explains the
manual task, it is only possible to precompute possible helpful          motivation for our prototypes. In Section 3, the field of
data and to then visualize it for the user. The process of querying      information visualization is described and the potential for patent
and the pertaining results of information retrieval systems can be       retrieval tasks is highlighted. Related work is presented in Section
visualized in many ways. We present two prototypical system              4. Our prototypes are described in Section 5 before concluding the
designs for comparing the queries in patent retrieval. The               paper in Section 6.
prototypes include elements of the query structure as well as the
results set size. Both are crucial elements for patent experts to
explore the effect of changes in a query. Our system supports the
stepwise optimization of complex queries in patent searches. The         2.   PATENT INFORMATION
design ideas are based on knowledge engineering with domain              RETRIEVAL
experts.                                                                 Patent retrieval differs from other retrieval processes in several
                                                                         ways [Lupu et al. 2011]. Of particular importance is the
Keywords                                                                 professional character of patent searches which emphasizes
Patent Retrieval, Information          Visualization,     Information    diligence and which leads to complex queries. Patent queries can
Retrieval, User Centered Design.                                         be one page long and may encompass many fields and may
                                                                         contain dozens of parameters. The development and maintenance
                                                                         of such a query strategy requires elaboration and iterative
1.         INTRODUCTION                                                  optimization [Bonino 2010].
Patents are one of the most important sources for recent                 One way to support the complexity for patent searchers is the
technology information. Over 2 million new patents are registered        implementation and integration of more value-added components
worldwide with high growth rates especially in Asia nowadays.            like trend analysis [Kim et al. 2009] or network analysis [Han
The retrieval of relevant information from patents is of crucial         2014], advanced linguistic analysis [Becks 2013] or even
importance for investments of enterprises.                               forecasting and predictive analysis [Jung & Ha 2015].
In this paper, we analyze the role of information visualization in       Currently, approaches taking a broader view at search processes
patent retrieval and present how the field can benefit from visual       and information behavior [Widen et al. 2014] are applied also to
                                                                         patent retrieval. A behavior model was developed which takes
                                                                         into account the phases of patent retrieval processes by patent
                                                                         experts [Jürgens & Womser-Hacker 2014].
                                                                         This model defines and explains the following seven sub-
Copyright © 2015 for the individual papers by the papers' authors.       processes of patent retrieval: Recognize/Accept, Define Problem,
Copying permitted for private and academic purposes.
                                                                         Select Database, Formulate Query, Examine Results, Extract
This volume is published and copyrighted by its editors
Published at CEUR-WS.org                                                 Info/Report, Reflect/Stop. The iterative character is clarified by
Proceedings of the 2nd International Workshop on Patent Mining and Its   the many arrows between the sub-phases. Jürgens & Womser-
Applications (IPaMin 2015). Beijing, May. 27-28, 2015.                   Hacker (2014) further highlight the difficulties in these steps. The
                                                                         query formulation phase e.g. is one of the most critical tasks in the
process since the problem needs to be translated into a query. The      of the classes comply with the percental change in the number of
quality of the query is highly dependent on the expertise and the       documents in comparison to the previous interval: green classes
experience of the patent searcher. This means that automatic            denote an increase in patents and red ones a decrease. A third
approaches alone fall short during this step, they can only be a        color is introduced when it comes to the analysis of specific
means for inspiration. Systems therefore need to deliver                portfolios by assignees. Here, yellow rectangles signify that the
precomputed data which then has to be presented to the user so          applicants had not been granted patents in that specific class. The
(s)he can further interact with it to be able to make better            author also visualizes these treemaps on a timeline to better
decisions. A field that is concerned with exactly such a scenario is    understand the evolution of the patent landscapes [Kutz 2004].
information visualization.


3.         INFORMATION VISUALIZATION
Visualization intends to make data more easily understandable for
humans. By making use the tremendous visual processing
capabilities of human brains, system engineers can present more
data than in textual or numerical modes.
Visualizations can be applied either as a presentation tool to
communicate ideas, explain data or provide support or they can be
used for analysis where very complex data is illustrated and users
can make use of a variety of interaction techniques. Especially
this latter use of visualizations can lead to a dialog between the
analyst and the data that promotes exploration and learning.
Visualization is thus helpful in gaining insights, not only in the
meaning of spontaneous “aha”-moments but also from the
perspective of knowledge building [Chang et al. 2009].
In patent retrieval, both forms of visualizations can be of avail. In
some search scenarios (like the state-of-the-art search), it is
sufficient to get a general understanding of the field. Here,               Figure 1. Visualization of a result set based on publication
visualizations that give the user an overview, e.g. over the top                                countries [Questel]
inventors or technologies, can be valuable. In other situations (like
the validity search), a large number of patents needs to be
examined in depth to extract the relevant passages. Here, visual
tools that support this analytical task could be applied. In critical
scenarios, the visual exploration of similar patents is also
imaginable. The use cases for visualizations during complex
patent searches are numerous. Visualizations currently offered in
patent search systems and discussed in research are described in
the next section.


4.   RELATED WORK:
VISUALIZATION IN PATENT RETRIEVAL
Patent retrieval systems on the market integrate more and more
visualization techniques. They mostly integrate classical diagrams
and presentation techniques into the result analysis (see Figure 1).
Some software products also contain more sophisticated                              Figure 2: Patent Landscape [STN Anavist]
visualizations such as 3D-landscapes (see Figure 2).
Independent from their specific visualizations, all systems focus
on the presentation of result sets so that the potential of
visualization for the retrieval process is often not fully exploited.   The close coupling of query formulation and result assessment has
On the one side, research concerning the use of visualizations in       long been discovered in traditional information retrieval and its
patent systems is rather limited. On the other side, very different     effectiveness been demonstrated in systems such as the alpha
applications for visualizations have been examined, ranging from        slider system by Ahlberg & Shneiderman (1994). The prototype
the presentation of the whole patent space to result set                by McLean (2000) follows exactly this idea and aims to “integrate
visualization and visualizations that should ultimately help users      retrieval with interaction“. On the basis of requirements collected
with improving their search queries.                                    from patent searchers, he built a system where users can create
                                                                        “query stacks“. The users start from a broader query and then
Kutz (2004) used treemaps to visualize all patents of the USPTO         refine it using certain filters. The results are immediately shown
archive between 1976 and 2002 on the basis of their 466 IPC             on a 2-dimensional plot of results so that the consequences of
classes. The data set was examined in 5 year intervals. The colors
changes in the query can be quickly viewed in the plot. Each             was modeled by introducing relevance feedback for individual
patent is shown as a small rectangle, its position on the plot is        documents. The effects of the relevance decisions of the user were
determined by similarity measures. Certain attributes such as the        immediately interpreted by the system and the ranking was
IPC class can be colored as shown in Figure 3 [McLean 2000].             adapted. Here, visualization was used to increase the transparency
                                                                         of the ranking algorithm. As seen on Figure 6, the changes of
                                                                         positions compared to the last ranking were shown for each
                                                                         document. That way the user could explore extreme changes and
                                                                         find more interesting documents with potentially more relevant
                                                                         terms [Hackl 2009].




Figure 3: Query Stack and Result Visualization [McLean 2000]


The system PatViz by Koch et al. (2009) has the same goal. It also
lies its focus on the integration of insights from the analysis of                   Figure 4: Filter Graph [Koch et al. 2009]
result sets into the reformulation of queries. The authors
developed ten views (e.g. a patent graph and a geo-timeline) that
show different perspectives on the current result set and that are
linked so that users can make use of brushing. A further view
called Filter Graph was developed to use different sets of results
as building blocks to produce complex extraction strategies (see
Figure 4). The different kinds of nodes allow the user to produce
subsets of the result set using filters and other operators and to
combine these in customized ways. Although this idea could be
further adapted to query formulation, its application is currently
restricted to result sets.

Another visualization by the same authors also picks up the idea
by McLean (2000) of presenting the different query facets of a
search. Since their tool PatViz is based on work in the PatExpert
project, where different search functionalities like full text search,
metadata search, image similarity search, semantic search, and
document similarity search are provided, the authors constructed a
visual tool that allowed the user to combine these different
searches. As depicted in Figure 5, the various search types are all
presented in unique colors (Image similarity search (blue),
semantic search (grey), keyword search (green), and metadata
search (orange)), making it easy and obvious for the user to see
how a query is constructed.
                                                                         Figure 5: Visual integration of different search facilities [Koch et
                                                                                                     al. 2009 ]
The system by Hackl (2009) also aspires to optimize the patent
search query, although by a different approach, namely relevance
feedback. The system PatentAide aims to make weighting and
advanced scoring models more transparent for patent retrieval
where Boolean matching is still most widely used. PatentAide
allows Boolean as well as probabilistic matching and ranking. The
typical information behavior of stepwise optimization of a query
                                                                        spend a lot of time and effort in constructing the queries and
                                                                        demand a high degree of control over them. They desire a wide
                                                                        variety of search possibilities and appreciate systems that take the
                                                                        special requirements into account.
                                                                        We developed and designed two prototypes which allow the
                                                                        comparison of queries from two different points of view. The
                                                                        effect of changing parameters is shown to the user by different
                                                                        means. The prototypes are well suited to explore and optimize
                                                                        complex queries in interaction sequences.
                                                                        In the first case, different queries can be directly compared to
                                                                        enhance the user’s understanding concerning the scope of result
                                                                        sets and their overlaps or differences. The view that was
                                                                        developed for this scenario is called Query Comparison. The
                                                                        second suggestion is to support the patent searcher in the
                                                                        development of query combinations. The view Query
                                                                        Combination should inspire the user to produce effective
                                                                        combinations of queries without having to undertake too many
                                                                        iterations of query formulation. By giving the user an immediate
      Figure 6: Dynamic Relevance Feedback [Hackl 2009]                 impression on result set sizes, unsuitable combinations of queries
                                                                        might be prevented, thereby making the process more transparent
                                                                        and efficient. Both concepts and prototypes are described in detail
The prototype by Herr et al. (2014) consists of two views that          below.
should support the user in identifying relevant IPCs to improve
their search queries. The authors adapted tag clouds to visualize       Figure 7 shows the paper prototype of the Query Comparison
co-occurrences between IPC classes. They compute the pair-wise          view. On the left, the user can choose which queries (s)he would
similarities of IPC subclasses based on their co-use in patents and     like to compare. These queries have been executed before and are
map these onto a 2D-plane. Two different views are available to         now available in a history.
the user. In the first one, called map view, it is possible to gain a   The selected queries are then depicted as symbols in the center of
general overview of all IPC subclasses used in a patent set. The        the screen. A query is represented by a circle and a combination
similarity between these classes is depicted by their distance and      of queries (connected through Boolean operators) looks rather
the font size displays the overall frequency of the IPC subclass in     cloud-like to visually remind the user of its formation. The bars
the set. The darts view lets users specify a class as a focus. Like     below contain the specified logic behind the comparisons of the
on a dartboard, co-occurring subclasses are presented on                queries. They can either be formulated manually or loaded from
concentric circles.                                                     earlier comparisons. It is also possible to specify a group of
As can be seen from the above literature, there have been some          default comparisons that is automatically loaded when the view
attempts to support patent searchers during query formulation.          opens. The result set that fulfills the Boolean logic is calculated
The users can learn from consequences on result sets or from            upon clicking the „Execute comparison“ button in the lower right
metadata such as IPC classes. The first idea seems very logical but     and is then represented as a circle beneath the corresponding bar.
the question arises if and how the searchers can abstract from the      The number of documents is shown in the circles’ center, which
presentation of results to making the right decisions concerning        provides the user with helpful information concerning the further
query reformulations. Maybe, other visualizations can support the       development of the search strategy. To see a list of the patents in a
users in making this task easier. This forms the starting point for     new window, the user needs to double-click the circles. That way,
the authors’ research which is described in detail in the next          the user can immediately check if e.g. an expansion of a query led
section.                                                                to more relevant results. These subsequent steps of query
                                                                        evaluation are especially important in patent retrieval since the
5.   DESIGN OF QUERY COMPARISON                                         result set needs to comprise all relevant documents but must at the
                                                                        same time be manageable.
SYSTEMS
Our approach is based on intensive knowledge engineering with           The second visualization, Query Combination, is shown in Figure
experts and a user centered design process with several design          8. Its goal is to let the user visually explore which query
iterations.                                                             combinations might lead to manageable result sets. Patent
                                                                        searchers often formulate initial subqueries that describe parts of
Interviews with domain experts from several technical fields have       the search (e.g. certain materials or the use of a technology) and
shown that for the development of complex queries for typical           combine them later on to final queries that comprise all relevant
patent information needs, it is crucial to compare the effects of       aspects of the search. Since the first combination of queries
different queries and find the optimal query for a certain              usually doesn’t produce the final result set, it would be
information need [Struß et al. 2014]. The state of the art in patent    advantageous to specify a few candidates for query parts and let
search in general also stresses the importance of iterative query       the system calculate all combinations. The user can choose on the
construction and query comparison.                                      left which query parts should be included, thereby triggering the
The study by Joho et al. (2010) emphasizes the importance of            system to calculate all combinations. These are then depicted as
search functionalities in the patent domain. The users differ very      circles where the color and the size redundantly represent the
much from the typical web searcher in that they are willing to          result set sizes. All document sets can be opened and assessed by
                                                      Figure 7: Query Comparison view


double-clicking the particular circle. It must be noted that the       “crash”, i.e. not deliver the anticipated amount of patents. Two
calculation of all possibilities and their visual representation       professionals were not sure about the benefit; one described the
should be limited to a reasonable number. The immediate and            size of the result set as being a “dangerous criterion” for the
direct visualization of the size allows the experts to easily          appropriateness of the result set. The meaning of the color scheme
optimize the size of their final result set.                           was again criticized by one expert and the request for more
We conducted an informal evaluation of these two prototypes            information concerning each set was also expressed once.
with seven professional patent searchers. The patent searchers         In summary, the evaluation of thye ideas was very encouraging
were recruited at the PatInfo 2014 in Ilmenau, Germany. Since          and indicated that the ideas tackle real problems of patent
this conference is highly domain-specific, all participants were       searchers. The discussion with the professionals and their
familiar with the patent domain. The patent searchers were invited     suggestions will be taken into account in the further development
to take part in an interview that lasted about an hour. This was       of the visualizations.
structured as follows: The experts were first asked to present their
professional experience in patent retrieval to learn something         6.  CONCLUSION AND FUTURE
about their background. Then, they were given an introduction          WORK
into the study and were afterwards confronted with the prototypes      In this paper, we argued that patent retrieval and especially query
and the ideas behind them. The patent searchers were allowed to        formulation is a complex process that needs to be supported by
ask questions and were encouraged to give their opinion and to         tools. Our research aims to provide such tools on the basis of
suggest possible improvements.                                         visualizations. We presented two prototypical visualizations that
                                                                       give users another perspective on query formulation and that were
Out of the seven professionals, six experts commented favorably        evaluated with seven professional patent searchers. Since the
on the Query Comparison view. The visualization was evaluated          feedback was encouraging, the prototypes will be further
as meaningful and more efficient compared to current search            developed and integrated into a fully functional system. One of
facilities. One expert mentioned that the idea offered more            the authors is currently working on the implementation, using
information than currently available in the systems; another one       JavaScript and the JS library D3 for the visualizations.
highlighted its use as an analytical tool for a better understanding
of the result sets. Negative comments were the unclear use of
color, the lack of a drag and drop interaction and the question        Apart from the sub-process of query formulation, there are other
                                                                       tasks during the patent retrieval process that can benefit from
whether such functionality would be helpful at that point of the
research process.                                                      visualizations. For these scenarios, visual prototypes will be
                                                                       developed and further requirements of domain experts taken into
The Query Combination view was rated positively by four                account. The final prototype that will consist of a number of
experts. They saw value in the clear overview, liked the aesthetic     visual tools for patent searchers will be thoroughly evaluated in
design, and argued that one would not have to try out as many          formal user test settings.
queries anymore. Also, one could see when a query would
                                                        Figure 8. Query Combination


                                                                             Konvens 2014. Hildesheim, Germany, October 6-7, 2014.
                                                                             http://ceur-ws.org/Vol-1292/
7.        ACKNOWLEDGEMENTS
                                                                        7.   Jung, H. & Ha, Y. (2015): InSciTe advisory: Prescriptive
The authors would like to thank FIZ Karlsruhe for supporting this
                                                                             analytics service for enhancing research performance. In:
research through a Doctoral Fellowship to the first author.
                                                                             Knowledge and Smart Technology (KST), 2015 7th
                                                                             International Conference on Knowledge and Smart
8.        REFERENCES                                                         Technology. Chonburi, Thailand. 28-31 Jan. 2015.
1.   Ahlberg, C. & Shneiderman, B. (1994): Visual information                http://dx.doi.org/10.1109/KST.2015.7051448
     seeking: tight coupling of dynamic query filters with starfield    8.   Herr, D.; Han, Q.; Lohmann, S.; Brügmann, S. & Ertl, T.
     displays. In: Celebrating Interdependence. CHI’94
                                                                             (2014): Visual Exploration of Patent Collections with IPC
     conference proceedings on Human Factors in Computing                    Clouds. In: Proceedings of the First International Workshop
     Systems. Boston, New York: ACM, 313-317.
                                                                             on Patent Mining and Its Applications (IPaMin 2014) co-
2.   Becks, D. (2013): Die Nutzung von Head-Modifier Phrasen                 located with Konvens 2014. Hildesheim, Germany, October
     für Patent-Retrieval. Fachinformationszentrum Karlsruhe,                6-7, 2014. http://ceur-ws.org/Vol-1292/
     FIZ.
                                                                        9.   Joho, H.; Azzopardi, L.A. & Vanderbauwhede, W. (2010): A
3.   Bonino, D.; Ciaramella, A. & Corno, F. (2010): Review of                survey of patent users: an analysis of tasks, behavior, search
     the state-of-the-art in patent information and forthcoming              functionality and system requirements. In: Proceedings of the
     evolutions in intelligent patent informatics. In: World Patent          third symposium on Information interaction in context.
     Information vol. 32, Issue 1, March 2010, 30-38                         ACM, 13-24.
4.   Chang, R.; Ziemkiewicz, C.; Green, T. M., & Ribarsky, W.           10. Jürgens, J.J.; Womser-Hacker, C. & Mandl, T. (2014):
     (2009): Defining insight for visual analytics. Computer                Modeling the interactive patent retrieval process: an
     Graphics and Applications, IEEE, 29(2), 14-17.                         adaptation of Marchionini's information seeking model. In
5.   Hackl, R. (2009): Transparentes Ranking und Relevanz-                  Proceedings of the 5th Information Interaction in Context
     Feedback im Patentretrieval . Fachinformationszentrum                  Symposium (IIiX '14). New York, NY, USA: ACM, 247-250.
     Karlsruhe, FIZ                                                         http://doi.acm.org/10.1145/2637002.2637034

6.   Han,H.; Xu,S.; Zhu, L.; Qiao, X.; Gui,J. & Zhang, Z. (2014):       11. Jürgens, J.J. & Womser-Hacker, C. (2014): Limitations of
     Mining Technical Topic Networks from Chinese Patents. In:              Automatic Patent IR. In: Datenbank-Spektrum. March 2014,
     Proceedings of the First International Workshop on Patent              Volume 14, Issue 1, 5-17.
     Mining and Its Applications (IPaMin 2014) co-located with          12. Kim, Y.; Tian, Y.; Jeong, Y.; Jihee, R.& Myaeng, S.-H.
                                                                            (2009): Automatic Discovery of Technology Trends from
    Patent Text. In: Proceedings of the 2009 ACM Symposium on           Workshop on Patent Retrieval. Athens, Greece, July 28,
    Applied Computing. SAC. Honolulu, Hawaii, USA, March 8-             2000. http://research.nii.ac.jp/~ntcadm/sigir2000ws/
    12, 2009. New York, NY, USA: ACM, 1480–1487.                    17. Questel: https://www.questel.com/
    Available online at
    http://doi.acm.org/10.1145/1529282.1529611                      18. STN Anavist: http://www.stn-
                                                                        international.de/stn_anavist.html
13. Koch, S.; Bosch, H.; Giereth, M.,& Ertl, T. (2009): Iterative
    integration of visual insights during patent search and         19. Struß, J.M.; Mandl, T.; Schwantner, M. & Womser-Hacker,
    analysis. In: IEEE Symposium on Visual Analytics Science            C. (2014): Understanding Trends in the Patent Domain. In:
    and Technology, VAST 2009, 203-210.                                 Proceedings of the First International Workshop on Patent
                                                                        Mining and Its Applications (IPaMin 2014) co-located with
14. Kutz, D. O. (2004): Examining the evolution and distribution        Konvens 2014. Hildesheim, Germany, October 6-7, 2014.
    of patent classifications. In: Proceedings of the Eighth            http://ceur-ws.org/Vol-1292/
    International Conference on Information Visualisation, IV
    2004, IEEE, 983-988.                                            20. Widén, G.; Steinerová, J. & Voisey, P. (2014): Conceptual
                                                                        modelling of workplace information practices: a literature
15. Lupu, M.; Mayer, K.; Tait, J. & Trippe, A. (2011): Current          review. In: Proceedings of ISIC: the information behaviour
    Challenges in Patent Information Retrieval. Springer.               conference, Leeds, 2-5 September, 2014: Part 1. In:
16. McLean, A. W. (2000): Patent Space Visualization for Patent         Information Research vol. 19 no. 4, December, 2014.
    Retrieval. In: Proceedings of the ACM SIGIR 2000                    http://www.informationr.net/ir/19-4/isic/isic08.html#.VSO-
                                                                        POESqVA