=Paper= {{Paper |id=Vol-1437/ipamin2015_submission_10 |storemode=property |title=Patent Technology Competitor Group Analysis Method Based on IPC |pdfUrl=https://ceur-ws.org/Vol-1437/ipamin2015_paper10.pdf |volume=Vol-1437 }} ==Patent Technology Competitor Group Analysis Method Based on IPC== https://ceur-ws.org/Vol-1437/ipamin2015_paper10.pdf
     Patent Technology Competitor Group Analysis Method
                        Based on IPC
                  Yuan Fu                                       Hongqi Han                                  Lijun Zhu
  Information Technology Supporting                Information Technology Supporting          Information Technology Supporting
    Center, Institute of Scientific and              Center, Institute of Scientific and        Center, Institute of Scientific and
     Technical Information of China                   Technical Information of China             Technical Information of China
   No. 15 Fuxing Rd,.Haidian Distirct,              No. 15 Fuxing Rd,.Haidian Distirct,        No. 15 Fuxing Rd,.Haidian Distirct,
      Beijing 100038, P.R. China                       Beijing 100038, P.R. China                 Beijing 100038, P.R. China
           +86 10 5888 2447                                 +86 10 5888 2447                           +86 10 5888 2447
     fuyuan2014@istic.ac.cn                                bithhq@163.com                             zhulj@istic.ac.cn


ABSTRACT
It is crucial to understand the technical groups of intra-industry
                                                                        1. INTRODUCTION
                                                                        Competitiveness is a typical characteristic for industrial
and to master the competition in the field of technology. In order
                                                                        technology (Yoon, 2008) [1]. Practically, for almost every
to provide valuable information for industry participants and
                                                                        emerging industry, some kinds of technology will become leading
policymakers, a process model for mining technical competitor
                                                                        and predominant after developing over a period of time.
groups based on IPC classification number is put forward. Firstly,
                                                                        Agglomeration is common for an industry. When the industrial
the patent numbers under each IPC are counted for building
                                                                        technology agglomerates to a certain extent so that it can meet the
feature vectors for competitors. Then, technical similarities
                                                                        needs of product functions well, the industry will become mature,
between each pairs of competitors are computed. Finally, the
                                                                        and the industrial technology system is established. On the other
LinLog graph clustering algorithm is carried out to discover three
                                                                        hand, the technology owner compete reciprocally into different
levels of groups, i.e. institution, province and country. To obtain
                                                                        technical groups. According to Porter's theory of competitive
patent data for this research, an acquisition system for Chinese
                                                                        advantage, the real competitors inside an industry are companies
patent data is developed. Experiments on the field of fuel cell is
                                                                        similar to a company (Lee, 2006) [2]. These similar companies
conducted and the results show the technique is helpful and
                                                                        constitute a strategic group and become a sub-industry. A
effective.
                                                                        company has barriers to enter different strategy groups. Therefore
                                                                        companies which have very similar industrial technology are
Categories and Subject Descriptors                                      likely to be main competitors.
Information extraction from patent documents                            The clustering method of dividing data into several clusters can
                                                                        reflect relational schema of the data and the knowledge hidden in
General Terms                                                           the data. The method of competitor group analysis of industrial
Experimentation                                                         technology is to use appropriate clustering algorithm to divide
                                                                        competitors into several groups, and thus identify similar
                                                                        competitors inside an industry competitions and their reciprocal
Keywords                                                                influences. The level of technical competitor group analysis can
LinLog; IPC classification number; Technology competitor group
                                                                        be from different aspects such as countries, provinces, and
                                                                        institutions. The purpose of the analysis is to understand the
                                                                        technical groups inside an industry, and to master the competition
                                                                        in the field of technology from higher levels, and to provide
                                                                        valuable information for industry participants and policymakers.
                                                                        Some common clustering algorithms can be used to identify the
                                                                        competitor group of industrial technology, such as self-organizing
                                                                        mapping (SOM), K-means (Lee, 2009) [3], factor analysis, etc. In
                                                                        these models, each competitor is usually expressed as a feature
                                                                        vector which are measured by several technical characteristics.
                                                                        Similar objects will be clustered into one group by calculating
                                                                        distances between them. For example, (Pilkington, 2004)[4] used
                                                                        UPC number and IPC classification respectively as the technical
                                                                        features for competitors and used the factor analysis model to
                                                                        cluster 52 companies in the field of fuel cell into five groups.
Copyright © 2015 for the individual papers by the papers' authors.
Copying permitted for private and academic purposes.                    Literature studies found that many researchers have used
This volume is published and copyrighted by its editors.                visualization methods. The traditional clustering algorithm is
Published at Ceur-ws.org                                                based on the unsupervised learning so people often doubt the
Proceedings of the Second International Workshop on Patent Mining and   effectiveness of the analysis results. The visualization method can
its Applications (IPAMIN). May 27–28, 2015, Beijing, China.             display abstract data using graph or picture because it combines
the computer technology and human cognitive ability effectively.      normalized cut. Normalized cut and edge-repulsive model can
Therefore, the visualization method enhances the user’s               produce unbiased results, therefore it is especially suitable for
confidence for the analysis results, so it has been widely accepted   normally distributed data. In this paper, LinLog algorithm of
in recent years. Considering the advantages of visualization, the     Barnes and Hut hierarchy algorithms is used to draw clustered
proposed method will use graph clustering method to find              graphs (Stegmann, 2003) [9]. After the algorithm draw graphics, it
technical competitor groups.                                          also divide nodes into several clusters.

2. RELATED WORK                                                       2.2 IPC
2.1 LinLog graph clustering methods                                   IPC means the international patent classification. IPC is an
LinLog algorithm was first put forward by (Noack, 2007) [5]. The      international standard which is used by the patent offices of all
aim of the algorithm is to produce ideal and visual clustering        countries or regions in the world. Although some countries or
graphs. Figure 1 shows an example mentioned in Noack's paper          regions make its own patent classification system, such as CPC
(Noack, 2005) [6]. In the example, Spring and LinLog algorithm        system of USPTO, ECLA system of EPO, they provide the IPC
were employed respectively for graph clustering using the same        classification number. Chinese patent classification system also
data. Comparatively, LinLog algorithm clearly divided data into       use IPC system. A patent has at least one IPC number, but is not
two large clusters which are connected by two nods, Dan and           limited to one IPC classification number. In other words, some
Upton, while Spring algorithm positioned nodes with high degree       patents are endowed with two or more IPC classification numbers.
in the center and nodes with low degree near the borders.             The first classification number is called the main classification
                                                                      number when there are multiple patent classification numbers.
                                                                      According to the characteristics of technical topics of the
                                                                      invention, the technology fields in IPC system are divided into 8
                                                                      sections. Each section represents a kind of technology, designated
                                                                      by one of the capital letters A through H as shown in Table1.
                                                                                Table 1 section of technology in IPC system
                                                                      Section                         Section Title
                                                                         A                      HUMAN NECESSITIES
                                                                         B       PERFORMING OPERATIONS; TRANSPORTING
                                                                         C                 CHEMISTRY; METALLURGY
                                                                         D                        TEXTILES; PAPER
                                                                         E                    FIXED CONSTRUCTIONS
                                                                                   MECHANICAL ENGINEERING; LIGHTING;
                                                                         F
                                                                                      HEATING; WEAPONS;BLASTING
                     (a) Spring model
                                                                         G                              PHYSICS
                                                                         H                          ELECTRICITY
                                                                      The structure of IPC classification system is hierarchical. Sections
                                                                      are the highest level of hierarchy in the system. Each section is
                                                                      subdivided into classes which are the second hierarchical level.
                                                                      Each class comprises one or more subclasses which are the third
                                                                      hierarchical level. Each subclass is broken down into subdivisions
                                                                      referred to as “groups”, which are either main groups (the fourth
                                                                      hierarchical level) or subgroups (lower hierarchical levels
                                                                      dependent upon the main group level). A complete classification
                                                                      symbol comprises the combined symbols representing the section,
                                                                      class, subclass and main group or subgroup, as shown in Figure 2.
                                                                      Currently, there are approximately 70,000 subdivisions in the
                                                                      classification system. Figure 3 is a sample of the hierarchical
                                                                      structure.


                        (b) LinLog model
       Figure 1 Comparison of Spring and Linlog method
The LinLog model does not conform to the traditional aesthetic
standard, it aims to group nodes of closely connected and separate
nodes of partially connected. There are two kinds of LinLog
models: node-repulsion model and edge-repulsion model(Coscia,
2009) [7]. The two models are based on two famous clustering
standards respectively (Li, 2008) [8], namely density of cut and
 Figure 2 Hierarchical structure of the IPC classification system




                                                                         Figure 4 The process model of the graph clustering method
                                                                      Firstly, selecting a clustering level from three categories:
                                                                      institutions, provinces and countries. Then, counting the patent
                                                                      number under each main IPC classification number for each
                                                                      technology competitor. Then the association matrix is established
                                                                      between technology competitors and the main IPC classification
        Figure 3 A sample of IPC hierarchical structure               number (Dibattista, 1994)[10]. Each technology competitor is
                                                                      expressed as a feature vector whose attributes are IPC
3. METHOD                                                             classification numbers. The value of each attribute item is the
An industrial technology field can be divided into several            number of patents under the main IPC classification number.
subfields, and each subfield may have smaller technology              Finally, calculating the similarity between each pair of
subfields. Technology competitors often have different research       technological competitors by using cosine formula(Fruchterman,
background, bases, objectives and priorities. Competitors with        1991)[11]. Let IPC as the number of the IPC main classification
similar technology may be competitors or partners on the market,
                                                                      number covered by industrial technology, and the patent number
and they are likely to interact with each other. IPC classification
                                                                      of competitor i under k-th IPC classification number is IPCki .
codes are designated by patent examiner with professional
knowledge. Therefore IPC provide an effective way to know             The equation (1) shows how to compute the technological
industrial hot points, and research and development directions of     similarity between competitor i and j.
technology competitors. A technology competitor tend to invest                                IPC
research in several technical subfields, so it is difficult to
determine whether two competitors have similar research                                        IPC  IPC
                                                                                                     ki          kj
technology only from the IPC count statistics. Therefore, a graph     sim(i, j )             k 1

clustering method based on main IPC number is put forward to                           IPC                IPC                   (1)
identify technology competitor groups within an industrial
technology field. Figure 4 shows the process model of this method.
                                                                                        IPCki 2 
                                                                                       k 1
                                                                                                           IPCkj 2
                                                                                                          k 1



                                                                      In order to obtain good visual graphics, a minimum similarity
                                                                      threshold (Noack, 2004)[12] should be set. Generally, the threshold
                                                                      is set to the mean of similarity, yet it can also be determined by
                                                                      experiments. There will be a connect between two technology
                                                                      competitors when the similarity between them is higher than the
                                                                      set threshold. Using technology competitors as nodes, the
                                                                      connections between them as edges, and the weight of the edges
                                                                      are the technological similarity values between them, LinLog
                                                                      graph clustering algorithm will generate visual map. The map will
                                                                      show the clusters for identifying competitor groups.
4. DATA                                                                       Table 2 keywords for identifying application category
4.1 Data acquisition                                                         category                         Key words
Nowadays, almost all patent offices of major countries and                   company                     company, partnership
regions provide patent databases on their official web sites.
                                                                            university                    university, college
People can connect these websites any time and everywhere via
the Internet to obtain the patent data freely. In order to get patent        institute                    research institution,
data quickly, a patent data acquisition system (Laura, 2008) [13] is          others              committee, association, foundation
developed. The model of the system model is shown in Figure 5.
The acquisition system can fetch HTML web pages which                        personal
contains the patent description information from the official
website of the state intellectual property office of China
(http://www.sipo.gov.cn/). After the patent information is              5. EXPERIMENTAL RESULTS
collected, it can automatically obtain the items of description and
legal status of patents through the content analysis of web pages
                                                                        5.1 Research and development institutions
                                                                        In order to have clear visual map, we choose top 20 research and
and save them into the local databases.
                                                                        development institutions for graph clustering algorithm. The
                                                                        result is shown in Figure 6. In the map, the size of nodes
                                                                        represents the number of granted invention patents, and the color
                                                                        of nodes shows the group they belong to (Reinhard, 2007) [15].
                                                                        In the case, the LinLog algorithm identified two technology
                                                                        competitor groups (shown in Figure 6). The group with red node
                                                                        color is the first group, including 10. They are: Samsung (177),
                                                                        Chinese Academy of Sciences(128), Antiq(74), General
                                                                        Motors(56), Honda(52), Wuhan University of Technology(49),
                                                                        Shanghai Jiaotong University(38), Sanyo(37), BYD(32), and
                                                                        Harbin Institute of Technology(26); The group with orange node
                                                                        color is the second group, including 10 other institutions. They
                                                                        are: Shanghai Shen-Li High Tech(194), Panasonic(154),
           Figure 5 The data acquisition system model                   Toyota(120), Tsinghua university(72), Nissan(62), Toshiba(48),
In order to test the effectiveness of proposed method, the patent       Sunrise Power(26), Hitachi (24), LG(20) and UTC (19). The
acquisition system is run to download patent data in the field of       numbers in parentheses after company names means the numbers
fuel cell technology. 6346 patents are collected totally. The           of their granted invetion patents. Table 3 shows corresponding
following preprocessing steps and the empirical analysis will           English names of Chinese Names in Figure 6.
employ the downloaded patent data.

4.2 Data preprocess
The collected data often have some problems, and it must be
preprocessed before the formal analysis. In the experiment, the
patent data will be preprocessed to meet the analysis requirements,
including identifying the patent categories, countries and
provinces of applicants, and categories of applicants, etc.
If the first applicants are Chinese individuals or organizations, the
addresses of the applicants often contain the information of its
province (Kayal, 1999) [14]. Generally, the first 6 digits of the
address description is the applicant’s postcode, so the province
information can be obtained according to the postcode. If the first
applicants are foreign individuals or organizations, the priority
item and the international publication item in patent descriptions
contain the state information. For example, the priority item of a
patent is "1999.8.27 JP 242132/1999", where JP means that the                       Figure 6 Clustering result of R&D institutions
applicant is a Japanese.
                                                                               Table 3 Corresponding English names of Chinese names
For the purpose of the research, applicants are divided into 5                         of R&D institutions in Figure 6
categories: company, university, research institute, personal and
the other. The categories are identified by the keywords in the                Chinese name                       English name
applicant names. The corresponding relation of keywords and                       清华大学                         Tsinghua University
categories are shown in Table 2. If there are more than one
applicants in a patent description, only the first applicant is           新源动力股份有限公司                              Sunrise Power
considered. For example, there are two applicants of the patent No.       上海神力科技有限公司                      Shanghai Shen-Li High Tech
00112136.7: Nanjing Normal University and Changchun Institute
of Applied Chemistry Chinese Academy of Sciences, the system                      松下公司                              Panasonic
will designate "university" category to the patent.
          日产公司                                Nissan                              辽宁                                Liaoning
          丰田公司                                Toyota                              江苏                                 Jiangsu
          日立公司                                Hitachi                             天津                                 Tianjin
          东芝公司                               Toshiba                              山东                                Shandong
         BTC 公司                                BTC
                                                                                  安徽                                  Anhui
  乐金电子电器有限公司                                    LG
                                                                                  陕西                                 Shaanxi
       上海交通大学                      Shanghai Jiaotong University
                                                                                  四川                                 Sichuan
         中国科学院                    Chinese Academy of Sciences
          三星公司                               Samsung                              河北                                  Hebei

          三洋公司                                Sanyo                               北京                                 Beijing
       通用汽车公司                            General Motors
                                                                                  广东                               Guangdong
  胜光科技股份有限公司                                  Antiq
                                                                                  湖北                                  Hubei
      哈尔滨工业大学                     Harbin Institute of Technology
                                                                                黑龙江                               Heilongjiang
    比亚迪股份有限公司                                  BYD
                                                                                  吉林                                   Jilin
       本田株式会社                                 Honda
                                                                                  重庆                               Chongqing
          武汉大学                         Wuhan University of                        湖南                                 Hunan
                                          Technology
                                                                                  山西                                 Shanxi
                                                                     In the province level, two technology competitor groups are
5.2 Provinces                                                        identified. The group with red nodes is the first group, including
In the case, totally 22 provinces are extracted in all fuel cell     10 provinces: Shanghai (311), Taiwan (152), Liaoning (127),
patents. The graph clustering result is shown in figure 7. The       Jiangsu (41), Tianjin(40), Shandong(23), Shaanxi(13), Anhui
biggest node in the picture is Shanghai, which means the research    (19),Sichuan (4) and Hebei(2), The group with orange node color
strength of Shanghai province is the strongest one in China. While   represents the second group, including 8 provinces: Beijing(150),
the smallest node is Hebei, which means Hebei province is the        Guangdong(93),      Hubei(58),     Heilongjiang(29),     Jilin(18),
weakest one on the research of fuel cell in these provinces.         Chongqing (5), Hunan(4) and Shanxi Province (4). Because the
                                                                     technology similarity value of Zhejiang (16), Fujian (12), Yunnan
                                                                     (1) and Inner Mongolia (1) is lower than the set threshold, the
                                                                     clustering result do not include these provinces. Similarly, the
                                                                     number in parentheses is the number of granted patents of
                                                                     provinces.

                                                                     5.3 Countries
                                                                     In the case, totally 17 countries or regions are extracted in all fuel
                                                                     cell patents. The graph clustering result is shown in Figure 8.
                                                                     Obviously, the biggest node in the graph is China, the granted
                                                                     patent number of which is 1123. While the smallest nodes are
                                                                     Denmark and Finland. The granted patent number of both country
                                                                     are 3.




            Figure 7 The clustering results of provinces
  Table 4 Corresponding English names of Chinese names of
                   provinces in Figure 7
     The Chinese Name                  The English Name
            上海                              Shanghai
            台湾                               Taiwan
            Figure 8 The clustering result of countries                        Figure 9 The clustering figure of unconnected states
   Table 5 Corresponding English names of Chinese names of                 Table 6 Corresponding English names of Chinese names of
                 R&D institutions in Figure 8                                            R&D institutions in Figure 9
     The Chinese Name                    The English Name                    The Chinese Name                    The English Name
             中国                                 China                                中国                                 China
             德国                                Germany                               德国                               Germany
             英国                                 Britain                              英国                                Britain
             法国                                 France                               法国                                France
         欧洲专利局                                   EPO                             欧洲专利局                                  EPO
             瑞典                                Sweden                                瑞典                                Sweden
             荷兰                              Netherlands                             荷兰                              Netherlands
             日本                                  Japan                               日本                                 Japan
             美国                            the United States                         美国                           the United States
            加拿大                                 Canada                             加拿大                                 Canada
          澳大利亚                                 Australia                          澳大利亚                                Australia
             芬兰                                 Finland                              芬兰                                Finland
In the country level, four technology competitor groups are                          挪威                                Norway
identified, containing 16 countries and regional organizations.
The group with red node color represents the first group,                          意大利                                  Italy
including seven countries and regional organizations: China                          韩国                                 Korea
(1123), Germany (58), Britain (28), France (16), EPO (10),
Sweden (6), and Netherlands (5). The group of orange node color
represents the second group, including 5 countries: Japan (740),
the United States (292), Canada (33), Australia (4) and Finland         6. CONCLUSION
(3). The third group consists of Korea (202) and Denmark (3) two        In the paper, a graph clustering algorithm is used to obtain
countries. The fourth group includes Norway (3) and Italy (1).          technology competitor group analysis based on IPC. The
There is an edge between Norway and Italy, but there are no edges       proposed method consists of four stages. First, the clustering level
with other nodes (Figure 9), however Figure 8 can't show them           is determined. There are three levels for selected, i.e. institute,
because LinLog algorithm has problems to generate clusters with         province and country. Second, the numbers of patents are counted
unconnected graphs. The technology similarity of Austria (1) with       under each IPC for each object (competitor) in the selected level.
other countries is lower than the threshold, so the clustering figure   Third, each object is expressed with a vector, the attributes of
does not include Austria (1).                                           which are IPC classification codes, and the value of each attribute
                                                                        is corresponding patent count. Fourth, technology similarities are
                                                                        computed between each pair of competitors. Finally, Linlog
                                                                        algorithm is used to cluster competitors into groups and display
                                                                        them in a graph to improve the confidence of analysis results.
Experimental results on fuel cell demonstrate the effectiveness of        on Advances in Social Network Analysis and Mining
the proposed method.                                                      (ASONAM).
                                                                     [8] Li, Wanchun., Eades, P., and Nikolov, N. 2008. Using spring
7. ACKNOWLEDGMENTS                                                       algorithms to remove node overlapping. C. Proceedings of
This work is partially supported by National Natural Science
                                                                         the 2005 Asia-Pacific symposium on Information
Foundation of China (Project 71473237), and partially supported
                                                                         visualization.
by the Key Work Project of Institute of Scientific and Technical
Information of China (ISTIC) (ZD2014-7-1). Authors are grateful      [9] Stegmann, J. and Grohmann, G. 2003. Hypothesis generation
to the National Natural Science Foundation of China and the              guided by co-word clustering. J. Scientometrics. 56, 1, 111-
Ministry of Science and Technology of China for financial                135.
support to carry out this work.                                      [10] Dibattista, G., Eades, P., Tamassia, R., and Tollis, I. G. 1994
                                                                          Algorithms for drawing graphs: an annotated bibliography. J.
8. REFERENCES                                                             Computational Geometry: Theory and Applications. 4, 5,
[1] Yoon, B. and Lee, S. 2008. Patent analysis for technology             235-282.
    forecasting: sector-specific applications. C. 2008 IEEE
    International Engineering Management Conference.                 [11] Fruchterman, T. M. J. and Reingold, E. M. 1991. Graph
                                                                          drawing by force-directed placement. J. Software-Practice
[2] Lee, C. K. and Ong, R. 2006. An analysis of the liquid                and Experience. 21, 11, 1129-1164.
    crystal cell patents of LG and Samsung filed at the USPTO.
    C. 2006 IEEE International Conference on Management of           [12] Noack, A. 2004. An energy model for visual graph clustering.
    Innovation and Technology.                                            C. 11th International Symposium on Graph Drawing.
[3] Lee, S., Yoon, B., and Park, Y. 2009. An approach to             [13] Laura, R. 2008. Data mining tools for technology and
    discovering new technology opportunities: Keyword-based               competitive intelligence. R. VTT TIEDOTTEITA Research
    patent map approach. J. Technovation. 29, 6, 481-497.                 notes 2451.
[4] Pilkington, A. 2004. Technology portfolio alignment              [14] Kayal, A. A. and Waters, R. C. 1999. An empirical
    commercialisation: an investigation of fuel cell patenting. J.        evaluation of the technology cycle time indicator as a
    Technovation. 24, 10, 761-771.                                        measure of the pace of technological progress in
                                                                          superconductor technology. J. IEEE Transactions on
[5] Noack, A. 2007. Energy models for graph clustering. J.                Engineering Management. 46, 2, 127-131.
    Journal of Graph Algorithms and Applications. 11, 2, 453-
    480.                                                             [15] Reinhard, H., Martin, K., and Marcus, K. 2007. Patent
                                                                          indicators for the technology life cycle development. J.
[6] Noack, A. Energy-based clustering. C.13th International               Research Policy. 36, 3, 387-398.
    Symposium on Graph Drawing. 2005.
[7] Coscia, M., Giannotti, F., and Pensa, R. 2009. Social
    network analysis as knowledge discovery process: a case
    study on digital bibliography. C. International Conference