<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LDA v. LSA: A Comparison of Two Computational Text Analysis Tools for the Functional Categorization of Patents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Toni Cvitanic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bumsoo Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hyeon Ik Song</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katherine Fu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Rosen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia Institute of Technology</institution>
          ,
          <addr-line>Atlanta, GA, USA (tcvitanic3, blee300, hyeoniksong) @gatech.edu (katherine.fu, david.rosen) @me.gatech.edu</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>41</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>One means to support for design-by-analogy (DbA) in practice involves giving designers efficient access to source analogies as inspiration to solve problems. The patent database has been used for many DbA support efforts, as it is a preexisting repository of catalogued technology. Latent Semantic Analysis (LSA) has been shown to be an effective computational text processing method for extracting meaningful similarities between patents for useful functional exploration during DbA. However, this has only been shown to be useful at a small-scale (100 patents). Considering the vastness of the patent database and realistic exploration at a largescale, it is important to consider how these computational analyses change with orders of magnitude more data. We present analysis of 1,000 random mechanical patents, comparing the ability of LSA to Latent Dirichlet Allocation (LDA) to categorize patents into meaningful groups. Resulting implications for large(r) scale data mining of patents for DbA support are detailed.</p>
      </abstract>
      <kwd-group>
        <kwd>Design-by-analogy</kwd>
        <kwd> Patent</kwd>
        <kwd>Analysis</kwd>
        <kwd> Latent</kwd>
        <kwd>Semantic</kwd>
        <kwd>Analysis</kwd>
        <kwd> Latent Dirichlet Allocation  Function-based Analogy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Exposure to appropriate analogies during early stage design has been shown to
increase the novelty, quality, and originality of generated solutions to a given
engineering design problem [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1-4</xref>
        ]. Finding appropriate analogies for a given design
problem is the largest challenge to practical implementation of DbA. There have been
numerous efforts to address this challenge with computational support for targeted
access to design repositories, which will be reviewed next. The major research gap is
in the scale of implementation, the size of the repository being accessed. To address
this gap, we compare two computational approaches to processing design repository
content (patents) for categorization and similarity judgment, with the goal of both (1)
evaluating the methods in direct juxtaposition to one another, and (2) developing a
method to examine the effectiveness of data synthesis techniques at a large scale. In
the context of the Case-Based Reasoning (CBR) Workshop on Computational
Analogy, this work directly addresses methods for identifying and retrieving
analogies, similarity measures for analogy, analogical distance metrics, and data
mining techniques for textual CBR.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Patents are often used as input for design tools and repositories because of the large
amount of information captured by the patent database, already deemed novel and
useful in nature by its inherent patentability [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Patents have been used to develop
conceptual graphs of claims [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and dependency structures and content relationships
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Patents have been mapped to: extract the implicit structure in the data to support
DbA [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8-10</xref>
        ], to understand overlap in IP portfolios for mergers and acquisitions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
to search through patents for DbA support [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], to assist with patent infringement
analysis in the biomedical arena [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and to build a taxonomy from the data [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
TRIZ, the Theory of Inventive Problem Solving, is one the major efforts involving the
use of the patent database to support design [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and a number of tools have been
developed based upon the original efforts of Genrich Altshuller [
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18 ref19 ref20 ref21 ref22 ref23 ref24 ref25 ref26">15-26</xref>
        ]. The
computational analysis presented in this paper contributes to these efforts by showing
a direct comparison of two leading computational text analysis that can and do serve
as the basis of many of these and future patent based design tools.
2.2
      </p>
      <sec id="sec-2-1">
        <title>Latent Dirichlet Allocation (LDA)</title>
        <p>
          Within the large field of “data mining,” a body of knowledge has emerged that
provides methods for managing large document archives (text corpus). Tools have
been developed that can summarize a corpus, classify articles into categories, identify
common themes, and help users find relevant articles. A specific class of methods,
called topic modeling, is particularly promising for its potential to form a readily
explorable database of patents, or other documents, for use in DbA. As one of the
leaders in this area notes [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], “topic modeling algorithms are statistical methods that
analyze the words of the original texts to discover the themes that run through them,
how those themes are connected to each other, and how they change over time.”
        </p>
        <p>
          Topic modeling grew from LSA in several directions. Inputs to methods typically
include a word-document matrix that records the number of times a particular word is
included in one document. In the early 2000’s, a different approach called Latent
Dirichlet Allocation (LDA) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] was developed, where the basic idea is “that
documents are represented as random mixtures over latent topics, where each topic is
characterized by a distribution over words.”
        </p>
        <p>
          Many variants of LDA have been developed over the years. Of note, supervised
LDA methods enable the user to specify some topics and the corpus analysis seeks to
include these seeded topics in its overall probabilistic model [
          <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
          ]. Another
extension is the use of nonparametric Bayesian methods to determine hierarchies of
topics from LDA results [
          <xref ref-type="bibr" rid="ref31 ref32">31, 32</xref>
          ]. More recently, several researchers have investigated
variants of PCA and other least-squares regression formulations for topic modeling,
including sparse matrix formulations. El Ghaoui et al. [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] compared LASSO
regression [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] and sparse PCA [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] to LDA and found comparable efficacy at topic
modeling, but that LASSO and sparse LDA were significantly more efficient.
Another group investigated Non-negative Matrix Factorization (NMF) [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] for
interactive topic modeling and found computational performance sufficiently fast [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Latent Semantic Analysis (LSA)</title>
        <p>
          LSA is a computational text analysis tool that builds a semantic space from a corpus
of text. This semantic space is then used to compute the similarity between words,
sentences, paragraphs, or whole documents for a wide variety of purposes [
          <xref ref-type="bibr" rid="ref38 ref39 ref40 ref41">38-41</xref>
          ].
Note that this semantic space is a high-dimensional vector space (typically 300 or
more dimensions) with little inspectable value to humans; additional methods are
needed to create that inspectable structure. After performing LSA, the results can be
compared directly to LDA output, or can become input for further algorithmic
processing to understand the similarity values in a different way.
        </p>
        <p>
          In ref. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the functional content (verbs) and surface content (nouns) of patents
were processed and mapped separately, yielding structures that have the potential to
develop a better understanding of the functional and surface similarity of patents, for
the sake of analogical knowledge transfer. Structures created with this methodology
yield spaces of patents that are meaningfully arranged into labeled clusters, and
labeled regions, based on their functional similarity or surface content similarity.
Examples show that cross-domain analogies and transfer of knowledge based on
functional similarity can be extracted from the function based structures, and even
from the surface content based structures as well.
        </p>
        <p>More generally, LSA has mixed reception due to its inability to match observed
data, for example predicting human word associations. This is due to the nature of the
spatial representation that is intrinsic to LSA, forcing symmetry in similarity of words
and imposition of the triangle inequality, among others. While these criticisms are
valuable, they are at the word-to-word comparison level, which may or may not
become trivial with very large corpuses and repository sizes.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Research Methods</title>
      <sec id="sec-3-1">
        <title>Theoretical Approach</title>
        <p>LSA gives a direct comparison between different patents in the form of a cosine
similarity matrix, where document similarities range from -1 (the two documents are
complete opposites) to 1 (the two documents are the same). However, LDA works a
bit differently, in that it assigns the words of a document to different topics, and has
no output that directly compares documents. However, using a document vector
technique, described in a subsequent section on the implementation of LDA, it is
possible to use the data output from LDA to build a matrix of document similarities.</p>
        <p>For the purposes of comparison, the actual values within the document-similarity
matrices obtained from LSA and LDA are not important. In order to compare the two
methods, only the order of similarity between documents was used. This was done by
organizing the document-similarity matrices so that for a given column, every row
down, starting from the second, represents a document that is less similar to the
document in the first row than all of the documents above it (see Fig. 1).</p>
        <p>By comparing the order in which documents were rated on similarity between LSA
and LDA, it is possible to judge how similar or different the results of the two
methods are. In the case that the two methods yield substantially different results, a
qualitative analysis can be done to determine if one method better sorts based on
functionality. There are many ways to go about this, but one effective check is to look
at the top 50 rows in the document-similarity matrices, and count the average number
of patents with the same core functions (determined by first author, not automated),
then see which method yielded a greater number.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Selection</title>
        <p>Patents were selected from a set of all US CPC patents found in the bulk data storage
system of the United States Patent and Trademark Office (USPTO) at
Document 1 Document 2 Document 3 Document 4</p>
        <p>Most Most Most Similar Most Similar
Similar Similar
2nd Most 2nd Most 2nd Most 2nd Most
Similar Similar Similar Similar
3rd Most 3rd Most 3rd Most 3rd Most</p>
        <p>Similar Similar Similar Similar
Fig 1. Example of Document Comparison Matrix
https://data.uspto.gov/data2/patent/classification/cpc/. For this study, only patents
from the CPC section F, the section for mechanical patents, were used. Any patents
that were cross-listed under multiple CPC sections were removed from the study’s
dataset in order to reduce the scope of the data for document matching, in an effort to
get more coherent results from both the LDA and LSA methods. In addition,
any withdrawn patents were removed from the dataset. Finally, any patent
number that is below 3,930,270 is not accessible on the USPTO search online and
was removed.</p>
        <p>Once the study dataset was finalized, four patents were selected manually for
a small-scale test. For the large-scale test, 996 patents were selected using a
pseudo-random number generator built into MatLab. 996 patents were chosen to
3ha.3ve 100D0 aptaatePnrtse,-Pinrcolcuedsisnigngthe four from small-scale test.</p>
        <p>Both LDA and LSA take a word by document matrix as an input. Each row represents
a word from the entire dataset, and each column represents a patent. Each location in
the matrix has a number that corresponds to the number of times the word designated
by the row appeared in the document designated by the column. Before this word by
document matrix was created, however, some pre-processing was done on the data.</p>
        <p>First, a program was created to read the patents and retain only words from the
abstract, description, and claims sections. These sections are the most representative
of the mechanical nature of a patent. In addition, symbols and numbers were removed
from the dataset. Next, the entire dataset was run through a spellchecker to remove
any misspelled words. Then, words contained in a list of “stop words” were removed,
which are words deemed to have no value in describing the mechanical qualities of a
patent. For even further reduction, any words common to 90% or more of the patents
were removed, further reducing words that do not distinguish one patent from
another. The 90% cutoff was chosen through experimentation. When lower than
~80%, words that are important mechanical descriptors were excluded. The cutoff was
set to 90% to include a margin of error.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Latent Semantic Analysis (LSA)</title>
        <p>LSA gives a direct output of document similarities in the form of a cosine similarity
matrix. Values range from -1 to 1, where -1 represents two documents that are
complete opposites, and 1 represents two of the same document. This output is
sufficient to create a matrix whose columns each represent a document and whose
rows contain documents in their order of similarity to the document associated with
the column they are in. This matrix is the desired output for this study, and no further
processing is needed once it is obtained.
3.5</p>
      </sec>
      <sec id="sec-3-4">
        <title>Latent Dirichlet Allocation (LDA)</title>
        <p>Unlike LSA, LDA does not directly output document similarities. Instead, LDA
outputs a matrix, z, whose rows represent all the words in the dataset, and columns
represent all the documents. Each value in the matrix represents a topic that the word
represented by the row and column is assigned to by the LDA algorithm. The user
specifies the total number of topics that the words are sorted into, and each value in
the matrix ranges between 0 and the user-defined number of topics.</p>
        <p>LDA was run with different numbers of topics until a good topic range was found
for the dataset. This range is determined by looking at the word-topic assignments
output for each number of topics. If individual topics are judged, subjectively, to
contain groups of words that should belong to more than one topic, then the algorithm
is run again with more topics. If there are many empty or sparsely populated topics,
the algorithm is run with fewer topics. For the small-scale test, experiments were run
with k = 2, 4, and 6 to see what number of topics is appropriate for the comparison.
For the large-scale test (1000 patents), 150 topics provided the best sorting.</p>
        <p>In order to compare documents, it is necessary to represent a document’s subject
matter by the topics found within that document. For this study, this was done using
the “document vector” method. In this method, each document is represented as a
vector whose length is equal to the total number of topics. Each component of the
vector represents a topic, so the first component represents topic 1, the second, topic 2,
and so on. Each component of the document vector is then assigned a value that is
equal to the number of words in the document that were assigned to that topic. So, if a
document had 20 words assigned to topic 3, the third component of the vector would
have a value of 20. Next, the vector is normalized by dividing it by the total number of
words in the document it represents. In order to compare two documents, one subtracts
their document vectors, then takes the magnitude of the resulting vector, the L2 norm.
The lower the magnitude of this resulting vector, the more similar the documents are.</p>
        <p>The magnitudes of the differences of these document vectors can be considered
similarity scores, where a lower score corresponds to a higher similarity. Having these
scores, it is possible to create a matrix which orders documents based on their
similarity, the same way it was done for the LSA output.
3.6</p>
      </sec>
      <sec id="sec-3-5">
        <title>Data Post-Processing</title>
        <p>The final step is to compare the document similarity matrices output by LSA and
LDA. If only minor differences can be found between them, it can be concluded that
LSA and LDA are more or less equal in their ability to sort mechanical patents.
However, if the two matrices differ significantly, the more effective algorithm is
determined by looking at the top 50 documents in each column of the matrices, and
counting the number of documents with the same core functions. The core functions
of a mechanical patent must be subjectively determined.
3.7</p>
      </sec>
      <sec id="sec-3-6">
        <title>Color Coded Comparison for Large Scale Test</title>
        <p>In order to compare the document similarity matrices outputs from LSA and LDA
algorithms, column with same reference documents from LSA and LDA output
matrices were individually compared. Each column of the matrices was divided into
groups of 100, starting from most similar to least similar. One group each from LSA
and LDA that is in the same ranking group are directly compared by how many
number of the same documents are in that group. The group gets assigned with a color
according to the percentage of similarity and each document in that group shows the
same color in the document similarity color matrix. Each color with range of
percentage match is shown in Fig. 4. Since the most number of matches in one group
was under 35, each color has 3 percent range except for the last dark green color.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <sec id="sec-4-1">
        <title>Small Scale Test Case</title>
        <p>For the small-scale test case, LSA and LDA algorithms were performed on full patent
text, functional (verb-based) patent text, and surface (noun-based) patent text to
compare the results from LSA to LDA vice versa. Patents chosen for this test case
were two pairs of functionally similar technologies, as show in Fig. 2, with Docs 1 and
2 relating to archery, and Docs 3 and 4 relating to power generation. By performing
this very small-scale test case, we hoped to be able to dissect why LSA and LDA
might behave differently in their categorization of patents. LDA algorithm was
performed with three different number of topics, 2, 4, and 6. The result from LDA
with 4 topics was most similar to the result of a LSA. There was no particular pattern
or similarity between the results from LSA and LDA with topic number of 2 and 6,
which indicates that the number of topics is a crucial parameter to the categorizations.</p>
        <p>Doc # Patent # Patent Title
1 3942506 Demountable archery bow
2 3957027 Take-down and folding bow
3 7174710 Nelson flywheel power plant improvement
4 7363760 Thermodynamic free walking beam engine</p>
        <p>Fig 2. Patents Included in Small Scale Test Case</p>
        <p>The results from the small-scale test case are shown in Fig. 3A, 3B, and 3C. The
first row of each table is named “reference document” in this paper, as all the
subsequent documents are ordered below it depending on their similarity to that
reference document. Full patent text comparison between two algorithms shows the
best match with a minor discrepancy in the last two rows of the second column as
shown in Fig. 3A. The order of Docs 3 and 4 is switched in the two methods, which,
given that they are both power generation technologies, is not alarming. The
functional patent text comparison in Fig. 3B shows the next best match. Although
there is a discrepancy in every column of the matrix, it is interesting to note that the
most similar document in each column is the least similar document in different
methods, while the two other documents are in the same order. In the first column of
the table, Doc 2 in LSA is the most similar text to Doc 1 while it is the least similar
text in LDA’s result (as shown by the red outlines in Fig. 3B). The same pattern
applies to all columns. The surface patent text comparison in Fig. 3C shows no
similarity or pattern between the results of the two methods. Although the LSA result
in Fig. 3C is identical to the result of the full patent text LSA results in Fig. 3A, there
are too many dissimilarities to compare to the LDA results in Fig. 3C.
t Doc 1
xTe Doc 2
A llFu Doc 4</p>
        <p>Doc 3</p>
        <p>LSA
Doc 2
Doc 1
Doc 3
Doc 4</p>
        <p>Doc 3
Doc 4
Doc 2
Doc 1</p>
        <p>Doc 4
Doc 3
Doc 2
Doc 1</p>
        <p>Doc1
Doc 2
Doc 4
Doc 3</p>
        <p>LDA (4 Topics)
Doc 2 Doc 3
Doc1 Doc 4
Doc 4 Doc 2
Doc 3 Doc1</p>
        <p>Doc 4
Doc 3
Doc 2
Doc1
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Large Scale Test Case</title>
        <p>In large-scale test case, 1000 random mechanical patent documents, including the 4
patent documents used earlier in small-scale test case, were selected to perform LSA
and LDA algorithms. In the large-scale test, the LDA algorithm was performed with
150 topics. The results are shown in Fig. 4A, B and C. For all three types of text
comparisons, the results show more green color that is above 30 percent match, in the
first group of 100 as compared to the rest. The first group in each column is the group
of top 100 ranked documents that each algorithm ranked to be more similar to the
reference document than the rest. Also, for all cases, more similarity appeared in the
first and the last groups, and less similarity appeared in the middle region.</p>
        <p>For in-depth analysis, the results for the large-scale test were analyzed to
determine whether they are consistent with those of the small-scale test. The reference
document of the fifth column is Doc 1 from the small-scale test. The LSA results
of the small and large-scale test agreed in terms of the order of the four selected
documents. However, the LSA result in the large-scale test was not so effective
in sorting the patents by the functional similarity. Doc 2, which is thought to be
the most similar document to the reference document, was 231st similar document
for large-scale test.</p>
        <p>Instead, the LDA result for the functional patent text was better at sorting the
functionally related documents in the first group. Same as the first column of LSA
result, the reference document is Doc 1, which describes the functional component of a
bow. The fifth column also includes two more bow related documents in the first
group 100, specifically in ranked 20th and 23rd. However, this was only true for this
column and no similar pattern was observed in the other three examples.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <sec id="sec-5-1">
        <title>Comparison of LSA and LDA for Small Scale Test Results</title>
        <p>LDA requires a defined number of topics as an input parameter. For the small-scale
test, Fig. 3A and 3B indicate that LSA and LDA with 4 topics gives similar results for
the full patent text and functional patent text respectively. The consistency in both
cases suggests that using 4 topics is more appropriate than 2 or 6 topics as the input
parameter when LDA is performed with 4 patent documents. It is still unknown
whether LDA is effective in categorizing patent documents and how an appropriate
number of topics can be determined. Therefore, the empirical finding in the
smallscale test could be important in deciding whether LDA is appropriate for
analyzing patents. Given that Fu et al. succeeded in applying LSA as effective
method for categorizing patents at a small scale, the underlying hypothesis is that it
could be more effective than LDA at large scale.</p>
        <p>The functional text comparison in the small-scale test shows an interesting pattern
in the order of the doc-doc similarity matrix. Although all columns in the matrix
shown in Fig. 3B show discrepancies, the results resemble each other if the order
of the most similar document and least similar document in a column are switched.
The fact that the same rule applies to every column in Fig. 3B shows that there are
similar documents in the middle region of the comparison matrix, while completely
different documents in the regions farther away from the middle. When the
documents are analyzed by function, LSA is more accurate than LDA in sorting
them. For instance, Doc 2 should have matched functionally with Document 1 as
they both describe the component of a bow. However, this is only true for results for
LSA. Further research on small- and large-scale tests is required draw conclusions
about these algorithms. Unlike the comparison of the functional text in the
smallscale test, the surface text does not show any similarity in between the results of the
t5w.2o methCodosm. parison of LSA and LDA for Large Scale Test Results
For all cases, the similar documents are more apparent in the top and bottom groups of
100 patent documents. The fact that both methods agree on the most and least similar
documents can help designers to look at the two groups for near-field or far-field
analogies. Depending on the goal of the designer, they could analyze similar
documents or dissimilar documents during design ideation. However, the conclusion
that the groups are internally similar among the patents contained within them is
tenuous, as the best percentage match is approximately 30% and the rest are mostly
below 10%. This may be due to the lack of well-established methods to choose the
number of LDA topics, or to the diverse nature of language and particularly of
articulation of technologies within patents. Especially for the large-scale test, it is
unrealistic to test different numbers of topics until the best result is achieved.
5.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Future Directions</title>
        <p>Future work includes examining the data more closely to understand why and how
patents are categorized, and how that changes with scale. A method to determine the
best number of topics for the LDA algorithm is much needed. Ultimately, the goal is
to make a recommendation regarding the underlying method that should be used to
analyze and categorize patents based on their textual content, but further work must be
done prior to that recommendation.</p>
        <p>By mining the textual content of the patent database at an increasing scale, we can
start to access the wealth of knowledge contained in the historical records of invention
and technology. The computational techniques compared in this paper provide a way
to quantitatively evaluate similarity (and thus distance) between source analogies. In
the future, when deployed at a large scale with interactive data visualization, these
techniques will open up computationally supported analogy to a much larger audience.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>H.</given-names>
            <surname>Casakin</surname>
          </string-name>
          , Goldschmidt,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,
          <article-title>"Expertise and the Use of Visual Analogy: Implications for Design Education,"</article-title>
          <source>Design Studies</source>
          , vol.
          <volume>20</volume>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>175</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Christensen</surname>
          </string-name>
          , Schunn,
          <string-name>
            <surname>C. D.</surname>
          </string-name>
          ,
          <article-title>"The Relationship of Analogical Distance to Analogical Function and Preinventive Structure: The Case of Engineering Design,"</article-title>
          <string-name>
            <given-names>Mem.</given-names>
            &amp;
            <surname>Cog</surname>
          </string-name>
          ., vol.
          <volume>35</volume>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I.</given-names>
            <surname>Tseng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cagan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotovsky</surname>
          </string-name>
          ,
          <article-title>"The role of timing and analogical similarity in the stimulation of idea generation in design," Design Stud.</article-title>
          , vol.
          <volume>29</volume>
          , pp.
          <fpage>203</fpage>
          -
          <lpage>221</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Leclercq</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Heylighen</surname>
          </string-name>
          ,
          <article-title>"5.8 Analogies per Hour,"</article-title>
          <source>in Art. Int. in Des. '02</source>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>Netherlands</given-names>
          </string-name>
          ,
          <year>2002</year>
          , pp.
          <fpage>285</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>I.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>"Cluster-based Patent Retrieval," Inf</article-title>
          .
          <source>Proc.&amp;Mgmt</source>
          ,
          <volume>43</volume>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. S.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.-W.</given-names>
            <surname>Soo</surname>
          </string-name>
          ,
          <article-title>"Extract conceptual graphs from plain texts in patent claims,"</article-title>
          <source>Engineering Applications of Artificial Intelligence</source>
          , vol.
          <volume>25</volume>
          , pp.
          <fpage>874</fpage>
          -
          <lpage>887</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>G.</given-names>
            <surname>Ferraro</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Wanner</surname>
          </string-name>
          ,
          <article-title>"Towards the derivation of verbal content relations from patent claims using deep syntactic structures," Knowledge-Based Sys</article-title>
          ., vol.
          <volume>24</volume>
          , p.
          <fpage>1233</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>K.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>"Discovering and Exploring Structure in Design Databases</article-title>
          and
          <article-title>Its Role in Stimulating Design by Analogy,"</article-title>
          <source>Ph.D. Dissertation</source>
          , Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>K.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotovsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>"Discovering Structure in Design Databases Through Function and Surface Based Mapping,"</article-title>
          <source>Journal of Mech</source>
          . Design, In Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>K.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schunn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>"The Meaning of "Near" and "Far": The Impact of Structuring Design Databases and the Effect of Distance of Analogy on Design Output,"</article-title>
          <source>ASME Journal of Mechanical Design</source>
          , In Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>M.</given-names>
            <surname>Moehrle</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Geritz</surname>
          </string-name>
          ,
          <article-title>"Developing acquisition strategies based on patent maps," presented at the 13th IAMOT</article-title>
          , Washington, D.C.,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>S.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Giereth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ertl</surname>
          </string-name>
          ,
          <article-title>"Iterative Integration of Visual Insights during Patent Search and Analysis," presented at the IEEE Symposium on Visual Analytics Science</article-title>
          and Technology, Atlantic City, NJ, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>S.</given-names>
            <surname>Mukherjea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bhuvan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kankar</surname>
          </string-name>
          ,
          <article-title>"Information Retrieval and Knowledge Discovery Utilizing a BioMedical Patent Semantic Web,"</article-title>
          <source>IEEE Trans. on Know. &amp; Data Eng.</source>
          , vol.
          <volume>17</volume>
          , pp.
          <fpage>1099</fpage>
          -
          <lpage>1110</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <article-title>"Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies,"</article-title>
          <source>The VLDB Journal vol. 7</source>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>178</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Altshuller and R. B. Shapiro</surname>
          </string-name>
          ,
          <article-title>"On the psychology of inventive creation (in Russian)," The Psychological Issues</article-title>
          , vol.
          <volume>6</volume>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>R.</given-names>
            <surname>Duran-Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leon-Rovira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aguayo-Tellez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Said</surname>
          </string-name>
          ,
          <article-title>"Inventive Problem Solving Based on Dialectical Negation, Using Evolutionary Algorithms and</article-title>
          TRIZ Heuristics,
          <article-title>" Computers in Industry</article-title>
          , vol.
          <volume>62</volume>
          , pp.
          <fpage>437</fpage>
          -
          <lpage>445</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Okudan</surname>
          </string-name>
          ,
          <article-title>"Systematic Ideation Effectiveness Study of TRIZ," presented at the ASME IDETC/CIE</article-title>
          , Chicago, IL, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Okudan</surname>
          </string-name>
          ,
          <article-title>"Experimental Assessment of TRIZ Effectiveness in Idea Generation," presented at ASEE AC</article-title>
          , San Antonio, TX, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>V.</given-names>
            <surname>Krasnoslobodtsev</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Langevin</surname>
          </string-name>
          ,
          <article-title>"TRIZ Application in Development of Climbing Robots," presented at the First TRIZ Symposium</article-title>
          , Japan,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and J.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>"Patent Analysis with Text Mining for TRIZ," presented at the IEEE ICMIT</article-title>
          , Bangkok, Thailand,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. T. Nakagawa,
          <article-title>"Creative Problem-Solving Methodologies TRIZ/USIT: Overview of my 14 Years in Research, Education, and Promotion," The Bulletin of the Cultural and Natural Sciences in Osaka Gakuin University</article-title>
          , vol.
          <volume>64</volume>
          ,
          <year>March 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Nix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sherret</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <article-title>"A Function Based Approach to TRIZ," presented at the ASME IDETC/CIE</article-title>
          , Washington,
          <string-name>
            <surname>D.C.</surname>
          </string-name>
          , USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Cha, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>"A Conceptual Design Model Using Axiomatic Design, Functional Basis and TRIZ," presented at the</article-title>
          <source>Proceedings of the 2007 IEEE IEEM</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>R.</given-names>
            <surname>Houssin</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Coulibaly</surname>
          </string-name>
          ,
          <article-title>"An Approach to Solve Contradiction Problems for Safety Integration in Innovative Design Process," Comp</article-title>
          . in Industry, vol.
          <volume>62</volume>
          , p.
          <fpage>398</fpage>
          -
          <lpage>406</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>"Creativity in Transactional Design Problems: Non-Intuitive Findings of an Expert Study Using Scamper," presented at the Int</article-title>
          . Design Conference,
          <string-name>
            <given-names>Human</given-names>
            <surname>Behav</surname>
          </string-name>
          . and Des.,
          <string-name>
            <surname>Dubrovnik</surname>
          </string-name>
          , Croatia,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>A.</given-names>
            <surname>Dong</surname>
          </string-name>
          , W.,
          <string-name>
            <given-names>H. A.</given-names>
            ,
            <surname>Agogino</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M.</surname>
          </string-name>
          ,
          <article-title>"A Document Analysis Method for Characterizing Design Team Performance,"</article-title>
          <source>Journal of Mechanical</source>
          Design vol.
          <volume>31</volume>
          , pp.
          <fpage>011010</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>D. M. Blei</surname>
          </string-name>
          ,
          <article-title>"Probabilistic Topic Models,"</article-title>
          <source>Comm. of the ACM</source>
          , vol.
          <volume>55</volume>
          , p.
          <fpage>77</fpage>
          -
          <lpage>84</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>D. M. Blei</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>M. I.</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          ,
          <article-title>"Latent Dirichlet Allocation,"</article-title>
          <source>J. Mach. Learn. Res.</source>
          , vol.
          <volume>3</volume>
          , pp.
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>J. McAuliffe</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>D. Blei</surname>
          </string-name>
          ,
          <article-title>"Supervised topic models,"</article-title>
          <source>Adv. Neur. Inf. Proc. Sys</source>
          .,
          <fpage>121</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>J. Jagarlamudi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Daume</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Udupa</surname>
          </string-name>
          ,
          <article-title>"Incorporating Lexical Priors into Topic Models," presented at the EACL '12</article-title>
          ,
          <string-name>
            <surname>Avignon</surname>
          </string-name>
          , France,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>D. M. Blei</surname>
            ,
            <given-names>T. L.</given-names>
          </string-name>
          <string-name>
            <surname>Griffiths</surname>
            , and
            <given-names>M. I.</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          ,
          <article-title>"The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies,"</article-title>
          <source>J. ACM</source>
          , vol.
          <volume>57</volume>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <given-names>D.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Asuncion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Smyth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>"Distributed Algorithms for Topic Models,"</article-title>
          <source>J. Mach. Learn. Res.</source>
          , vol.
          <volume>10</volume>
          , pp.
          <fpage>1801</fpage>
          -
          <lpage>1828</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33. L.
          <string-name>
            <surname>El Ghaoui</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>G.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>V.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Duong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Srivastava</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bhaduri</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Understanding Large Text Corpora via Sparse Machine Learning," Stat. Anal. &amp; Data Mining</source>
          , vol.
          <volume>6</volume>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>242</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <article-title>"Regression shrinkage and selection via the LASSO</article-title>
          ,
          <string-name>
            <surname>" J R Stat Soc Ser</surname>
            <given-names>B</given-names>
          </string-name>
          , vol.
          <volume>58</volume>
          , pp.
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35. H.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hastie</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <article-title>"Sparse principal component analysis,"</article-title>
          <source>J Comp. Graph. Stat.</source>
          , vol.
          <volume>15</volume>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>286</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Seung</surname>
          </string-name>
          ,
          <article-title>"Learning the parts of objects by non-negative matrix factorization,"</article-title>
          <source>Nature</source>
          , vol.
          <volume>401</volume>
          , pp.
          <fpage>788</fpage>
          -
          <lpage>791</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>J. Choo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D. K.</given-names>
          </string-name>
          <string-name>
            <surname>Reddy</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Park</surname>
          </string-name>
          , "UTOPIAN:
          <string-name>
            <surname>User-Driven Topic Modeling BasedonInteractive Nonnegative Matrix Factorization</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>IEEE Trans. Vis. and Comp</source>
          . Graph., vol.
          <volume>19</volume>
          , pp.
          <fpage>1992</fpage>
          -
          <lpage>2001</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <article-title>"Indexing by Latent Semantic Analysis,"</article-title>
          <source>J. of the Amer. Soc. for Inf. Sci</source>
          . vol.
          <volume>41</volume>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>407</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Foltz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kintsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <article-title>"The Measurement of Textual Coherence with Latent Semantic Analysis,"</article-title>
          <source>Discourse Processes</source>
          vol.
          <volume>25</volume>
          , pp.
          <fpage>285</fpage>
          -
          <lpage>307</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40. T. Landauer,
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Foltz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Laham</surname>
          </string-name>
          ,
          <article-title>"An Introduction to Latent Semantic Analysis,"</article-title>
          <source>Discourse Processes</source>
          vol.
          <volume>25</volume>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>284</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41. T. Landauer, Dumais,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ,
          <article-title>"A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge," Psych</article-title>
          . Rev.,
          <volume>211</volume>
          -
          <fpage>40</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>