=Paper=
{{Paper
|id=Vol-1584/paper20
|storemode=property
|title=Exploring Web-based Visual Interfaces for Searching Research Articleson Digital Library Systems
|pdfUrl=https://ceur-ws.org/Vol-1584/paper20.pdf
|volume=Vol-1584
|authors=Maxwell Fowler,Chris Bellis,Christopher Perry,Beomjin Kim
|dblpUrl=https://dblp.org/rec/conf/maics/FowlerBPK16
}}
==Exploring Web-based Visual Interfaces for Searching Research Articleson Digital Library Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-1584/paper20.pdf</pdf>
<pre>
 Maxwell Fowler et al.                                           MAICS 2016                                                    pp. 55–62


  Exploring Web-based Visual Interfaces for Searching Research Articles
                      on Digital Library Systems
                          Maxwell Fowler, Chris Bellis, Christopher Perry, Beomjin Kim
                                                       Department of Computer Science
                                                     Indiana University-Purdue University
                                                           Fort Wayne, IN, U.S.A.
                                                     maxfwlr@gmail.com, kimb@ipfw.edu


                             Abstract                                         authors, the publication the work appeared in if applicable,
Previous studies that present information archived in digital librar-         and other basic meta-data at first. No profile of underlying
ies have used either document meta-data or document content. The              document content is provided, which can make finding the
current search mechanisms commonly return text-based results                  best sources a tedious task which requires reading through
that were compiled from the meta-data without reflecting the un-              the plaintext directly. Further, some search systems do not
derlying content. Visual analytics is a possible solution for improv-
ing searches by presenting a large amount of information, includ-
                                                                              adequately search document content, instead relying upon
ing document content alongside meta-data, in a limited screen                 users to already know the document they wish to retrieve.
space. This paper introduces a multi-tiered visual interface for
searching research articles stored in Digital Library systems. The               The lack of search depth caused by not searching docu-
goals of this system are to allow users to find research papers about         ment content is exacerbated by the use of non-intuitive, text
their interests in a large work space, to see how document content            based results. This is not an effective form of data represen-
relates to a search terms, and to refine their search queries using           tation. Displaying a large amount of text in a column does
document content. The current, under development pilot system
successfully presents graphical illustrations of search results pro-
                                                                              not provide an efficient way to traverse search results and
duced from both meta-data and underlying content in an intuitive              pinpoint desired content. At best, text based searches can
visual interface that will assist user’s search activities. With minor        prioritize results on the title that best matches the desired
modification, the proposed system can be applied to a variety of              search terms or upon a hidden document relevance score,
other text-based data repositories.                                           which does not help a user see why a given paper is the best
                                                                              choice. Further, many text based search systems on digital
Keywords - Digital libraries; Visualization; Unstructured text con-           libraries lack an intuitive way to determine the relationships
tent; Visual analytics                                                        between titles, the content in documents, and the relation-
                                                                              ships between different documents.
                         Introduction
                                                                                 Visualizations allow data to be presented in manners that
Academic paper writing leverages online corpora as one of                     are more interconnected and readily processable. This is ac-
the sources for references to prior work and to build upon                    complished by leveraging users’ perceptual cognition. Stud-
previous results. Most corpora are hosted on services aimed                   ies have already shown such leveraging leads to faster data
at easing the search process; digital libraries such as the                   consumption and a higher quality of understanding (Card
ACM Digital Library and the Library of Congress provide                       1999, Veerasamy 1997). Such visualization work has al-
books, articles, and other forms of media, while services                     ready been applied to some forms of digital libraries in the
such as Google Scholar focus on journal papers. While val-                    past. University of Maryland’s GRIDL, for example, pre-
uable as knowledge repositories, these services lack in their                 sents digital libraries using two hierarchical axes with topics
ability to present information in a way that helps lead to eas-               on one axis and publication years on another (Shneiderman
ier, more informed decisions when determining which aca-                      2000). The density of documents for that topic and publica-
demic papers to read and reference.                                           tion year are then displayed as bar graphs, split between the
                                                                              different kinds of digital media in the library. Visualizations,
   Current digital library systems suffer from limiting stand-                such as GRIDL, allow large quantities of data to be dis-
ards and provide only superficial information in their search                 played in a coherent format that is tailored for user ease and
results. Most archiving systems display the title of a work,                  document content exploration.


                                                                         55
 Maxwell Fowler et al.                                     MAICS 2016                                                     pp. 55–62


   Visualization has been used in the past in order to sim-                While the work already done is valuable, we see a place
plify searching document repositories. Most of these visual             for future development. The current visualization work can
approaches have used some form of graphical representation              be applied to other data domains, such as social networking
to better show links between papers within a document and               data and unstructured text content. Unstructured content
the overall document spread in a repository.                            presents a number of issues. Such documents can have dif-
                                                                        ferent layouts from one another. Even within a specific do-
  Some visualization work used a graphing approach with                 main, such as research articles, the structure can be different.
axes. ActiveGraph, developed by Marks et al, used scatter               While most articles contain similar sections, such as an in-
plots with customizable axes (Marks 2005). These axes, the              troduction and a method section, there is no guarantee arti-
X, Y, and Z axes, could be set to any of the kinds of meta-             cles use the same layout. Sacks-Davis and Ron et al. dis-
data discussed earlier. ActiveGraph took a repository wide              cussed the subject of structuring text content to be indexable
approach; it did not get into underlying document content,              and queryable, but did not consider visual approaches or
but did allow an at-a-glance look at the entire repository              building indexes for journal papers dynamically (Sacks-Da-
based on specific meta-data.                                            vis 1997). Development in this field, utilizing visual analytic
                                                                        techniques, will assist researchers in finding references for
   Others used different graphical representations. Rushall             their work.
et al. and Lin focused on self-organizing maps that could be
directed at a document repository or single book to display                To assist researchers, a visual search focused on research
the types of documents in a workspace or the topics con-                articles in Digital Library systems would be useful. This sys-
tained in a repository (Rushall 1996, Lin 1996). These maps             tem requires indices to exist for the content in the papers.
were useful for quickly searching for documents in a visual             These indices need to be searched in a way that will help
fashion. The search showed the contents of a workspace in               users make educated decisions on their paper selections. Pa-
a visual form, allowing the user to quickly parse out the               pers do not tend to have indices, which mandates that an in-
kinds of documents provided. This system still lacked a link            dex is built for papers in some fashion in order to be reason-
between the superficial meta-data and document content,                 ably searched. This is work previously undone, as prior sys-
though. While preferable to a text search, the work was still           tems that used indices used pre-built ones, such as Greg et
plagued by the limiting factor of judging a book by its cover           al.’s work, and is a topic we need to address.
- using the title, but not the actual content within the docu-
ment.                                                                      In addition, a good search term must support the ability
                                                                        for users’ queries to undergo search refinement. A search
   To address the limitations of only leveraging meta-data              should not only find documents related to given topics, but
and not document content, Short et al. developed a multi-               should allow the user to refine their search using different
tiered visual interface for digital libraries (Short 2014). This        terms they discovered during the search. Short et al. ap-
work used the indices of textbooks to index books based                 proached this subject by showing chapter content from
upon their overall content and content by chapter. The mul-             books. This is a limitation as current search refinement fo-
tiple tiers focused on different representations. The first tier        cuses specifically on content that is searched for. Related
compared books to each other based on desired search                    content and words that may be synonymous to desired con-
terms. This tier was similar to work such as ActiveGraph,               tent are currently unexplored angles for search refinement.
leveraging a similar overall interface with a more regi-
mented coordinate system rather than a scatterplot. This al-               We propose a system which will bring the visual aspect
lowed for document screening based on meta-data like be-                and automatic indexing aspect together into one, targeted at
fore. Clearly unrelated titles, works that were too old to be           assisting researchers in searching text corpora and refining
useful, or works with bad reviews could be safely ignored.              their searches through intermediate results. This paper will
                                                                        introduce an ongoing development of a system; a two-tiered
   Other tiers took a content based approach to visualization.          visualization web application that displays research articles
By leveraging the index of textbooks, Short et al. were able            with titles and associated content in a graphical format. The
to directly allow exploration of document content. The vis-             first tier will provide a high level profile of the kinds of doc-
ualizations showed the layout of the book’s index and the               uments in a repository and how related these documents are
presence of search terms on a by-chapter basis using the                to desired search terms. These relationships will also show
book’s index. Searching for topics showed not only books                a relationship between papers, by proxy. The second tier
on the subject, but also exposed the relevant content within.           has been designed with the idea of search refinement in
This allowed the search to be used to more easily select the            mind. It displays the frequency of search terms in the paper,
best sources, based on how much they covered the desired                as well as synonyms, terms related to the search terms, and
search topic.                                                           compound terms created by coexisting words. This paper


                                                                   56
 Maxwell Fowler et al.                                   MAICS 2016                                                    pp. 55–62


presents the current prototype of the system developed to             the blue term. Each search term is shown in the visualization
assist users’ searches on research articles in Digital Library        in a circle of the term’s color. From this point forward, all
systems.                                                              shapes in the visualization are referred to as nodes.

                      Methodology                                        The documents appear in the visualization as nodes as
                                                                      well. Only documents that include at least one of the afore-
The prototype system consists of three major components:              mentioned search terms are placed on the visualization.
index generation, query processing, and visualization. The            Node size is determined absolutely, with the largest docu-
index generation module analyzes the underlying content of            ment in a repository having the largest node and the smallest
a Digital Library’s research papers and constructs an index           document the smallest node. Node size is capped at 30 pix-
for each of them. Query processing is an underlying process           els, with any documents that would have a larger size being
that connects the indices with the visualizations. The visual-        set to 30. The number of documents displayed by the search
ization itself is implemented in two tiers. Tier 1 presents an        is a user defined number, using a slider to change for more
overview of the document base and the high level relation-            tightly focused or broader reaching searches.
ships among the documents and the query’s search terms to
guide user’ selection of documents. Tier 2 provides a con-               The nodes are positioned to show correlation between
tent analysis of a specific document from the Tier 1, show-           each document and the search terms in the query. A force
ing terms related to the search query for the sake of search          directed graph is used for the layout, specifically d3’s im-
refinement. The methodology section is designed around                plementation of Dwyer’s algorithm (Bostock 2011, Dwyer
looking for information about thread-based programming                2009). Each document has tension directed towards the
and architectures. We used the terms “thread,” “process,”             search query nodes. The tension force is directly linked to
and “cpu” as our search terms.                                        the relation between a document and a term. A search term
                                                                      that a document has no relation to will provide 0 tension.
                                                                      Documents that feature all three terms will tend to be pushed
                    Index Generation
                                                                      into the middle of the visualization, while documents that
                                                                      only feature two terms will appear between only those two
Our indexing system was developed using well known Lu-                terms and not appear in the middle. We also use a simple
cene libraries and is not a major focus of our research               collision algorithm to prevent node overlap. The document
(Apache 2015b). The documents are first extracted into                is placed in the triangle defined by the search term nodes.
plain text in order to ensure a consistent format. Using Lu-          Documents with all three search terms equally weighted
cene, common words and other characters deemed to be gar-             within it will be placed in the middle, equidistantly. Papers
bage are removed from the text. This is to prevent such               more related to a specific term will be placed closer to them,
words impeding the index searching process. The text is               as they have a higher tension toward that search term than
then stored into a data structure which maintains a word              the others.
count, as well as information on which sentences in each
document contain which words. Together, these structures                 Document relevancy is determined using Lucene’s Term
serve as a searchable index for the document base.                    Frequency-Inverse Document Frequency (TF-IDF) algo-
                                                                      rithm. TF-IDF is frequently used in data and text mining ap-
                  Tier 1 Visualization                                plications. The score for a term increases if a term appears
                                                                      often in a document or if that particular word is uncommon
The Tier 1 visualization provides a profile of the entire doc-        (Apache 2015a). Overall, documents are favored for having
ument base. The intent of Tier 1 is to show the best papers           a large number of desired words. The score for words is used
for a user’s search query in the digital library being used.          for both document relevancy to a single search term, for the
Tier 1’s search is based upon title info and the indices of           node positioning, and overall document relevancy.
each document. The aspects considered for each document
are the length of the document, the relevancy the documents              Overall document relevancy combines the relevancy
have to specific terms in a search query, and the relevancy           scores for all three search terms to give each document node
the documents have to the entire search as a whole.                   a color. The most relevant paper, determined by having the
                                                                      highest overall score for all search terms, will be black. Less
   The user’s search query is directly represented in the vis-        relevant papers appear white, with papers in the middle fall-
ualization. Each search query is three terms, with each term          ing somewhere on the grayscale in between. Black contrasts
going into a different colored box. The colored boxes are             well with lighter colored nodes around it, making it a good
red, green, and blue, which are the primary additive colors           color to indicate the best papers. Our basis for this decision
used in computer science. In our examples below, “thread,”            came from color theory and digital graphics design (Foley
is the green term, “process,” is the red term, and “cpu,” is          1996).


                                                                 57
 Maxwell Fowler et al.                                      MAICS 2016                                                   pp. 55–62


   When a node is selected on the visualization, the node is                The formula scores the probability of a potential term be-
highlighted and the paper’s supplementary data is shown in               ing related to a base term by comparing how many hits the
a tooltip. Figure 2.1 shows a sample of the Tier 1 visualiza-            potential and the base have together over the number of hits
tion with the best paper selected. Note that the best paper is           only the base does in the document set. For each of the two
not necessarily the largest, as in our sample query one of the           terms that use the score, we will go into specific detail.
smaller papers has the best overall search results. When a
paper is selected, the title of the paper is shown, which acts              Related terms are defined as non-synonyms that appear in
as a link to the PDF. The author and conference are provided             the same sentence as a search query’s term. These terms can
as well. Finally, a link to the second tier visualization is pro-        help refine a user’s search query by showing them words
vided to link between the two tiers.                                     that commonly appear together.


                                                     Figure 2.1 Sample view of Tier 1

                   Tier 2 Visualization                                     This can then be used in a new search to refine the docu-
                                                                         ments returned in a specific direction. In order to determine
The Tier 2 visualization provides a closer look into specific            related terms, all the documents in the database are first
documents. It relates the search terms to the content in the             stripped down to just contain the sentences containing a spe-
document itself. This way, the user can see precisely how                cific term.
prevalent a given term is in a document. This serves to in-
crease user confidence in the document they have selected                   Each of the remaining words is scored as the potential,
as being a useful document. In addition to directly showing              with the search term as the base, using PMI-IR 3. The higher
term prevalence, the system provides coexisting words, re-               the score, the more relevant a specific related term is deemed
lated terms, and the synonyms for the search terms. The in-              to be. The current system allows all related terms with a
tent is to help users’ with the task of search refinement by             score higher than 0 to appear in the visualization.
selecting better words for their queries. Synonyms and com-
pound words are a new consideration in this research. While                 Compound terms are similar to related terms, but are spe-
prior studies did not consider them useful, both serve to al-            cifically terms made up of two words; a query term and ei-
low the user to phrase the same query in multiple ways to                ther the term directly before or directly after the query term
find the best results possible for their search.                         in a sentence. These terms are intended to expand a specific
                                                                         search term. For example, a search can be refined to use
   Both related terms and compound terms using a scoring                 “cloud computing,” rather than “cloud,” after finding the
algorithm called Pointwise Mutual Information-Information                former as a compound term of the latter. Given the query
Retrieval (PMI-IR). PMI-IR was developed by Peter D. Tur-                “machine,” one might get both, “machine learning,” and,
ney for developing automatic indices of non-structured con-              “autonomous machine.” The same PMI-IR 3 scoring is used
tent (Turney 2001). Our algorithm specifically implements                on compound terms as is used on related terms.
PMI-IR 3, with some modifications:
                                                                           Synonyms are generated by searching through a synonym
                    𝐻𝑖𝑡𝑠(𝑝𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 𝐴𝑁𝐷 𝑏𝑎𝑠𝑒)                             database. Our algorithm uses WordNet for collecting syno-
𝑃(𝑝𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙|𝑏𝑎𝑠𝑒) =                                                      nyms (Fellbaum, C). WordNet returns synsets of potential
                           𝐻𝑖𝑡𝑠(𝑏𝑎𝑠𝑒)
                                                                         matches. As synonyms tend to be small in number, there is


                                                                    58
 Maxwell Fowler et al.                                   MAICS 2016                                                    pp. 55–62


no threshold number in place for limiting the number of syn-
onyms displayed.                                                                              Discussion
   The visualization is consistent between Tier 1 and Tier 2.         The proposed system has made ground towards reaching the
The same force graph rules still apply. However, data in this         goals set for it. The visualization successfully functions on
tier is only related to one term node. This means that docu-          unstructured text content, such as the journal papers used in
ment content nodes that tend toward the middle are weakly             this study, which has yet to be done in this way. The Tier 1
related to their term. Content with a high relatedness to a           visualization does provide a visually accessible look at the
search term, though, appears close to the term’s node. Like-          entire document repository. It manages to capture the legi-
wise, node size remains consistent in that it shows size, but         bility of previous systems while improving upon the visual-
the size is the count of specific terms, rather than document         ization’s ability to aid in selecting documents. Further, the
size.                                                                 Tier 2 visualization does make strides towards helping users
                                                                      refine their search in meaningful ways.
   The size of each object, including the search terms them-
selves, are how relevant they are to the overall paper. This             The Tier 1 visualization is quite strong at this point. It is
is the word count from Lucene’s index. It is possible to have         useful for finding papers that span across multiple related
a paper where a search term has relevance 0, which would              domains, as shown in the methodology section. The over-
make the shape have 0 size. Likewise, it is possible to have          view is scalable, allowing users to search for a large number
related, compound, or synonym terms be larger than the                of papers or select only a small subset refined to be the best
search query nodes if they appear in the current document             for a given search. Further, the white to black color scale for
more than the query terms do. The largest nodes in Tier 2             least to most relevant allows the most relevant paper to stand
are the terms that are most likely to help refine a search by         out easily, making finding the best options in any sized
replacing a search term.                                              search an easy task.

   Each of the term types is represented with its own node               Tier 1’s strength is obvious when we compare the visual
shape. Synonyms are given circular nodes, to show they are            search to a text based alternative. Figure 3.1 shows a search
directly related in meaning to the search query terms. Re-            for the terms “simulate”, “transform”, and “automata”. The
lated terms and compound terms, meanwhile, are squares                search provides the same information the visualization does,
and triangles respectively. This decision was made to draw            but the best paper’s relation to terms is shown as numeric
distinction between term types.                                       scores. This is less intuitive than the visualization’s black to
                                                                      white color scale and position algorithm.
   Figure 2.2 shows an example of a Tier 2 visualization,
specifically from the last figure’s best document. The size
of the three search term nodes shows that they are, in fact,
all three prevalent in the paper. Thread is the most relevant,
though, as shown by the size. We can see what the terms are
by hovering over their nodes. The term will appear in a
tooltip above the node.


                                                 Figure 2.2 Sample view of Tier 2


                                                                 59
      Maxwell Fowler et al.                                       MAICS 2016                                                    pp. 55–62


                                                                                   The Tier 2 visualization succeeds in the goal of providing
                                                                                potential search refinement. It shows all the potentially use-
                                                                                ful related terms each of the search results have. The figure
                                                                                above directly shows the benefit of search refinement, as
                                                                                previously discussed.

                                                                                   The choice of red, green, and blue for the search term
                                                                                nodes was retained for Tier 2 in order to allow RGB color
                                                                                combinations to show off terms related to multiple docu-
                                                                                ments. This was abandoned in practice, in part because
                                                                                meaningful terms related to two distinct, other terms were
                                                                                rare. This means the colorization here could be changed to
Figure 3.1 Sample view of text based search designed for usability tests        represent different information if a better way to show term
                                                                                relation is found. Further, some collision can occur in tier 2
       Figure 3.2 shows two searches. The search on the left is                 term nodes, which needs addressed in future updates. This
    the same as the search in figure 3.1. It becomes readily ap-                can be seen in Figure 3.3, especially around the term
    parent where the two best papers are and how they related                   “thread”.
    to the terms. It also becomes apparent that the world simu-
    late is fairly useless. Using the Tier 2 visualization, we re-
    fined the search to use, “grammar,” rather than, “simulate,”


                                           Figure 3.2 The visual form of Figure 3.1 and a refined version


                                           Figure 3.3 The best paper from 3.2 shows some Tier 2 overlap
    giving us the image on the right. This search refinement
    gives documents of much higher quality, according to the
    color scale and provides a better distribution in the visuali-
    zation’s center. Further, the refinement is easier to make in
    the visual system than a text based one, which would require
    reading the whole document to find useful terms.


                                                                           60
 Maxwell Fowler et al.                                     MAICS 2016                                                    pp. 55–62


                   FUTURE WORK                                             Another room for improvement is the clustering algo-
                                                                        rithm, especially when it comes to the overlapping related
The proposed system makes good strides at reaching the                  terms and the large clusters of low use papers. Some form
goals set out in the introduction. Despite this, there are fu-          of blobbing algorithm which combines closely related pa-
ture work avenues to prove the system works, improve the                pers into one node which can then be expanded into the full
system, and potentially apply the system in other ways.                 node set should be considered to make the visualization sim-
Some discussion of those angles follows below.                          pler and more user friendly in such instances.

   The paper uses searches on a paper database of our own                  Finally, future work could include applying our system to
creation. It is built of papers freely accessible on Google’s           other domains. So long as an index can be constructed for
own research papers. There are approximately 1600 of them.              the desired data, any form of text-based data could be
In the future, we will apply our visualization to a paid-for            searched and visualized using the above system. For exam-
paper collection, such as the Text Retrieval Conference Pro-            ple, social network posts could be used as documents to
ceedings (TREC). This will give our system a more robust                search. This would allow the system to search blogs discus-
collection to be tested against.                                        sions forums, and other forms of social media for the sake
                                                                        of determining user consensus or gathering data for market-
   Usability tests are needed to prove that the visualization           ing purposes.
above is better than a text based system. While we feel the
visualization system stands on its own merit, usability tests                               CONCLUSION
will add credence to that claim. Our next task will use a cus-
tom made, text based search system and compare it to our                This paper proposed a visual search on Digital Library sys-
visualization. We will use a metric based approach to judge             tems, specifically targeting journal papers and other re-
effectiveness, as well as judge user preferences. This way,             search publications. The proposed system targets two goals.
the visualization’s greater effectiveness compared to tradi-            The system's first goal is a visualization on an entire docu-
tional text based interfaces can be proven.                             ment base, to help the user more easily see the best papers
                                                                        available for a given search. The second goal is to aid in
   The visualization is not without room for improvement. It            search refinement, changing the original search to better suit
would be ideal to take into account more than just the pres-            the user’s needs. This unexplored element was addressed by
ence of search terms in Tier 1. Ideally, the document ranking           providing a visualization for various forms of related terms.
algorithm can be altered to take into account all attributes of         A preliminary indexing step allowed us to apply these visual
a document. This means the document’s titles, reviews,                  elements to a collection of unstructured text data; while not
word count, presence of search terms, and other data will all           our primary research focus, this was still an interesting ele-
contribute to a document’s relevance score.                             ment and is in contrast to prior visualizations of structured
                                                                        document content which did not require an indexing step.
   Currently, the search is related to three terms. This is left
over from earlier work when we were considering using                      We developed a two tier system to meet our goals. Tier 1
RGB color combinations for paper quality, rather than the               provides a visualization over the document base for a spe-
current black and white scale. We retained the color usage              cific set of three search terms. The papers are positioned and
for Tier 2, but did not find such situations that would benefit         colored based on their relevance to terms and the overall
from color combination. We are considering allowing an ar-              search respectively. Tier 2 provides a visualization of a doc-
bitrary N-gon, which will free up RGB colors to be used for             ument’s content. It shows how the search terms relate to the
different visual elements. Such an N-gon’s size will be de-             underlying content and show other related terms for the sake
termined by the user, which a minimum size of two to allow              of search refinement.
the visualization to remain fully featured.
                                                                           By providing these two tiers, we help users with multiple
   The potential exists for a third tier: specific term expan-          tasks. Tier 1 makes it easy to see if a given search is useful
sion. This tier would allow the user to select a term and see           or if a given search is skewed too far to one term. Tier 1 also
more information about it, including all related terms to that          makes it easy to find the best papers for a given search. Tier
term in the repository, synonyms both in the repository and             2 allows us to confirm the best paper shown includes the
outside of it, and definitional information. This has yet to be         search terms with a high frequency. Tier 2 also lets us refine
implemented as the usefulness is questionable. It may be                our searches, allowing users to turn bad searches into good
sufficient to augment the Tier 2 visualization with word def-           searches by changing a search term or two.
initions and leave it at that.
                                                                           By assisting users with these tasks, our system makes suf-
                                                                        ficient strides towards our goals. Our last step is fixing the


                                                                   61
 Maxwell Fowler et al.                                  MAICS 2016                                                  pp. 55–62


problems mentioned in the discussion section and investi-
gating the improvements mentioned in the future work sec-
tion. Once this is accomplished, our system will become
practical and serve users in their searching of unstructured
content in digital libraries.                                        Shneiderman, B., Feldman, D., and Rose, A. (2000). Visu-
                                                                     alizing Digital Library Search Results with Categorical and
                                                                     Hierarchical Axes in Proc. 5th ACM International Confer-
                                                                     ence on Digital Libraries, pp. 57-66.
                       References
                                                                     Short, G., and Kim, B. (2014). Multi-tiered Visual Interfaces
Apache Software Foundation (2015). Class TFIDFSimilar-               for Book Search with Digital Library Systems in Proceed-
ity, Available:                                                      ings of the 6th International Conference on Multimedia,
https://lucene.apache.org/core/5_2_1/core/org/apache/lu-             Computer Graphics and Broadcasting, pp.21-24.
cene/search/similarities/TFIDFSimilarity.html
                                                                     Turney, P. D. (2001). Mining the Web for Synonyms: PMI-
Apache Software Foundation (2015). Lucene 5.2.1 core
                                                                     IR versus LSA on TOEFL in Proc. of the 12th European
API, Available:
                                                                     Conference on Machine Learning (EMCL '01), pp. 491-502.
https://lucene.apache.org/core/5_2_1/core/overview-sum-
mary.html#overview_description
                                                                     Veerasamy, A., and Heikes, R. (1997). Effectiveness of a
Bostock, M., Ogievetsky, V., Heer, J. (2011). D3: Data-              graphical display of retrieval results in Proc. of the 20th
Driven Documents in IEEE Trans. Visualization & Comp.                Annu. Int. ACM SIGIR Conf. on Research and Development
Graphics.                                                            of Information Retrieval, pp. 236-245.

Card, S. K., Mackinlay, J. D., and Shneiderman, B. (1999).
Information visualization in Readings in Information Visu-
alization: Using Vision to Think, pp. 1-34.

Dwyer, T. (2009). Scalable, Versatile and Simple Con-
strained Graph Layout in IEEE-VGTC Symposium on Visu-
alization.

Fellbaum, C. (2005). What is WordNet. Princeton Univer-
sity.

Foley, J., van Dam, A., Feiner, S., Hughes, J. (1996). Com-
puter Graphics: Principles and Practice, Addison-Wesley
Publishing Company.

Lin, X. (1996). Graphical table of contents in Proc. of the
first ACM Int. Conf. on Digital Libraries, pp. 45-53.

Marks, L., McMahon T., and Luce, R. (2005). ActiveGraph:
a digital library visualization tool in International Journal
on Digital Libraries, vol. 5, no. 1, pp. 57-69.

Rushall, D., and Ilgen, M. (1996). A context vector-based
self organizing map for information visualization in TIP-
STER: Proc. of a Workshop on held at Vienna, Virginia, pp.
159-166.

Sacks-Davis, R., Dao, T., Thom, J. A., Zobel J. (1997). In-
dexing documents for queries on structure, content and at-
tributes in Proc. of International Symposium on Digital Me-
dia Information Base (DMIB).


                                                                62

</pre>