<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LAK Explorer - A Fusion of Search Tools</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mike Sharkey</string-name>
          <email>mike@bluecanarydata.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammed Ansari</string-name>
          <email>mohammed@bluecanarydata.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andy Nguyen</string-name>
          <email>andy@bluecanarydata.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Blue Canary</institution>
          ,
          <addr-line>6185 W. Detroit St., Chandler, AZ 85226</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The LAK Data Challenge asks the question “What do analytics on learning analytics tell us?” One approach to this challenge is not to answer the question, but to provide a simple, user-focused application that allows any user to easily draw their own conclusion. This was Blue Canary's driver for building the LAK Explorer (http://lakexplorer.bluecanarydata.com). Our team combined multiple tools to create a powerful search application. We extracted topics from the papers, used an autocomplete feature in the search bar, added topics as search result metadata, and provided links to similar papers all as part of the user search experience. The value is in the usability. In the same way that Google presents powerful results via a simple interface, LAK Explorer allows for seamless searching, reading, and comparing of over 400 documents. The application is also instrumented to capture user input (search terms, papers viewed) to provide closedloop analytics in the future.</p>
      </abstract>
      <kwd-group>
        <kwd>Search</kwd>
        <kwd>natural language processing</kwd>
        <kwd>similarity</kwd>
        <kwd>document</kwd>
        <kwd>vector</kwd>
        <kwd>elastic search</kwd>
        <kwd>cosine</kwd>
        <kwd>autocomplete</kwd>
        <kwd>corpus</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The LAK Data Challenge asks the question “What do analytics on
learning analytics tell us?” The Blue Canary team tackled this
question in 2014 by using topic modeling to describe trends in the
LAK Corpus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Topic Modeling was a technique used to distill a
large corpus of text into a manageable list of topics. While
repeating this approach for LAK15 would theoretically yield new
results, it wouldn’t do much to advance experimentation in the
spirit of the LAK Data Challenge.
      </p>
      <p>Building off of previous LAK entries, the Blue Canary team took a
different approach. Instead of analyzing the corpus to look for
trends and threads, what if we made the corpus more easily
searchable so that the analytics community can browse the corpus
for meaning?
This was the core of our approach for 2015. The result is the LAK
Explorer (http://lakexplorer.bluecanarydata.com). It’s an intuitive
search application that allows users to search, browse, and find
content in the corpus of papers/articles provided by the LAK Data
Challenge. Our goal was to automate the processing so that the
LAK Explorer could be applied to any corpus, not just specifically
tuned to the LAK data.
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Use of Turbo Topics</title>
      <p>
        As will be explained in this paper, the use of Turbo Topics is a key
thread to the Blue Canary team’s approach. Blei’s research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
allowed the team to use programmatic techniques to extract n-gram
topics from the corpus that end up being a much more user-friendly
way to digest corpus content.
      </p>
    </sec>
    <sec id="sec-3">
      <title>1.2 LAK Dataset Incomplete</title>
      <p>Blue Canary retrieved the LAK Dataset from the Challenge website
(http://lak.linkededucation.org/lak/LAK-DATASETDUMP.rdf.zip). Upon examination, it appeared as if this dataset
did not contain the entirety of the updated 2015 content. Of the 579
content tags (&lt;led:body&gt; and &lt;bibo:content&gt; ), 108 were empty.
Blue Canary inquired about the gaps but at the time of this project
submission, the content was not added to the dataset.</p>
    </sec>
    <sec id="sec-4">
      <title>2. The LAK Explorer Components</title>
      <p>The LAK Explorer is a fusion of existing tools and techniques for
interacting with semantic data. From the simple home screen to the
detailed neighborhood of papers, each component was used to add
utility to the search process. Figure 1 shows how the different tools
were used at different points in the search process.
The context for LAK Explorer is similar to that of a search engine.
The user comes to the site knowing the universe in which they are
searching (corpus of papers) and some idea as to what they want to
find out (search terms). However, the user doesn’t know exactly
what they are looking for. In that way, the presentation of results
and related information is vital to improving the utility of the
application.</p>
    </sec>
    <sec id="sec-5">
      <title>2.1 Home Page</title>
      <p>As with any search engine, the usability of the tool starts with the
home page. For LAK Explorer, the Blue Canary team was inspired
by the simplicity of Google’s ubiquitous home page.
drive the efficacy of the search and leverage the linked aspect of
the LAK data. The search results could be weighted on content
found in the content abstract, body, the author(s), and citations.</p>
    </sec>
    <sec id="sec-6">
      <title>2.4 Results Filtering</title>
      <p>When a user hits the search button, LAK Explorer returns the
results similar to the image in Figure 4.
The home page is dominated by a large text entry box. The only
other significant feature on the page is a listing of topics that were
extracted from the corpus of papers.</p>
    </sec>
    <sec id="sec-7">
      <title>2.2 Autocomplete</title>
      <p>When a user starts entering text into the search box, the first thing
they will notice is the use of autocomplete.
Autocomplete is advantageous since the LAK Explorer deals with
a fixed corpus of knowledge. Instead of tying autocomplete to a
larger base of content (e.g. dictionary or DBPedia), the team tied it
to the topics that were extracted from the corpus using Turbo
Topics – the same topics that appear under the search bar. Blue
Canary used the Typehead feature from AngularStrap
(http://mgcrea.github.io/angular-strap/#/typeaheads#typeaheads)
to power this feature.</p>
    </sec>
    <sec id="sec-8">
      <title>2.3 Elastic Search</title>
      <p>Elastic Search (http://www.elasticsearch.org/) was used to drive the
search engine results in LAK Explorer. Blue Canary only used
basic features of this tool to drive search results. The text entered
in the search box are the inputs to a keyword search algorithm.
There are additional features in Elastic Search that could further
The papers/articles are presented in a fashion similar to an engine
like Google Scholar. The title, author(s), and abstract are presented
for each result. The most prominent addition is the inclusion of the
color-coded topics. LAK Explorer uses the extracted topics as
another level of filtering. Upon viewing the results, the user can
see what topics are relevant for each paper, click the color-coded
topic, and get new search results that are sorted by the frequency of
that topic.</p>
    </sec>
    <sec id="sec-9">
      <title>2.5 Paper Display</title>
      <p>The utility of LAK Explorer is not to just browse search results.
Blue Canary wanted to make the tool helpful for actually reading
the resulting papers and articles. Therefore, clicking on a paper
from the search results gives the display mode shown in Figure 5.
The paper is displayed in a clear/crisp format for easy online
reading. The display is split into three sections. First, the user sees
the topics that are most strongly associated with that paper. Then,
the abstract is presented followed by the body of the paper. Another
key usability point is that the topics are color-coded at the top and
the coloring remains intact throughout the body of the paper. This
helps draw the reader’s attention to topics that may be of particular
interest.</p>
    </sec>
    <sec id="sec-10">
      <title>2.6 Similar Papers</title>
      <p>
        Perhaps the strongest feature of LAK Explorer is the ability to find
similar papers. Keyword and topic searching limits similarity to
only papers that share the same frequency of that one term. The
similar papers feature uses the entirety of the paper to compare it to
other content.
Blue Canary used a technique called Doc2Vec [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to generate
similarity scores between two papers. This approach condeses the
paper into a single vector, and then a cosine similarity measure is
used to compare the vector of one paper to all others in the corpus
(http://en.wikipedia.org/wiki/Cosine_similarity). The results
(Figure 5.) give a match or score percentage showing papers that
are most similar to the one currently being read.
      </p>
      <p>
        Additionally, LAK Explorer displays this same vector similarity in
a neighborhood scatter plot (Figure 6.).
The Paper Neighborhood graph takes the Doc2Vec vector data
and uses t-SNE (t-Distributed Stochastic Neighbor Embedding) to
give all papers two dimensional Cartesian coordinates [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
resulting graph shows the current paper (orange dot) in relation to
all other papers (blue dots). While the x- and y-coordinates have
no real meaning or definition, the spatial representation of each
paper allows the viewer to both browse similar papers and to see
how “close” or “far” the current paper is from the rest of the
corpus.
      </p>
    </sec>
    <sec id="sec-11">
      <title>3. BENEFITS OF LAK EXPLORER</title>
      <p>The Blue Canary team wanted to create an application that LAK
researchers and practitioners would find valuable. To that end, the
team focused on a few key aspects of LAK Explorer to maximize
its contribution to the field.</p>
    </sec>
    <sec id="sec-12">
      <title>3.1 Visual usability</title>
      <p>The Blue Canary product development team gives significant
weight to usability. The team might develop an incredibly useful
metric, but if the user can’t easily interpret that metric, it is useless.
A clean layout, color coding, and simple charts all contribute to the
usability of LAK Explorer. These features were consciously added
in order to improve ease of use.</p>
    </sec>
    <sec id="sec-13">
      <title>3.2 Leveraging Turbo Topics</title>
      <p>As the team discovered in our LAK14 submission, extracting topics
from the corpus turned out to be a simple yet effective way of
absorbing the content of papers. We continued this trend for
LAK15 by using the Turbo Topics to aid in the initial search and in
the meta-tagging of the search results.</p>
    </sec>
    <sec id="sec-14">
      <title>3.3 Search Results Plus Similarities</title>
      <p>The LAK Explorer was named due to the fact that users will likely
not be looking for a singular result. They will look for the results
of their search PLUS explore other papers that are similar to their
top search result.</p>
      <p>The paper similarity tools allow LAK Explorer users to follow this
natural path of exploration:
1.
2.
3.
4.</p>
      <sec id="sec-14-1">
        <title>I’m interested in papers related to Topic X</title>
        <p>Searching for Topic X gives me an ordered list of papers</p>
      </sec>
      <sec id="sec-14-2">
        <title>I read through Paper 1 that comes up in search results</title>
      </sec>
      <sec id="sec-14-3">
        <title>I am also shown Papers 2, 3, etc. that are like Paper 1</title>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>4. FUTURE FEATURES</title>
      <p>After integrating the previously described components into the
LAK Explorer, the team realized that there were additional features
we could add to the tool to further increase its value.</p>
    </sec>
    <sec id="sec-16">
      <title>4.1 Tracking User Input</title>
      <p>The most impactful feature to add is to track user input to the
application. LAK Explorer is already instrumented to capture the
terms that users search and also the papers that are viewed. The
additional feature would be to expose these tracking metrics to the
application’s front end so that other users can see how the
community is interacting with the tool. For example, a simple
listing of “most viewed papers’ would add more fidelity to other
LAK Explorer visitors.</p>
    </sec>
    <sec id="sec-17">
      <title>4.2 Linked Data</title>
      <p>The linked aspect of the corpus could be further exploited to
improve the search process of LAK Explorer. The linked data
discerns between abstracts, body, authors, citations, people, and
institutions. These parts can be exposed as metadata in the search
results (e.g. view more papers by this author) and the data can also
be used to drive search efficacy (e.g. weigh hits to the abstract
higher than hits to the paper body).</p>
    </sec>
    <sec id="sec-18">
      <title>5. SIMILAR INITIATIVES</title>
      <p>This is the third year of the LAK Data Challenge and all of the
participants continue to stand on the shoulders of previous
contributors. Blue Canary acknowledges that previous entrants
such as the ones listed in this section have developed toolsets in the
same vein as LAK Explorer</p>
    </sec>
    <sec id="sec-19">
      <title>5.1 RekLAK</title>
      <p>
        RecLAK [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was submitted by a team of researchers from PUC Rio
in Brazil for LAK14. RecLAK is a recommendation engine that
uses the linked nature of the data to recommend other data sources
that have feature similarities to the LAK dataset.
      </p>
    </sec>
    <sec id="sec-20">
      <title>5.2 DEKDIV</title>
      <p>
        DEKDIV [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is an interactive application that allows the user to
drill into different aspects of the LAK dataset. The ‘Publications’
section of DEKDIV was developed in the same spirit as LAK
Explorer – allow the user to look at a paper and understand some
of the key concepts.
      </p>
    </sec>
    <sec id="sec-21">
      <title>5.3 Visualizing the LAK/EDM Literature</title>
      <p>
        A team of famous LAK researchers submitted a paper to the first
LAK Data Challenge in 2013 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] where, among other things, they
clustered the semantic content of the LAK corpus. While using a
different technique, this clustering process achieved the same goal
of the LAK Explorer’s similarity feature set.
      </p>
    </sec>
    <sec id="sec-22">
      <title>6. ACKNOWLEDGMENTS</title>
      <p>As with most initiatives at Blue Canary, this was a product of
teamwork. LAK Explorer was made possible by a team of players
who each brought additive skills to the table. In addition to the
named authors, we’d like to thank Satya Mudiam, Faiz
Mohammad, Avinash Narasingam, and Kiran Reddy for their
contributions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sharkey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ansari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Deconstruct and Reconstruct: Using Topic Modeling on an Analytics Corpus</article-title>
          .
          <source>In LAK Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Visualizing topics with multi-word expressions</article-title>
          .
          <source>arXiv preprint arXiv:0907</source>
          .
          <fpage>1013</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>arXiv preprint arXiv:1405</source>
          .
          <fpage>4053</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Van der Maaten</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Visualizing data using t-SNE</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>9</volume>
          (
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          ),
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lopes</surname>
            ,
            <given-names>G. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leme</surname>
            ,
            <given-names>L. A. P. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nunes</surname>
            ,
            <given-names>B. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Casanova</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <article-title>RecLAK: Analysis and Recommendation of Interlinking Datasets</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKenzie</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdalla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>A Linked-Data-Driven Web Portal for Learning Analytics: Data Enrichment, Interactive Visualization, and Knowledge Discovery</article-title>
          .
          <source>In LAK Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Taibi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Sándor, Á.,
          <string-name>
            <surname>Simsek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Buckingham</given-names>
            <surname>Shum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>De Liddo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , &amp;
            <surname>Ferguson</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Visualizing the LAK/EDM literature using combined concept and rhetorical sentence extraction</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>