<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>System RDC: Relevant Data Condenser A Knowledge System for a Cloud-immersed Culture</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eric Ciminelli</string-name>
          <email>eciminelli@rollins.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jennifer Seitzer</string-name>
          <email>jennifer.h.seitzer@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science Rollins College</institution>
          ,
          <addr-line>Winter Park, FL 32789</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge representation is an area of artificial intelligence that manages knowledge by most efficiently making the knowledge internally and externally representable, accessible, and usable. Among many other things, this management entails the acquisition, translation, longevity, and structural representation of the knowledge being represented. Our world has become one immersed in its knowledge by the plethora of physical repositories, sources, and subjects. This research addresses the ramifications of this immersion by asking the question of “how do these new characteristics of knowledge affect our computer system specifications and implementations?” In this work, we delineate the characteristics of knowledge in the 2010s and present our response to them. System “Relevant Data Condenser (RDC)” is a cloud-savvy app for a smartphone that symbolizes the new computer system that is globally informed, locally understandable, lean, and portable. This endeavor is part of an overall project that involves knowledge representation in the Cloud. In this paper, we present the development framework and system architecture of RDC, including the incremental steps of its development process.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>http://myweb.rollins.edu/jseitzer</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>Frameworks for knowledge representation methodologies
have been proposed since the late 1960s. Logic, frames,
semantic networks, graphs, as well as many others, have all
been effective tools to accomplish the four requisites of
any knowledge representation technique: representational
and inferential adequacy, and representational and
inferential efficiency [Russell 2010]. If we fast-forward
50 years, we find ourselves in a cyber world where we are
literally immersed in our knowledge. The Von Neumann
cycle of fetch-decode-execute driving our computational
devices has seemingly been expanded to drive us at the
Copyright retained by the authors.
macro level. We now fetch somewhere in the Cloud,
decode into tweet-like morsels condensing unmanageable
amounts of data into ingestible portions, and along with
“executing” the knowledge in its problem-specific domain,
we also log meta-knowledge of the information’s time,
date, source, and user-base. Our work presented here
embraces this new paradigm by presenting a software
system that is portable, cloud-driven, and accessible to all.
One of the newest genres of software systems is the app for
the smartphone. Our research reported here integrates
various techniques of artificial intelligence into a mobile
app that is accessible to anybody who has a smartphone.
Artificial intelligence (AI) is the area of computer science
that attempts to put human thinking into computer
programs. Our application eliminates the need for users to
constantly “check the net” for new and relevant
information on any topic they choose. It generates “Alerts”
containing useful information that will come in a succinct
text-message-like fashion (phone notification). AI
techniques employed include smart web crawling
(scanning the Web for valuable patterns) and automatic
language generation related to natural language
understanding (such as what Siri does). We illustrate the
efficacy of RDC by using the topic of cryptocurrency
trading (digital money).</p>
      <p>The main producible from this work is a computer system
comprised of the following components: 1) a web crawler,
2) an AI inference engine that performs web page ranking,
parsing, and language generation, 3) a server that pushes
relevant information to 4) a smartphone client app that
displays the information. The topology of these
architectural components can be seen below in Figure 1.</p>
    </sec>
    <sec id="sec-3">
      <title>Characteristics of Knowledge in 2015</title>
      <p>Today’s knowledge is distributed, expansive, ephemeral,
elusive (search engines may or may not capture it) and
plentiful beyond belief. The great positive is the increased
richness of our world-perception; the negative, however, is
that contained in this plentitude, is the likelihood that
inconsistencies and contradictions along with defunct data,
is greatly increased.</p>
      <p>Like the “brittle” knowledge of the 1980s’ expert system
that inspired us to endow our intelligent systems with
machine learning capabilities, we are finding that the
immense, distributed, ephemeral knowledge of today is
also nudging us to augment our computer systems with
new capabilities such as custom client-server relationships,
GPS services, dedicated web-crawlers, and search engine
access. Responding to this nudge, we created RDC.</p>
      <p>System Relevant Data Condenser (RDC)</p>
      <p>The RDC application is grounded in an internal engine
that constantly scans through a repository of pre-specified
web sites for keywords. Once those keywords are found,
the page that contains the keyword is stored locally. Then,
using a number of AI methods including a context-based
ranking algorithm, the usefulness of each page based on
the keyword is determined. Then, a brief message
summarizing the information is generated and sent to the
smartphone user. This is called “pushing” information to
the user. In Figure 1 below, the overall system
architecture of RDC is presented. Using this architecture,
there are five basic steps that stay in a repetitive loop in the
behavioral framework as shown in Figure 2.</p>
    </sec>
    <sec id="sec-4">
      <title>Software components of RDC</title>
      <p>There are five main software components to the RDC that
are cyclically invoked in the general operation of the RDC
as shown in Figure 3 below. These activities do not include
the formidable development work that was done to
establish the underlying infrastructure of this operational
cycle. This infrastructure includes the definition and
maintenance of a user data base, the networking
conundrums experienced when establishing any
clientserver relationship (three of which are to be discussed), and
last, the considerable development required for GUI
presentation on the iPhone (which were required to be
coded in Objective-C).</p>
      <p>One of the fascinating aspects of software development for
the Cloud, is that the typical application assumes many
roles including the seemingly opposite roles of client and
server. RDC assumes the roles of client and server many
times in its cycle of operation. The cycle begins with RDC
crawling the web for information requested by its
subscribers.
(1) The Web Crawler requires that the AI inference
engine become a client to many prospective web servers. A
client is a program that, over a network, invokes a server
for some file, message, or action. As a crawler, the client
invokes many web servers requesting pertinent web pages.
The web crawler of RDC was written with Java using
wellknown crawling packages available in the JDC (Java
Development package).
(2) The Filter determines which of the procured web
pages are pertinent to the topic currently being pursued and
uses many of the techniques delineated in [Brin 1998] and
[Shestakov 2013]. This is where future links of pages are
either kept and enqueued on the crawler frontier, or
discarded.
(3) The Parser is a Java program that searches through
HTML and related code to find morsels of information
that can be used as small messages to be pushed to users.
(4) The Translator is responsible for transforming the
verbose to the succinct. In particular, it is the message
generator to be served to the clients of RDC.
(5) The Notifier is a message pusher that notifies users
of important information that was just found then delivers
the messages to the smartphone users.</p>
      <p>Subscribers to RDC choose a topic(s) of their choice. This
topic then forms one of the queries that RDC regularly
“checks.” We exemplify this notion by presenting how
one might use RDC to keep up to date on the topic of
cryptocurrency.</p>
    </sec>
    <sec id="sec-5">
      <title>Example: Cryptocurrency (aka Digital Money)</title>
      <p>Cryptocurrency is beginning to have a presence in the
economic and social world [Alstyne 2014]. Traders are
taking advantage of how volatile the cryptocurrency
market is right now. Cryptocurrencies are just beginning to
be used in the world. There are now automated teller
machines that allow withdrawals of cryptocurrency for
actual cash [Jervis 2014]. Until they gain a wide popularity
and recognition, however, their exchange rates will
continue to fluctuate greatly on a daily basis.</p>
      <p>In this example domain, our system is helpful because of
its ability to report these fluctuations by being the first to
know about any news that goes on in the world of
cryptocurrency. One must realize that the exchange rates
for cryptocurrencies normally change after news gets out
of a certain event that can affect the price. For instance,
when a popular business begins to accept a cryptocurrency,
it seems that as soon as the news spreads, the price rises.
Similar to stock market trading, a societal custom tells us
that those who are the first to know the news are the ones
who are most likely to succeed in the trading world.
An important part of this work is that the topic of interest
can be changed. We choose to exemplify our software
system using the topic of cryptocurrency. However, as in
many artificial intelligence applications, the underlying
software infrastructure can be extracted and applied to
virtually any other subject. For example, intelligent expert
systems originally written for medical diagnosis were
decomposed and then applied to computer system
configuration [Russell 2010]. Other areas of application
beyond trading include disaster alert systems,
institutionspecific local data such as ‘deadline to drop classes’
messages, and any other uses where immediate information
is important.</p>
    </sec>
    <sec id="sec-6">
      <title>The RDC User Experience</title>
      <p>The users of the application interact with the RDC via their
smartphones. Inside the application, the user selects the
type of cryptocurrency they want to monitor from a list.
Each currency selected will have its own tab where the
current price along with a graph displaying the price over
time can be seen. At the top of the interface there is a news
feed where any relevant news that the web-crawler finds is
displayed. If the user clicks on the news scroller it will
bring them to a page where a list of relevant articles found
will be displayed. Selecting an item will open an iPhone
browser and bring the user to the source web page where
the data was found by RDC.</p>
      <p>The most powerful feature of the app is the sending of
notifications to its subscribers. Depending on the type of
cryptocurrency the user selects to monitor, alerts
discovered by the crawler will appear on the home screen
in a form such as, “PC World says that LiteCoin is
expanding. Now is a good time to buy.” or “LiteCoin hit a
low for the week at $742.”
In future work, the app will also include notifications for
significant price drops or gains. The user will be able to
dictate the price at which they want to be alerted and the
app will do so accordingly.</p>
    </sec>
    <sec id="sec-7">
      <title>RDC User Client and Graphical User Interface</title>
      <p>The graphical user interface (GUI) and network client
communication software was written in objective-C. The
GUI is organized so that the top shows the news scroller
followed by the different types of currencies. Under this
top portion is the current price in green along with other
useful information including a price graph. Figure 4 shows
a rendering of the interface for the topic of cryptocurrency.
Fig. 4: RDC Smartphone GUI.</p>
      <p>A screen mockup of the ticker page in
the application. The top shows the news
scroller followed by the different types
of currencies. Under that is the current
price in green along with other useful
information including a price graph.</p>
      <p>Clients and Servers in System RDC
The process of getting operational communication of RDC
on the Internet was multi-faceted and one that entailed
writing many preliminary clients and servers using socket
calls and socket libraries for C, C++, Java, and
ObjectiveC. There were many versions that had to be discarded in
our path to a functional cloud-savvy app. The first author
(who was the main developer) built several different
versions of web-interfaces – experimenting with many
programming languages including PHP. The main AI
inference engine server is a customized server (that
evolved out of an Apache server) running on a computer
(the server machine) that the main developer built himself.</p>
    </sec>
    <sec id="sec-8">
      <title>Incremental Development of Cloud-savvy programs</title>
      <p>In [Freeman 2013], the author discusses the notion of
TSTMTCTMI: the simplest toy model that captures the
main idea of the problem. This was a guiding principle for
us in our pursuit. Because there are so many facets,
functional units, internal and external links in any one
system, we approached our endeavor incrementally. That
is, we built a fully functional (yet an extremely primitive
version) of RDC in a few months. Then, module by
module, we improved and replaced each part to attain a
more satisfying version of RDC.</p>
    </sec>
    <sec id="sec-9">
      <title>Current Version and Status of RDC</title>
      <p>The RDC is a fully functional system consisting of five
components: (1) the Crawler, (2) the Filter (3) the Parser
(4) the Translator, and (5) the Push-Server (Notifier).
The crawler has been revised many times and is now
designed to cooperate with Bing. The software method
implementing this cooperation is called bingResults. On
invocation to the crawler, the method returns the top three
news results (sorted by relevance from the past hour).
Once the HTML from the page returned is loaded into a
string, the program picks it apart to get the title, URL, and
a short description from each news story. This
information is stored in the News class. Moreover, this
latest version of the crawler has implemented
multithreading where it is possible for multiple crawler threads
simultaneously running for multiple search words at the
request of the user.</p>
      <p>The parser successfully performs both filtering and
translation. This involves condensing possibly verbose
information into succinct messages. The newest version of
the RDC parser effectively cooperates with the commercial
search engine, Bing although the material of [Meyerzon
2000] was used extensively in earlier versions. The current
version returns the top three news results (sorted by
relevance from the past hour). Once the HTML from the
returned page is loaded into a string data structure, the
program decomposes it to extract the title, URL, and a
short description from each news story. This information is
then inserted into the Java implementation “News Class”
(internally represented as a linked-list).</p>
      <p>Upon insertion to the News Class, the story is checked to
verify non-duplication. If it is a fresh story, the title, URL,
and description of the story are concatenated into a single
string separated by a unique character that enables the
string to be pushed to all of the clients in one message.
When the client receives this string, it performs processing
invisible to the iPhone user. In particular, it can extract the
title, URL, and description from each of the new stories
because it knows the unique separation character. Once the
client receives the message, it then displays the message in
the form of a notification and adds it to the news page of
the App.</p>
    </sec>
    <sec id="sec-10">
      <title>Conclusion</title>
      <p>Today’s knowledge is ubiquitous, ephemeral, and massive.
It is becoming apparent that most applications will involve
interaction involving many clients, servers, and web
services. The Relevant Data Condenser is one such
application that is able to access, represent, and deliver
knowledge both adequately and efficiently. Additionally,
we believe that the RDC system serves as a model for the
software application that is cloud-savvy, portable,
ubiquitous, connected, communicating, and intelligent.
These characteristics are lofty and convenient. And, alas,
they are difficult to implement. We have shown, by proof
of construction, that it is possible to create these successful
systems using an incremental and varied approach. In our
endeavor, we had many modules that failed or that were
not adequate before we found the language and/or
technique that worked.</p>
      <p>In future work, we intend to expand this system to the
Android smartphone platform. Additionally, we hope to
actively expand our iPhone user-base to test and augment
the RDC to scale up to a much larger number of clients.</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgements</title>
      <p>We are greatly indebted to the reviewers of this paper, and
to the Rollins College Student Faculty Collaborative
Scholarship program for funding this project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Alstyne</source>
          <year>2014</year>
          ]
          <article-title>“Why Bitcoin has Value”</article-title>
          <source>Communications of the ACM , 57 No. 5</source>
          , pages
          <fpage>30</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Brin</source>
          <year>1998</year>
          ]. Brin, Sergey, and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <article-title>"The Anatomy of a Large-Scale Hypertextual Web Search Engine." The Anatomy of a Search Engine</article-title>
          .
          <source>Computer Networks &amp; ISDN Systems</source>
          ,
          <volume>30</volume>
          (
          <year>1998</year>
          , pages
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>Web. 13 Feb</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Freeman</source>
          <year>2013</year>
          ]. Freeman,
          <string-name>
            <surname>W.</surname>
          </string-name>
          "How to do Research." http://people.csail.mit.edu/billf/publications/How_To_Do_ Research.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Jervis</source>
          <year>2014</year>
          ]. Jervis, Rick.
          <article-title>"Bitcoin ATMs Come to USA." USA Today</article-title>
          . Gannett,
          <volume>20</volume>
          Feb.
          <year>2014</year>
          . Web.
          <volume>22</volume>
          Feb.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Meyerzon</source>
          <year>2000</year>
          ]. Meyerzon, Dmitriy, and Sankrant Sanu.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>"Method of Web Crawling Utilizing Address Mapping." Microsoft Corporation</source>
          , Patent granted 11/7/2000.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>Downloaded using Google Patents on</source>
          <volume>2</volume>
          /12/
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Russell 2010] Russell,
          <string-name>
            <surname>Stuart J.</surname>
          </string-name>
          ,
          <source>Peter Norvig, and Ernest Davis. “Intelligent Agents Artificial Intelligence: A Modern Approach” 3rd Edition</source>
          . Prentice Hall, Upper Saddle River NJ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Shestakov</source>
          <year>2013</year>
          ]. Shestakov, Denis.
          <source>Intelligent Web Crawling. IEEE. Intelligent Informatics Bulletin</source>
          , Dec.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          2013. Web.
          <volume>12</volume>
          Feb.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>