<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mobipedia: Mobile Applications Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Primal Pappachan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Yus</string-name>
          <email>ryus@unizar.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prajit Kumar Das</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharad Mehrotra</string-name>
          <email>sharadg@uci.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Finin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anupam Joshi</string-name>
          <email>joshig@umbc.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Irvine</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore County, Baltimore</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Zaragoza</institution>
          ,
          <addr-line>Zaragoza</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present Mobipedia, an integrated knowledge base with information about 1 million mobile applications (apps) such as their category, meta-data (author, reviews, rating, release date), permissions and libraries used, and similar apps. The goal of Mobipedia is to integrate unstructured and semi-structured data about mobile apps from publicly available data sources and publish it as Linked Data using RDF. We describe the extraction process for facts, access mechanisms to the knowledge base, and an overview of applications facilitated by Mobipedia.</p>
      </abstract>
      <kwd-group>
        <kwd>Mobile applications</kwd>
        <kwd>Knowledge Base</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Linked Data</kwd>
        <kwd>SPARQL</kwd>
        <kwd>Android</kwd>
        <kwd>Privacy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The number of mobile applications (also called apps) available for various
platforms has seen an exponential growth in the last few years (for example, the
Google Play Store achieved the 1 million apps milestone in 2013). This has
resulted in smart phones replacing other devices as de facto medium for online
browsing, social networking, and other activities. Today's users have a wide
array of choices while nding apps for entertainment, utility, or education.</p>
      <p>However, this huge number of apps has also made the choice of an
appropriate app di cult. There are many parameters to be taken into account when
selecting an app such as technical (such as the version of the operating system
supported, the hardware required, or the installation size), user experience (such
as ratings and comments), and privacy concerns (such as the information that
the app would access or the third-party libraries used). As a matter of fact,
different studies have been performed on app stores and some of them have publicly
released their datasets and results. But these projects are mostly isolated from
one another and scattered across the Internet. In addition, the use of di erent
methods to release the datasets (e.g., websites, dumps, or databases) and
different formats (from unstructured to semi-structured data) has made accessing
them di cult.</p>
      <p>Through Mobipedia4, we envision an evolving knowledge base (KB)
containing information related to mobile apps. Mobipedia integrates information from
various sources such as o cial websites, and research projects. In this paper
we introduce the current status of Mobipedia describing the ontology created
to model knowledge about mobile apps, the di erent sources that have already
been integrated, the access mechanisms o ered, and an overview of the
applications which can be developed using Mobipedia. We believe that having an online
knowledge base integrating information about mobile apps would accelerate the
research in various domains related to mobile apps for e.g., mobile privacy, and
app search.
2</p>
      <p>Mobipedia Dataset
To create the dataset we utilized the information available for apps on the Google
Play Store. Each app's metadata includes information such as their category,
images, version, installation size, developer, comments and permissions used. We
created classes, and data, and object properties to model all this information.
We have also included additional information (other than what is available on
the Play Store) about mobile apps from open datasets such as PlayDrone and
PrivacyGrade mentioned below (e.g., libraries used by each app and developer
metadata). Figure 1 shows an excerpt of the ontology including the most
important classes and the object properties that relate them5.
4 http://mobipedia.link
5 The gure has been generated using the Gra oo speci cation http://www.</p>
      <p>
        essepuntato.it/graffoo/
Information Extraction. To populate the ontology with instances we extracted
facts from two research projects, PlayDrone [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and PrivacyGrade [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
information in these sources is mainly unstructured (contained in HTML websites)
or semi-structured (in JSON format) and therefore, we developed parsers and
crawlers based on the crawler4j library6 and the OWL API7 for the extraction
and semantic annotation respectively. All the tools developed are available on
GitHub repository of Mobipedia8 to aid in creation of additional parsers/crawlers
for other data sources. The sources currently included in Mobipedia are:
{ PlayDrone9: An scalable Google Play store crawler developed by researchers
from Columbia University which extracted information of over 1.4M apps in
24 categories.
{ PrivacyGrade10: Android apps graded based on static code analysis and
crowdsourcing and currently has over 1M apps which uses nearly 250 third
party libraries. It was compiled by researchers from Carnegie Mellon
University.
{ Android Permissions Website11: The website includes information about all
the 152 o cial permissions that Android apps can request to access
information from the user.
      </p>
      <p>From these sources we extracted information about more than 1M apps and
added them as RDF triples in the Mobipedia KB. Each of these entities are
described in the dataset by a URI of the following form where entity corresponds
to an app, developer, permission, rating and so on:
http://mobipedia.link/ontology/entity .</p>
      <p>
        Accessing Mobipedia. Similarly to DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we provide three mechanisms to
access the Mobipedia dataset:
{ Linked Data: Uses HTTP protocol to retrieve entity information which
contains all the triples associated with the entity. This can be accessed using
web browsers, Semantic Web browsers, and crawlers.
{ SPARQL endpoint: The endpoint has been setup using Open source version
of Virtuoso. This can be used for querying the Mobipedia dataset using
SPARQL at http://mobipedia.link/sparql.
{ RDF dumps: Larger versions of the dataset in the form of serialized triples
can be downloaded from the Mobipedia website.
      </p>
      <p>Linking Mobipedia with other knowledge bases. We have linked Mobipedia with
DBpedia. Speci cally, with instances of the DBpedia categories Mobile software
and Android (operating system) software.
6 https://github.com/yasserg/crawler4j
7 http://owlapi.sourceforge.net
8 https://github.com/primalpop/MobipediaProject
9 http://systems.cs.columbia.edu/projects/playdrone
10 http://privacygrade.org
11 http://developer.android.com/reference/android/Manifest.permission.html
In Mobipedia, we have focused on creating a single point of access for Android
app related data, which can be easily accessed through access mechanisms
mentioned earlier. We believe that a Linked Data cloud of Mobile apps would make
it easier to develop applications which utilizes app data and have outlined some
of them below.</p>
      <p>{ Semantic search portal for mobile apps: Enable users to nd relevant apps
based on their semantic search criterion. For instance, sports games with
parental control; to-do list with location reminders, or ashlight with least
number of required permissions.
{ Detection of ad targeting: With the information of ad libraries being used by
apps, permissions requested, and developer metadata that Mobipedia stores
it could be possible to draw inferences about which app developers are \going
rogue" with respect to targeting users for ads.
{ Linking application user experiences: The information of user app
experiences such as app ratings, reviews, blog articles, forums and so on while
using an application is fragmented across various sources. Using Mobipedia,
this information can be linked to apps itself, which could be leveraged to
build smarter app recommendation systems.</p>
      <p>Mobipedia has to adapt to the dynamic nature of mobile app stores with
new apps and new version of existing apps being released almost daily.
Therefore, Mobipedia will be a continuously evolving knowledge base by incorporating
these new information. We also intend to link the entities in Mobipedia to other
open and popular datasets like Freebase.</p>
      <p>Acknowledgments. This research work has been supported by RADICLE project
CNS-1059436, CNS-1212943, CNS-1118127 and CNS-1450768, CICYT project
TIN201346238-C4-4-R and DGA FSE, U.S. National Science Foundation awards 0910838 and
1228198.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>DBpedia: A nucleus for a web of open data</article-title>
          .
          <source>In: 6th International Semantic Web Conference</source>
          . pp.
          <volume>722</volume>
          {
          <fpage>735</fpage>
          . ISWC'
          <volume>07</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Sadeh</surname>
            ,
            <given-names>J.L.B.L.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>J.I.</given-names>
          </string-name>
          :
          <article-title>Modeling users mobile app privacy preferences: Restoring usability in a sea of permission settings</article-title>
          .
          <source>In: Symposium on Usable Privacy and Security (SOUPS)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Viennot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nieh</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A measurement study of Google Play</article-title>
          .
          <source>In: The 2014 ACM International Conference on Measurement and Modeling of Computer Systems</source>
          . pp.
          <volume>221</volume>
          {
          <fpage>233</fpage>
          . SIGMETRICS '
          <volume>14</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>