<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NIM: Scalable Distributed Stream Processing System on Mobile Network Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wei Fan</string-name>
          <email>weifan@us.ibm.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IBM T.J. Watson Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hawthorne</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>As a typical example of New Moore's law, the amount of 3G mobile broadband (MBB) data has grown from 15 to 20 times in the past two years (30TB to 40TB per day on average for a major city in China), real-time processing and mining of these data are becoming increasingly necessary. The overhead of storage and le transfer to HDFS, delay in processing, etc are making o ine analysis on these datasets obsolete. Analysis of these datasets are non-trivial, examples include mobile personal recommendation, anomaly tra c detection, and network fault diagnosis. In this talk, we describe NIM - Network Intelligence Miner. NIM is a scalable and elastic streaming solution that analyzes MBB statistics and tra c patterns in real-time and provides information for real-time decision making. The accuracy of statistical analysis and pattern recognition of NIM is identical to that of o line analysis, while NIM can process data at line rate. The design and the unique features (e.g., balanced data grouping, aging strategy) of NIM will be helpful not only for the network data analysis but also for other applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>analysis, bioinformatics, social network analysis, novel applications and
commercial data mining systems. His co-authored paper received ICDM'2006 Best
Application Paper Award, he lead the team that used Random Decision Tree
to win 2008 ICDM Data Mining Cup Championship. He received 2010 IBM
Outstanding Technical Achievement Award for his contribution to IBM
Infosphere Streams. He is the associate editor of ACM Transaction on Knowledge
Discovery and Data Mining (TKDD).</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>