<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Survey of Smart Cards Data Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitry Namiot</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manfred Sneps-Sneppe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computational Mathematics and Cybernetics Lomonosov Moscow State University; National Competence Center for Digital Economy GSP-1</institution>
          ,
          <addr-line>1-52, Leninskiye Gory, Moscow, 119991</addr-line>
          ,
          <institution>Russia and Institute of mathematics and computer science University of Latvia Raina bulvaris 29</institution>
          ,
          <addr-line>Riga, LV-1459</addr-line>
          ,
          <country country="LV">Latvia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Smart cards are used worldwide in transport applications as a payment tool. So, cities will have big and constantly updated collections of transactions data from cards validation equipment. Of course, this data source could be used to study user behavior from collected observations, to detect movement patterns, to perform new route planning, etc. In this paper, we are going to provide a survey of data models used in such kind of analysis and describe practical questions (problems) that could be solved with histories of payment transactions collected by cities and transport companies. In our opinion, this information will be useful for smart cities applications because of the relative ease of collection of such data and their transparency. In the context of smart city users, mobility is one of the keys components.</p>
      </abstract>
      <kwd-group>
        <kwd>smart card</kwd>
        <kwd>transport cards</kwd>
        <kwd>data mining</kwd>
        <kwd>mobility</kwd>
        <kwd>smart city</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Smart cards (transport cards) are popular payment tool in many countries
(Fig. 1).</p>
      <p>Typically, it is a pre-paid card. Users can purchase cards with some initial
credits and any later buy a new card as well as buy new credits for existing card.
With such cards, users can pay for transport, parking slots and sometimes other
city-related services. In the most cases, pre-paid smart cards are anonymous, so
there is no user-related information at all. Of course, each card has got unique
ID but this ID is not linked with user data.</p>
      <p>
        Smart Card validators are part of the fare collection system. They are devices
that read smart cards and support the fare applications contained on them. So,
they are devices for getting credits, recorded on the card [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The validator units
can be contact, contactless or both, and can be stand-alone or integrated with
the on-board computer. They can include some data storage.
      </p>
      <p>This store can be downloaded via USB, Bluetooth, Wi-Fi. The validators
can include SIM-cards and support GPRS. Also, they can be directly connected
to some central computer (cloud, external data storage) via wireless LAN or
Ethernet connection. So, what is important for us, validators collect logs for
transactions. Each transaction contains card ID, validator ID, time stamp. Of
course, there is also a sum of credit, but this value is not so important for our
tasks.</p>
      <p>
        For smart cities, logs from validators present another dataset which could
be used in so-called digital urbanism [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this paper, we would like to discuss
data mining models for smart cards logs. The aim of the work is to present (to
discover) algorithms and models that are suitable for the processing of transport
card data used in Moscow. We will also present one new approach to the
processing of transport data, which uses ideas from the processing of web sessions
(web logs).
      </p>
      <p>The rest of the paper is organized as follows. In Section II, we discuss data
formats. In Section III, we describe data mining models. In this section, we also
describe our own proposal for using web statistics analysis tools for transport
card logs.
2</p>
    </sec>
    <sec id="sec-2">
      <title>On Data Formats</title>
      <p>
        In general, there are two types of payment models: at fares and
distancebased fares [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For a at rate model, users deploy their smart cards on the
card reader when entering (only check-in scans are necessary). For the
distancebased scheme, riders need to swipe their smart cards twice: when checking-in
and checking-out. The separate question is location info. Depends on the device,
it could be saved too (for check-in and check-out in a case of distance-based
fares).
      </p>
      <p>Key information stored in the database, in general, includes smart card ID,
validator ID, transaction time, remaining balance, transaction amount, boarding
stop, alighting stop.</p>
      <p>
        Note that at each particular time point, any particular validator, of course, is
placed in a speci c bus (subway station, etc.). Accordingly, instead of the ID of
the validator, we can use bus ID or a metro station ID (in both cases there could
be several validators). This model (Fig. 2), for example, is used in the paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Note that a particular route in the city is served by the group of the buses.
So, if we want to talk about smart cards transactions in terms of routes, we need
to maintain a mapping for buses and routes. And because any bus in our model
is a set of validators, it is a mapping between validators and routes. This external
information with respect to transactions log (as the assignments of buses to the
route, for example, are subject to change).</p>
      <p>
        Accordingly, prior to processing, we can combine (join) the mapping with
transactional data. This approach is used, for example, in the paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (Fig. 3).
      </p>
      <sec id="sec-2-1">
        <title>In [6], authors use the following data format (Canada, Quebec):</title>
      </sec>
      <sec id="sec-2-2">
        <title>Date and time of the boarding transaction; Card number and fare type; Route number and direction; Vehicle and driver numbers;</title>
        <p>Stop number at boarding</p>
        <p>
          There are several other papers devoted to smart cards logs and additional
attributes [
          <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7-10</xref>
          ].
        </p>
        <p>In our work, we are interested in the data representation with a single mark
(check-in only). This form is a typical solution for smart cards used in Moscow,
Russia (Troika card).</p>
        <p>A separate question is location information associated with check-in marks.
Technically, it could be implemented. On practice, requesting location
information may cause delays in check-in (check-out) processes. So, it is rather the rare
decision. But for any particular route, we can assume that the bus, for
example, is on time schedule and assign (approximately) location according to time
stamps.</p>
        <p>Note, that we do not discuss here data engineering and by this reason do not
touch practical questions related to smart cards databases.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>On mining smart cards data</title>
      <p>In this section, we would like to discuss data processing issues. From the
literature analysis, the following problems were identi ed: tra c patterns, trips
generation, and routes-based studies.</p>
      <sec id="sec-3-1">
        <title>A. Transit patterns</title>
        <p>
          A good review of transit smart card data processing provided in the paper
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In the paper [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], authors support the idea that individual daily movement
uses only a small number of movement patterns. As per their research, these
patterns, termed motifs, appear stable in many di erent cities. And the
detection of motifs is rule-based [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Suppose, C is some central node (bus stop, etc.).
Let T(n) denote a cyclical tour with n locations starting at C and ending there.
Then a construction rule is a set of cycles in the motif starting at C. There are
four predominant rules for the most frequent motifs:
I) T(1), T(n-2),
II) T(n-1),
III) T(2), T(n-3),
IV) T(1), T(1), T(n-3).
        </p>
        <p>
          In plain English, the construction rule (I) consists of a long tour plus a short
tour visiting an additional node not contained in the long tour, rule (II) presents
some tour with no other nodes visited, rule (III) has one long and one short tour
without joint locations, rule, and rule (IV) nally features two back and forth
trips and a long cycle. As per [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], the most frequent motifs fall into these four
construction rules.
        </p>
        <p>
          In the paper [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], authors study the so-called spatial variability of transit use.
It is examined through the enumeration of all the bus stops used for boarding.
Then, the frequency of use of the bus stops is studied, in order to express a level
of regularity. It allows detecting the number of bus stops which cover the main
proportion of observed transit paths. And as the last step, a k-mean algorithm
is used to partition the data set into a prede ned number of clusters for the
di erent types of smart cards (students, adult, etc.).
        </p>
        <p>
          In the paper [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], authors use the Density-based Spatial Clustering of
Applications with Noise (DBSCAN) algorithm for the identi ed trip chains in order
to detect historical travel patterns for transit riders. A trip chain is de ned as
a series of trips made by a traveler on a daily basis. We can assume some xed
temporal threshold to link several smart card transaction records into a trip
chain.
        </p>
        <p>
          The following algorithm is used for clustering:
1. Sort the trip chain records and add visited/non-visited ag to the each record
2. Randomly select an unvisited trip. Flag this record as visited and form a
cluster for this record
3. Check the boarding time di erence between unvisited records and the last
visited record. If the di erence is greater than some prede ned interval (it is 1
hour in that paper), repeat Step 2. This prede ned interval let us separate
connected trips from new independent trips. The same idea with prede ned time
weve used, for example, in our paper [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
4. Check the spatial relationship between unvisited records and the last visited
record. If a spatial relationship exists within some prede ned radius (it is 200
meters in the above-mentioned paper), then this record is included in the cluster
formed in Step 2 and agged as visited.
5. If the total count of trips in cluster is less than some prede ned threshold
(it is 3 in in the above-mentioned paper), drop the cluster and mark records as
noised.
6. Continue to process the unvisited records from Step2 through Step 5 until all
the records are agged as visited or noised.
7. The number of total clusters corresponds to the number of typical trip chains
per day. The recurring route, boarding/alighting stops, and timings can be
acquired by counting the most frequent pattern within each cluster.
And the following set of features could be used for the detecting regularity [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]:
A number of travel days. The more days a transit rider travels, the more likely
it is that he is a frequent transit rider.
        </p>
        <p>A number of similar rst boarding times. Boarding time represents a riders
temporal characteristics. If a rider begins an own trip at a similar time of day every
weekday, then he is more likely to be a regular transit rider.</p>
        <p>A number of similar route sequences. Route sequence represents a general spatial
pattern for a rider. The number of similar route sequences followed during the
week may indicate a repetitive travel pattern (e.g., home o ce).
A number of similar stop ID (end points) sequences. The stop ID sequence may
contain detailed spatial similarity information. For example, two formally di
erent stop IDs might be spatially adjacent.</p>
        <p>
          Another example of clustering is presented in papers [
          <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
          ]. In their study,
passenger heterogeneity is investigated based on a longitudinal representation of
each users multi-week activity sequence derived from smart card data. Authors
propose a methodology leveraging this representation to identify clusters of users
with similar activity sequence structure. In general, there are four categories for
travelers: non-exclusive commuters, exclusive commuters, non-commuter
residents, and leisure travelers [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], authors propose a simple and e ective
algorithm for sampling selection. In order to identify users whose activity
pattern can be inferred more completely from smart card data, cards were clustered
based on their level of public transport usage. Each card is characterized by the
number of days it was observed traveling over the 29-day analysis period, and by
the spread of days between the rst and last day it is observed. Using just these
two variables and classical k-means clustering, 3 user clusters were identi ed: a
group of non-recurrent users who are seen traveling few days concentrated over a
short period, a group of occasional users who travel on few days spread over the
analysis period and a group of frequent users who travel on many days spanning
most of the analysis period. This simple approach is, probably, the best way for
getting a quick snapshot for transit info in smart cards data.
        </p>
        <p>
          In the paper [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] authors describe a transit passenger segmentation method
based on a two-step DBSCAN algorithm. They use also k-means algorithm to
distinguish frequent and infrequent transit users based on the number of travel
days and journeys made [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Trip generation</title>
        <p>It is about extracting (discovering) trips in log les. Summarizing the papers,
the following technique could be used for extracting trips from cards data.</p>
        <p>A trip is composed of a sequence of activities for a particular purpose. In our
case, it is a sequence of transport card transactions. Classically, time thresholds
are adopted to link these transactions. As the boundary data, we can use the
maximum transfer times for the di erent types of travelers activities. Activity
here is a type of trip. E.g., it could be metro (subway) only, metro and bus, bus
and metro, etc. So, the transaction time di erences that fall within the maximum
transfer times could be extracted.</p>
        <p>
          Some of the authors also suggested using 95th percentile transaction time
di erence for each transfer activity as the time threshold to form up a complete
trip [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Now, for any individual transport card (for any individual traveler), if the
transaction time di erence for two consecutive card records exceeds any of the
thresholds, then a trip is separated. For individual's trips, we could merge also
multiple trips with short duration times as a single trip.</p>
        <p>After that, we can reasonably assume, for example, that for any individual
traveler the rst trip (every day) is a home-to-work commute, and the last trip is
a work-to-home travel. So, this pair of trips could be used for detecting activity
patterns.</p>
        <p>
          The regularity of commuting should be spatially and temporally measured
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The temporal patterns could be detected by the similarity of departure
time and the number of traveling days. For spatial patterns, we can calculate
the frequency of the most visited stops as well as the number of recurring travels
on similar routes or lines.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>C. Routes-based study</title>
        <p>For this research, we can talk about models that are suitable for single swipe
(check-in) only. It is the most interesting direction for our goal. Here we can list
the following suggestions.</p>
        <p>At the rst hand, we should calculate the distribution of check-ins by
boarding places. As the source information for this, we can use boarding counts for
every weekday with 1-hour step, for example.</p>
        <p>Obtaining such distribution lets us compare routes. Secondly, it lets us detect
changes in loading (amount of travelers) for the particular days. It is about a
statistically signi cant di erence. And changes in the day distribution should be
linked to some real life events (e.g., opening new mall, closing the o ces, etc.).
Also, we will be able to build time series for the route related check-ins (e.g., 30
minutes summary of check-ins for the particular bus). And in these time series,
we can see outlines/trends.</p>
        <p>
          In paper [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], authors describe another idea for the routes-based study: a
Markov model to study stochastic behavior for travelers in the day-to-day route
choice adjustment process. The model is described by two components: how
often a passenger reconsiders a route choice (in other words, it is a route
switching rate), and the probability to take a certain route (in other words, it is a
route choice probability). All travelers make route choice today only depending
on the limited road information available from yesterday (Markov's rule). It is
illustrated in Fig 4.
        </p>
        <p>
          The next idea (it is our own proposal), follows the model used in our papers
[
          <xref ref-type="bibr" rid="ref22 ref23 ref24">22-24</xref>
          ]. We can present (simulate) our transport system as a web statistics
system. Cards swipe device (validator) is a web page. The set of such devices on a
particular route is a web site. Cards ID is an analogue for visitor's IP address.
This approach should have a clean path for the visualization. We should be able
to use many existing web statistics visualization and data mining applications
(Fig. 5).
        </p>
        <p>For testing, we can take a transport data set, build a mapping Card ID - IP
address, present our dataset as a standard web log (IP address corresponds to
Card ID) and use any available web log analyzer software as a proof of concept
(in our case, we used the free Deep Log Analyzer).</p>
        <p>
          Also, we can use web mining algorithms for the research and visualization
[
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. It is so-called web log mining. There are several techniques that could be
reused here [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. For example, association rules could be used in order to
discover the routes which are used together. It could help to discover movement
patterns. The discovery of association rules in Web logs discussed in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] for
example. Other methods are sequence mining, topology patterns, Markov
predictors, Ngram models.
        </p>
        <p>
          Originally, the sequence mining for web logs can be used for discovering the
Web pages which are accessed immediately after another. For our simulation, it
will be a sequence of validators. It means an e ective route. The typical examples
for such methods are presented in the paper [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
        <p>According to sequence mining, we can preprocess our log and create a database
of sequences (routes). In this database, we can collect for the each card (cards
ID, originally for the each IP address) the sequence of the visited validators
(originally, it is the sequence of web accesses). Technically, such collection could
be presented as some key-value store. A key here is a card's ID.</p>
        <p>
          For the next explanation, we follow to the de nitions in [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. The event is
a card's usage on some validator (originally, it is a visit for a web page). Given
a set of events E, the route sequence (originally - access sequence) S can be
represented as e1 e2 ... en sequence Here ei is some validator (originally some
web page). That means the access sequence is composed of a series of events,
which are members of event set E (the whole set of validators). The repetitions
of events are allowed in a sequence (e.g. a citizen used the same bus twice). A
route (access sequence) R = e1 e2 ... el is called a sub-sequence of a route R1 =
e1 e2 ... en, and R1 is a super-sequence of R, if and only if for every event ej in
R, there is an equal event ek in R1, while the order that events occurred in R
should follow the order of events in R1.
        </p>
        <p>A frequent pattern is a route (originally, it is an access sequence) to be
discovered during the mining process. The frequency here could be de ned via so-called
"support". The support of pattern S in database of sequences is de ned as the
number of sequences Si , which contains the subsequence S, divided by a number
of transactions in the whole database. Although events (card's transactions) can
be repeated in a route (in an access sequence), a pattern can get at most one
support count contribution from one access sequence. So, any frequent pattern
should have a support that is higher than minimum support.</p>
        <p>The minimum support for sequential pattern mining is the percentage value
between 0 and 1. It could be set empirically to identify the frequent sequence.
The problem of route mining (originally, web usage mining) is that of nding all
patterns which have supports greater than some prede ned minimum support
threshold d.</p>
        <p>
          There are two basic techniques for mining sequential patterns from web logs
fall: Apriori and non-Apriori [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Apriori algorithm uses the fact that any
superpattern of non-frequent patterns is not frequent. The non-Apriori algorithms
divide the original database into smaller partitions and solve them recursively.
The most popular, according to the academic papers, is a non-Apriori method
called WAP-tree mining [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. This approach stores the web access patterns in
a compact pre x tree (it is called WAP-tree). Since non-Apriori algorithms did
not need to scan the database multiple times, they should be faster.
        </p>
        <p>
          But we can see some drawbacks on this way too. Although detected routes
(sequential patterns) include the order of the events, the time between the
individual events is unknown. So, we should think about nding sequential patterns
with time intervals. For example, authors in [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] describe the time-interval
sequential pattern, which includes not only the order of the events but also the
time intervals between successive events.
        </p>
        <p>
          A time-interval sequential pattern provides more valuable information than
a conventional sequential pattern. Consider the transport business and mobility
as an example, with the assistance of the time-interval sequential pattern, the
smart city not only learns the mobility patterns, but also links them with time
of the day. There are several papers devoted to time-interval sequential patterns
[
          <xref ref-type="bibr" rid="ref32 ref33">32, 33</xref>
          ].
        </p>
        <p>
          With time-interval sequences the de nitions for sub-sequence (super-sequence)
should be changed. Now we should include not only the events inclusions, but
the inclusion for time-intervals also. The modi cations for Apriori algorithms in
case of time-interval sequences are presented in the paper [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], for example.
        </p>
        <p>
          As the next issue, we should mention stream processing for sequences data
mining [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. In general, a data stream processing has to satisfy the several
constraints: new elements are generated continuously and should be processed as
soon as possible, the data can be examined only once, memory usage is
restricted. The nal idea is to proceed data in real-time. For smart cities mobility,
for example, it could be important to detect usage mode outlines in real-time.
In the paper [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], authors propose the algorithm for sequence mining in data
streams, which is based on sequences alignment for mining approximate
sequential patterns in data streams. The similar task is studied by authors in [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] and
[
          <xref ref-type="bibr" rid="ref36">36</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        In this paper, we present data models for transport cards data processing. We
discuss transit patterns detection (it is rule based), trips discovering (it is clustering
in the various forms, e.g. modi ed DBSCAN), routes-based studies (statistics,
Markov's model). Additionally, we present a new approach where transport cards
data log could be processed like a web server log. It opens the possibility to use
many existing methods and tools for web statistics. For testing this approach,
we tried to re-code one existing transport cards dataset [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] as a web log and
used a free web log analyzer as a proof of concept.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Myki</given-names>
            <surname>Travel</surname>
          </string-name>
          Smart Card http://australiaforeveryone.com.au/melbourne/gettingaround.html Retrieved: Feb,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. Toolkit on Intelligent Transport Systems for Urban Transport https</article-title>
          ://www.ssatp.org/sites/ssatp/ les/publications/Toolkits /ITS%20Toolkit %
          <article-title>20content /its-technologies/ electronic- fare-collection/smart-card-validatorand-display</article-title>
          .
          <source>html Retrieved: Feb</source>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Namiot</surname>
            , Dmitry, and
            <given-names>Elena</given-names>
          </string-name>
          <string-name>
            <surname>Zubareva</surname>
          </string-name>
          .
          <article-title>"Data-driven Cities."</article-title>
          <source>International Journal of Open Information Technologies</source>
          <volume>4</volume>
          .12 (
          <year>2016</year>
          ):
          <fpage>79</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Ma, Xiaolei, et al.
          <article-title>"Mining smart card data for transit riders travel patterns." Transportation Research Part C: Emerging Technologies 36 (</article-title>
          <year>2013</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Juanjuan</surname>
          </string-name>
          , et al.
          <article-title>"Understanding temporal and spatial travel patterns of individual passengers by mining smart card data</article-title>
          .
          <source>" Intelligent Transportation Systems (ITSC)</source>
          ,
          <source>2014 IEEE 17th International Conference on. IEEE</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Agard</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bruno</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Partovi Nia</surname>
            , and
            <given-names>Martin</given-names>
          </string-name>
          <string-name>
            <surname>Trpanier</surname>
          </string-name>
          .
          <article-title>"Assessing public transport travel behaviour from smart card data with advanced data mining techniques</article-title>
          .
          <source>" World Conference on Transport Research</source>
          . Vol.
          <volume>13</volume>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dinant</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Marc</surname>
            , and
            <given-names>Ewout</given-names>
          </string-name>
          <string-name>
            <surname>Keuleers</surname>
          </string-name>
          .
          <article-title>"Data protection: multi-application smart cards: the use of global unique identi ers for cross-pro ling purposes-Part II: towards a privacy enhancing smart card engineering."</article-title>
          <source>Computer Law &amp; Security Review 20.1</source>
          (
          <year>2004</year>
          ):
          <fpage>22</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Trepanier</surname>
            , Martin, Khandker MN Habib, and
            <given-names>Catherine</given-names>
          </string-name>
          <string-name>
            <surname>Morency</surname>
          </string-name>
          .
          <article-title>"Are transit users loyal? Revelations from a hazard model based on smart card data."</article-title>
          <source>Canadian Journal of Civil Engineering</source>
          <volume>39</volume>
          .6 (
          <year>2012</year>
          ):
          <fpage>610</fpage>
          -
          <lpage>618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lee</surname>
            , Sanggu, and
            <given-names>Mark D.</given-names>
          </string-name>
          <string-name>
            <surname>Hickman</surname>
          </string-name>
          .
          <article-title>"Travel pattern analysis using smart card data of regular users." Transportation Research Board 90th Annual Meeting</article-title>
          . No.
          <volume>11</volume>
          -
          <fpage>4258</fpage>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yun-Hong</surname>
          </string-name>
          , et al.
          <article-title>"Mining missing train logs from Smart Card data." Transportation Research Part C: Emerging Technologies 63 (</article-title>
          <year>2016</year>
          ):
          <fpage>170</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Ma, Xiaolei, et al.
          <article-title>"Understanding commuting patterns using transit smart card data</article-title>
          .
          <source>" Journal of Transport Geography</source>
          <volume>58</volume>
          (
          <year>2017</year>
          ):
          <fpage>135</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christian</surname>
            <given-names>M.</given-names>
          </string-name>
          , et al.
          <article-title>"Daily travel behavior: Lessons from a week-long survey for the extraction of human mobility motifs related information</article-title>
          .
          <source>" Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing. ACM</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christian</surname>
            <given-names>M.</given-names>
          </string-name>
          , et al.
          <article-title>"Unravelling daily human mobility motifs</article-title>
          .
          <source>" Journal of The Royal Society Interface 10.84</source>
          (
          <year>2013</year>
          ):
          <fpage>20130246</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Morency</surname>
            , Catherine,
            <given-names>Martin</given-names>
          </string-name>
          <string-name>
            <surname>Trepanier</surname>
            , and
            <given-names>Bruno</given-names>
          </string-name>
          <string-name>
            <surname>Agard</surname>
          </string-name>
          .
          <article-title>"Measuring transit use variability with smart-card data</article-title>
          .
          <source>" Transport Policy 14.3</source>
          (
          <year>2007</year>
          ):
          <fpage>193</fpage>
          -
          <lpage>203</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Namiot</surname>
          </string-name>
          , Dmitry.
          <article-title>"Mining groups of mobile users."</article-title>
          <source>International Journal of Wireless and Mobile Computing</source>
          <volume>9</volume>
          .3 (
          <year>2015</year>
          ):
          <fpage>211</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Goulet</surname>
            <given-names>Langlois</given-names>
          </string-name>
          , Gabriel, Haris N.
          <string-name>
            <surname>Koutsopoulos</surname>
            , and
            <given-names>Jinhua</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>"Clustering the Multi-week Activity Sequences of Public Transport Users." Transportation Research Board 95th Annual Meeting</article-title>
          . No.
          <volume>16</volume>
          -
          <fpage>6162</fpage>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Langlois</surname>
            , Gabriel Goulet,
            <given-names>Haris N.</given-names>
          </string-name>
          <string-name>
            <surname>Koutsopoulos</surname>
            , and
            <given-names>Jinhua</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>"Inferring patterns in the multi-week activity sequences of public transport users." Transportation Research Part C: Emerging Technologies 64 (</article-title>
          <year>2016</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ortega-Tong</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Classi cation of Londons Public Transport Users Using Smart Card Data, Thesis MIT</article-title>
          . http://dspace.mit.edu/handle/1721.1/82844 Retrieved: Mar,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Bhaskar</surname>
            , Ashish, and
            <given-names>Edward</given-names>
          </string-name>
          <string-name>
            <surname>Chung</surname>
          </string-name>
          .
          <article-title>"Passenger segmentation using smart card data</article-title>
          .
          <source>" IEEE Transactions on intelligent transportation systems 16.3</source>
          (
          <year>2015</year>
          ):
          <fpage>1537</fpage>
          -
          <lpage>1548</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kieu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Le-Minh</surname>
            ,
            <given-names>Ashish</given-names>
          </string-name>
          <string-name>
            <surname>Bhaskar</surname>
            , and
            <given-names>Edward</given-names>
          </string-name>
          <string-name>
            <surname>Chung</surname>
          </string-name>
          .
          <article-title>"A modi ed Density-Based Scanning Algorithm with Noise for spatial travel pattern analysis from Smart Card AFC data." Transportation Research Part C: Emerging Technologies 58 (</article-title>
          <year>2015</year>
          ):
          <fpage>193</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Kurauchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fumitaka</surname>
          </string-name>
          , et al.
          <article-title>"Variability of commuters bus line choice: an analysis of oyster card data." Public Transport 6</article-title>
          .
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (
          <year>2014</year>
          ):
          <fpage>21</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Namiot</surname>
          </string-name>
          , Dmitry, and
          <string-name>
            <surname>Manfred</surname>
          </string-name>
          Sneps-Sneppe.
          <article-title>"On the analysis of statistics of mobile visitors." Automatic Control</article-title>
          and
          <source>Computer Sciences 48.3</source>
          (
          <year>2014</year>
          ):
          <fpage>150</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Namiot</surname>
          </string-name>
          ,
          <source>Dmitry. "Mining Relationships in Proximity Movements." Applied Mathematical Sciences</source>
          <volume>7</volume>
          .
          <issue>144</issue>
          (
          <year>2013</year>
          ):
          <fpage>7173</fpage>
          -
          <lpage>7177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Namiot</surname>
          </string-name>
          , Dmitry, and
          <string-name>
            <surname>Manfred</surname>
          </string-name>
          Sneps-Sneppe.
          <article-title>"Customized check-in procedures." Smart Spaces and Next Generation Wired/Wireless Networking (</article-title>
          <year>2011</year>
          ):
          <fpage>160</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Deshmukh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sana</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>and Krishnakant P.</given-names>
            <surname>Adhiya</surname>
          </string-name>
          .
          <article-title>"A Review on Finding Users Navigation Behavior Using Web Mining Algorithm</article-title>
          .
          <source>"</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Ivncsy</surname>
            , Renta, and
            <given-names>Istvn</given-names>
          </string-name>
          <string-name>
            <surname>Vajk</surname>
          </string-name>
          .
          <article-title>"Frequent pattern mining in web log data."</article-title>
          <source>Acta Polytechnica Hungarica</source>
          <volume>3</volume>
          .1 (
          <year>2006</year>
          ):
          <fpage>77</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Huang</surname>
            , Xiangji,
            <given-names>and Aijun</given-names>
          </string-name>
          <string-name>
            <surname>An</surname>
          </string-name>
          .
          <article-title>"Discovery of interesting association rules from livelink web log data</article-title>
          .
          <source>" Data Mining</source>
          ,
          <year>2002</year>
          .
          <article-title>ICDM 2003</article-title>
          . Proceedings.
          <source>2002 IEEE International Conference on. IEEE</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>J. Pei</surname>
            , J. Han,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mortazavi-Asl</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Mining access patterns e ciently from web logs</article-title>
          ,
          <source>in PADKK 00: Proceedings of the 4th Paci cAsia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications</source>
          . London, UK: Springer-Verlag,
          <year>2000</year>
          , pp.
          <fpage>396</fpage>
          -
          <lpage>407</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Ezeife</surname>
            ,
            <given-names>Christie I.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>"Mining web log sequential patterns with position coded pre-order linked wap-tree."</article-title>
          <source>Data Mining and Knowledge Discovery 10.1</source>
          (
          <year>2005</year>
          ):
          <fpage>5</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Agrawal</surname>
            , Rakesh, and
            <given-names>Ramakrishnan</given-names>
          </string-name>
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>"Mining sequential patterns</article-title>
          .
          <source>" Data Engineering</source>
          ,
          <year>1995</year>
          .
          <source>Proceedings of the Eleventh International Conference on. IEEE</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , Yen-Liang,
          <article-title>Mei-Ching Chiang, and</article-title>
          <string-name>
            <given-names>Ming-Tat</given-names>
            <surname>Ko</surname>
          </string-name>
          .
          <article-title>"Discovering time-interval sequential patterns in sequence databases</article-title>
          .
          <source>" Expert Systems with Applications 25.3</source>
          (
          <year>2003</year>
          ):
          <fpage>343</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Joong</given-names>
          </string-name>
          <string-name>
            <surname>Hyuk</surname>
          </string-name>
          .
          <article-title>"Mining weighted sequential patterns in a sequence database with a time-interval weight."</article-title>
          <source>Knowledge-Based Systems 24.1</source>
          (
          <year>2011</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yen-Liang</surname>
          </string-name>
          , and
          <string-name>
            <surname>TC-K. Huang</surname>
          </string-name>
          .
          <article-title>"Discovering fuzzy time-interval sequential patterns in sequence databases</article-title>
          .
          <source>" IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          (
          <year>Cybernetics</year>
          ) 35.5 (
          <year>2005</year>
          ):
          <fpage>959</fpage>
          -
          <lpage>972</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Marascu</surname>
            , Alice, and
            <given-names>Florent</given-names>
          </string-name>
          <string-name>
            <surname>Masseglia</surname>
          </string-name>
          .
          <article-title>"Mining sequential patterns from temporal streaming data</article-title>
          .
          <source>" Proc. of the 1st ECML/PKDD Workshop on Mining SpatioTemporal Data (MSTD</source>
          <year>2005</year>
          ).
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Zihayat</surname>
          </string-name>
          ,
          <string-name>
            <surname>Morteza</surname>
          </string-name>
          , et al.
          <article-title>"Mining high utility sequential patterns from evolving data streams." Proceedings of the ASE BigData</article-title>
          &amp;
          <article-title>SocialInformatics 2015</article-title>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Jiang</surname>
            , Nan, and
            <given-names>Le</given-names>
          </string-name>
          <string-name>
            <surname>Gruenwald</surname>
          </string-name>
          .
          <article-title>"Research issues in data stream association rule mining</article-title>
          .
          <source>" ACM Sigmod Record 35.1</source>
          (
          <year>2006</year>
          ):
          <fpage>14</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Opal</surname>
          </string-name>
          Trips - Bus https://opendata.transport.nsw.gov.au/dataset/opal-trips-bus Retrieved: May,
          <year>2017</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>