Mining the user profile from a smartphone: a multimodal agent framework Giuseppe Loseto, Michele Ruta, Floriano Scioscia, Eugenio Di Sciascio, Marina Mongiello DEI - Politecnico di Bari via E. Orabona 4, I-70125, Bari, Italy loseto@deemail.poliba.it, {m.ruta, f.scioscia, disciascio, mongiello}@poliba.it Abstract—Nowadays smartphones play a significant role in the mobile profiler must modulate proactively the amount gathering relevant data about their owners. Micro-devices embed- and complexity of data capture and processing, in order to ded in Personal Digital Assistants (PDAs) perform a continuous use energy efficiently. Smart Home and Building Automation sensing, the phone call lists, PIM (Personal Information Man- (HBA) [3] was selected as proof scenario: the profiling agent ager), text messages and so on allow to collect and mine data sends the inferred preferences to its HBA counterpart so that a enough for a high-level description of daily activities of a user. This paper proposes an agent able to perform an automated logic-based matchmaking session could finalize the adaptation profile annotation by adopting Semantic Web languages. As a of the environment to user needs. proof of concept, the devised agent has been tested in an Ambient The remainder of the paper is organized as in what follows. Intelligence (AmI) scenario, i.e., a domotic environment where Section II contextualizes the overall multi-agent HBA system it interacts with its home counterpart to trigger services best motivating the proposed approach before presenting both archi- matching the user needs. A toy example is presented as case study aiming to better clarify the proposal while an early experimental tecture and algorithms of the profiler agent in Section III. The evaluation is reported to assess its effectiveness. toy example in Section IV acts as a case study while an early experimental evaluation is reported in Section V. Finally, most Keywords—Ambient Intelligence; Agent-based Data Mining; relevant related work is discussed in Section VI and concluding Semantic Web of Things; Home and Building Automation. remarks and future research are in Section VII. I. I NTRODUCTION II. S CENARIO : SEMANTIC - BASED HOME AUTOMATION Mobile phones are both pervasive and personal –following The user agent proposed in this paper is intended as a the user and having clues about everyday situations– resulting part of a more complex HBA Multi Agent System (MAS) [4] extremely useful to infer a context. Embedded micro-devices leveraging the semantic-based evolution of the KNX domotic (accelerometer, digital compass, gyroscope, GPS, microphone protocol in [5]. It introduced a semantic micro-layer on the and camera) can be used to extract significant information top of the stack enabling novel services and functions while about the user: GPS location traces, call and SMS lists, keeping a full backward-compatibility with current domestic PIM (Personal Information Management) records including devices and HBA appliances. The above enhancements allowed contacts and calendar, battery charging habits. By leveraging to fully describe device features by means of annotations the smartphone processing capabilities, ever-expanding ways expressed in logic-based languages such as RDF2 and OWL3 . to investigate behavioral, spatial and temporal dimensions of The knowledge domain of building automation was concep- the everyday life can be provided. The personal nature of tualized in a shared ontological vocabulary enabling a rich mobile phones suggest they are well suited for pervasive characterization of home resources and services. The MAS computing, but data they are able to collect and process could was implemented in Java on a testbed composed of off-the- be profitably used for a large set of context-aware applications, shelf KNX domotic equipment4 . like the Ambient Intelligence (AmI) [1] ones. The adopted multi-agent system comprised a home me- This paper presents a smart profiling agent1 which bor- diator agent as well as user and device agents. Each agent rows languages and technologies from the Semantic Web adopts the custom service-oriented model sketched in [4, experience to funnel inarticulate raw individual information Fig. 4]. Basically, the agent monitors its internal state and toward a semantically rich glossary. A crawler agent runs inputs; when a significant change occurs, it communicates with on the user smartphone and performs a multimodal (i.e., the other agents in order to discover suitable services that involving several heterogeneous data sources) and continuous maximize its utility. The number of both resources/services and sensing [2] collecting and processing information without agents varied unpredictably (as new users or devices joined or human intervention. The multimodality requires specialized disconnected the system at any time) without redefining the analyses for each kind of collected data. The agent mines communication paradigm for that. the user habits automatically and annotates them in a logic- 2 RDF (Resource Description Framework) Primer, W3C Recommendation, based formalism to build a daily profile to be further ex- 10 February 2004, http://www.w3.org/TR/rdf-primer/ ploited in context-aware knowledge-based applications. The 3 OWL 2 Web Ontology Language, W3C Recommendation, 11 December main motivation for adopting an agent-based approach is that 2012, http://www.w3.org/TR/owl2-overview/ 4 See the related project home page 1 Project home page: http/sisinflab.poliba.it/swottools/mobile-user-profiler/ http://sisinflab.poliba.it/swottools/smartbuildingautomation/ for more details. – The Mediator Agent coordinates the explicit characteriza- c. associate a “place category” to each POI, so as to further tions of available services, described w.r.t. a reference ontology infer the kind of user activity; modeling the conceptual knowledge for the building automa- d. enrich the daily user profile conjoining all detected activi- tion problem domain. Furthermore, it acts as a broker in order ties, described w.r.t. a proper HBA ontology. to discover the (set of) elementary services that cover (part of) A SP represents a narrow geographic region where a user the request coming from user or device agents. stands for a while. In particular, given two subsequent detected – The Device Agents are thought to run on advanced devices, GPS locations P1 and P2 , a SP satisfies both the following i.e., home appliances with some computational capabilities constraints: (i) maximum distance d(P1 , P2 ) < Dmax ; (ii) and memory availability. Each one can expose one or more minimum time difference |T1 − T2 | > Tmin , where the semantic descriptions, i.e., functional profiles to be discovered thresholds were set to Dmax = 200m, Tmin = 350s. An by other agents, or alternatively each of them could issue empirical evaluation was executed to assign the thresholds semantic-based requests to the mediator agent when the device values granting the highest precision of the SP recognition status changes and then require a home reconfiguration. algorithm. – KNX Device Interface Agents support semantic-based en- hancements in case of legacy or elementary appliances, e.g., switches, lamps, and so on. In such cases, there is only a static interaction between agent and device. – Finally the User Agents, running on mobile clients, send requests toward the home environment, in order to satisfy user needs and preferences. W.r.t. the version in [4], an approach for the automated mining of a user profile in charge to that kind of agent is proposed as main contribution of this paper. III. F RAMEWORK AND APPROACH Figure 1 sketches the general architecture of the profiling agent. Raw data are extracted from smartphone embedded micro-devices, communication tools and PIM. The data min- (a) Home POI (b) POI Info (c) Extracted Places ing life cycle consists of the following subsequent stages: (a) gathering; (b) feature extraction; (c) classification and interpretation; (d) semantic annotation. High-level information about user activities, whereabouts, mental and physical status is inferred and annotated w.r.t. an extension of the HBA ontology in [5]. The mined profile should be finally used to trigger the activation or deactivation of the most appropriate home services. A modular architecture allows to process the various data sources with specialized algorithms. In particular, as shown by icons in Figure 1, three modules fully characterize the agent at the moment: (i) Points of Interest Recognition; (ii) Transportation Mode Recognition; (iii) User Activity Recog- nition. (d) Profile mining (e) Food place detail (f) Daily stay period and User Profiling Agent location visited before GPS Trace Accelerometer PIM, SMS, Call Fig. 2. Screenshots of the GPS profiler Stay Points SVM Model Data Processing Google POIs SVM Features Figure 2 shows the GUI of the profiler prototype on the Places GPS-side. The daily GPS trace is drawn on Google Maps Mental Overpass ss Transp. Mode e User Activity Status, Mood together with detected SPs, depicted as markers on the map in Figure 2(a). The Home and Workplace POIs are set by Semantic-based User Profile the user in a preliminary configuration step. As said, the SP classification leverages a Web-based reverse geocoding Fig. 1. Reference architecture of the user profiling agent service: after comparing Google Places and LinkedGeoData (LGD) [7] (see Section V for further details) the first one service has been chosen at the moment, since it provides more 1. Points of Interest Recognition. A mining algorithm ana- available POIs even if LGD often seems to be more accurate. lyzes the smartphone GPS data in order to: In the example reported in Figure 2(c), the agent selected a a. identify Stay Points (SPs) through a slightly refined version SP near to the Politecnico di Bari and all the nearby POIs of the algorithm in [6]; were retrieved by means of the Google Places API. The main b. for each SP, retrieve the nearest Point Of Interest (POI) via category of the nearest POI is used as label of the retrieved reverse geocoding queries to Google Places5 Web service; location. Starting from the Google Places classification6 , the 5 http://developers.google.com/places/ 6 http://developers.google.com/places/documentation/supported types/ reference ontology for domotics in [5] has been extended to include a places taxonomy. Finally, as reported by the Figure 2(d), a profile is generated through the conjunction of location information. As shown in Figure 2(e), each SP description contains an ontology class related to the specific location the user visited, the overall time spent there (in seconds), the daily period and the place visited before, if present (Figure 2(f)). 2. Transportation Mode Recognition. GPS data are exploited also to detect the transportation mode adopted by the user when moving during a day. Four transportation modes are supported: bus, train, car or walking. A pre-processing splits the whole daily GPS trace P = {T1 , . . . , Tn } in trajectories (a) Overpass routes (b) Train Mode (c) Train Mode details Ti . In turn, each trajectory Ti = Q{P OIi , P OI(i+1) } consists of a set of GPS points Q included between two subsequent Fig. 3. Screenshots of the Transportation Mode profiler POIs. Starting from the trajectories set, the transportation mode detection is based on two reference parameters: (i) the # Feature description walking speed threshold (W Sth ), set to an average value of 1 tBodyAcc correlation(X,Y) 2 tGravityAcc mean(X) 2 m/s (i.e., 7.2 km/h); (ii) the minimum correspondence ratio 3 tGravityAcc mean(Y) (CRmin ) between user trajectories and bus/train routes, set to 4 tGravityAcc max(Z) 0.8 (i.e., at least a 80% correspondence is required). Also in 5 tGravityAcc min(X) 6 tGravityAcc energy(X) this case, an experimental evaluation was performed to select 7 tBodyGyro iqr(Z) the most suitable threshold values. The algorithm for detection 8 tBodyGyroJerk entropy(X) progresses along the following stages: 9 tBodyGyroJerk entropy(Z) 10 tBodyAccJerkMag iqr(X,Y,Z) a. For each trajectory Ti , the average user speed is evaluated. 11 tBodyGyroJerkMag energy(X,Y,Z) If it is lower than W Sth then walking mode is detected. 12 fBodyGyro max(Y) 13 fBodyGyro max(Z) b. Otherwise, the algorithm queries OpenStreetMap7 (OSM) 14 fBodyGyro skewness(Z) via the Overpass API8 to retrieve all available bus and train 15 fBodyAccMag std(X,Y,Z) routes (Rs = Rbus ∪ Rtrain ) in a bounding box covering the 16 fBodyAccMag energy(X,Y,Z) geographical coordinates of the GPS points in Ti . Figure 3(a) t=time domain, f=frequency domain, Jerk=derived in time, shows an example for that. Mag=Euclidean norm, iqr=Interquartile range c. A comparison between the GPS points of the user trajectory TABLE I. F EATURES SUBSET FOR THE SVM CLASSIFIER and the retrieved routes is performed. In case of a correspon- dence ratio greater than CRmin with a bus or train path, the trajectory Ti is associated to a bus or train mode, respectively (Figure 3(b)). for the classifier were reduced to 16 (see Table I) by applying d. Finally, if the detected mean is neither walking nor train the Recursive Feature Elimination (RFE) algorithm proposed nor bus, then the car mode is selected. in [9]. Each transportation mode is associated to a semantic-based A training set composed by sensor raw data has been used annotation fragment which includes a given class of the ontol- to let the classifier learn directly on the mobile device. The ogy, further extended to include also concepts and properties smartphone used for the experimental evaluation is equipped about user movements. Moreover, the description will include with an accelerometer and a gyroscope measuring both the 3- the overall time –in seconds– the user spent during the day for axial linear acceleration and the angular velocity (tAcc-XYZ moving, the daily period and possible means of transport used and tGyro-XYZ, respectively) at a fixed sampling rate of 25 before. Figure 3(c) shows the details about the user profile ms, which is adequate to identify a human body motion. The section related to a transfer by train. collected data are subsequently processed through two first- 3. User Activity Recognition. Beyond the above components, order low-pass filters. The first one is used to reduce noise, the profiling agent is completed by a module to detect some while the second filter splits the acceleration signal into body user activities. In particular, at the moment the following ele- and gravity components (tBody and tGravity). The classifier mentary actions can be discovered: sitting, standing, walking, has been implemented using Weka-for-Android10 , an Android walking upstairs and dowstairs. Starting from data acquired port of Weka [10]. The training set has been built fastening from the smartphone accelerometer and gyroscope, a super- the smartphone in vertical position as reference; after the SVM vised Machine Learning (ML) approach is adopted, exploiting training, the recognition process starts. Data are sampled in the Support Vector Machines (SVM) classifier in [8]. W.r.t. the fixed-width sliding windows of 2.5 s (i.e., 100 samples) with original approach, the classifier was simplified to improve its 50% overlap, and processed as described above. From each efficiency on PDAs and to reduce the training time. The early window, a vector with the 16 features in Table I is obtained 568 features used on the dataset9 associated to [8] as input by computing the extracted accelerometer and gyroscope data in the time and frequency domain. Finally, an energy saving 7 http://www.openstreetmap.org/ strategy is implemented to avoid unnecessary data capture: 8 http://wiki.openstreetmap.org/wiki/Overpass API after each activity recognition ARi , a pause W Pi is waited 9 http://archive.ics.uci.edu/ml/datasets/ Human+Activity+Recognition+Using+Smartphones 10 https://github.com/rjmarsan/Weka-for-Android for. W Pi is defined as: FoodActivity ≡ Bar ⊓ ∀ during.Af ternoon ⊓ { ∀ af ter.W ork ⊓ =474 stayT ime 0sec if ARi ̸= ARi−1 W Pi = 2.5sec if ARi = ARi−1 SportActivity ≡ Gym ⊓ ∀ during.Evening ⊓ ∀ af ter.W ork ⊓ =5362 stayT ime (W Pi−1 ∗ 2)sec if ARi = ARi−1 = ARi−2 WalkMode ≡ W alk ⊓ =2115 moveT ime ⊓ ∀ during.Af ternoon ⊓ In this way, if the classifier consecutively detects two similar ∀ af ter.Car activities, then the data sampling is stopped for 2.5 seconds. SittingActivity ≡ Sitting ⊓ =21436 moveT ime ⊓ This value is doubled in case of additional similar recognitions, ∀ during.(M orning ⊓ Af ternoon ⊓ Evening) up to a maximum value of W Pi = 80s. Otherwise, the waiting period is reset to zero when a different action is detected. The above generated profile will be adopted by the user The rationale is that users usually perform similar activities agent to negotiate with the mediator agent at home the in a short period –consider for example the case of sitting environmental situation best fitting needs and mood of the and walking– so a continuous data gathering could be often inhabitant via a semantic-based matchmaking. The elementary avoided. services and appliances covering the mined user profile as The vector containing the extracted features is then used much as possible are automatically activated (or in case as input of the trained SVM model. Finally, the user profile is deactivated) to increase the overall MAS utility. As an example enriched with the annotations related to the detected activities. of this phase, let us consider the following available home For each of them it will be also considered the overall stay services/resources: time and the daily period. CookingService ≡ Service ⊓ ∀ wasInSportP lace.( >=1800 stayT ime) ⊓ ∀ wasAtHome.( ∀ af ter.(Sport ⊓ ¬F ood)) ⊓ IV. C ASE STUDY ∀ suggestedF orF eeling.Hungry SoftLightLevel ≡ LightLevelRegulation ⊓ ∀ wasAtW ork.( >=10800 In order to clarify the rationale behind the proposed ap- stayT ime) ⊓ ∀ wasAtHome.( ∀ af ter. ¬Relax) ⊓ proach and to let emerge the goal of the profiling agent, the ∀ suggestedF orStamina.M entallyT ired ⊓ following daily scenario is considered as example. The user ∀ suggestedF orDisease.Headache leaves home early in the morning to go to work. He remains at PlayMusic ≡ Service ⊓ ∀ wasAtHome.( ∀ af ter.( ¬W ork ⊓ office until lunch, then reaches a bar for a fast meal. Afterward, Relax) ⊓ ∀ during. ¬N ight) ⊓ ∀ suggestedF orStamina.Rested ⊓ he comes back to work, then goes to the gym in the evening and ∀ suggestedF orDisease. ¬Headache finally returns home late at night. The profiling agent extracts the daily location sequence reported in Table II. Particularly, It should be noticed that service annotations are described Home and Office POIs are mapped to the user profile directly in terms of both user features (such as a physical status, mood as Home and Work activities; Bar is identified as a Food place; and health) and daily events which cause the activation. In this Gym is associated to the Sport place category. The agent also way, a service/resource selection can be performed through the recognizes the adopted means of transport and the duration of matchmaking against the user profile. For example, a cooking each trajectory. service is activated not only if the user explicitly declares he is hungry, but also if the user agent detects he comes back Route Type Duration (min) home after a sport activity, performed for more than 30 minutes Home → Office car 30 Office → Bar walk 4 (expressed in seconds), without eating anything before. In a Bar → Office walk 5 similar way, a soft lighting setting is selected to improve the Office → Gym car 11 Gym → Home car 21 comfort at home in case the user is mentally tired and he spent more than 3 hours at work not followed by a restful activity. TABLE II. DAILY USER LOCATIONS AND ROUTES The extracted user profile can also lead to a deactivation of previously enabled services. For example, the music service is normally activated to welcome the owner at home, but it is Along the day, the agent also detects the activities of the unsuitable if the user comes back during the night and in that user: he was seated for about 6 hours (e.g., at work, within the case it must be turned off. car, during lunch), walked for 35 minutes (e.g., to reach the bar or for short strolls) and was standing for 15 minutes. As a result The above case study is purposely simplified in order to of the mining and annotation processes, the following profile is make the presentation of the proposed approach clear and extracted (expressed in Description Logic [11] notation w.r.t. short. In real scenarios, more articulated user profiles and the reference ontology)11 : service descriptions can be used. User Daily Profile ≡ ∀ wasAtHome.HomeActivity ⊓ ∀ wasAtW ork.W orkActivity ⊓ ∀ wasInF oodP lace.F oodActivity ⊓ V. E XPERIMENTS ∀ wasInSportP lace.SportActivity ⊓ An overall evaluation of the proposed approach has been ∀ movedByCar.CarM ode ⊓ ∀ movedByW alk.W alkM ode ⊓ ∀ wasSitting.SittingActivity ⊓ ∀ wasW alking.W alkingActivity ⊓ carried out following a reference user for a period of 14 ∀ wasStanding.StandingActivity months. Results reported here refer to the first 60 days of HomeActivity ≡ Home ⊓ ∀ during.(M orning ⊓ N ight) ⊓ observation. In particular, only the days –24 in the evaluated ∀ af ter.Gym ⊓ =1945 stayT ime dataset excerpt– with at least one Stay Point different from Home or Workplace have been selected for further investi- WorkActivity ≡ W ork ⊓ ∀ during.(M orning ⊓ Af ternoon) ⊓ ∀ af ter.(Home ⊓ Bar) ⊓ =32470 stayT ime gation. The profiling agent has been tested on a smartphone equipped with an ARM Cortex A8 CPU at 1 GHz, 512 MB 11 Due to space constraints, some sections have been voluntarily omitted. RAM, a 8 GB internal storage memory, and Android 2.3.3 as operating system. Done experiments basically aimed to Activity A B C D E Recall % measure: (i) the amount of data retrieved from services on the A Sitting 340 0 0 0 0 100 Web; (ii) the turnaround time (for which each test was repeated B Standing 0 98 0 1 0 98.9 C Walking 1 0 70 0 3 94.6 four times taking the average of the last three runs); (iii) the D Walking Upstairs 0 0 2 125 5 94.7 memory usage (for which the final result was the average of E Walking Downstairs 0 0 0 4 130 97.0 three runs). This experimental analysis only focuses on the user Precision % 99.7 100 97.2 96.2 94.2 98.0 profiling aspects: [4] reports on evaluation of the remaining TABLE III. C ONFUSION M ATRIX elements of the reference HBA MAS. Figure 4 shows the total number of stay points detected with the mining algorithm compared with the overall GPS In particular, the last step took about 1.15 s (49% of total coordinates composing a daily trace. It can be noticed that time) to parse the ontology and create the semantic-based the user agent collects 53 GPS points per day on average, annotation. The remaining steps require only the 3% of the detecting about 3 relevant SPs. overall turnaround time, as these procedures use elementary data structures stored in the device main memory. For the GPS Points Stay Points POI Google POI LGD transportation mode detection, only 1.7 s were spent to query 100 the Overpass service, while traces comparison is one of the slower operations, needing 3.4 s. The activity recognition Number of points process has a very short turnaround time. After a preliminary 10 task (required to train the SVM classifier) taking about 5.6 s and performed when the profiling agent starts, this module needs only 45 ms to extract the 16 reference features for each 1 windows and 6 ms to detect the user activity. Finally, a daily 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 profile was completely composed in about 1.2 seconds. Fig. 4. GPS points, detected SPs and retrieved POIs 10000 GPS Trace Parsing SPs Detection Google Query Starting from detected SPs, the results of Google Places 1000 LGD Query Time (ms) and LGD services have been compared in terms of number of Overpass Query 100 retrieved POIs in the neighborhood of each SP. As shown in Traces Comparison Figure 4, Google Places usually returns 16 POIs w.r.t. 5 POIs SVM Training 10 on average retrieved by LGD, so an accurate identification Features Extraction of the locations the user visited is more likely. Nevertheless, 1 Activity Recognition as reported in Figure 5, in some cases the LGD replies are Profile Creation Processing Task longer even though it returns fewer POIs. This is due to the LGD response format including, for each point, information Fig. 6. Processing Time annotated according to Linked Data principles [12]: Google Places uses 830 B per POI on average, whereas LGD uses 1.56 kB. A further evaluation of the activity recognition module required to measure precision and recall of the classifier. 100 Google Places LGD Overpass datasets of activities containing a similar number of samples 1000000 per class have been used. The confusion matrix shown in Table III reports on the weighted precision of the classifier Retrieved Data (byte) 100000 and on single precision and recall values for each activity. It is referred to a single specific dataset with 779 sample vectors. 10000 However all confusion matrices for different tests showed 1000 similar outputs, varying slightly in the classification results. It is possible to notice that the classifier precision and recall 100 are very high despite the usage of a small set of features. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 RAM usage trend was also evaluated and results are shown Fig. 5. Retrieved Data in Figure 7, where memory peaks are reported. The profiler agent needs very low memory, only 4.2 MB on average, a satisfactory value for current mobile devices. The time required by the main processing steps for POIs recognition (GPS traces parsing; SPs detection; Google VI. R ELATED WORK Places/LGD services querying; profile enrichment), trans- portation mode detection (Overpass service querying; traces The recent popularization of smartphones equipped with comparison; profile enrichment) and activity recognition are a wide range of embedded sensors and adequate processing reported in Figure 6. Google Places is slightly slower than capabilities has attracted increasing research efforts toward LGD, but this is due to the greater amount of retrieved POIs. mobile sensing. Lane et al. [2] proposed a survey on existing Considering Google Places as reference service, the agent algorithms, applications, and systems. In addition, many perva- spends about 1.2 s to retrieve the POIs from a detected SP. sive frameworks were defined to collect and capture the user’s DĞŵŽƌLJhƐĂŐĞ;DͿ DĞŵŽƌLJWĞĂŬ s'DĞŵŽƌLJWĞĂŬ ploitation in an articulated AmI framework is still missing. ϳ͘Ϭ Usually, collected data are only used to indicate detected user ϲ͘Ϭ conditions or activities through messages or alerts displayed on ϱ͘Ϭ the mobile phone. On the contrary, in the approach proposed ϰ͘Ϭ here, the ontology-based characterization of user activities is ϯ͘Ϭ used as an input for a context-aware HBA MAS [4], enabling a Ϯ͘Ϭ direct environment adaptation and a negotiation between user ϭ Ϯ ϯ ϰ ϱ ϲ ϳ ϴ ϵ ϭϬ ϭϭ ϭϮ ϭϯ ϭϰ ϭϱ ϭϲ ϭϳ ϭϴ ϭϵ ϮϬ Ϯϭ ϮϮ Ϯϯ Ϯϰ and home agents. This feature is not possible for any other current user profiler. Fig. 7. Main memory usage trend VII. C ONCLUSION AND F UTURE W ORK context via cellphones in latest years: remarkable works are The paper presented a lightweight agent able to mine data ContextPhone [13], UbiqLog [14] and LifeMap [15]. The agent collected by embedded micro-devices, logs and applications proposed here aims to improve upon these works by leveraging of a smartphone to build a semantic-based daily profile of the multimodality aspect: the implemented prototype retrieve its user. According to the AmI paradigm, such a description information from a data source richer than the above systems, can be exploited to transparently adapt the environment to even though further mining modules have been planned but not user preferences, implicitly inferred. In the matter in question, integrated yet. A comparison should be carried out also with the agent interacts in a multi-agent framework for Home and respect to commercial location and context-aware mobile soft- Building Automation, grounded on knowledge representation ware: trekking and fitness applications like Google MyTracks12 theory and reasoning technologies. It has been designed and and Endomondo Sportstracker13 ; personalized assistants like then implemented as an Android application and experiments Google Now14 and Xme15 . Nevertheless, these tools either in a concrete case study proved its feasibility and effectiveness. require explicit user interaction or define context just by means Future work will include a more extensive experimental of GPS location and time of day, hence they are quite far campaign involving several different users to be profiled and off the agent proposed here which uses more parameters and new performance indicators. Particularly, both battery drain automatically recognizes a larger variety of contexts. and storage peaks will be taken into account to assess the The activity recognition from accelerometer by means of feasibility of a continuous data collection and mining and to machine learning is a frequent sensing application. Among compare the provided framework with existing approaches. other proposal, noteworthy are [16], [8] where smartphone Also the exploitation of an agent-based framework w.r.t. to accelerometer data are used to classify six common activities. classical approaches will be posed under investigation to verify With reference to context extraction via GPS data analysis, if it results in a more accurate profiling action. Finally, future there are many approaches in literature. For example Zheng research will be also devoted to the integration of the current et al. [17] model multiple individuals GPS trajectories with multimodal information. A fusion of information coming from a tree-based hierarchical graph to mine location history and data sources which now are distinct and independent will be travel sequences in a given geospatial region. In [6] mobile pursued in order to reach a more accurate and precise user phones are used as sensors to collect location information. characterization. Places are first grouped using a time-based clustering technique to discover stay points; then the stay points are clustered in ACKNOWLEDGMENT stay regions through a grid-based algorithm. In [18] a large- scale dataset is collected from 114 users over 18 months. The authors acknowledge partial support of Italian PON project Res Novae and EU PO Apulia region FESR project In the above cited works, however, the knowledge gap be- UbiCare. tween acquired data and the understanding of human behavior is still huge. Stay points and movement patterns require to be interpreted to extract a user profile, implicitly providing R EFERENCES knowledge about the user habits. Noteworthy attempts to [1] D. J. Cook, J. C. Augusto, and V. R. Jakkula, “Ambient intelligence: enrich movement trajectories with semantics are in [19] and Technologies, applications, and opportunities,” Pervasive and Mobile [20]. An ontology-based approach for a semantic modeling of Computing, vol. 5, no. 4, pp. 277 – 298, 2009. trajectories is also proposed in [21]. Trajectories are seen as [2] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. Campbell, “A survey of mobile phone sensing,” IEEE Communications composed by three main elements: stops, moves and begin- Magazine, vol. 48, no. 9, pp. 140–150, Sep. 2010. ends. Each part is described through an annotation referred [3] G. Loseto, F. Scioscia, M. Ruta, and E. Di Sciascio, “Semantic-based to a domain ontology and time information are also exploited Smart Homes: a Multi-Agent Approach,” in 13th Workshop on Objects to annotate activities to enable rule-based queries and to help and Agents (WOA 2012), ser. CEUR Workshop Proceedings, F. De Paoli users validate and discover moving objects. and G. Vizzari, Eds., vol. 892, Sep 2012, pp. 49–55. [4] M. Ruta, F. Scioscia, G. Loseto, and E. Di Sciascio, “Semantic-based Although previous solutions add a machine-understandable resource discovery and orchestration in home and building automation: meaning to data collected by smartphones, a subsequent ex- a multi-agent approach,” IEEE Transactions on Industrial Informatics, 2013, to appear. 12 http://www.google.com/mobile/mytracks/ [5] M. Ruta, F. Scioscia, E. Di Sciascio, and G. Loseto, “Semantic-based 13 http://www.endomondo.com Enhancement of ISO/IEC 14543-3 EIB/KNX Standard for Building 14 http://www.google.com/landing/now/ Automation,” IEEE Transactions on Industrial Informatics, vol. 7, no. 4, 15 http://xndme.com/ pp. 731–739, 2011. [6] R. Montoliu, J. Blom, and D. Gatica-Perez, “Discovering places of interest in everyday life from smartphone data,” Multimedia Tools and Applications, pp. 1–29, 2012. [7] C. Stadler, J. Lehmann, K. Höffner, and S. Auer, “LinkedGeoData: A Core for a Web of Spatial Open Data,” Semantic Web Journal, vol. 3, no. 4, pp. 333–354, 2012. [8] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz, “Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine.” in Workshop of Ambient Assisted Living (IWAAL 2012), 2012. [9] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, pp. 389–422, 2002. [10] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10–18, 2009. [11] F. Baader, D. Calvanese, D. Mc Guinness, D. Nardi, and P. Patel- Schneider, The Description Logic Handbook. Cambridge University Press, 2002. [12] C. Bizer, T. Heath, and T. Berners-Lee, “Linked Data - The Story So Far,” International Journal on Semantic Web and Information Systems, vol. 5, no. 3, pp. 1–22, 2009. [13] M. Raento, A. Oulasvirta, R. Petit, and H. Toivonen, “Contextphone: A prototyping platform for context-aware mobile applications,” IEEE Pervasive Computing, vol. 4, no. 2, pp. 51–59, Apr. 2005. [14] R. Rawassizadeh, M. Tomitsch, K. Wac, and A. Tjoa, “Ubiqlog: a generic mobile phone-based life-log framework,” Personal and Ubiqui- tous Computing, pp. 1–17, 2012. [15] J. Chon and H. Cha, “LifeMap: A Smartphone-Based Context Provider for Location-Based Services,” IEEE Pervasive Computing, vol. 10, no. 2, pp. 58–67, Apr. 2011. [16] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition us- ing cell phone accelerometers,” ACM SIGKDD Explorations Newsletter, vol. 12, no. 2, pp. 74–82, 2011. [17] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, “Mining Interesting Loca- tions and Travel Sequences From GPS Trajectories,” in Proceedings of the 18th International Conference on World Wide Web, ser. WWW ’09. New York, NY, USA: ACM, 2009, pp. 791–800. [18] T. M. T. Do and D. Gatica-Perez, “The Places of Our Lives: Visiting Patterns and Automatic Labeling from Longitudinal Smartphone Data,” IEEE Transactions on Mobile Computing, 2013, PrePrints. [19] C. Renso, M. Baglioni, J. Macedo, R. Trasarti, and M. Wachowicz, “How you move reveals who you are: understanding human behavior by analyzing trajectory data,” Knowledge and Information Systems, pp. 1–32, 2012. [20] C. Parent, S. Spaccapietra, C. Renso, G. Andrienko, N. Andrienko, V. Bogorny, M. L. Damiani, A. Gkoulalas-divanis, J. Macedo, N. Pelekis, Y. Theodoridis, and Z. Yan, “Semantic Trajectories Model- ing and Analysis,” ACM Computing Surveys, vol. 45, no. 4, 2013. [21] R. Wannous, J. Malki, A. Bouju, and C. Vincent, “Time Integration in Semantic Trajectories Using an Ontological Modelling Approach,” in New Trends in Databases and Information Systems, ser. Advances in Intelligent Systems and Computing, M. Pechenizkiy and M. Wo- jciechowski, Eds. Springer Berlin Heidelberg, 2013, vol. 185, pp. 187–198.