Knowledge-driven Analytics and Sensor Signal Processing in Human- centric Applications Arijit Ukil Leandro Marin Antonio Jara John Farserotu Tata Consultancy Services University of Murcia University of Applied CSEM, Switzerland Kolkata, India Spain Sciences Western john.farserotu@csem.ch arijit.ukil@tcs.com.com leandro@um.es Switzerland (HES-SO) Switzerland jara@ieee.org Abstract eventually influences the betterment to human society. Technology disruption through knowledge driven intelligent systems is increasingly 1. Introduction controlling human life. Management of the present and future knowledge-driven artificial This paper is intended to demonstrate the capability intelligence- based technologies is of highest of knowledge-driven analytics for building human- importance to maximize its progressive centric applications. We envisage the knowledge- influence to human life and human society. driven human beings, knowledge-driven societies and Life style diseases, social network affinity, knowledge-driven technologies should co-operatively impulsive financial decision, technology-abuse co-exist to create a better knowledge-driven world. Our negatively affect our physical, emotional, focus is to minimize the risks, conflicts and hazards of social and mental health. Conversely, adapting to intelligent systems. This paper illustrates intelligent systems can bring positive impact exemplary impactful ideas and proposals to achieve the on human life. This paper brings forward those goal of knowledge-driven life. positive applications and technologies as well Technology advancements of last few years have as the path towards transformation of produced number of exquisite applications and intelligent systems through some exemplary penetrating influences in human life. The ubiquity of analysis that minimizes the negative impact. smartphones, large scale deployment of Internet of The push is to promote the development of Things, high end computing, big data, impactful and human-centric intelligent technologies like gross engagement to social networks along with the precise and personalized medication and advent and promise of powerful artificial intelligent treatment plan, drug discovery of untreatable tools like deep learning algorithms result in abundance diseases, improved elderly care, minimizing of information generation, dissemination of knowledge private data theft, big data analytics for and analytics- driven human decisions and choices. prediction of macro or micro economic Such conglomeration of technologies, applications and condition, effective and fair trading practices, the big data resources paves ways for knowledge- retail decision management, knowledge-driven driven human life, society and economy. energy and resource management, deep We bring forward the applications and technologies learning and artificial intelligence based that through knowledge-driven analytics bring positive applications for risk prediction and augmented outcomes to the human life and to the world at large. human capability generation. The main focus For example, knowledge-managed learning techniques of this paper is to demonstrate the knowledge- have the capability of providing robust prediction of driven technologies, developments, medical condition, automated summarization, report applications for ensuring improvement of generation, minimization of diagnosis error, enabling human quality of life. The impact would be remote disease screening. It can predict the suicidal micro-level, where human life is impacted in trend or state of depression from analyzing Facebook daily basis and at macro-level where human posts, tweets or recent posted images. Prediction of life would be impacted in long term that psychiatric disorders like schizophrenia, which physicians find difficult to anticipate would have immense impact on millions of human life. Traditional coarse evidence driven medical treatment needs to be Copyright © CIKM 2018 for the individual papers by the papers' more precise and personalized. Big data and authors. Copyright © CIKM 2018 for the volume as a collection by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). availability of vast information invite severe data sustainable societies by optimizing energy, waste and privacy attacks which can potentially ruin one’s life and perishable resource management? And many others. reputation. One of the challenging applications is the controlled release of private data without The pertinent areas of human quality of life compromising the beneficial influence, prediction and improvement through intelligent knowledge subsequent prevention of cyber-attacks and privacy management would be: breach incidents. Knowledge-driven analytics will  Macro-action analytics to identify cognitive restrict an individual to venture into risky investments, traps of false social requests. dissonance. The goal of this paper is to inculcate the realization  Computational method of automated disease of long term co-existence of human-life with big data, detection. artificial intelligence and deep analytics. Powerful  Social network usage analytics to identify tools, applications and ever increasing knowledge suicidal tendency and psychiatric abnormality. sources will drive human life, its micro and macro conditions for augmenting the human capabilities,  Finding efficacy of prescription drugs in the minimizing the nuisances of infiltratory technologies presence of concept drift. and overall betterment of human experiences.  Identifying wrong or ineffective economic decisions based on spent and requirement 2 Knowledge Management for Human analysis. Quality of Life  Recommendation of personalized retail and financial decisions and plans. We are at the crucial juncture of welcoming the  Big data management by proactive control of knowledge-driven management of our life with the apparent arrival of inflection point of big data analytics data misuse and incorporating proactive data based industry solutions and research outcomes. privacy. Knowledge-driven technologies and applications for  Value alignment to highly automated improving human quality of life will potentially enable intelligence systems to restrict greedy long term human-centric convergence of futuristic outcomes. applications.  Algorithmic fair trading. It is assumed that knowledge-driven analytics,  Deeper personalization by understanding the information management will attempt to ensure positive retail behavior, prognosis trend, sentiment influence for society and quality of life. Broadly, the analysis, drug abuse, online surfing habits and areas would be: managing and analysis of knowledge other related personal studies. for human mental and physical health condition  Patient-specific tailored medication and improvement, maximizing the benefits of social treatment plan. network interactions while minimizing the ill-effects,  Virtual assistant for elderly and infant care. assisting human decision making in financial domain,  Knowledge-driven energy, waste, perishable social network foot-printing, behavioral understanding resource management. and subsequent necessary action recommendation, ensuring personal data privacy preservation, as well as  Artificial intelligence for changing the attempting to address few pertinent questions: Can responsibilities of human workers, where machines understand how are we feeling and act mundane, repetitive, stressful jobs would be accordingly? How will I be alerted before a devastating by robots or other humanoids. financial decision? How can a doctor be given  Game theoretic investigation for conflict augmented knowledge on diagnosis? All of us are resolution of actions in knowledge-driven different. Why are we not given personalized treatment intelligent system. instead of average case treatment plan? How can we  Long term prediction on knowledge driven use big data and knowledge mining for developing human life and society.  Crowd sourcing for knowledge aggregation diseases including arrhythmia, coronary artery diseases and exploiting wisdom of the crowd. using single lead Alivecor ECG sensor attached with a smartphone [Ukil17A]. In this paper, we illustrate two important case studies: Formally the analytics problem to solve the disease 1. Analytics for unobtrusive cardiac condition prediction can be formulated as: identification and inference: ways to minimize loss of human life due to cardiac diseases. Let instance space be , label space be 2. Privacy preserving sensor signal mining: ways , where are the to minimize human value loss due to intended different diseases (for e.g. be Atrial Fibrillation, and unintended privacy breaching attempts. be Coronary Artery Disease, be the normal sinus rhythm) and prediction space be and our model be , such that: 3 Analytics For Unobtrusive Cardiac (1) Condition Identification And Inference: Where, is certain loss function. Ways To Minimize Loss Of Human Life Due To Cardiac Diseases Another vital aspect that needs considerable It is estimated that more than 25% of worldwide attention is to identify distortion and noise in the sensor deaths are due to cardiac ailments. Fortunately, cardiac captured physiological signals. For example, Alivecor diseases are preventable when early signs of cardiac captured single lead ECG contains significant noise health abnormality systems are captured. particularly due to motion artifacts. In order to ensure With the advent of sophisticated body sensors, mobility to the sensing applications and smartphone smartphones and Internet-of-things (IoT), we can being the integral part in the ecosystem, noise affordably capture various fundamental physiological identification and removal play important role for signals, which are definite markers of cardiac health getting acceptably accurate clinical inference. In [Silv], [Fras14]. For example, photoplethysmogram (PPG) can the authors show that physiological signals captured be reliably captured by smartphones, electrocardiogram even at controlled setup like in the ICU (Intensive care (ECG) can be reliably captured by external sensors like Unit) requires signal quality estimation and noise Alivecor [Alive]. AliveCor has developed Kardia heart cleaning action. We have to note that presence of noise monitor that has prediction capability of fatal cardiac would invariably impact the prediction outcomes condition like Atrial Fibrillations [Heart]. In their negatively and consequently false alarm rate would investigation by concerned team, total 1001 persons in increase [Ukil17B]. Heartmate scheme described in vulnerable age group of cardiac diseases (65 years and [Ukil17A] proposed a robust denoising algorithm that more) are studied and disease detection prediction of identifies and eliminates corruption in physiological Kardia outperforms the doctor's capability [Jul]. It is signals like PPG. In [Ukil16], an integrated analysis of well-known that prognosis is significantly better when unobtrusive cardiac health management and remote Atrial Fibrillations is detected early and treated with monitoring system CardioFit is proposed. Authors in appropriate anticoagulation. Such proactive diagnosis [Ukil16], emphasize the aspect of clinical utility will have high probability of decreasing stroke enhancement by physiological signal cleaning and morbidity and mortality. We observe that the entire removing distortion and noise. The complete learning study and analysis were performed ion smartphones, pipeline in data-driven clinical analytics pipeline which encourage the ubiquity of deployment and consists of: building a penetrative eco-systems of cardiac disease  Pre-processing and noise cleaning monitoring.  Feature listing and feature selection In a further study, researchers attempted to predict  Model building the presence of Atrial Fibrillations and other cardiac Apart from pre-processing and noise cleaning, concerned stakeholders like doctors, hospitals or model building; feature listing and feature selection emergency service providers. Such ecosystem solves play a major role for the construction reliable aptly the problem of building cardiac health management fitted learning model with the objective of avoiding partially. In such predictive analytics model, patients overfitting on the training datasets. provide the data, which based on the action by the doctor, results in remote cardiac care. The main crux of Heart sound or phonocardiogram (PCG) is another this system is the complete dependency on the actions vital marker of cardiac health which can conveniently rendered by the human-in-loop. Circumstances may captured using digital stethoscope or smartphone arise when timely action could not be taken. We acoustic sensors. PCG signal is characterized by envisage that prescriptive analytics, where the actions different markers like S1, S2 which are predominant, need to be taken is also part of the analytics system as whereas murmurs, S3, S4 indicate the presence of illustrated in Figure 1. cardiac anomalies. Authors in [Ukil17A] have demonstrated that smart analysis of PCG signal would Prescriptive analytics includes predictive analytics reveal cardiac health condition and prediction of and descriptive analytics on prediction to instruct the cardiac abnormality can be performed by studying PCG patient to take actions. The outcome of the prescriptive signals. Further, in [Ukil17B], noise reduction of PCG analytics engine directly provides the patient with signal is presented. It has been shown that disease advices. Prescriptive analytics systems that reliably prediction model preceded by appropriate noise deliver instructions in healthcare applications are yet to cancellation and removal block results in better clinical be in deployable shape. The development process of utility and higher accuracy of detection. One of the prescriptive analytics involve enormous involvement of significant decision model of clinical analytics is that domain experts (in remote cardiac health management, sensitivity of the model should be ensured very high cardiologists are the domain experts) such that the and which means that presence of cardiac anomaly knowledge is sufficiently captured and a resilient rule will be captured with negligible failure rate, while engine is generated. Natural language processing based specificity is maintained at decent rate, say > 0.8. We techniques can also be employed to build the re-formulate equation (1) for practical model knowledge representation. However, definite methods development purpose as: and systems to construct predictive analytics engine particularly for cardiac health data analysis and patient (2) care instruction-based knowledge building researches Such that: would usher the development of complete cardiac health management with prescriptive and predictive analytics engines. Where, is typically > 0.9. However, the proposed predictive analytics for remote cardiac health management would be useful for the care givers and partially adds value to the patients. The main outcome of predictive analytics like the presence of cardiac abnormality in a patient or the probability of cardiac damage recurrence are meaningful to the doctors, who can immediately provide diagnostic actions. We envision that smartphones with body sensors in the form of smart bands, smart patches will extract the physiological Figure 1. Architecture sketch of the automated signals like ECG, PPG, PCG and analytics would be unobtrusive cardiac care ecosystem catering both either performed locally at the smartphone or at the predictive and prescriptive analytics cloud. The prediction outcome when found important (i.e. cardiac anomaly is detected) is shared to the 4 Privacy Preserving Sensor Signal We depict the architectural sketch of the security Mining: Ways To Minimize Human Value and privacy methods of sensor data analytics management in Figure 2. Firstly, sensor data captured Loss Due To Intended And Unintended by the sensing device is to be securely transmitted with Privacy Breaching Attempts lightweight security implementation to the analytics Human quality of life improvement through platform, which may be at the cloud or locally available knowledge management and analytics largely depend (smartphone). The captured sensor data is securely on sensor signals and data captured through sensing stored and executed by trusted computing setup. human activities. Such data often contains sensitive Further, the sensor data is privacy protected by required information. For example, energy consumption obfuscation and anonymity. The privacy preserved data forecasting for optimal energy generation and carbon is securely transmitted to the users. In fact, there are footprint minimization require smart energy meter data. mainly three aspects of sensor data security-privacy Smart energy meter data contains granular information framework: of inside home human activity, which are private and  Data at transit: sensitive. Privacy breaching attacks on gaining access through Non-Intrusive Load Monitoring (NILM) needs o Lightweight secure transmission to be minimized by detecting the sensitivity content of from the sensing devices to the the shared information [Ukil15]. In [Ukil14A], analytics platform [Ukil14B]. ‘Dynamic Privacy Analyzer’ is proposed that controls o Secure transmission from involuntary leakage of smart meter data. The salient analytics platform to the clients aspects of the proposed solution is that: It is completely (users) [Ukil10]. unsupervised and attempts to find the optimal privacy- utility trade off while obfuscating the private smart  Data at storage: meter data to third parties. o Storage security for secure Traditionally, privacy-preserving data mining is storing and execution of implemented using k-anonymity [Swee02], l-diversity sensitive sensor data [Ukil11]. [Mach07] or other sensitive data anonymization o Privacy preservation of sensitive techniques [Gentry]. However, we need to consider few data before sharing with the of the specific aspects of security and privacy of the clients (users) [Ukil10]. sensor data that capture human activity signatures. For example,  Sensor devices, particularly body sensors are constraint with energy resources. Data transmission energy cost needs to be minimized to maximize the life span of such devices. Data transmission security with minimum energy consumption needs to be achieved using Constrained Application Protocol (CoAP) [Ukil14B].  Sensitivity information requires secure storage and execution at the analytics engine at the analytics platform [Ukil10]. With the help of trusted computing (e.g. Trustzone), sensor data and computation are to be made secure Figure 2. Architecture sketch of the secure and privacy resistant to data stealing attacks [Ukil11]. preserved sensor data analytics Another significant sensor data privacy protection References policy would be privacy-preserving computation, where the analytics function is computed over encrypted data, [Alive] https://www.alivecor.com/ without data being decrypted. Let, , be the data [Heart] https://spectrum.ieee.org/the-human- from sensors and . The analytics function os/biomedical/diagnostics/heart-monitor-for- computes mean of , . The analytics engine receives your-phone-beats-doctors-at-diagnosing-atrial- encrypted data = , = , where is the fibrillation encryption function. The analytics engine can compute [Jul]Julian P.J. Halcox, Kathie Wareham, Antonia mean( , ) from , using homomorphic Cardew, Mark Gilmore, James P. Barry, Ceri encryption technique [Ukil10]. In practice, useful Phillips, Michael B. Gravenor. Assessment of fundamental analytics functions like summation can be Remote Heart Rhythm Sampling Using the computed in real-time through simplistic computational AliveCor Heart Monitor to Screen for Atrial set up [Gentry]. Fibrillation: The REHEARSE-AF Study. Circulation, 136 no. 19 (2017): 1784-1794 [Silv]Silva, Ikaro, Joon Lee, and Roger G. Mark. 5 Conclusion Signal quality estimation with multichannel adaptive filtering in intensive care settings. Knowledge-driven technologies and applications IEEE Transactions on Biomedical Engineering for improving human quality of life will potentially 59, no. 9 (2012): 2476-2485. enable long term human-centric convergence of futuristic applications. We have demonstrated [Ukil17A]Arijit Ukil, Soma Bandyopadhyay, Chetanya exemplary cases of analytics for unobtrusive cardiac Puri, Rituraj Singh, Arpan Pal, Ayan health management and privacy-preserving data mining Mukherjee. Heartmate: automated integrated of sensitive sensor signals. We observe that human- anomaly analysis for effective remote cardiac centric applications work closely with human activities health management. IEEE International and capture human behavior or other related sensitive Conference on Acoustics, Speech and Signal information. Owing to the sensitive nature of such Processing (ICASSP), (2017): 6578-6579. applications, security-privacy framework should be [Ukil16] Arijit Ukil, Soma Bandyopadhyay, Chetanya considered at the initial design time, as an integral part Puri, Rituraj Singh, Arpan Pal, KM Mandana. of the entire application eco-system. Another crucial CardioFit: Affordable Cardiac Healthcare aspect is to incorporate larger network of analytics to Analytics for Clinical Utility Enhancement. fathom the human actions and cognitions. For instance, eHealth 360° (2016): 390 - 396. social networking posts, retail consumption pattern, [Puri17] Chetanya Puri, Rituraj Singh, Soma frequency of visit to physicians may be combined to Bandyopadhyay, Arijit Ukil, Ayan Mukherjee. derive the plan for personalized medication or Analysis of phonocardiogram signals through cognition therapy. We envision that knowledge proactive denoising using novel self- management, sensor signal processing and intelligent discriminant learner. 39th Annual analytics system would immensely impact human life International Conference of the IEEE and the thrust of human-centric application would Engineering in Medicine and Biology Society significantly improve the human quality of life. (EMBC), (2017): 2753-2756. [Ukil17B]Arijit Ukil, Uttam Kumar Roy. Smart Acknowledgments cardiac health management in IoT through Leandro Marin is partially supported by Research heart sound signal analytics and robust noise Project TIN2017-86885-R from the Spanish Ministery filtering. IEEE 28th Annual International of Economy, Industry and Competitivity and Feder Symposium on Personal, Indoor, and Mobile (European Union). Radio Communications (PIMRC), (2017). [Fras14] Fraser, Graham D., Adrian DC Chan, James R. International Conference on Communications Green, and Dawn T. MacIsaac. Automated (ICC) (2015): 536-541. biosignal quality analysis for [Swee02]L. Sweeney. Achieving k-anonymity Privacy electromyography using a one-class support Protection Using Generalization and vector machine. IEEE Transactions on Suppression. Int. J. of Unc. Fuzz. Know. Syst, Instrumentation and Measurement 63, no. 12 (2002): 571 – 588. (2014): 2919-2930. [Mach07]A. Machanavajjhala, D. Kifer, J, Gehrke, and [Puri16] Chetanya Puri, Arijit Ukil, Soma M. Venkitasubramanian. l-diversity:Privacy Bandyopadhyay, Rituraj Singh, Arpan Pal, beyond k-anonymity. ACM Trans. Knowl. Kayapanda Mandana. iCarMa: Inexpensive Disc. Data, vol. 1, issue. 1 (2007). Cardiac Arrhythmia Management--An IoT Healthcare Analytics Solution. First [Ukil12] A. Ukil, J. Sen, and S. Ghosh. An Efficient Workshop on IoT-enabled Healthcare and Distribution Sensitive Privacy for Real-time Wellness Technologies and Systems (2016): Applications. Computer Science and 3-8. Convergence, LNEE, vol. 114, (2012): 81-91. [Gim18] Jangwon Gim, Sukhoon Lee, and Wonkyun [Ukil14B]Arijit Ukil, Soma Bandyopadhyay, Abhijan Joo. A Study of Prescriptive Analysis Bhattacharyya, Arpan Pal, Tulika Bose. Framework for Human Care Services Based Lightweight security scheme for IoT On CKAN Cloud. Journal of Sensors, (2018). applications using CoAP. International Journal of Pervasive Computing and [Puri16] Puri Chetanya, Arijit Ukil, Soma Communications, Volume 10, Issue 4 (2014): Bandyopadhyay, Rituraj Singh, Arpan Pal, 372-392. Ayan Mukherjee, and Debayan Mukherjee. Classification of Normal and Abnormal Heart [Ukil11] Arijit Ukil, Jaydip Sen, Sripad Koilakonda. Sound Recordings through Robust Feature Embedded security for Internet of Things. Selection. IEEE Computing in Cardiology, IEEE National Conference on Emerging Vol. 43 (2016). Trends and Applications in Computer Science (NCETACS), (2011): 1- 6. [Thor13]Thornton, Chris, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto-WEKA: [Gentry09]Craig Gentry. Fully Homomorphic Combined selection and hyperparameter Encryption Using Ideal Lattices. ACM optimization of classification algorithms. In Symposium on Theory of Computing (STOC), Proceedings of the 19th ACM SIGKDD (2009): 169- 178. international conference on Knowledge [Ukil10] Arijit Ukil, Jaydip Sen. Secure multiparty discovery and data mining, (2013): 847-855. privacy preserving data aggregation by [Ukil14A] Arijit Ukil, Soma Bandyopadhyay, Arpan modular arithmetic. IEEE International Pal. Sensitivity inspector: Detecting privacy in Conference on Parallel Distributed and Grid smart energy applications. IEEE Symposium Computing (PDGC), (2010): 344-349. on Computers and Communication (ISCC), [Sen11] J. Sen, S. Koilakonda, A. Ukil. A mechanism (2014): 1- 6. for detection of cooperative black hole attack [Moli10]A. Molina-Markham, P, Shenoy, K. Fu, E. in mobile ad hoc networks. IEEE International Cecchet. and D. Irwin. Private memoirs of a Conference on Intelligent Systems, Modelling smart meter. ACM BuildSys (2010): 61-66. and Simulation (ISMS), pp. 338-343, 2011. [Ukil15] Arijit Ukil, Soma Bandyopadhyay, Arpan Pal. Privacy for IoT: Involuntary privacy enablement for smart energy systems. IEEE