=Paper= {{Paper |id=Vol-1618/FuturePD_paper3 |storemode=property |title=Accuracy and Reliability of Personal Data Collection: An Autoethnographic Study |pdfUrl=https://ceur-ws.org/Vol-1618/FuturePD_paper3.pdf |volume=Vol-1618 |authors=Amon Rapp,Alessandro Marcengo,Federica Cena |dblpUrl=https://dblp.org/rec/conf/um/RappMC16 }} ==Accuracy and Reliability of Personal Data Collection: An Autoethnographic Study== https://ceur-ws.org/Vol-1618/FuturePD_paper3.pdf
   Accuracy and Reliability of Personal Data Collection: An
                 Autoethnographic Study
               Amon Rapp                               Alessandro Marcengo                                Federica Cena
         University of Torino                                 Telecom Italia                          University of Torino
    Computer Science Department                           Via Reiss Romoli, 274                  Computer Science Department
    C.so Svizzera, 185, Torino Italy                           Torino Italy                      C.so Svizzera, 185, Torino Italy
      amon.rapp@gmail.com                       alessandro.marcengo@telecom                              cena@di.unito.it
                                                            italia.it


ABSTRACT                                                                gathered data and on the consequent perceived reliability of the
Accuracy of self-tracking devices is a key problem when dealing         instrument used.
with personal data. Different devices may result in different           We carried out a four-week autoethnographic study to investigate
reported measure, and this may impact on the users’ perceived           how different self-tracking tools may lead to different results in
reliability of the devices they used. We conducted an                   terms of the values of the collected data. The results of the study
autoethnography to investigate how different devices collect data       reveal that: i) the data collected for a specific target parameter
on specific parameter in order to highlight discrepancies in the        were different depending on the tools used, and such difference
measures reported. Results highlight that designers should account      was primarily due to the position in which these instruments were
for the variability of activities that users may face during their      worn and the activities performed during the day by the
daily practices, as each of them may impact on the device’s             ethnographer; ii) the discrepancies among the measures reported
capability of collecting accurate data.                                 by the different tools impacted on their perceived reliability,
                                                                        pushing the ethnographer to seek strategies to account for the data
CCS Concepts                                                            collected.
• Human-centered computing➝Human computer interaction.
                                                                        2. RELATED WORK
Keywords                                                                Various research has studied how users perceive reliability and
Personal informatics;       Quantified     Self;    Personalization;    accuracy of self-tracking instruments. Kay et al. [3] found that
Autoethnography.                                                        users react negatively to the inaccuracies of their devices, while
                                                                        Lazar et al. [4] emphasized that they do care about the accuracy of
                                                                        the data collected, so that failing to produce accurate information
1. INTRODUCTION                                                         is one of the main reason for abandoning a specific device.
Personal Informatics systems are currently appealing a large            Consolvo et al. [1] listed seven different types of errors that a
number of users, spreading beyond the traditional user group of         fitness tracker device may produce during its daily use, such as
Quantified Selfers [6]. Quantified selfers have a deep knowledge        exchanging one activity or another one, completely failing to
of tracking technologies, finding solutions for the possible barriers   detect an activity, or detecting an activity that was not occurred:
that they may encounter during the data collection and                  this kind of errors produces frustration in users, directly impacting
management. However, this is not true for all those people that are     on the instrument’s credibility. While Mackinlay [5] highlighted
interested and curious toward Personal Informatics, and may try         that users put to test their devices’ accuracy, but often find
this kind of technologies for the first time [8].                       difficulties in calibrating them due to the scarce visibility of their
One of the issue that this new user base may encounter is related       status. Finally, Yang et al. [9] outlined the various techniques that
to the bewilderment induced by the different possibilities of           users use to evaluate trackers’ accuracy, emphasizing the different
tracking the same parameter. Thanks to the spreading of multiple        perceptions that they may have of accuracy and reliability.
wearable devices for personal data collection, in fact users can        3. METHOD
now rely on different instruments to measure the same parameter.
                                                                        We used autoethnography to individuate discrepancies among
Each of them has its own physical structure, uses specific
                                                                        diverse trackers and analyze how they may affect the user’s
recognition algorithms and is addressed to be worn on certain part
                                                                        experience. This method considers the ethnographer’s subjective
of the body: and all these elements may affect the reported
                                                                        experience worth to be analyzed and reported, valuable as that of
measures and thus the data collected by the device. The
                                                                        the other individuals. The autoethnographer continuously
differences in the data collected that may result from such a
                                                                        observes herself to account for the reality she is interested to
diversity might impact on the user’s perceived accuracy of the
                                                                        explain [2].
                                                                        The second author self-examined the use of four different
                                                                        wearable devices to compare the data collected and eventually
                                                                        individuate criticalities due to discrepancies in their accuracy
                                                                        and/or reliability. The devices were chosen by taking into account
                                                                        the position in which they are worn, with the goal of exploring the
                                                                        differences in the gathered measures by them.
We selected: Withings Activité on the right wrist; Shine Misfits          should be possible also to advise the user about the best body
necklace; Sony SWR30 on the left wrist; GoogleFit application             location to wear the device in relation to her personal lifestyle.
running background on a Sony Xperia Z3.
The hypothesis was that the recorded data would not be affected
                                                                          5. CONCLUSION
                                                                          Our study emphasizes the need of considering the idiosyncratic
by the influence of the body positioning, all recording
                                                                          activities that users carry out during their daily practices in order
approximately the same data. The self-observation session was
                                                                          to produce more accurate and thus reliable trackers. Activity
carried out for four weeks. We provide here a brief summary of
                                                                          recognition algorithms should be tailored to the specific habits of
the study findings pointing to Marcengo et al. [7] for a more
                                                                          the single individual as these may be the main culprit for the
detailed description.
                                                                          inaccurate reporting of the target parameters. Personalization,
4. RESULTS AND DISCUSSION                                                 thus, should be not only a matter of the services provided by the
Sleep data analysis showed interesting problems related with the          new personal informatics technologies, but also a key requirement
personal style of “going to sleep” in relation with the used device.      for the design and implementation of the modalities for collecting
For instance, the sleep total amount recorded by the Misfit Shine         the data.
(necklace) is always higher of about thirty minutes. This point is
due to the fact that the Shine considers the lying position as the
                                                                          6. REFERENCES
                                                                           1. Consolvo, S., McDonald, D.W., Toscos, T., Chen, M. Y.,
user is already sleeping even if she’s reading a book or watching
her tablet in the bed. So the sleep total amount will always be               Froehlich, J., Harrison, B., Klasnja, P., LaMarca, A.,
increased by the activity performed before falling asleep. The                LeGrand, L., Libby, R., Smith, I., Landay, J. A.: Activity
device with the best accuracy results the one worn on the right               sensing in the wild: a field trial of ubifit garden. In
wrist. This makes possible to distinguish the activities performed            Proceedings of the SIGCHI Conference on Human Factors in
with the right hand while lying in the bed as something different             Computing Systems (CHI '08) 1797–1806 (2008)
from sleeping (for left-handed user the same principle will work           2. Ellis, C., Bochner, A.,: Autoethnography, personal narrative,
for the left wrist).                                                          and personal reflexivity. In Handbook of qualitative research
Also steps showed interesting evidences and relations through life            (2nd. ed.), Norman K. Denzinand Yvonna S. Lincoln (eds.).
style and devices. The total steps amount is very biased by the               Sage, Thousand Oaks, CA, 733-768 (2000)
interaction between the location on the body (if wearable) and the         3. Kay, M., Morris, D., schraefel, mc, Kientz, J. A.: There’s no
activities performed by the user. Indeed, considering the data                such thing as gaining a pound: reconsidering the bathroom
collected by Withings Activité (on the right wrist) it is clear that if       scale user interface. In: ACM international joint conference
the user performed a lot of public talking on a specific day                  on Pervasive and ubiquitous computing (UbiComp '13), 401–
(meetings, showing slides, etc) steps becomes inclined towards                410 (2013)
high figures due to the gestures involved. Opposite results become         4. Lazar, A., Koehler, C., Tanenbaum, J., Nguyen, D.H.;Why
evident according to different life circumstances. In particular              we use and abandon smart devices. In: the 2015 ACM
data became surprisingly low for two conditions. The first one is             International Joint Conference on Pervasive and Ubiquitous
when the user walk pushing a stroller. In this case the device does           Computing (UbiComp '15). ACM, New York, NY, USA,
not log the alternate hanging of the hands and does not see the               635-646 (2015)
activity as walking. The second one occurs if the user carry a
                                                                           5. Mackinlay, M.: Phases of Accuracy Diagnosis:(In) visibility
moderately heavy bag (e.g. a small suitcase) depending which
                                                                              of System Status in the Fitbit. Intersect: The Stanford Journal
hand holds the bag.
                                                                              of Science, Technology and Society 6, 2 (2013)
If the steps are collected by a phone app even more life situation         6. Marcengo, A., Rapp, A.: Visualization of Human Behavior
distortions become evident toward low figures because of all the              Data: The Quantified Self, in Huang L. H. and Huang, W.
occasions when the phone is not on the body (e.g. weekend,                    (Eds.) Innovative approaches of data visualization and visual
sports, home, etc.). This, in a minor evident manner, is also true
                                                                              analytics. IGI Global, Hershey, PA, 236-265 (2013)
also for wearable devices. On the weekend all data appears
                                                                           7. Marcengo, A., Rapp, A., Cena, F. The Falsified Self:
distorted by incomplete or peculiar usage of the device due to
                                                                              Complexities in Personal Data Collection. To appear in
different life activities (i.e. working in the garden, playing with
kids, etc.).                                                                  Proceedings of HCI International ’16, Springer, 2016.
                                                                           8. Rapp, A., Cena, F.: Self-monitoring and Technology:
From these evidences some needs of personalization in the design              Challenges and Open Issues in Personal Informatics. In: HCI
of logging devices and apps emerge. Manufacturers need to                     International. Universal Access in Human-Computer
consider different designs for different life styles brought by
                                                                              Interaction. Design for All and Accessibility Practice.
different types of users with different life patterns (e.g. watching
                                                                              Lecture Notes in Computer Science, Volume 8516, 2014,
videos in the bed, walking with a stroller, carrying a bag,
gesturing a lot, etc.). These patterns could be compressed into a             613-622 (2014)
few personas that can lead to different declinations of the same           9. Yang, R., Shin, E., Newman, M. N., Ackerman, M. S.: When
device or slightly different tracking algorithms on the same                  fitness trackers don't 'fit': end-user difficulties in the
device. This personalization may be transferred directly into the             assessment of personal tracking device accuracy. In: the 2015
user experience by collecting specific aspects and habits that                ACM International Joint Conference on Pervasive and
impact on the accuracy of the logging system. In certain case                 Ubiquitous Computing (UbiComp '15). ACM, New York,
                                                                              NY, USA, 623-634 (2015)