=Paper=
{{Paper
|id=Vol-1584/paper11
|storemode=property
|title=Comparison of Biometric Authentication Software Techniques:
GEFE vs. Angle Based Metrics
|pdfUrl=https://ceur-ws.org/Vol-1584/paper11.pdf
|volume=Vol-1584
|authors=Robert Stokes,Angelica Willis,Kelvin Bryant,Zanetta Tyler,Anthony Dobson
|dblpUrl=https://dblp.org/rec/conf/maics/StokesWBTD16
}}
==Comparison of Biometric Authentication Software Techniques:
GEFE vs. Angle Based Metrics==
Robert Stokes et al. MAICS 2016 pp. 75–80 Comparison of Biometric Authentication Software Techniques: GEFE vs. Angle Based Metrics Robert Stokes, Angelica Willis, Kelvin Bryant, Zanetta Tyler, and Anthony Dobson Department of Computer Science North Carolina A&T State University 1601 E Market St. Greensboro, NC 27411 kingstokes@gmail.com, awillis@aggies.ncat.edu, ksbryant@ncat.edu, zrtyler@aggies.ncat.edu, amdobson@aggies.ncat.edu Abstract and touch screen interaction. The main benefits of biomet- rics is that they are difficult to mimic and they have an In this paper, we explore three alternatives for developing a bio- advantage over password authentication in that they are not metric authentication software system. The first approach we will susceptible to being cracked (via dictionary attacks or brute consider is a computer vision technique optimized by Genetic and force attacks), lost, or stolen [11]. Evolutionary Feature Extraction (GEFE); the second is Angle An emerging application of biometrics is active authen- Based Metrics (ABM); and the third is Angle Based Metrics tication (AA). Active authentication is a way of continu- combined with Genetic and Evolutionary Computation (ABM + ously authenticating or verifying a user’s identity during a GEC). Each of these techniques are research areas which show session. Typically, a user is only authenticated at the be- promise in regards to being able to authenticate users based on their natural mouse movements. When applied to the same data ginning of a session. If the user steps away from the com- set, the results of our experimentation indicate that both the ABM puter or if the session is hijacked then the secured assets and ABM + GEC techniques are more accurate than GEFE in are vulnerable to exploitation. Active authentication at- correctly verifying genuine users, as well as correctly rejecting tempts to continually verify that a user’s biometric patterns impostors. (human to computer interactions) are consistent with those demonstrated during their previous sessions [3]. The goal is to determine whether or not the current user is an im- Keywords – Biometrics, genetic and evolutionary feature extrac- poster or the original authenticated user. tion (GEFE), angle based metrics In this paper, we compare three different approaches to implementing biometric authentication using mouse movement. The first approach uses Genetic and Evolution- Introduction ary Feature Extraction (GEFE) [1] to optimize computer Biometric systems are able to authenticate or identify peo- vision and evolutionary computation techniques. The sec- ple based on physiological or behavioral characteristics ond approach, called Angle Based Metrics (ABM) [15], which are unique for each person [5]. As biometric systems uses angle analysis in order to extract features and distin- become increasingly accurate, they will be selected more guish between valid users and impostors. And the third often as the option of choice for authentication, intrusion approach, called ABM+GEC is an enhanced version of detection, or access control within software systems. One ABM which utilizes a genetic and evolutionary computa- of the most useful applications for biometrics is user au- tion (GEC) technique in order to reduce the size of the ex- thentication. Authentication is a way to prove that a user is tracted feature set. Though both GEFE and ABM+GEC who they claim to be. In most systems, authentication in- use evolutionary computation as a method of improving volves asking a person to prove who they are by what they the efficiency and success of their root techniques, they are know – such as a username and password combination [9]. completely independent approaches. Biometric authentication attempts to carry out the verifica- In addition to exploring how these three approaches tion process based on analysis of characteristics that are compare, we also present evidence that GEC is a valuable unique to a given individual. Physiological biometrics in- method of reducing the complexity of systems like ABM, clude analysis of characteristics such as fingerprint, iris, or by eliminating irrelevant data from consideration, thus in- facial features. Behavioral biometrics focus on the way in creasing the efficiency and feasibility of Active Authenti- which users interact with their computer device. Some cation. The true acceptance rate (TAR) and false ac- examples are mouse movements [8], keystroke rhythm, 75 Robert Stokes et al. MAICS 2016 pp. 75–80 ceptance rate (FAR) results for all three techniques were GEFE uses a genetic algorithm in order to select the computed using the same data set. The rest of the paper is best feature extractor no matter how many features are as follows. The next section describes GEFE. Following designated per patch [12]. This means that the genetic al- the GEFE section, ABM is introduced. Next, a discussion gorithm will be able to optimize the feature set to ensure of how the GEC was combined with ABM is presented, that only the significant features are included in the feature followed by a section that presents the advantages and dis- vector. The size of the patches, the center of each patch, advantages of AMB and GEFE. The last three sections and which patches should be included in the feature vector describe how the experiment was conducted, present a are all decided by the genetic algorithm which evolves the comparison of the results and, finally, present conclusions feature extractor as the algorithm is run repeatedly. In con- and future work. trast, the generic LBP method uses non-overlapping, uni- form sized patches for matching. The process of "evolving" a feature extractor is accom- GEFE plished via the Estimation of Distribution Algorithm The GEFE technique involves the use of algorithms which (EDA). An EDA will select a specified number of elites have been adopted from the fields of Evolutionary Compu- (candidate solutions with the best fitness) to be automati- tation and Computer Vision in order to be able to classify cally included in the next population iteration. The re- images [4]. The path of each mouse movement is recorded maining offspring in the population will be generated by using the (x, y) screen coordinates and then saved as an choosing a subset of the current population to be used to image file. The image is then analyzed in a similar bio- create a probability distribution function (PDF). The PDF metric manner as a facial image. Images are compared by is then sampled to generate the remaining offspring for the using image processing techniques to extract features. It is next population. important that the features extracted are useful in distin- The feature vectors for each mouse movement session guishing one image from another. GEFE uses Local Binary of a given user will be stored in a profile, and new move- Pattern (LBP) [10] for extracting features from the images ments can be compared to the profile of a user to determine and storing them into feature vectors/templates. These fea- if the distance is within a certain threshold. This technique ture vectors allow images to be mathematically compared allows users to be authenticated (based on their mouse to one another to determine how similar they are. Tradi- movements) with a fairly high accuracy rate. tionally, the comparison is accomplished by utilizing a distance metric (e.g. Euclidean Distance or Manhattan Dis- tance) to determine how close the images are to each other ABM [7]. Angle Based Metrics [15] is an approach to designing a LBP works by dividing an image canvas into rectangu- biometric system that focuses on the angles that are gener- lar regions called patches. Within each patch, the LBP al- ated by the mouse movements of a user. The angles are gorithm will iteratively select each interior pixel as a center used to derive useful features or metrics which may be pixel. Next, the intensity value of the center pixel is com- used to distinguish one user from another. The main ad- pared with its neighboring pixels in order to generate a vantage of this approach is that it works well even if the texture pattern (bit string) for a given pixel. For each user’s hardware or computing environment changes from neighboring pixel, if the grayscale value is greater than the one session to the next. center pixel's grayscale value then a 0 bit is generated; oth- erwise, a 1 bit is generated. For each center pixel, an 8 bit As with most biometric systems, the Angle Based Met- binary string is generated that denotes the relationship be- rics approach is comprised of four different components: tween the center pixel's grayscale value and that of the 8 Recorder, Preprocessor (feature extractor), Classifier, and neighbors (top, top right, right, bottom right, bottom, bot- Decision Maker. The Recorder is the simplest of these tom left, left, top left). Each patch is then treated as a his- components and is positioned on the client side of an ap- togram where the different bins consist of all the texture plication to capture user mouse movement events and send patterns or bit strings that are possible. The strings for each that data to the Pre-processor. The Preprocessor executes patch are concatenated in order to form feature sets or fea- on the server side and is responsible for translating the data ture vectors. it receives from the Recorder into valuable metrics. There It is possible to designate the number of features that are 3 metrics which our Pre-processor calculates from the are included in the extracted feature set of a given mouse mouse coordinates and mouse clicks: the direction angle, movement session. For example, GEFE-56 uses feature curvature angle, and the curvature distance ratio. These sets of size 56 (per patch) while GEFE-256 uses feature metrics are calculated by examining groupings of 3 points - sets of size 256 for each patch. - in the order in which those points were visited by the user’s mouse movement. Thus point A is visited before 76 Robert Stokes et al. MAICS 2016 pp. 75–80 point B, and point B is visited before point C (See figure tures of that data in order to determine whether the move- 1). ments belong in the same grouping/classification with the x The direction angle (1) is the angle measured from other movements in the user’s training model/profile. a horizontal line to the line AB. Line AB is Another classification technique utilizes the Normalized formed by traveling from the first point in the Manhattan Distance (NMD). NMD is calculated by taking group of 3 to the second point. the sum of the differences between two feature-sets (where x The curvature angle (2) is the angle ABC where A, each feature set is simply a list of percentages or floating B, and C are consecutive points read into the Pre- point numbers) divided by the total number of features. For processor from the Recorder. the purposes of our own analysis, NMD was the chosen method for comparison and classification. The NMD val- x The curvature distance (r) is as follows: for a line ue represents how close mathematically a template is to AC, let point Z be the point located from B to AC those in a user’s profile/training set. That value is sent to that is perpendicular to AC. Then the curvature the Decision Maker component. distance is the ratio BZ/AC. The Decision Maker is the component that is tasked with deciding whether the actions being generated by a user’s session are similar enough to those movements C saved under the user’s profile to be considered a match. B One way to do this is to establish a threshold value in order to be able to accept or reject a feature set based on the 2 NMD value. Another approach is to utilize a SVM to de- termine whether or not a feature set may be classified with the other feature sets known to belong to a given user. The Z SVM will output a decision value to accept or reject, and 1 that information may be utilized by the security mecha- nisms within a larger system in order to determine if a user needs to be prompted to re-authenticate or not. A ABM + GEC Figure 1: Illustration of Angle Based Metrics All of the main components of the ABM + GEC approach are consistent with that of ABM. In fact, ABM + GEC can The metrics calculated in the Preprocessor are orga- be considered an optimized version of ABM. Upon the nized as a cumulative distribution function (CDF), with initial implementation of the ABM system, it was observed intervals of direction angle (x), curvature angle (y), and the that the greatest experimental results were achieved when curvature distance/ratio (r). The CDF is a mathematical the CDF bin sizes for the x, y, and r metrics were set to model that illustrates which percentage of a user’s metrics very small values. However, this presented a practicality fall within a given range of values. The percentage values problem because decreasing the bin sizes results in an in- within each CDF bin (interval) are what help to distinguish crease in the number of features. This is due to an idea one user from another and are referred to as “features”. The known as the curse of dimensionality, where it can be said collection of all the features for a given session of user that, as the number of dimensions in a vector problem in- action is referred to as a feature set or template. The feature creases, so does the complexity of the problem, and there- sets are used as input to the Classifier component of the fore, the time devoted to solve the problem increases as ABM system. well. The natural relationship between the interval sizes The main task of the Classifier is to be able to tell and the magnitude of the feature set is an inversely propor- whether or not a feature set or group of feature sets belong tional relationship, and so, as the size of the intervals de- to a given user or not. There is more than one way to im- creased, the size of the feature set grew profoundly. For plement the Classifier. One way is to utilize a support vec- example, when using x and y intervals of .05, the feature tor machine (SVM). A support vector machine is a ma- set contained 2683 features. Because features represent the chine learning component often used for classification vector dimensionality of the authentication problem, this [14]. A SVM will take in a group of feature sets derived meant the system incorporated 2683 dimensions, and cre- from a user and utilize them to create a training model of ated an authentication environment that was very slow and the user’s mouse movement characteristics. Then, whenev- difficult to manage. To solve this problem, a genetic algo- er new mouse data arrives, the SVM can compare the fea- rithm toolset called X-TOOLSS [13] was used. The objec- 77 Robert Stokes et al. MAICS 2016 pp. 75–80 tive for using X-TOOLSS was to optimize the system by Pros and Cons of ABM and GEFE evolving new, smaller feature sets with larger intervals that could produce similar results -- in terms of authentication One of the major benefits of both the ABM (including accuracy -- as the .05 intervals. In addition, X-TOOLSS ABM+GEC) and GEFE approach to software biometrics eliminated redundant features which were non-essential to and active authentication is that these techniques are able authentication. This process is called feature masking. to effectively verify a user’s mouse movements across dif- ferent platforms without losing a significant amount of X-TOOLSS uses genetic algorithms (GAs) that, based accuracy due to differences in hardware devices [15]. This on the “survival of the fitness” concept, develop optimal is a major benefit over other metric approaches, such as solutions for many types of parametric software systems. speed and acceleration that are affected by the user’s oper- In this case, the feature masks and interval (bin size) com- ating system as well as the mouse or the screen resolution binations were designated as candidates. The GA evolves [6]. Speed and acceleration are also poor metric choices a population of candidate solutions by first generating ran- due to the endless possibilities of situational diversity. For dom candidates and assigning fitness values to feature ex- example, a user may quickly make a decision to advance tractors implementing different versions of those candi- toward and click a submit button, yet the same user may dates. Depending on the type of genetic algorithm being slowly advance and then pause before clicking a hyperlink used, different methods are employed to create offspring on a text-rich web page such as a wiki article. from high-fitness “parent” candidates, and introduce those offspring into the next generation of the candidate popula- Another benefit of the ABM authentication approach tion as a whole. Fitness values were calculated using the over other authentication techniques lies in its generated authentication accuracy of the candidate system (explained data’s minimal impact on user privacy. In the hands of a further as the Cumulative Match Curve (CMC) in the malicious culprit, mouse movement data would be of little Comparisons and Results section). For the ABM + GEC use, as such data would not lend itself to reproduction. The system, a Steady-State GA was used, which stipulates that mouse dynamics of a user can be compared to a signature; adding the offspring candidates to the population can only however, unlike the forging of a signature, where authenti- occur when those children have a higher fitness value than cation is carried out once, an impostor would be required their parents. Therefore, the population size remains con- to continuously mimic the genuine user’s biometric behav- stant, or steady, throughout the evolution process. ior throughout the duration of the session [2]. The x, y and r intervals were evolved using double- One possible hindrance that could be encountered by precision 64-bit floating point values, between a range .5 ABM authentication involves genuine users who undergo and (large enough intervals to produce a more manageable sudden biometric behavioral changes that render them un- volume of features), a population size of 20 individuals, a able to match up to their former biometric profiles. For Crossover Usage Rate of 1.0, a Mutation Usage Rate of example, a user could sustain a wrist fracture, causing a 1.0, a Mutation Range of .2, with 1000 total evaluations. sudden change in mouse movement dynamics. Such occur- These settings evolved new x, y, and r intervals of 6.024, rences, though rare, would possibly require intervention by 1.0, and 20.0 respectively. As for the feature mask evolu- system administrators to ensure the user is not falsely re- tion, the range was limited to the integers 0 and 1, and was jected from the system. applied to each feature in the template, representing either “on” (1) or “off” (0) for that corresponding feature. All Experiment other parameters for the Steady-State GA were the same as the interval optimization, save the number of total evalua- The experiment that we developed was closely related to tions, which was 1000. The average results are based off of the experiment performed by J. Shelton et al. [12]. The 10 runs of the GEC. mouse pointer was automatically centered on the screen The evolution of the ABM system produced a remarka- and users were instructed to move the mouse in order to ble complexity reduction from a 2683-dimensional system bring up the login box. The subjects were unaware of the to a 283-dimensional system, using interval evolution, and purpose of the experiment. then even further to a 150-dimensional one using the We obtained and utilized the same data set used by evolved feature mask. This resulted in an overall decrease Shelton. The data consisted of mouse movements collect- in complexity of about 94.4%. The evolved system is far ed for 16 unique subjects. Each subject had a “profile” faster and more practical for real-world implementation; comprised of 10 different sessions or sequences of mouse not only did the efficiency of the authentication system movements. Our experiment was to take a sequence (tem- show improvement, the overall accuracy of the authentica- plate) from any user and compare it with the profiles of all tion improved as well (See Comparisons and Results sec- other users including the “self” profile to see if we could tion). authenticate or verify a user based solely on their move- 78 Robert Stokes et al. MAICS 2016 pp. 75–80 ment pattern. The comparison was based on calculating the etc. The percentages on the CMC chart were calculated by NMD between a single sequence and all of the other se- letting every template in the population serve as the probe quences in each profile. And based on a certain threshold exactly one time. For a given rank, the percentage includes value that we set for the NMD we were able to accept or all the matches which were produced using x number of reject each sequence as belonging to the owner of a certain probes where x is less than or equal to the rank number. So profile or not. We were able to analyze the TAR and the rank 3, for example, includes the percentage of probes that FAR for ABM, ABM+GEC, and GEFE. found a match within 1, 2, or 3 attempts. A match occurs when a probe is compared with the population gallery and the template discovered to be closest in distance from the Comparison and Results probe belongs to the same subject as the probe. If any at- Our experimental results consist of the following catego- tempt to find a match results in discovering a template that ries: FAR, FRR, TAR, and the threshold. Note that the is closest in distance to the probe but belonging to a differ- threshold is the independent variable but the results are ent subject, this is a “miss”. After any miss, we removed also influenced by the interval that we utilized for the x, y, all the templates from the population which belong to the and r bins (representing the direction angle, curvature an- subject which caused the miss. gle, and curvature distances respectively) in the CDF that generates the feature vectors. We selected a single tem- plate which we designated as a probe and we used all the remaining templates as our gallery set. The probe was then compared to every template in the gallery and if the NMD for probe and gallery member was less than or equal to the threshold value then this would count as an acceptance. True acceptances were those cases where both templates being compared belonged to the same subject and the NMD was below the threshold. A false acceptance oc- curred if the NMD for probe and gallery template was be- low the threshold but the templates did not belong to the Figure 2: ROC results for ABM, GEFE, and ABM +GEC same subject. And a false reject occurred if the NMD value was above the threshold but the templates were both from the same subject. We iterated through and allowed each of the 160 templates in our data set to have their chance to act as the probe and then designated the remaining 159 tem- plates as our gallery set for each iteration. As we increased the threshold, the TAR value continued to increase towards 100%. Our best results were the ones that minimized FAR and FRR while maximizing TAR. When we set the threshold at .081, it yielded a TAR of approximately 70%, a FAR of approximately 42% and FRR of 30%. Likewise, while using a threshold of .0161 we calculated TAR of 90%, a FAR of 74% and a FRR of 10% (See Figure 2). These results are significantly better than what was Figure 3: CMC results for ABM, GEFE, and ABM +GEC achieved with GEFE. When the TAR for GEFE (specifi- cally GEFE-256) approaches 80%, it yields a FAR 76%, and when the TAR reaches 90% it yields a FAR which is The CMC results show that though GEFE has a consid- close to 90% as well. erably higher rank 1 accuracy of 43.75%, compared to We also computed a Cumulative Match Characteristic ABM’s 25.0% rank 1 accuracy, ABM begins to substan- (CMC) in order to analyze the ABM technique. The CMC tially outrank GEFE from rank 3, and beyond, including uses a single template as a probe and the remainder of the double digit differences in accuracy beginning with rank 4. templates from all subjects (including self) in the popula- (See Figure 3 CMC Chart). ABM + GEC further widens tion as the gallery. The CMC applies a rank for each probe the accuracy gap, by matching GEFE’s 43.75% rank 1 ac- to determine the percentage of templates which are able to curacy and greatly outperforming every other rank for GE- find a match which belongs to the same subject on the first FE, including double digit percentage leading from rank 3 probe (rank 1), second probe (rank 2), third probe (rank 3), and on. 79 Robert Stokes et al. MAICS 2016 pp. 75–80 Conclusions and Future Work (Chapter 17), Intelligent Control Systems Using Soft Computing Methodologies, A. Zilouchian & M. Jam- Based on the results we have tabulated and displayed in the shidi (Eds.), pp. 365-380, CRC press. ROC and CMC curves, it appears that the ABM + GEC 5. K. Jain, L. Hong, and S. Pankanti, “Biometric Identifi- cation” Commun. ACM 43, 90-98, 2, 2000. technique is more accurate and more effective as a soft- 6. Z. Jorgensen and T. Yu. On mouse dynamics as a be- ware biometric approach when compared to the GEFE. In havioral biometric for authentication. In Proceedings of addition, ABM+GEC is able to accomplish higher accura- the 6th ACM Symposium on Information, Computer cies than standard ABM although using a significantly and Communications Security, ASIACCS ’11, pages lower number of features. 476–482, 2011. 7. Malkauthekar, M. D. "Analysis of euclidean distance Future work needs to be done in order to improve both and Manhattan Distance measure in face recognition", the GEFE and ABM + GEC techniques if either strategy is International Journal of Computer Science and Engi- going to become applicable to the mainstream authentica- neering (IJCSE) ISSN(P): 2278-9960; ISSN(E): 2278- tion. Each approach will have to decrease the FAR while 9979 Vol. 3, Issue 4, July 2014, 89- 98. maintaining a high TAR. Also, the entire system needs to 8. Nazirah Abd Hamid; Suhailan Safei; Siti Dhalila and be modified and tested in a real time environment in order Mohd Satar. “Mouse Movement Behavioral Biometric Systems”; Kuala Terengganu, Malaysia. to better evaluate the feasibility of the technique for de- 9. L. O'Gorman, “Comparing Passwords, Tokens, and Bi- ployment in a production setting. The evolutionary com- ometrics for User Authentication,” Proc. IEEE, Vol. 91, putation that GEFE and ABM+GEC undergo can both take No. 12, 2019-2040, 2003. hours to run depending on the algorithm parameters. How- 10. Timo Ojala; Matti Pietikainen; Topi Maenpaa. “Multi- ever, each system can be viewed as a feature "update" al- resolution Gray-Scale and Rotation Invariant Texture gorithm which would run as a background component to Classification with Local Binary Patterns”. 11. Douglas A. Schulz, MOUSE CURVE BIOMETRICS, an AA system, as new data becomes available, to maintain Pacific Northwest National Laboratory, U.S. Depart- optimal accuracy. Therefore, there should be little impact ment of Energy on user experience due to the speed of completion. 12. Joseph Shelton; Joshua Adams; Derrick Leflore and Furthermore, we would like to test the system on a larger Gerry Dozier. “Mouse Tracking, Behavioral Biomet- pool of users in order to see how that affects the accuracy rics, and GEFE”; In Proceedings of IEEE Southeastcon; 2013, p1-6, 6p. measurements. Some things to consider in a real time ac- 13. Tinker, M. L., Dozier, G. & Garrett, A. (2010). The ex- tive authentication (AA) system also include: how many ploratory toolset for the optimization of launch and templates should be stored in a user’s profile during train- space systems (X-TOOLSS). Available online: ing phase; and how long should each template remain in http://nxt.ncat.edu/. profile before being “aged out” by new templates. 14. S. Tong. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM inter- national conference on Multimedia, 2001. Acknowledgements 15. Nan Zheng; Aaron Paloski; Haining Wang. “An Effi- cient User Verification System via Mouse Movements”; We would like to thank Dr. Gerry Dozier and Joseph Shel- Williamsburg, VA, USA. ton for their consultation on the technical methodology behind prior GEFE research at North Carolina A&T State University. References 1. J. Adams, D. L. Woodard, G. Dozier, P. Miller, G. Glenn, K. Bryant. "GEFE: Genetic & Evolutionary Fea- ture Extraction for Periocular Based Biometric Recog- nition," Proceedings 2010 ACM Southeast Conference, April 15-17, 2010, Oxford, MS. 2. A. A. E. Ahmed and I. Traore. A new biometric tech- nology based on mouse dynamics. IEEE Transactions on Dependable and Secure Computing, 4(3):165–179, 2007. 3. Ingo Deutschmann and Johan Lindholm. “Behavioral biometrics for DARPA’s active authentication pro- gram”. BIOSIG 2013: 225-232. 4. Dozier, G., Homaifar, A., Tunstel, E., and Battle, D. (2001). “An Introduction to Evolutionary Computation” 80