<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Enhancement of Low-Level Classifications for Ambient Assisted Living</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rachel</forename><surname>Goshorn</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Systems Engineering Department</orgName>
								<orgName type="institution">Naval Postgraduate School</orgName>
								<address>
									<country key="US">U.S.A</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Deborah</forename><surname>Goshorn</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Computer Science and Engineering Department</orgName>
								<orgName type="institution">University of California</orgName>
								<address>
									<settlement>San Diego</settlement>
									<country key="US">U.S.A</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mathias</forename><surname>Kölsch</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Computer Science Department and MOVES Institute</orgName>
								<orgName type="institution">Naval Postgraduate School</orgName>
								<address>
									<country key="US">U.S.A</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Enhancement of Low-Level Classifications for Ambient Assisted Living</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">1B0E7713506E294C82B19E446D75B5D3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:07+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>human-computer interaction</term>
					<term>hand postures</term>
					<term>hand gestures</term>
					<term>user interface</term>
					<term>ambient intelligence</term>
					<term>posture recognition</term>
					<term>gesture recognition</term>
					<term>smart environments</term>
					<term>computer vision</term>
					<term>human body tracking</term>
					<term>hand gesture behaviors</term>
					<term>Ambient Assited Living (AAL)</term>
					<term>interpreted computer commands</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Assisted living means providing the assisted with custom services, specific to their needs and capabilities. Computer monitoring can supply some of these services, be it through attached devices or "smart" environments. In this paper, we describe an ambient system that we have built to facilitate non-verbal interaction that is not bound to the traditional input means of a keyboard and mouse. We investigated the reliability of hand gesture behavior recognition, from which computer commands for AAL communications are interpreted. Our findings show that hand gesture behavioral analysis reduces false classifications while at the same time more than doubling the available vocabulary. These results will influence the design of gestural and multimodal user interfaces.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In a variety of situations gestural communication is either preferable over verbal communication or advantageous if used in a multimodal combination with voice. For example, noisy environments might render voice recognition unreliable. People with speech impairments might have difficulties communicating verbally. And some intentions are best communicated multimodally, best illustrated in Bolt's "Put That There" elaboration <ref type="bibr" target="#b0">[1]</ref>. In addition, people may be bedridden or elderly living alone at home, and may need assistance in communicating and controlling various devices.</p><p>Through behavior analysis of human hand movements over time, observed through vision sensor data, interpretation of communication commands can be carried out and communicated for human computer interaction to enable smart environments. A great need for smart environments exists with those needing assistance, such as the elderly at home, others bedridden, etc. In this, vision sensor data is providing the means to become aware of the surrounding environment, "ambient intelligence", and through hand gestures, providing the human computer interaction, smart environments are enabled. Combining these two worlds of "ambient intelligence" and "smart environments", those needing help at home are assisted; in other words "ambient assisted living (AAL)" is provided. In AAL, people can for example be assisted in turning devices on and off, carry out phone calls (e.g. emergency calls), change the television channel, etc. Fig. <ref type="figure" target="#fig_0">1</ref> shows the overall systems view of AAL discussed in this paper. If these people needing assistance could communicate commands to devices, using their hands, it would remove the need for tools and remote controls, and allow for hands free communications. If remote controls were required, and for example these fell under the bed where a person is bedridden, they would need some way to communicate. This paper will demonstrate the high level classification of hand gesture behaviors, based on sequences of hand postures over time. The hand gesture behaviors are then interpreted as various control commands for various computer devices. The levels of analysis for AAL, are shown in a pyramid process in Fig. <ref type="figure" target="#fig_1">2</ref>. In this paper, we demonstrate the use of hand gestures as a robust input for AAL. Based on an alphabet of atomic hand postures illustrated in Fig. <ref type="figure" target="#fig_2">3</ref>, we compose a small "gesture" vocabulary composed of posture sequences. The postures are observed with ceiling-mounted cameras and recognized with the "HandVu" library. We then use a robust hand gesture behavior classification method <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b3">4]</ref> to distinguish the gestures. In an AAL system, these hand gesture behaviors are then interpreted as computer commands to control various devices of interest. In an experiment, we demonstrate improved recognition performance over individual hand postures, and providing additional computer commands per gesture (versus being limited in commands to the fixed set of postures), thus making the system robust for application in AAL. The overview of the AAL systems experiment focus can be seen in Fig. <ref type="figure" target="#fig_3">4</ref>. After reviewing the related work in the following section, Sec. 3 will introduce the posture recognition library, the robust hand gesture classification method for syntactic analysis is descried in Sec. 4 and the experimental design and results in Sec. 5. The last two sections cover the summary and conclusions. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Smart environments have long attracted people's interests. In many ways, rooms have become more aware of its inhabitants: consider motion-activated doors, lights that turn off automatically if noone is present, and even the thermostat "senses" the environment. For active user interaction with the environment, there's the clap switch that turns on and off electricity to your favorite household appliance at the clap of a hand. However, networked systems that can continuously monitor and react to a person's state are still the dreams of researchers.</p><p>One of the earliest demonstrations of the of gestural and multimodal humancomputer interaction was Bolt's 1980 "Put That There" article <ref type="bibr" target="#b0">[1]</ref>. An early user interface implementation using temporal gestures was shown by Pausch and Williams <ref type="bibr" target="#b16">[17]</ref> in 1990, making use of a tracked data glove. Various researchers and now also commercial products employ handheld devices instead of bare-hand gestures: The XWand <ref type="bibr" target="#b23">[24]</ref>, a stick the size of a remote control, enables natural interaction with consumer devices through gestures and speech. One can point at and gesture or speak commands to, for example, your TV or stereo. Earlier, Kohtake et al. <ref type="bibr" target="#b9">[10]</ref> showed a similar wand-like device that enabled data transfer between consumer appliances such as a digital camera, a computer, and a printer by pointing the "InfoPoint" at it. Probably the most popular device for gestural HCI as of late is Nintendo's "Wii Remote" which makes use of sensors in its game controller to estimate position and orientation.</p><p>Computer vision as sensing technology has advantages over sensors embedded in handheld devices due to its silent, unobtrusive operation that enables unencumbered interaction. (Your hands are not likely to get lost between the sofa cushions.) There is a vast body of research on hand detection, posture recognition, hand tracking, and trajectorybased gesture analysis. Early work on utilizing computer vision to facilitate humancomputer interaction in relatively unconstrained environments include work by Freeman et al. <ref type="bibr" target="#b2">[3]</ref>. In their implementation, a variety of image observations including edges and optical flow allow distinction of hand gestures and full-body motions. Ong and Bowden <ref type="bibr" target="#b15">[16]</ref> show methods for real-time hand detection and posture classification. 3D pointing directions as discussed in <ref type="bibr" target="#b0">[1]</ref> can be determined with methods such as those described by Nickel et al. <ref type="bibr" target="#b14">[15]</ref>. Recognition of a select vocabulary of the American Sign Language using temporal hand motion aspects has been demonstrated by <ref type="bibr">Starner and Pentland [19]</ref>. Wachs et al.'s Gestix <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref> combines many of these technologies to create a user interface suitable to interaction in antiseptic surgical environments where keyboards can not be used. An analysis of and recommendations for using gesture recognition for user interaction can be found in a book chapter by Turk <ref type="bibr" target="#b19">[20]</ref>.</p><p>Behavior classification has grown to be a popular area of research in the computer vision area. There are several approaches to classifying high-level behaviors of visionbased data. If taken the approach of decomposing behavioral classification into two stages: (1) first, a low-level event classifier based on features of raw video data, and (2) a high-level behavioral classifier based on sequences of events that were outputted from the first stage, then a behavior classifier can be thought of as a classifier of sequences of symbols (events). Sequential classification has been an interest in several applications such as genetic algorithms, natural language processing, speech recognition, and compilers. For computer vision applications, behavior classification has been mostly popular using state-space models, like Hidden Markov Models (HMMs). However, when certain sequences of low-level data inherently fall into meaningful behaviors, Ivanov and Bobick <ref type="bibr" target="#b6">[7]</ref> conclude that using syntactic (grammar-based) structural approaches can outperform more statistically-based approaches such approaches as HMMs . Ivanov and Bobick <ref type="bibr" target="#b7">[8]</ref> use a stochastic context-free grammar for behavior modeling (hand gesture and behavior modeling in car parking).</p><p>Others have designed specific deterministic finite state machines for a for classifying behaviors such as airborne surveillance scenarios <ref type="bibr" target="#b1">[2]</ref>, but are not able to handle noisy data. To fix this problem, augmented finite state machines can be used, so that noisy data can still be parsed and accepted by each finite state machine representing behavior. In <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>, such a novel robust sequential classifier is developed and proved to classify behaviors on noisy data in various applications, such as modeling behaviors in freeway traffic, human behavioral patterns in a lab room, and signal processing patterns seen in communications channels for distinguishing types of transmitted signals. An extended version of this classifier <ref type="bibr" target="#b5">[6]</ref> is used to classify hand gestures based on three hand postures, assuming the hand postures were classified ahead of time and were heavily mislabeled. The data was simulated with to be heavily noisy to demonstrate the classifier's robustness and ability to correct errors from a poor low-level posture classifier. This paper further improves this extended classifier (as the high-level gesture behavior classifier) as described in Sec. 4 and runs it on real data (posture labels) outputted from the real-time posture recognition classifier discussed in Sec. 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Hand Posture Recognition</head><p>This section introduces HandVu, a library and program to recognize a set of six hand postures in real-time in video streams. Its three main components are described in the following subsections: hand detection, 2D hand tracking, and posture recognition. HandVu's vision processing methods typically require less than 100 milliseconds combined processing time per frame. They are mostly robust to different environmental conditions such as lighting changes, color temperature and lens distortion. HandVu performs fast, accurate, and robust enough for a user interface's quality and usability. HandVu's output includes which of the six postures was recognized or, if none was recognized, an "unknown posture" identifier. This data was fed into the classification method described in This section introduces HandVu, a library and program to recognize a set of six hand postures in realtime in video streams. Its three main components are described in the following subsections: hand detection, 2D hand tracking, and posture recognition. HandVu's vision processing methods typically require less than 100 milliseconds combined processing time per frame. They are mostly robust to different environmental conditions such as lighting changes, color temperature and lens distortion. HandVu performs fast, accurate, and robust enough for a user interface's quality and usability. HandVu's output includes which of the six postures was recognized or, if none was recognized, an "unknown posture" identifier. This data was fed into the classification method described in Sec. 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Hand Detection</head><p>HandVu's hand detection algorithm detects the hand in the closed posture based on appearance and color. It uses a customized method based on the Viola-Jones detection method <ref type="bibr" target="#b20">[21]</ref> to find the hand in this posture and view-dependent configuration. This posture/view combination is advantageous because it can be distinguished rather reliably from background noise <ref type="bibr" target="#b11">[12]</ref>.</p><p>Upon detection of a hand area based on gray level texture, the area's color is compared against a user-independent histogram-based statistical model of skin color, built from a large collection of hand-segmented pictures from many imaging sources (similar to Jones and Rehg's approach <ref type="bibr" target="#b8">[9]</ref>). If the amount of skin pixels falls below a threshold the detection is rejected. These two image cues combined reduce the amount of false detections to about a dozen per hour video.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Hand Tracking</head><p>Next, the hand's motion is tracked in the video stream. To that end, the system learns the observed hand color in a histogram, hence adjusting to user skin-color variation, lighting differences, and camera color temperature settings. Hand tracking uses the "Flock of Features" approach <ref type="bibr" target="#b12">[13]</ref> which calculates the optical flow for small patches and occasionally resorts to local color information as backup. This multicue integration of graylevel texture with textureless color information increases the algorithm's robustness, permitting hand tracking despite vast and rapid appearance changes. It further alleviates interdependency problems with staged-cue approaches, it improves the robustness, and increases confidence in the tracking results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Posture Classification</head><p>The algorithm's last stage attempts to recognize various predefined postures. A posture in our sense is a combination of a hand/finger configuration and a view direction, allowing for the possibility to distinguish two different views of the same finger configuration such as Lback and Lpalm. The focus of the recognition method is on reliability, not expressiveness. That is, it distinguishes a few postures reliably and does not attempt less consistent recognition of a larger number of postures. HandVu's recognition method uses a texture-based approach to fairly reliably classify image areas into seven classes, six postures and "no known hand posture." The confusion matrix of a video-based experiment is shown in Fig. <ref type="figure" target="#fig_10">7</ref> and described to more detail in <ref type="bibr" target="#b10">[11]</ref>.</p><p>A two-stage hierarchy achieves both accuracy and good speed performance. In the first step, a detector looks for any of the six hand postures without distinguishing between them. This is faster than executing six separate detectors because different postures' appearances share common features and can thus be eliminated in one classification. In the second step, only those areas that passed the first step successfully are investigated further. Each of the second-step detectors for the individual postures was trained on the result of the combined classifier which had already eliminated 99.999879% of image areas in a validation set <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref> After a successful classification, the tracking stage is initialized again (the Flock of Feature locations and the observed skin color model). HandVu is largely user independent and not negatively influenced by different cameras or lenses.</p><p>Results from HandVu are sent as "Gesture Events" to the gesture classification module in a unidirectional TCP/IP stream of ASCII messages in the following format: The two identifiers "tracked" and "recognized" are boolean values indicating the tracking and recognition state of HandVu.</p><p>More detailed descriptions of HandVu's architecture <ref type="bibr" target="#b13">[14]</ref>, robust hand detection, <ref type="bibr" target="#b11">[12]</ref> and hand tracking <ref type="bibr" target="#b12">[13]</ref> are available elsewhere.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Hand Gesture Behavior Recognition Classifier</head><p>Sequences of hand postures, over time, compose a hand gesture behavior, which are then interpreted as a computer command (as represented in the overall system, described in Sec. 1 and Fig. <ref type="figure" target="#fig_3">4</ref>.). This section describes the hand gesture recognition classifier theory, implementation, and cost autmation method used within the hand gesture behavior recognition classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Hand Gesture Behavior Recognition Theory</head><p>Sequences of events or features, over time and space, compose a behavior. The events/features are the low-level classifications; in this paper, they are the detected hand postures (as described in Sec. 3). The detected hand postures are concatenated in a temporal sequence to form a behavior, which is a hand gesture. As sequences of hand postures are detected, a method is needed to read these sequences and classify which hand gesture the sequence is most similar to.</p><p>In order to read and classify sequences, we use a syntactical grammar-based approach <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b3">4]</ref>. Before classifying sequences, the various hand gesture behavior structures need to be defined a priori. Each hand gesture behavior can be seen as an infinite set of sequences of hand postures of similar temporal structure; this infinite set is known as a language. Each hand gesture behavior will have different temporal structures of the hand postures. These sequence structures are defined with syntax rules, specifically regular grammars <ref type="bibr" target="#b17">[18]</ref>, where the alphabet are the various hand postures (also known as symbols); the syntax rules then define ways these postures can be combined together to form the temporal structures of interest. A set of syntax rules, defining a hand gesture behavior, is also called a grammar. The grammar is implemented through a finite state machine (FSM). In other words, the FSM reads the sequence of hand postures. If a sequence of hand postures matches a certain hand gesture behavior, its corresponding FSM will accept this sequence. Therefore, the sequence of hand postures is classified into the hand gesture behavior whose corresponding FSM accepts the sequence. Since systems are not one-hundred percent predictable and reliable, its likely that a sequence will not be accepted by any of the predefined hand gesture behaviors. This could be due to errors in the low-level classification of the hand postures, or a user error by making the hand gesture with an incorrect hand posture in the sequence of postures. Therefore, the sequence of hand postures must be classified as the hand gesture behavior to which it is most similar. In order to do so, a distance metric between a sequence of hand postures and a hand gesture behavior is defined. To continue in defining the hand gesture behavior recognition classification, preliminary definitions are shown.</p><p>Let an alphabet be the set of predefined hand postures. An example alphabet is = a, b, c, d, e, f where each letter represents a detected hand posture. More details on the hand posture symbols is shown in Sec. 5. A hand gesture behavior is then a set of syntax rules combining the elements of .</p><p>If an infinite set of sequences of postures is the following k th language:</p><formula xml:id="formula_0">L(B k ) = ab aab • • • a • • • ab (1)</formula><p>then the hand gesture behavior (grammar) that generated this language is:</p><formula xml:id="formula_1">B k =   S → a Q 1 Q 1 → a Q 1 Q 1 → bF  <label>(2)</label></formula><p>Let a hand gesture, a temporal sequence of detected hand postures be denoted by s = s 1 s 2 . . . s n , where each s j is a hand posture from . If s matches one of the sequences in L(B k ), it is the k th hand gesture behavior, and its corresponding FSM M k will accept this sequence. Let there be K predefined hand gesture behaviors B 1 , B 2 , . . . B K , with K corresponding finite state machines M 1 , M 2 , . . . M K that implement each hand gesture behavior. The sequence s will be classified into the hand gesture behavior to whose corresponding FSM it is accepted. If s is not accepted by any M l , sequence s will then be classified as the hand gesture behavior to which it is most similar. Therefore, as M l is parsing sequence s, it will edit the sequence so that the M l accepts it, but with a cost per edit, and then a total cost. Therefore, the distance between a sequence s and a hand gesture behavior B l , denoted by d(s, B l ), is determined by the cost-weighted number of editions required to transform s into B l . The possible posture symbol editions are substitution and deletion, where each edition has an a priori cost assigned. As s is being parsed, M l , will carry out the minimum number of edits required to transform s into a sequence in B l . In order to allow edits with an associated cost in a hand gesture behavior, the original set of syntax rules per behavior and corresponding FSM must be augmented. Let the augmented k th hand gesture behavior and corresponding FSM be denoted by B k and M k . With the example, B k , let the augmented set of syntax rules be denoted by</p><formula xml:id="formula_2">B k =                            S → a Q 1 , 0 Q 1 → a Q 1 , 0 Q 1 → bF, 0 S → bQ 1 , C S(b,a) Q 1 → bQ 1 , C S(b,a) Q 1 → a F, C S(a,b) S → ε Q 1 , C D(b) Q 1 → ε Q 1 , C D(b) Q 1 → εF, C D(a)                            (3)</formula><p>Let S(a, b) denote substituting the true posture b for the mislabeled posture a, and the associated cost C S(b,a) , and D(a) denote deleting a mislabeled posture a with a cost C D(a) . The corresponding modified FSM M k is shown in Fig. <ref type="figure" target="#fig_5">5</ref>.</p><p>In order to calculate the distance between a sequence of hand postures and a hand gesture behavior, let the possible hand postures be = r 1 , r 2 , • • • , r N , where N is the total number of hand postures. With this, the distance from a sequence of hand postures s and a hand gesture behavior B l is given by,  where n S(r i ,r j ) is the number of substitutions of true hand posture r j for mislabeled hand posture r i , and n D(r i ) number of deletions of the mislabeled hand posture r i .</p><formula xml:id="formula_3">d(s, B l ) = | | i=1 | | j=1 C S(r i ,r j )) n S(r i ,r j ) + | | i=1 C D(r i ) n D(r i )<label>(4)</label></formula><p>With a distance metric between a sequence of hand postures and an a priori defined hand gesture behavior, the classification definition can be elaborated upon. Assuming K hand gesture behaviors, each behavior and its associated FSM are augmented a priori so that any sequence of hand postures is accepted by each hand gesture behavior, but with a total cost. The augmented hand gesture behaviors are then denoted by B 1 , B 2 , . . . B K , and their K corresponding augmented finite state machines are denoted by M 1 , M 2 , . . . M K . An unknown sequence s of hand postures is then parsed by each M l , with a cost d(s, B l ). The sequence s is then classified as the hand gesture B g , where B g = min d(s, B l , l = 1, 2, • • • , K ), and is the hand gesture behavior to which it is most similar. Therefore, sequences of hand postures are classifed based upon Maximum Similarity Classification (MSC) as seen in Fig. <ref type="figure" target="#fig_6">6</ref>.</p><p>The hand gesture behavior classifier is innovatively implemented. The implementation is generalized so that the overall hand gesture behavior classification structure stays  the same, and can operate on various number of hand gesture behaviors, various number of syntax rules per hand gesture behavior, various number of sequences of hand postures to classify, and various number of hand postures per sequence to classify as a hand gesture. The hand gesture classifier is also easily scalable to additional hand postures and hand gesture behaviors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Hand Gesture Recognition Cost Automation</head><p>In low-level classification of hand postures, hand postures can be mislabeled at times. The probability of misclassifying certain hand postures as other hand postures, is known a priori. For example, the probability of mislabeling the true hand posture b as hand posture a is known a priori. This knowledge is used in classifying sequences of hand postures; in other words, these probabilities are used in automating the costs per syntax rule in a hand gesture behavior. The probabilities of mislabeling certain hand postures as other hand postures is extracted from the confusion matrix, also known as the recognition summary in section Sec . <ref type="bibr" target="#b2">3</ref>. The probability that the low-level hand posture recognition classifier mislabeled hand postures is then calculated from the confusion matrix for the posture recognition classifier, as seen in Table <ref type="table" target="#tab_0">1</ref>. Let N be the total number of hand postures classified by the low-level hand posture recognition classifier (also is the size | |). </p><formula xml:id="formula_4">= Count (b, a)/N (Count (b, a) + Count (b, b) + Count (b, c))/N (7) = Count (b, a) Count (b, a) + Count (b, b) + Count (b, c)<label>(6)</label></formula><p>The conditional probability estimate for all possible mislabeled hand postures can now detected. The substitution costs are defined as the inversion of the conditional probability. This can be seen in an example; let the cost for substituting the mislabeled posture a with the true posture b be the inversion of the probability that the low-level hand posture classifier mislabeled posture b as posture a. By denoting the conditional probabilities P(labeled postur e = a|tr uepostur e = b) as as P(a|b), then the cost for substituting a mislabeled the true posture a with the true posture b is: Thinking about this intuitively, if there is a high probability that the low-level hand posture recognition classifier mislabels the true posture b with posture a, where P(a|b) is high, then the cost for substituting posture a for b, C S(b,a) , is low.</p><formula xml:id="formula_6">C S(a,b) = 10log 10 ( 1 P(a|b) )<label>(9)</label></formula><p>Additionally, if this probability P(a|b) is nonzero, then P(b|b) is less than one, since</p><formula xml:id="formula_7">P(b|b) = 1 − P(a|b) − P(c|b).<label>(10)</label></formula><p>In order to get an understanding of the range of potential costs per edit, let x be an entry in the confusion matrix, the cost is normalized in a range by taking the 10log 10 (1/x), where x is the probability such as P(a|b). In addition, to avoid infinite costs, such as when the probability is zero, probabilities of zero have added, 0 + , where = 2x10 −16 . To get insight into the costs from the probabilities of misclassifying hand postures, see the costs in Table <ref type="table" target="#tab_1">2</ref> lists the range.</p><p>For this data set, we set the cost for deleting a posture instance b, for example, D(b) be 20log 10 (1/(0 + eps)), with this, constraining the hand gesture classifier to chose substitutions for the minimum cost edits possible, and scale deletions to other applications in future work. The infrastructure is in place. Deleting could be used for cases where edit is not a result of low-level classification, but the user used the wrong hand posture by accident.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Design and Results</head><p>This section will demonstrate hand gesture behaviors enhancing low-level classifications of hand postures and scaling the possible number of computer commands available for AAL. Since low-level hand posture classifications can have errors, sequencing the hand postures together over time can enhance the low-level classifications for ambient assisted living environments. In addition to enhancing the low-level classification, if you limit the computer commands to the fixed set of hand postures, the number of computer commands possible is the number of hand postures possible. If you sequence various hand postures together over time, additional computer commands are possible. Therefore, the experimental results of this paper will show that sequencing hand postures together will enhance low-level classifications and sequencing various hand postures together allows for more possible computer commands. For each classification, the hand gesture behavior classification performance accuracy results will also be shown. The overall experiment design focus can be seen in Fig. <ref type="figure" target="#fig_3">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Experimental Design</head><p>This section describes the experimental design.</p><p>As discussed in section Sec .3, the possible hand postures are Closed, Open, Lback, Lpalm, Victory, and Sidepoint. In order to define the various hand gesture behaviors, a symbol is assigned to each hand posture as seen in Table <ref type="table" target="#tab_2">3</ref>, with the alphabet of hand postures, = a, b, c, d, e, f . Various data sets were created, where each data set has six hand gesture behaviors, in order to compare six hand gesture behaviors with six individual hand postures, as seen in Table <ref type="table" target="#tab_3">4</ref>. Data Set 1 sequences the same hand posture together, and therefore classifying this hand gesture is similar to a weighted average over the detected hand posture over time (e.g. a low-pass filter to remove and smooth out the noise of posture mislabeling). In addition, combining various hand postures, extends the number of potential computer commands usable in an AAL environment. Therefore, various combinations of the hand postures were defined per hand gesture behavior, such as Data Set 2 uses the most reliable pairs of hand postures, Data Set 3 uses the least reliable pairs of hand postures, Data Set 4 uses a combination of the most reliable and least reliable hand posture label.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Experimental Results</head><p>This section shows the results of hand gesture behavior classification, based upon sequences of lower-level classifications of hand postures. The low-level hand posture detection is shown in Fig. <ref type="figure" target="#fig_10">7</ref>. In this figure, probability of classifying a certain hand posture, given the true hand posture is shown. The x-axis is true posture label, the y-axis is probability of classifying an observed label (shown in the various colors, represented in the legend) given the true label. The six hand postures, from Table <ref type="table" target="#tab_2">3</ref> are shown. Note, the inherent high probability of the low-level posture classifier mislabeling the true Sidepoint, f , hand posture as the Lpalm, d, hand posture.</p><p>The hand gesture behavior detection is shown in Fig. <ref type="figure" target="#fig_11">8</ref>. In this figure, probability of classifying a certain hand gesture behavior, given the true hand gesture behavior is shown. The x-axis is true hand gesture behavior label, the y-axis is probability of classifying an observed label (shown in the various colors, represented in the legend) given the true label. Six hand gesture behaviors per data sets, from Table <ref type="table" target="#tab_3">4</ref> are shown. The robustness of the high-level hand gesture classifier is portrayed in these results. The first dataset, Data Set 1, the gestures made up of single postures, was designed to show the error-correcting capability of the high-level behavior classifier. Notice the 100% accuracy of classifying the sequences of Sidepoint postures, even though the underlying raw posture symbols were mostly mislabeled. In addition, the other Data Sets show various hand gesture behavior classifications, enhancing the low-level hand posture recognitions and increasing the vocabulary size, therefore increasing the number of computer commands that can be interpreted for AAL.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future Work</head><p>Future work can be broken into improved algorithm development for usability and scaled to additional applications. For improved algorithm development, we plan to incorporate usability factors into the costs (fuse these factors with the costs from the low-level classifier). For example, "how often does a user misuse a hand posture in a hand gesture?" should be additionally incorporated into the hand gesture behavior costs. In addition, this paper can scale to various applications, such as increased awareness and environment enabled AAL, where a house would have a network of cameras throughout the house that can interpret hand gestures as commands. Another potential application is a enabling a smart environments and communications using a surveillance system infrastructure. In addition to surveillance, people with certain features (e.g. airport security, etc.) can communicate signals to a central control station, and communicate information such as unusual behavior they observed or reporting a status on certain suspicious persons. In this case, the start and end of a hand gesture would be defined, e.g. specific hand postures would be defined to initiate and terminate a hand gesture communication. In addition, the structure of hand gesture behavior classification can be scaled to classifying sequences of body postures for human gesture behavior classification for surveillance applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Summary</head><p>We built a vision-based user interface to support intentional interaction with a smart assisted-living environment. Our experiments show that the use of temporal hand postures, creating hand gesture behaviors, improves the recognition rates and increases the available vocabulary, two very important considerations for a user interface. The lowlevel classifications of hand postures are therefore enhanced through hand gesture behaviors for more robust human computer interaction; in addition, through hand gesture behaviors, with an increased vocabulary possible, the number of computer commands interpreted from hand gesture behaviors also increases. Overall, ambient assisted living is enabled through computer command interpretations from hand gesture behaviors, as a result of low-level posture recognitions over time. The vision-based intentional interface is robust in that it can be scaled to a range of hand posture types and hand gesture behavior types, and therefore additional applications as discussed in Sec. 6 can be carried out.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1. Overall systems view of an Ambient Assisted Living (AAL) system.</figDesc><graphic coords="2,148.59,279.99,298.10,133.77" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 .</head><label>2</label><figDesc>Figure 2. Overall pyramid process of behavior analysis and interpetation, from video to hand postures, to hand gesture behaviors to computer commands interpreted to enable Ambient Assisted Living (AAL).</figDesc><graphic coords="2,183.66,500.64,227.95,156.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 .</head><label>3</label><figDesc>Figure 3. Close-up view of the six hand postures, from left to right: closed, open, Lback, Lpalm, Victory, sidepoint.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 .</head><label>4</label><figDesc>Figure 4. Technical Overview of Ambient Assisted Living (AAL) system, with the experiment focus highlighted.</figDesc><graphic coords="3,166.13,456.39,263.02,231.83" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>1. 2</head><label>2</label><figDesc>timestamp obj_id: tracked, recognized,... ..., "posture" (xpos, ypos) [scale, unused]\r\n</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 .</head><label>5</label><figDesc>Figure 5. Augmented FSM M 1 of hand gesture behavior B k . The augmented syntax rules are in red, and original syntax rules in blue with zero cost.</figDesc><graphic coords="9,209.96,150.94,175.35,120.98" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 .</head><label>6</label><figDesc>Figure 6. Maximum similarity classification (MSC), with the input sequences of hand postures and output the hand gesture behavior.</figDesc><graphic coords="9,183.66,320.09,227.96,156.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head></head><label></label><figDesc>Count (a, a) Count (a, b) Count (a, c) True: b Count (b, a) Count (b, b) Count (b, c) True: c Count (c, a) Count (c, b) Count (c, c)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>P</head><label></label><figDesc>(labeled postur e = a|tr uepostur e = b) = (5) = P(labeledpostur e = a, tr uepostur e = b) P(tr uepostur e = b)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 7 .</head><label>7</label><figDesc>Figure 7. Probabilities of hand posture classification (y-axis) given the true hand posture (x-axis).</figDesc><graphic coords="13,211.72,150.94,171.83,88.24" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 8 .</head><label>8</label><figDesc>Figure 8. Probabilities of hand gesture behavior classification (y-axis) given the true hand gesture behaviors (x-axis).</figDesc><graphic coords="14,298.89,242.60,171.83,90.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Confusion Matrix for Posture Recognition Classifier</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Costs -Confusion Matrix for Posture Recognition Classifier</figDesc><table><row><cell>Labeled: x</cell><cell>1/x</cell><cell>10log 10 (1/x)</cell></row><row><cell>0+eps</cell><cell>4.5x10 15</cell><cell>156.5356</cell></row><row><cell>0.1</cell><cell>10</cell><cell>10</cell></row><row><cell>0.2</cell><cell>5</cell><cell>6.9897</cell></row><row><cell>0.3</cell><cell>3.3333</cell><cell>5.2288</cell></row><row><cell>0.4</cell><cell>2.5</cell><cell>3.9794</cell></row><row><cell>0.5</cell><cell>2</cell><cell>3.0103</cell></row><row><cell>0.6</cell><cell>1.6667</cell><cell>2.2185</cell></row><row><cell>0.7</cell><cell>1.4286</cell><cell>1.5490</cell></row><row><cell>0.8</cell><cell>1.25</cell><cell>0.9691</cell></row><row><cell>0.9</cell><cell>1.1111</cell><cell>0.4576</cell></row><row><cell>1.0</cell><cell>1</cell><cell>0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Hand Postures and Corresponding Symbols</figDesc><table><row><cell cols="2">Hand Posture Symbol</cell></row><row><cell>Closed</cell><cell>a</cell></row><row><cell>Open</cell><cell>b</cell></row><row><cell>Lback</cell><cell>c</cell></row><row><cell>Lpalm</cell><cell>d</cell></row><row><cell>Victory</cell><cell>e</cell></row><row><cell>Sidepoint</cell><cell>f</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Hand Gesture Behaviors (Six Per Data Set), where n, m, k, &gt; 1 for all Hand Gesture Behaviors.</figDesc><table><row><cell cols="5">Hand Gesture Behavior Data Set 1 Data Set 2 Data Set 3 Data Set 4</cell></row><row><cell>Behavior 1</cell><cell>a n</cell><cell>a n b m</cell><cell>c n d m</cell><cell>a n f m</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Put-That-There: Voice and Gesture in the Graphics Interface</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Bolt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Graphics, ACM SIGGRAPH</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="262" to="270" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Scenario recognition in airborne video imagery</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bremond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Medioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">DARPA98</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="211" to="216" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Computer Vision for Interactive Computer Graphics</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">T</forename><surname>Freeman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">B</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Beardsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">N</forename><surname>Dodge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Roth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Weissman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">S</forename><surname>Yerazunis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Computer Graphics and Applications</title>
		<imprint>
			<biblScope unit="page" from="42" to="53" />
			<date type="published" when="1998-06">May-June 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Sequential Behavior Classification Using Augmented Grammars</title>
		<author>
			<persName><forename type="first">R</forename><surname>Goshorn</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001-06">June 2001</date>
			<pubPlace>San Diego</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of California</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master&apos;s thesis</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Syntactical Classification of Extracted Sequential Spectral Features Adapted to Priming Selected Interference Cancelers</title>
		<author>
			<persName><forename type="first">R</forename><surname>Goshorn</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005-06">June 2005</date>
			<pubPlace>San Diego</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of California</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Vision-based syntactical classification of hand gestures to enable robust human computer interaction</title>
		<author>
			<persName><forename type="first">R</forename><surname>Goshorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Goshorn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">3rd Workshop on AI Techniques for Ambient Intelligence, co-located with European Conference on Ambient Intelligence (ECAI08)</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="211" to="216" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Probabilistic parsing in action recognition</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ivanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bobick</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Recognition of visual activities and interactions by stochastic parsing</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">A</forename><surname>Ivanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Bobick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Pattern Anal. Mach. Intell</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="852" to="872" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Statistical Color Models with Application to Skin Detection</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Rehg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. Journal of Computer Vision</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="81" to="96" />
			<date type="published" when="2002-01">Jan 2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">InfoPoint: A Device that Provides a Uniform User Interface to Allow Appliances to Work Together over a Network</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kohtake</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rekimoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Anzai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Personal and Ubiquitous Computing</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="264" to="274" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Vision Based Hand Gesture Interfaces for Wearable Computing and Virtual Environments</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kölsch</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004-09">September 2004</date>
		</imprint>
		<respStmt>
			<orgName>Computer Science Department, University of California, Santa Barbara</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Robust Hand Detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kölsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Turk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE Intl. Conference on Automatic Face and Gesture Recognition</title>
				<meeting>IEEE Intl. Conference on Automatic Face and Gesture Recognition</meeting>
		<imprint>
			<date type="published" when="2004-05">May 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Hand Tracking with Flocks of Features</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kölsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Turk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Video Proc. CVPR IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Vision-Based Interfaces for Mobility</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kölsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Turk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Höllerer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Intl. Conference on Mobile and Ubiquitous Systems (MobiQuitous)</title>
				<imprint>
			<date type="published" when="2004-08">August 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">3D-tracking of Head and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario</title>
		<author>
			<persName><forename type="first">K</forename><surname>Nickel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Seemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stiefelhagen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE Intl. Conference on Automatic Face and Gesture Recognition</title>
				<meeting>IEEE Intl. Conference on Automatic Face and Gesture Recognition</meeting>
		<imprint>
			<date type="published" when="2004-05">May 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A Boosted Classifier Tree for Hand Shape Detection</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Ong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bowden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE Intl. Conference on Automatic Face and Gesture Recognition</title>
				<meeting>IEEE Intl. Conference on Automatic Face and Gesture Recognition</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="889" to="894" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Tailor: creating custom user interfaces based on gesture</title>
		<author>
			<persName><forename type="first">R</forename><surname>Pausch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Williams</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the the third annual ACM SIGGRAPH symposium on User interface software and technology</title>
				<meeting>the the third annual ACM SIGGRAPH symposium on User interface software and technology</meeting>
		<imprint>
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Thoery of Computation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sipser</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>PWS Publishing Company</publisher>
			<pubPlace>Massachusetts</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Visual Recognition of American Sign Language Using Hidden Markov Models</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Starner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pentland</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995</date>
			<publisher>AFGR</publisher>
			<pubPlace>Zurich</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Gesture recognition</title>
		<author>
			<persName><forename type="first">M</forename><surname>Turk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Handbook of Virtual Environments: Design, Implementation and Applications</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Stanney</surname></persName>
		</editor>
		<imprint>
			<publisher>Lawrence Erlbaum Associates Inc</publisher>
			<date type="published" when="2001-12">December 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Robust Real-time Object Detection</title>
		<author>
			<persName><forename type="first">P</forename><surname>Viola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. Journal of Computer Vision</title>
		<imprint>
			<date type="published" when="2004-05">May 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Gestix: a doctor-computer sterile gesture interface for dynamic environments</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wachs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Stern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Edan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gillam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Feied</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Handler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Soft Computing in Industrial Applications: Recent and Emerging Methods and Techniques</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="30" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Cluster labeling and parameter estimation for the automated setup of a hand-gesture recognition system</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Wachs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Stern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Edan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Systems, Man, and Cybernetics</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="932" to="944" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">XWand: UI for Intelligent Spaces</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shafer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
			<publisher>ACM CHI</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
