<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Video based human smoking event detection method</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Anna</forename><forename type="middle">V</forename><surname>Pyataeva</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Siberian Federal University</orgName>
								<address>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Reshetnev Siberian State University of Science and Technology</orgName>
								<address>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maria</forename><forename type="middle">S</forename><surname>Eliseeva</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Siberian Federal University</orgName>
								<address>
									<settlement>Krasnoyarsk</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Video based human smoking event detection method</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">16A07CE99DF2164959F91F4D03B9AC22</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Smoking event detection</term>
					<term>convolutional neural network</term>
					<term>spatio-temporal features Video 12. Number of frames: 117. Resolution: 270×360. Video duration: 3.90 sec. Alias: Video 15. Number of frames: 151. Resolution: 1280×720. Video duration: 5.07 sec</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The paper proposes a method for recognizing smoking event detection from visual data. The method uses a three-dimensional convolutional neural network ResNet, which provides work with video based spatio-temporal features.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>According to WHO Framework Convention on Tobacco Control <ref type="bibr" target="#b0">[1]</ref> there is no safe level of tobacco smoke exposure. Creating a completely smoke-free environment is the only way to protect people from the harmful effects of breathing even second-hand smoke. Human action analysis based on visual processing is significant for many applications such as intelligent video surveillance, analysis of employee and customer behavior. Recognizing a person's smoking while driving can significantly increase road safety <ref type="bibr" target="#b1">[2]</ref>. To recognize smoking activity on the use of smartwatch sensors as a state-transition model that consists of the mini-gestures handto-lip, hand-on-lip, and hand-off-lip <ref type="bibr" target="#b2">[3]</ref>. Wu et al. <ref type="bibr" target="#b3">[4]</ref> proposed the color-based ratio histogram analysis is introduced to extract the visual clues from appearance interactions between lighted cigarette and its human holder. The techniques of color re-projection and Gaussian Mixture Models enable the tasks of cigarette segmentation and tracking over the background pixels. Smoke detection in the area around human faces and hands can be applied to recognition of the smoking action <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. The reliable smoke detection is a difficult due to great variability of shape, color, transparency, turbulence variance, non-stable motion, boundary roughness, and time-varying flicker effect in the boundaries of smoke as well as artifacts during shooting such as low resolution, blurring, and weather conditions. The key problem of smoking behavior recognition is the irregular shape: different ways to hold a cigarette, types of tobacco products, bad weather and shooting conditions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Smoking event detection method</head><p>In this paper spatio-temporal features based smoking activity detection algorithm, which allows recognizing human smoking activity regardless of the person's appearance, the way to hold a cigarette, the type of cigarette, the distance of the object of interest, and movement patterns.</p><p>SDM-2021: All-Russian conference, August 24-27, 2021, Novosibirsk, Russia anna4u@list.ru (A. V. Pyataeva)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Spatio-temporal features of smoking activity</head><p>Smoking activity belongs to a group of atomic actions that can be recognized only if there is a certain set of spatio-temporal features. Four atomic action groups are considered:</p><p>• arm position changes. The sequence of actions: the hand rises to the level of the lips, pause, falls down, pause, rises again; • lip movement on close-up scenes;</p><p>• lighting a cigarette: ∘ tilt of the head; ∘ using a cigarette lighter, using a lighter involves a sequence of actions:</p><p>-bringing the lighter to the face with one hand; -the thumb of this hand starts the mechanism (the action can be repeated several times); -the other hand can prevent the cigarette from fading and block the view to recognize previous actions (in this case, both hands are at the level of the lips); -the hands are lowered; ∘ lighting up with matches, the use of matches for lighting cigarettes consists of the following actions:</p><p>-the cigarette is clamped between the teeth; -both hands are at chest level or just below the chest; -one hand is performed with a small wave (the action can be repeated); -one hand remains at chest level, the second changes position, moving higher to the chin or lips, -a wave of the hand (to extinguish the match); -lowering the hands;</p><p>• flicking the ash from the cigarette (the action may not be present in the frame) consists of the following steps: the withdrawal of the hand with the cigarette down and the characteristic movement of the hand or fingers of the hand.</p><p>Smoking activity recognition is implemented using a three-dimensional neural network based on the spatio-temporal features in the entire video data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Image pre-processing</head><p>Visual information as a result of real-time video shooting may include objects with dynamic behavior, noise of the hardware or transmission lines, as well as artefacts affected by weather conditions (for example, rain or snow, poor luminance in the morning or evening). Because of this, the quality of smoking action recognition significantly degrade. Therefore, scaling and mean subtraction <ref type="bibr" target="#b7">[8]</ref> are used to solve this problem. To implement preprocessing algorithms, a computer vision library OpenCV (Open Source Computer Vision Library) was used <ref type="bibr" target="#b8">[9]</ref>. Thus, the video sequence preprocessing is performed according to the expressions:</p><formula xml:id="formula_0">𝑅 = 𝑅 − 𝜇 𝑅 𝜎 , 𝐺 = 𝐺 − 𝜇 𝐺 𝜎 , 𝐵 = 𝐵 − 𝜇 𝐵 𝜎 ,<label>(1)</label></formula><p>where 𝑅, 𝐺, 𝐵 are the values of the red, green, blue channels of the image, respectively; 𝜇 = {𝜇 𝑅 , 𝜇 𝐺 , 𝜇 𝐵 } is the average color intensity for each image channel; 𝜎 -scaling coefficient.</p><p>The 𝜎 value can be the standard deviation over the training set. However, 𝜎 can also be manually set to scale the input image space to a specific range</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Neural network architecture</head><p>AlexNet <ref type="bibr" target="#b9">[10]</ref>, VGG <ref type="bibr" target="#b10">[11]</ref> and ResNet <ref type="bibr" target="#b8">[9]</ref> neural networks are most often used to classify images and video sequences. The ResNet neural network is fully convolutional, so it is used for space-time volume extraction, unlike many architectures with fully connected layers, including AlexNet and VGG-16, which contain several levels of the maximum pool that can damage the actions evaluation. The ResNet network contains only one pool level immediately after the conv1 layer. The reduced number of bonding layers makes ResNet more suitable for visual recognition of smoking, since spatial details must be preserved to recognize this process.</p><p>In the work the 34 layers ResNet neural network was used that shows computational efficiency in solving classification problems <ref type="bibr" target="#b11">[12]</ref>. In order to use ResNet to estimate multi-frame optical flow, it is necessary to extend this architecture, replacing all 𝑘×𝑘 two-dimensional convolutional kernels with an additional time dimension 𝑘 × 𝑘 × 3, as described in article <ref type="bibr" target="#b12">[13]</ref>. The pool layers in the decoder are expanded in a similar way. The neural network transformed in this way in the paper is called ResNetM, its composition is presented in Table <ref type="table" target="#tab_0">1</ref>.</p><p>In Table <ref type="table" target="#tab_0">1</ref> the residual blocks are grouped in square brackets. Batch normalization is used after each convolutional layer. The main difference between this architecture and ResNet is the use of 3D kernels and a modified downsampling operation, whereby feature maps in the convolution layer are combined with several adjacent frames in the previous layer, thereby capturing motion information.</p><p>The dimensions of the convolutional kernels are 3 × 3 × 3. The network uses 16-frame RGB clips as inputs. The dimensions of the input clips are 3 × 16 × 112 × 112. Downsampling of inputs is performed periodically in steps of 2. </p><formula xml:id="formula_1">[︂ 3 × 3 × 3 3 × 3 × 3 ]︂ × 3<label>64</label></formula><p>Convolutional layer 3</p><formula xml:id="formula_2">[︂ 3 × 3 × 3 3 × 3 × 3 ]︂ × 4<label>128</label></formula><p>Convolutional layer 4</p><formula xml:id="formula_3">[︂ 3 × 3 × 3 3 × 3 × 3 ]︂ × 6<label>256</label></formula><p>Convolutional layer 5</p><formula xml:id="formula_4">[︂ 3 × 3 × 3 3 × 3 × 3 ]︂ × 3<label>512</label></formula><p>343-352</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Smoking activity detection algorithm</head><p>The proposed method uses deep learning network for smoking action detection by recognizing actions that are characteristic of a person who is in the process of smoking. The block diagram of the smoking activity detection algorithm is shown in Figure <ref type="figure" target="#fig_0">1</ref>. Stochastic Gradient Descent (SGD) with momentum is used to train the neural network. Training samples are randomly generated from the videos in the training set. Time positions are selected evenly. Next, 32-frame clips are set around the specified time positions. If the video is shorter than 32 frames, it will loop as many times as necessary to reach the set duration. Then the spatial positions are randomly selected from four corners or one center. In addition to the positions, the spatial scales of each sample are also specified for multiscale cropping. The frame is cropped at the time-space positions. The size of each sample is 3 channels×32 frames×112 pixels×112 pixels, and each sample is flipped horizontally with a 1/2 probability. It also subtracts the average of our dataset from the sample for each color channel. All created samples retain the same class labels as their original videos. Model training uses cross entropy as a function of loss. The training parameters include a damping of 0.001 and 0.9 for the impulse. The learning rate is 0.1 and divided by 10 after saturation of the validation loss. When fine tuning is performed at a learning rate of 0.001, the scale attenuation is 1e−5.</p><p>At the first stage, the neural network is initialized, the parameters are set, and after that the video sequence is fed to the input. Initialization of the classes is performed, which allows the classification of the dataset: "smoking", "no smoking". The duration of the sample is determined, that is, the number of frames for classification is 32, and the spatial sizes of the sample are 112 × 112. To create input clips, the sliding window method is used, in which only the oldest frame in the list is discarded, making room for the newest frame. Each video is then split into non-overlapping 32-frame clips. This operation occurs using a loop that reads frames from the video stream, then checks for frame capture. If a frame is captured, then each clip is cropped around the center position at the maximum scale, an average subtraction is performed and a new frame is added to the queue, otherwise the loop exits. The new cycle allows you to check if the queue is full. At the end of this cycle, a blob object is created. A "blob object" or "blob" is a collection of frames with the same spatial dimensions, expressed in width and height, and the same depth, that is, the number of channels that must be preprocessed in the same way. A blob object has the following dimensions: <ref type="bibr" target="#b2">(3,</ref><ref type="bibr">32,</ref><ref type="bibr">112,</ref><ref type="bibr">112 )</ref>. The number 3 denotes the number of channels in the input frames. 32 -the total number of frames in the "blob". The following numbers represent the height and width respectively.</p><p>Next, in order to extract the space-time characteristics, each instance is transmitted through a 3D convolutional neural network. Smoking is recognized by finding multiple optical flow. The optical flow is calculated at each point, then a motion map is formed. Each feature map of a convolutional layer is associated with several consecutive adjacent frames in the upper layer. The next step is to assess the probability of smoking in the clips. The network "scans" the sequence of thirty-two frames, generates motion paths, analyzes the similarity to a known smoking pattern, and finds the probabilities of smoking in each frame, which are then averaged over all clips. The class that has the highest score indicates the action in the given video sequence. If the probability is greater than or equal to 0.5, then smoking in these frames is recognized.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experimental and results</head><p>In order to the video-based smoking detection model work, the following specifications are required: a minimum of 2GB NVIDIA graphics card and installed software: CUDA and cuDNN. The model uses Anaconda and Python packages including OpenCV, matplotlib, and Pytorch. Experimental studies were carried out with the characteristics of a laptop Intel (R) Core (TM) i7-6700HQ processor, 2.60 GHz processor clock, 8 GB RAM, Windows 10 operating system, NVIDIA GeForce GTX 960M graphics processor, 2 GB dedicated graphics processor memory. The modified neural network was trained on 6766 videos from the HMDB51 dataset <ref type="bibr" target="#b13">[14]</ref>. The video shows actions that can be grouped into five groups:</p><p>(1) general face actions: smile, laugh, chew, speak;</p><p>(2) actions with object manipulations: smoking, eating, drinking;</p><p>(3) general body movements: do a wheel, applaud, climb, climb stairs, dive, fall to the floor, put your hands back, do a handstand, jump, pull up, push up, run, sit down, climb from something, do somersaults , get up, turn around, walk, make a wave; (4) body movements when interacting with an object: combing hair, catching, drawing a sword, dribbling a ball, playing golf, hitting a ball, picking, pouring, pushing something, riding a bicycle, riding a horse, shooting a ball, shooting bow, shoot a gun, throw a ball; (5) body movements for human interaction: fencing, hugging, kicking, kissing, punching, shaking hands, sword fighting.</p><p>Actions of categories ( <ref type="formula" target="#formula_0">1</ref>)-( <ref type="formula">5</ref>) for experimental research are combined into one class "no smoking". For experimental studies, 70 "smoking" videos were used, in which people of different ages, body types, gender characteristics, different races, differently holding cigarettes, of different shapes and types, were filmed in the process of smoking. and 6766 video with "no smoking" actions. At least two observers to ensure consistency have reviewed each clip. The algorithm results are shown in Table <ref type="table" target="#tab_1">2</ref>.</p><p>Tables <ref type="table">3 and 4</ref> shows the frames of some of the video sequences used and the results of smoking recognition. The results of the smoking recognition method are marked with the labels "smoking" -"no smoking".</p><p>The test video data is supplemented with videos in which the action is visually similar to smoking, but, thanks to the spatial and temporal features of the neural network and the identified pattern of characteristic smoking movements, it is able to distinguish these actions from smoking action. In video 9 a girl eats a lolipop; video 11 a girl bites a pen; video 15 a man eats ice cream. The training sample was 80%, the test sample was 20% of the total sample. To evaluate the effectiveness of human smoking activity detection and recognition algorithms, the indicators of detection accuracy (TR), false-positive (FAR) and false-negative (FRR) were used. The results of smoking detection for neural network architectures ResNet and modified network ResNetM are shown in Table <ref type="table" target="#tab_2">5</ref>. Experimental studies conducted on 20 video sequences obtained in real-world shooting conditions confirm the efficiency of the proposed method for recognizing smoking. The ResNet neural network architecture, modified to a three-dimensional neural network, ensures that the spatial-temporal signs of smoking are taken into account and shows, on average, 15% higher accuracy in recognizing the smoking actions compared to the basic architecture. The developed software implementation of the smoking recognition method provides real-time operation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Smoking event detection algorithm.</figDesc><graphic coords="4,89.29,273.71,416.69,365.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Architecture of the ResNetM neural network.</figDesc><table><row><cell>Layer name</cell><cell>Activation function</cell><cell>Core</cell><cell>Neuron count</cell></row><row><cell>Convolutional layer 1</cell><cell></cell><cell>7 × 7 × 7</cell><cell>64</cell></row><row><cell>Convolutional layer 2</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>ReLu</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>The algorithm results.</figDesc><table><row><cell cols="5">Era Training loss Accuracy when training Test losses Accuracy when checking</cell></row><row><cell>1</cell><cell>1.1552</cell><cell>0.4329</cell><cell>0.7308</cell><cell>0.6699</cell></row><row><cell>2</cell><cell>0.9412</cell><cell>0.5801</cell><cell>0.5987</cell><cell>0.7346</cell></row><row><cell>3</cell><cell>0.8054</cell><cell>0.6504</cell><cell>0.5181</cell><cell>0.7613</cell></row><row><cell>4</cell><cell>0.7215</cell><cell>0.6966</cell><cell>0.4497</cell><cell>0.7984</cell></row><row><cell>5</cell><cell>0.6253</cell><cell>0.7572</cell><cell>0.4530</cell><cell>0.7984</cell></row><row><cell></cell><cell></cell><cell>. . .</cell><cell></cell><cell></cell></row><row><cell>46</cell><cell>0.2325</cell><cell>0.9167</cell><cell>0.2024</cell><cell>0.9198</cell></row><row><cell>47</cell><cell>0.2284</cell><cell>0.9212</cell><cell>0.2058</cell><cell>0.9280</cell></row><row><cell>48</cell><cell>0.2261</cell><cell>0.9212</cell><cell>0.2448</cell><cell>0.9095</cell></row><row><cell>49</cell><cell>0.2170</cell><cell>0.9153</cell><cell>0.2259</cell><cell>0.9280</cell></row><row><cell>50</cell><cell>0.2109</cell><cell>0.9118</cell><cell>0.2267</cell><cell>0.9125</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 5</head><label>5</label><figDesc>Experimental results.</figDesc><table><row><cell>Video</cell><cell cols="6">ResNet TR, % FAR, % FRR, % TR, % FAR, % FRR, % ResNetM</cell></row><row><cell>Video 1</cell><cell>80.2</cell><cell>19.8</cell><cell>32.1</cell><cell>87.8</cell><cell>12.0</cell><cell>12.2</cell></row><row><cell>Video 3</cell><cell>81.5</cell><cell>18.5</cell><cell>15.6</cell><cell>90.7</cell><cell>9.20</cell><cell>9.32</cell></row><row><cell>Video 5</cell><cell>78.0</cell><cell>22.0</cell><cell>31.2</cell><cell>88.8</cell><cell>11.1</cell><cell>11.2</cell></row><row><cell>Video 7</cell><cell>86.1</cell><cell>13.9</cell><cell>12.9</cell><cell>97.4</cell><cell>2.41</cell><cell>2.59</cell></row><row><cell>Video 8</cell><cell>80.9</cell><cell>19.1</cell><cell>17.9</cell><cell>92.4</cell><cell>7.52</cell><cell>7.57</cell></row><row><cell>Video 10</cell><cell>78.7</cell><cell>21.3</cell><cell>20.1</cell><cell>85.9</cell><cell>14.5</cell><cell>14.1</cell></row><row><cell>Video 12</cell><cell>90.0</cell><cell>10.0</cell><cell>11.1</cell><cell>98.2</cell><cell>1.74</cell><cell>1.81</cell></row><row><cell>Video 14</cell><cell>96.0</cell><cell>4.00</cell><cell>15.0</cell><cell>100.0</cell><cell>0.0</cell><cell>0.0</cell></row><row><cell>Video 16</cell><cell>84.5</cell><cell>15.5</cell><cell>16.1</cell><cell>95.4</cell><cell>4.51</cell><cell>4.61</cell></row><row><cell>Video 18</cell><cell>81.2</cell><cell>18.8</cell><cell>14.9</cell><cell>92.5</cell><cell>7.42</cell><cell>7.49</cell></row><row><cell>Video 20</cell><cell>80.9</cell><cell>19.1</cell><cell>25.7</cell><cell>84.3</cell><cell>1.53</cell><cell>15.7</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Description and results of some used videos.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://www.who.int/fctc/text_download/en" />
		<title level="m">WHO framework convention on tobacco control</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Deep learning based driver smoking behavior detection for driving safety</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Chien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">P</forename><surname>Fan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Image and Graphics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="15" to="20" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">State transition modeling of the smoking behavior using LSTM recurrent neural networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Odhiambo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Cole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torkjazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Valafar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Computational Science and Computational Intelligence (CSCI)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="898" to="904" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Human smoking event detection using visual interaction clues // 20th International</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tseng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Pattern Recognition</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="4344" to="4347" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Smoking detection in video footage</title>
		<author>
			<persName><forename type="first">É</forename><surname>Dunne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">A Dissertation Submitted in Partial Fulfilment of the Requirements for the Degree of MAI</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page">43</biblScope>
		</imprint>
		<respStmt>
			<orgName>Computer Engineering ; Submitted to the University of Dublin, Trinity College</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Cigarette smoke detection from captured image sequences // Image Processing: Machine Vision Applications III</title>
		<author>
			<persName><forename type="first">K</forename><surname>Iwamoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Inoue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Matsubara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tanaka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Society for Optics and Photonics</title>
		<imprint>
			<biblScope unit="volume">7538</biblScope>
			<biblScope unit="page" from="82" to="87" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Cigarette smoke detection using feature values based on the kernel LMS algorithm</title>
		<author>
			<persName><forename type="first">K</forename><surname>Iwamoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Inoue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Matsubara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tanaka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEICE Technical Report</title>
		<imprint>
			<biblScope unit="volume">109</biblScope>
			<biblScope unit="issue">434</biblScope>
			<biblScope unit="page" from="237" to="248" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>Circuits and Systems</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Real time detection of speed hump/bump and distance estimation with deep learning using GPU and ZED stereo camera</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Varma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">P</forename><surname>Sasidharan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">I</forename><surname>Ramachandran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Nair</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">143</biblScope>
			<biblScope unit="page" from="988" to="997" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="https://opencv.org" />
		<title level="m">OpenCV (Open source computer vision library</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">ImageNet classification with deep convolutional neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="page" from="1097" to="1105" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Very deep convolutional networks for large-scale image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno>CoRR. 2014. abs/1409.1556</idno>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">3D convolutional neural networks for human action recognition</title>
		<author>
			<persName><forename type="first">Ji</forename><forename type="middle">S</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="221" to="231" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Recognition of human continuous action with 3D CNN // International Conference on Computer Vision Systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>Springer</publisher>
			<biblScope unit="page" from="314" to="322" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">HMDB a large human motion database</title>
		<ptr target="https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database" />
		<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
