=Paper=
{{Paper
|id=Vol-2476/short4
|storemode=property
|title=COSMIC Sizing of Machine Learning Image Classifier Software Using Neural Networks
|pdfUrl=https://ceur-ws.org/Vol-2476/short4.pdf
|volume=Vol-2476
|authors=Arlan Lesterhuis,Alain Abran
|dblpUrl=https://dblp.org/rec/conf/iwsm/LesterhuisA19
}}
==COSMIC Sizing of Machine Learning Image Classifier Software Using Neural Networks==
COSMIC Sizing of Machine Learning Image Classifier Software Using Neural Networks Arlan Lesterhuis 1 and Alain Abran 2 1 COSMIC Measurement Practices Committee 2 École de Technologie Supérieure – ETS, University of Québec, Canada lesterhuisa@kpnplanet.nl alain.abran@etsmtl.ca Abstract. Development of machine learning software has now penetrated a large diversity of domains, in both academia and industry. From the initial realm of research with a focus on innovation and creativity, its scaling up in industry re- quires improved planning, monitoring and control of the development and imple- mentation process. Such industry planning and monitoring is difficult without relevant measurement techniques adapted to the problem at hand. This paper il- lustrates how generic software functions can be extracted from machine learning (ML) system requirements and their functional size measured in COSMIC func- tion points - ISO 19761. An application of these concepts is presented using an example of an ML image classifier software with a feedforward neural network. Keywords: Machine learning, neural networks, COSMIC, function points, ISO 19761 1 Introduction The development of machine learning (ML) software has now penetrated a large diver- sity of domains both in academia and industry. From the initial realm of research with a focus on innovation and creativity, the scaling up of ML software in industry requires improved planning, monitoring and control of its development and implementation pro- cess. Such industry planning and monitoring is difficult without relevant measurement tools adapted to the problem at hand. While there is in the ML literature a large body of knowledge on the mathematical aspects of the variety of ML algorithms and analysis of their performance with various datasets, once implemented in software applications, there is very little in the literature about the software itself that needs to be developed in order to implement the ML algorithms in specific industry contexts. Generally, the literature describes the ‘system viewpoint’ of ML, bundling together all the tasks carried on by ML researchers, including the design of the ML system, its coding, operation and data analysis. Considering that ML expertise is very specialized and requires considerable expertise, while being in very high demand as well as in short supply, it would be valuable to segregate the ML specific tasks from the software de- velopment tasks for which expertise is more widely available and less specialized. Copyright © 2019 for this paper by its authors. 121 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). These coding tasks could then be delegated to staff with programming expertise, thereby freeing up the ML software developers and data analysts, thereby allowing them more freedom to address additional ML challenges. Of course, the prerequisite for such delegation is that software tasks, such as specification of the functions allo- cated to software be untangled from ML specific analytical tasks and well described prior to delegation. This means that generic software functions be segregated from ML specific functionalities. Success in this endeavor may allow parallelism of tasks in ML projects, with the possibility of shortening their development cycle. While it is expected that in every software there will be unique aspects making each software distinct from any other, there are as well functionalities common and generic throughout. This genericity is at the basis of software functional sizing measurement methods such as COSMIC Function Points – ISO 19761. In this paper, the functional principles underlying this COSMIC measurement method are used to address two ob- jectives: segregation of the generic (classical) functionality to be allocated to software, and not specific to ML; use of the international standard to size the generic functionality identified. Success in both will facilitate delegating tasks to data analysts, as well collecting data in a standardized fashion in order to develop estimation models for planning purpose, and for on-going monitoring of the software tasks within an ML development project. Section 2 presents overviews of machine learning and COSMIC function points. Section 3 presents the system view of the ML image classifier case study used in this paper, their generic functions allocated to software, followed by their measurement with COSMIC function points. Section 4 presents a summary and suggestions for future work. 2 Related work 2.1 Machine learning and neural networks A neural network is an ML application that can ‘learn’ to classify input data with the help of training examples of that input data. The example selected here from [1] is a neural network to be trained to classify handwritten digits. The neural network can learn to assign a handwritten digit, the desired digit, with an accuracy of over 97% depending on the network. During training each training example is input together with its desired value, the latter being used to determine the error between desired and actual output. Through this training, a cost (‘error’) function C of the neural network quantifies the average over the error of all individual training examples in a mini batch. An example of such a function is C(w, b) = (1/2n).∑x(y(x)−a(x))2, where n indicates the number of training examples in a mini-batch, x is a training example, y(x) its desired output, a(x) its actual output, while w and b represent the weights and biases in the hidden layers. 122 In the backpropagation algorithm the ‘overall error’ C(w, b) is reduced by systemat- ically and repeatedly adapting the weights and biases so that the output a(x) from the network approximates y(x) for all training inputs x: the neural network ‘learns’ [1,2]. Learning takes place on a basis of three sub-sets: the training set, used for training the network; the test set, for testing the result of training; the validation set, used to determine the values of the three ‘hyper parameters’ learn- ing rate (indicated by η), the size of the mini-batches to be used and the number of epochs of training. The purpose of the backpropagating algorithm is to find a global minimum of function C, or at least a minimum for which the error is acceptable for the purpose of the appli- cation. In practice it is useful to experiment with the size of the changes of the weights and biases supplied by the backpropagation algorithm. The size of the changes will then be multiplied by a positive factor η (eta), the learning rate. To prevent overfitting a regularization parameter (indicated by λ) is sometimes added. 2.2 Software functional size with COSMIC function points Function points quantify the functional requirements of software and are used for vari- ous purposes in software project management, including effort estimation, project plan- ning, project monitoring, productivity studies and benchmarking [3-5]. In the COSMIC functional size measurement method [3, 4] there are four types of data movements: entries and exits each move a data group in and out of the software, from/to func- tional users. reads and writes each move a data group from/to persistent storage. A data group is a set of attributes of interest to a functional user of the software being measured, i.e. a ‘thing’ in the real world of the functional users about which the soft- ware must enter, store, or output data. The unit of measurement of the COSMIC FSM method is one data movement of one data group, referred to as one COSMIC function point (CFP). 3 Case study: an ML image classifier of manuscript digits 3.1 System view: ML image classifier functions Of a given file of images of separate individual handwritten digits each image must be classified, i.e. assigned its correct digit. A file of randomly selected training images is available, each image showing the handwritten digit and the corresponding digit it rep- resents. An image of a digit consists of 28x28 pixels. The pixels are greyscale, with a value of 0.0 representing white, a value of 1.0 representing black, and in between values representing shades of grey. On the basis of functional re-use of a pre-programmed feedforward neural network algorithm, including its cost function, a neural network must be developed that can 123 learn to classify the images of the file. To initialize the network, it is therefore assumed that specifying the numbers of layers and the numbers of weights and biases per layer suffices. Also, to initialize the values of the weights and biases it suffices to specify the mean and standard deviation. Learning takes place on the basis of three sub-sets of the training images, called the training set, the test set and the validation set. The training set is used for training, the test set for testing the result of training. The validation set is used to determine the values of the hyper parameters learning rate (indicated by η), the size of the mini batches to be used and the number of epochs of training. The size of the mini batches to be applied must be determined. All images are stored with their sub-set name (‘train- ing’, ‘test’, or ‘validation’) for re-use. The learning performance is monitored by printing and displaying the classification accuracy and displaying the cost (error) per epoch of training. To detect overtraining, two graphs are required, one displaying the accuracy per epoch, the other the cost per epoch. The training set is enlarged by adding one elastically distorted copy of each image in the training set. For acceptance by the client the classification accuracy is required to be not less than 98%, verified on the basis of the test set. It must be possible to tune the network, i.e. investigate the accuracy and speed by varying the main parameters of the network struc- ture (the number of its layers, the number of units per layer and the hyper-parameters to be applied). 3.2 Functional view of the requirements allocated to generic software The functional users of the generic software to be measured from the system require- ments in section 3.1 are (Fig. 1): the data analysts of the neural network, i.e. those who tune the network so as to meet the accuracy requirement; the reused neural network algorithm (the feedforward algorithm in Figure 1): this reused software does not have to be measured in this example; it is considered a functional user rather than a software component to be measured. Fig. 1. Context diagram of the generic software For each image of separate individual handwritten digits, the image classifier software (the generic and algorithm software) assigns and records its correct digit. 124 The context for this case study explicitly states that there is functional reuse of a pre- programmed feedforward neural network algorithm, including its cost function. Con- sequently, to create the feedforward neural network the data analyst needs only specify a sequence of numbers in which: its length indicates the number of layers, each number indicates the number of units in its layer, and each unit in the hidden layers has one weight per input and one bias. The feedforward algorithm software receives the training parameters, including the hyper parameters and then: 1. assigns random values to the weights and biases on the basis of the mean and stand- ard deviation; 2. groups all training images randomly into mini batches, each consisting of a fixed number of training images; 3. forward propagates the training images of a mini-batch and determines the average cost (deviation, error) of the actual and desired output values of the training images in the mini-batch; 4. backpropagates the changes, which the algorithm determines on the basis of the av- erage error of the mini batch just processed, to all weights and biases backwards through the layers in the network and stores the (updated) values; 5. processes all mini batches of training examples (finishing an epoch of training); 6. repeats steps 2) to 5) for the specified number training epochs. It is assumed that the feedforward algorithm software stores the training session param- eters so that it is possible to train anew with one or more changed parameters, other parameters remaining the same. It also stores all the data needed to meet the require- ments of the generic software. Requirement 0 - Pre-processing. Images must be pre-processed. Note: since pre-pro- cessing is specific to each context (and not described-specified in the above system requirements), its measurement is not included in this case-study. Requirement 1 - Initialization of the feedforward network architecture. The net- work architecture is initialized by the parameters of the neural network architecture specified by the data analyst. Requirement 2 - Preparing training. 1. The software enlarges the training set by adding one distorted copy of each image into the training set. The software receives the required expansion instruction from the data analyst. In the absence of details on ‘expansion instruction’ an assumption is made here that this will consist of a single data group. If in other cases there is more in the ‘expansion instructions’, including more than one data group, this could then lead to additional data movements since there would be more data groups. 2. The software receives from the data analyst the number of images for the three sets of training, test and validation images, the members of which are randomly chosen; all images must be stored with their sub-set name. 125 3. Determining the learning rate η. The software receives the training parameters to produce the graph ‘Cost per epoch’ (with the validation images). By repeating train- ing with different learning rates, the data analyst can determine a suitable value of the learning rate η with the help of these graphs by comparing the rates of decrease of cost (i.e. error) – Fig. 2. Fig. 2. Cost per epoch 4. Determining a suitable number of epochs. For each execution of the training step (with the validation images), the software must print the graph of Fig. 3.The data analyst determines a suitable number of epochs with the help of the graph ‘Accuracy per epoch’ by training with the validation images and differing numbers of epochs. The analyst selects the smallest number of epochs with which the required accuracy can be reached – Fig. 3. 5. Determining the mini-batch size. The data analyst determines the mini-batch size from the graph in Fig. 4. The software receives from the data analyst the following inputs to produce the graph – Fig. 4: a number of epochs, a plotting period in sec- onds, and a number of intended mini-batch sizes. For execution of the software in the training step (with the validation images), the soft- ware must: read the classification accuracy per mini-batch size, and plot the four graphs of the classification accuracy of the mini-batch sizes versus time, one by one, in one continuous run, i.e. without interruption or stopping. Fig. 3. Accuracy per epoch Fig. 4. Speed per mini-batch size Requirement 3 - Training. The software receives from the data analyst the training session parameters (mean, standard deviation, number of epochs, number of images per mini batch, learning rate (η), regularization parameter (λ) and the sub-set name). During 126 training the number ‘classification accuracy per epoch’ and the corresponding elapsed training time is printed to monitor the learning performance. 3.3 COSMIC size of the generic software The measurement of the above functional processes using COSMIC function points and the software data movements within each functional process are presented in Table 1 together with their FP Id, sizes in CFP, data movements (DM) and data groups. Functional process 1 - FP1: Initialize the feedforward network architecture. The software receives from the data analyst the parameters to create the feedforward archi- tecture: a sequence of numbers the length of which indicates the number of layers and each number indicates the number of units in the layer, and where each unit in the hidden layers has one weight per input and one bias. Functional process 2 – FP2: Expand the images. The data analyst inputs the required expansion instruction, then this functional process copies each image, distorts it and adds the result to the training set. Functional process 3 – FP3: Divide images into 3 sub-sets. The data analyst inputs the number of images within each set, then this functional process divides the images into the three sub-sets of training, test and validation images, adds the sub-set name attribute value to each image and stores the image data. Functional process 4 – FP4: Display cost per epoch (Fig. 2). The data analyst re- quests to display the graph of the cost (‘error’) of (the last mini batch of) each epoch. Functional process 5 - FP5: Display classification accuracy per epoch (Fig. 3). The data analyst requests to display the graph of the classification accuracy of the images. Functional process 6 - FP6: Determine mini-batch size (Fig. 4). The data analyst inputs a number of epochs and a number of mini-batch sizes to be examined and deter- mines the desired mini-batch size visually on the basis of the graph. The software: ─ executes neural network algorithm with validation data, ─ graphically plots the last known epoch accuracy at each point of time. Functional process 7 – FP7: Train the network. The data analyst inputs the training session parameters (mean, standard deviation, number of epochs, mini-batch size, learning rate η, regularization parameter λ) to train the network. For monitoring the training, the epoch ID, number of correctly classified images, total number of images and elapsed time must be printed. The total functional size of the generic software is the sum of the sizes of its functional processes FP1 to FP7, that is: 4 CFP + 4 CFP + 4 CFP + 6 CFP + 6 CFP + 10 CFP + 5 CFP = 39 CFP 127 Table 1. Functional Sizes of FP 1 to FP7 in CFP FP Id DM Data group/ data attributes & size FP 1 Entry Number of units from data analyst size = 4 Exit Number of units to feedforward algorithm CFP Entry Result of initialization from feedforward algorithm Exit Error/confirmation message to data analyst FP 2 Entry Expansion instruction size = 4 Read Image data CFP Write Training image Exit Error/confirmation message FP 3 Entry Sub-set of images (sub-set name, number of images) size = 4 Read Image data CFP Write Image data with sub-set name Exit Error/confirmation message FP 4 Entry Epoch ID range size = 6 Read Epoch ID, epoch cost stored by the feedforward algorithm CFP Exit Epoch ID (x-axis, multiples of 50) Exit Cost (y-axis, multiples of 0,001) Exit Epoch cost Exit Error/confirmation message FP 5 Entry Epoch ID range size = 6 Read Epoch ID, epoch accuracy stored by feedforward algorithm CFP Exit Epoch ID (x-axis, multiples of 10) Exit Epoch accuracy (multiples of 0.5%) Exit Epoch ID, epoch accuracy Exit Error/confirmation message FP 6 Entry Training session parameters (without number of images per mini batch) size = Training session parameters (without no. of images per mini batch) to feedfor- 10 CFP Exit ward algorithm Entry Mini-batch size, input the sizes to be compared Exit Mini-batch size, sizes to be compared to feedforward algorithm Read Elapsed time, epoch accuracy per mini-batch size stored by feedforward algo. Exit Elapsed time (x-axis, in seconds) Exit Epoch accuracy (y-axis: multiples of 20%) Exit Mini-badge denotation from entry above Exit Epoch accuracy per mini-batch size at point of time Exit Error/confirmation message FP 7 Training session parameters (mean, std dev., number of epochs, no. of images Entry size = 5 per mini batch, learning rate (η), regularization parameter (λ), sub-set name) CFP Exit Training session parameters to feedforward algorithm, sub-set name Epoch ID, number of correctly classified images, total number of images, Entry elapsed time from feedforward algorithm Exit Print epoch ID, no. correctly classified images, total no. images, elapsed time Exit Error/confirmation message 128 4 Summary and future work In this paper an ML image classifier case study was used to illustrate how to move from a ‘system viewpoint’ of ML functionalities to the description of the generic soft- ware development tasks for which expertise is more widely available and less special- ized. Since these coding tasks can then be delegated to staff with programming exper- tise, the ML experts can be freed up, allowing them more freedom to address additional ML challenges. The case study illustrates how the generic software functionality has been segregated from the ML specific functionalities. In this paper, the COSMIC func- tion points technique was used to carry out the segregation. While it recognizes that in every software there will be unique aspects making each software distinct from any other, there are functionalities that are common and generic throughout. This genericity is at the basis of software functional sizing measurement methods such as COSMIC function points – ISO 19761, and has been used to address two objectives: segregation of the generic functionality to be allocated to software and not specific to ML; use of the international standard to size the generic functionality identified. Success in both facilitates delegating tasks to ML data analysts and software develop- ers, as well as collecting data in a standardized fashion to develop estimation models for planning purposes, and for on-going monitoring of the generic software tasks within an ML development project. It is to be noted that success in this endeavor will allow parallelism of tasks in ML projects, with the possibility of shortening their development cycle. Future work will include additional steps to verify the breadth and depth of the ge- neric software functions described in the set of ML requirements used, identification of ambiguities, and updates with corresponding size adjustments. Further validation will include verification with actual ML software already developed by industry. Additional empirical research work is also required to consolidate the insights developed in the research reported here. In particular, additional case studies from other domains may provide additional types and sources of generic functionality that could then be consid- ered for scaling purposes. References 1. Nielsen, M. A.: Neural networks and deep learning, Determination Press, available on www.neuralnetworksanddeeplearning.com (2015). 2. Graupe, D.: Deep learning neural networks, World Scientific (2016). 3. COSMIC Group: The COSMIC Functional Size Measurement Method – Measurement Man- ual, version 4.0.2, available on https://cosmic-sizing.org/publications/measurement-manual- v4-0-2/, (2017). 4. Abran, A. and Dumke, R. (Eds.): COSMIC Function Points Theory and Advanced Practices, CRC Press. ISBN 978-1-4398-4486-1 (2011). 5. Abran, A.: Software Metrics and Software Metrology, John Wiley & Sons and IEEE-CS Press, New Jersey, p. 328 (2010). 129