Scoring System Based on Neural Networks for Identification of Factors in Image Perception Inna Khortiuk[0000-0001-7525-8362], Galyna Kondratenko[0000-0002-8446-5096], Ievgen Sidenko[0000-0001-6496-2469], Yuriy Kondratenko[0000-0001-7736-883X] Intelligent Information Systems Department, Petro Mohyla Black Sea National University, 68th Desantnykiv Str., 10, Mykolaiv, 54003, Ukraine, innamysnyk@gmail.com, halyna.kondratenko@chmnu.edu.ua, ievgen.sidenko@chmnu.edu.ua, yuriy.kondratenko@chmnu.edu.ua Abstract. In the paper, the authors analyze the image memorability prediction using memorability experiments (tests) and neural networks. With the devel- opment of modern technologies for processing and presenting data, images make it easy to demonstrate the visualization of results, the visibility of materi- als, design features, etc. Image memorability prediction (image perception) has high scientific and practical importance and can be used in various fields, in particular, business, education, design, etc. Recently, deep convolutional neural networks (CNNs) have been attracting considerable attention in image restora- tion. The factors that influence the image memorability and the technologies and approaches for data processing are analyzed. This paper shows the effec- tiveness of applying the considered technologies and approaches on the experi- mental data of the public LaMem set, while calculating the quantitative assess- ment of image memorability. Photographers, designers, and advertisers will benefit from the results of this study directly. The peculiarities of the “Deep Learning” technology implementation are discussed in detail. Results confirm the efficiency of the considered approach for image memorability recognition. In our case studies, we have compared different neural network architectures and develop a scoring system for image memorability. To achieve this target, the authors use PHP as a programming language. Keywords: image memorability, image perception dataset, CNN, deep learn- ing, neural networks. 1 Introduction People remember thousands of images and the amazing number of their details [1]. While a few pictures are remembered, others are overlooked or rapidly forgotten. Designers, promoters, clip makers, and photographers are usually stood up with the actual question "what makes an image memorable?". But how to create an image that is memorable to the watcher? While experts investigated the ability of a person to memorize visual stimuli [2], in [3], 2222 photographs were memorized, and the con- cept of image memorization was introduced as the speed at which viewers (watchers) Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). determine a repeated image presentation sometime after its first demonstration on the screen. The storage of these images has been found to be the same for different sub- jects in different contexts, making some of the images more memorable than others. Thus, while the memory of an image can be considered as an indicator that is difficult to calculate, recent publications by scientists prove that this is not the case and there is a clear pattern. The image storage property has the potential to change it by changing specific properties or components presented in the image. First of all you need to understand the parameters (factors), which influence on image memorability. The paper [4] shows that image memory cannot be determined by the usual properties for image descriptions such as "weirdness", "aesthetic enjoyment" or "fun", and, moreover, people are not good to predict the memory of the image. Herewith, computer vision technologies have been proven to be quite effectiveness for solving the current prob- lem. The paper [5] describes appropriate technologies for determining the investment of certain areas of the image to the memory of the image. It is possible to change the level of image memorability by changing the areas with low or high memorability. This can have far-reaching results in a variety of areas, from advertising and games (for example, determine which commercials or company logos are more recognizable and memorable) to education (for example, change the image without losing its pri- mary meaning) and social networks (for example, determine the best scenes for per- ception and memorization in order to integrate interesting objects into them in the future). There is another solution for visual perception: to check how well people remember images, thereby showing them an image that they have never seen before. The task is how much it is necessary to change the image so that a person believes that he has never seen this image before. The paper [6] provides an overview of recent image storage work that shows that memory is a measurable and inherent image property. Next, it is necessary to define in more detail the elements that influence the image memorability. Development and globalization of the market contributes to increased competition in it. That is why many companies have to expand their sphere of influence, expanding the sales area of their goods or services. Along with this, the delivery process is becoming more ex- pensive. 2 Related Works and Problem Statement Recent studies on the storage of images [7] reported a high degree of consistency of factors due to which participants in the experiment can remember and forget images. This may indicate that memory is very closely related to images, despite the personal differences between the participants who observe the images. Modeling the results of a high degree of consistency of image factors was carried out for images of more than 100 different groups [8], and after that, it was also demonstrated on more specialized groups of images, in particular on people's faces [9] and visual scenes [10]. In these works, it was demonstrated that the corresponding consistency is not considered a peculiar property of factors and that it cannot be considered only as a difference be- tween images (for example, images in rooms can be more memorable, and images outside the room - vice versa). It has been shown that high consistency is maintained in 21 different categories of scenes both inside and outside of the room, each of which consists of hundreds of images. Previous studies have shown that objects are better remembered when they stand out from the main scene or background [10]. For example, in the works [5, 11], the authors demonstrated the results of long-term memory when working with images depicting strange things. Konkle et al. have shown that categories of objects with conceptually prominent samples showed less memory interference when increasing the number of samples [8]. In addition, for specific groups of images, in particular, people's faces, the results showed that another or unusual (non-trivial) face will be remembered more than usual [12, 13]. Borkin et al. analyzed that unique categories of visualizations have much higher memory performance than regular graphics, and that new and unique visualiza- tions are better remembered [3, 14, 15]. 3 Methods for Solving the Problem of Image Recognition Image recognition tasks are characterized by a number of difficulties. These are: ex- ternal factors due to the fact that the images of the same objects can vary greatly in different conditions and depend on the point of observation, lighting, scale, overlap, background complexity, and internal - variability in the middle of the class and de- formation (various forms of object) [16, 17]. Depending on the above factors, different groups of methods are used to solve the problems of image recognition in an image. The color / brightness filter method can be used in cases where the subject is substantially different from the background and the illumination is uniform and does not change. By separating such objects from the background, you can use geometric analysis to make classification decisions [18]. If the object on the background does not stand out significantly or has a complex color, then applying the method of color characteristics of the brightness of the filters will not give good results. In this case, a more efficient method of contour selection and analysis. To do this, we highlight the border in the image, that is, the place of a sharp change in the brightness gradient, for example, using the Canny method. Then we check the highlighted border for matching the geometric contours of the object, this can be done using the Hough Transform method, to search within circles, ellipses, lines or to apply contour analysis methods [2]. If the image has many small details, then the contour analysis can be complicated. In this case, you can apply the pattern mapping method. This method is used to find the areas of images that are most similar to a given template - a specific object in one class. The pattern mapping method finds the exact points of the pattern with the image points [4]. However, if the image is rotated or resized with respect to template settings, this method does not work well. Key points based methods are used to overcome these limitations. A key point is a small area that is significantly highlighted in an image. There are several methods for determining such points, these can be Harris corner detector or blob, that is, small areas of equal brightness, with a fairly clear border stand out against the general background. For the key point, the descriptor is calculat- ed - the characteristic of the key point. The descriptor is calculated over a given point area as the direction of the brightness gradients of different parts of this area. There are several methods for calculating descriptors for special points: SIFT, SURF, ORB, etc. Key points can be used to find an object in an image. To do this, we need to take an image of the desired object and do the following [5, 7, 15]: ─ find specific points of the object in the image with the object and calculate their descriptors; ─ find special points in the image already mentioned and calculate their descriptors; ─ compare the descriptors of the special points of the object and the descriptors of the special points found in the image; ─ if a sufficient number of matches is found, then mark the area with the correspond- ing points. The scope of the above methods is limited due to the fact that they all have poor gen- eralizability. This means that the model describes one particular object or a very nar- row class of objects. In order to successfully recognize not one object, but large and diverse classes of objects, you can apply machine learning methods [12]. Deep learning is a set of feature/representation learning methods of machine learn- ing. In the context of image recognition, the main algorithm of deep learning is a con- volutional neural network. 4 Image Memorability Experiments A new dataset was created by sampling figures from 21 various categories of inside and outside scenes. The choice of category was dependent on having at least 300 im- ages of the required size. Duplicate images were manually removed. Images were truncated and cropped to 700x700px, 25% of each scene category was randomly se- lected as targets, and other images became fillers). The targets are images for which they have received memory ratings [19, 20]. The Amazon Mechanical Turk (MTurk) was used to collect the experimental data. MTurk is a site that allows individuals and entrepreneurs to coordinate the use of the human mind to solve problems that computers cannot currently solve. This is one of the famous Amazon Web Services sites. On this site, employers can upload Human Intelligence Tasks, for example, choosing the best photo available, adding product descriptions, or identifying artists on music CDs. Staff members can later review current actual tasks and solve them at the employer-determined cash payment. AMT 1 experiment. For this: ─ AMT memory games were set up, in which successions of 120 images (a combina- tion of goal images and filler images selected from one group) were displayed for 1 second each, the original image and its repetition were at a distance of 91-109 im- ages from each other, and sequential images, which were separated by a special sign (cross) with a time interval of 1.4 sec; ─ some placeholder images were repeated with shorter intervals of 1-7 images and were used as attentiveness tests to find out when a person did not take part in the game (did not pay the necessary attention); ─ experiment participants pressed a key when they saw a repeated image, and then received feedback (cross of different colors). In this case, the image was repeated only once; ─ experiment participants could pass this memory test several times, but each time another group of images was displayed; ─ false alert is triggered when the experiment participant shows at the first presenta- tion that the image has been repeated. Not a single keystroke during the first presentation is a correct deviation. The answer is considered correct when the re- peated image is correctly remembered, otherwise, the answer is considered incor- rect (mistake). New set of data for AMT 1 includes 1754 targets / 7674 filler images. The partial set of corresponding data is presented in Table 1. Table 1. The partial set of corresponding data for AMT 1. Category Targets Fillers HR‾ (%) FAR‾ (%) HR cons. (ρ) FAR cons. (ρ) Datapoints /Target Amusement 68 296 64.2 (SD: 15.5) 10.2 (SD: 9.7) 0.85 (SD: 0.3) 0.80 (SD: 0.3) 84.8 (SD: 3.2) park Playground 74 330 63.3 (SD: 14.4) 14.7 (SD: 12.7) 0.78 (SD: 0.4) 0.84 (SD: 0.3) 86.4 (SD: 2.7) … … … Cockpit 68 320 49.5 (SD: 17.2) 18.2 (SD: 14.7) 0.70 (SD: 0.5) 0.88 (SD: 0.2) 80.6 (SD: 3.5) The HR and FAR values are calculated over the goals (targets), for which identified an average of 85 data points per image. AMT 2 experiment. AMT 2 experiment was conducted on the grouped set of target and placeholder images in all categories, herewith a new set of memory factors was collected follow- ing the same protocols as before. New dataset statistics for AMT 2 (Table 2). Table 2. New data for AMT 2 experiment Category Targets Fillers HR‾ (%) FAR‾ (%) HR cons. (ρ) FAR cons. (ρ) Datapoints /Target 21 scenes 1754 7296 66.0 (SD: 13.9) 11.1 (SD: 9.5) 0.74 (SD: 0.2) 0.72 (SD: 0.1) 74.3 (SD: 7.5) Comparing memory performance across data sets shows consistency of results and stability of memory performance. For images in 21 scenes context (AMT 2), the im- age memorization value is higher than in 7 scenes context (in the laboratory) and even higher than in 1 scene context (AMT 1). At the same time, the overall trend remains the same. 5 Practical Implementation Since the main purpose of our practical implementation was to simplify and acceler- ate the results it was suggested to divide the system into three main parts: the user interface, the client service and the REST API of the scoring system itself [21, 22]. The REST API service was deployed on the IBM AI network and was the only call to accept image links. Given that the neural network model can only analyze one im- age per call, the entire call interface is only one image parameter. The image parame- ter took a direct link to the image for analysis. Given this restriction, the processing of multiple images was simultaneously translated into client service (Fig. 1). Fig. 1. Schematic representation of the organizational structure of the scoring system [21] Client service is implemented as an application in PHP programming language, using Laravel web Framework [10] and following the MVC web development paradigm [23-26]. Downloaded image data and analysis results are stored in a small SqLite database to simplify deployment and save computing resources. The client service interface is four separate REST API calls, namely: ─ analyze is accepts n-number images and sends an asynchronous Redis queue for background processing; ─ index is returns a list of downloaded images and data about them. Data such as upload date, file name, image processing step, and memory rating are displayed; ─ view is returns a description of a particular image; ─ compare is generates a comparative characteristic between two different, previous- ly loaded and analyzed images in the system. The end-user visible system is implemented in the form of an interactive Web Inter- face with a cross-cutting of VueJs, HTML and CSS libraries. The entire user interface is designed using the SPA approach to maximize comfort and speed when analyzing image memory evaluation [27]. The service suite was deployed using Docker Compose [28-30]. The Nginx open source server was used as the web server. For the first time when downloading the implemented system, the user will see a window asking to download images for fur- ther analysis (Fig. 2). Fig. 2. Image upload form The user can drag and drop the desired images through the standard operating system window. After the images have been selected, the system will prompt you to delete the unnecessary images or add additional ones. When the user is ready to send the image for analysis, just click the Analyze button (Fig. 3). Fig. 3. Image analysis interface After successfully downloading all the images, the system will queue each image for analysis. The user is not required wait for completion as the analysis runs asynchro- nously and the system will send a successful completion message. After the results are obtained, the user can view all the estimates (for example, the score of the first image is 8.81) and sort them by the order (Fig. 4). Fig. 4. Scoring table By clicking on a specific image, you can see a heat map that reflects the impact of the image elements on the image's memory rating (Fig. 5). Fig. 5. Comparison table with heat maps To compare the image with the other analyzed results, the user can click the compare button and select the images that interest him. As a result, the system will generate a comparative table with visualized heat maps for each image (Fig. 5). A heat map is a tool that uses a color palette to visualize data in an image. The heat map uses the color spectrum from warm to cool tones to show areas of attention that attract users (warm tones attract the most attention, cold tones attract the least attention). 6 Conclusions In the paper, authors analyze the image memorability prediction using memorability experiments (tests) and neural networks. The developed scoring system not only al- lows to evaluate and compare the results of storing the images of the automatically trained system with real experimental data, but also has practical application, as it is a full-featured application with a user-friendly interface, which widens the circle of users. The system can be used by designers, SMM specialists, teachers and students, advertising industry workers, and regular users, who aim to address the issue of choosing the image that the audience will pay attention. The studies presented in the paper offer great potential for future use as they provide tools to determine the condi- tions in which images can be best remembered (and how the image memorability may vary in different contexts). As a result, the existing technologies and approaches to image recognition were analyzed; existing studies on the detection of factors of image storage were consid- ered; scoring system using PHP programming language and convolutional neural networks for identification of factors in image perception was developed; experiments and results analysis using Amazon Mechanical Turc were performed. References 1. Bartlett, J., Hurry, S., Thorley, W.: Typicality and familiarity of faces. Memory & Cogni- tion 12(3), 219-228 (1984). 2. Borji, A., Itti, L.: Defending Yarbus: Eye movements reveal observers' task. Journal of vi- sion 14(3), 29-29 (2014). 3. Borkin, M., Vo, A., Bylinskii, Z., Isola, P., Sunkavalli, S., Oliva, A., Pfister, H.: What makes a visualization memorable? IEEE Transactions on Visualization and Computer Graphics 19(12), 2306-2315 (2013). 4. Brady, T., Konkle, T., Alvarez, G.: A review of visual memory capacity: Beyond individu- al items and toward structured representations. Journal of vision 11(5), 4-4 (2011). 5. Standing, L.: Learning 10000 pictures. The Quarterly journal of experimental psychology 25(2), 207-222 (1973). 6. Wiseman, S., Neisser, U.: Perceptual organization as a determinant of visual recognition memory. The American journal of psychology, 675-681 (1974). 7. Bainbridge, W., Isola, P., Oliva, A.: The intrinsic memorability of face photographs. Jour- nal of Experimental Psychology: General 142(4), 1323-1334 (2013). 8. Konkle, T., Brady, T., Alvarez, G., Oliva, A.: Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. Journal of Experimental Psychology: General 139, pp. 558-578 (2010). 9. Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: IEEE Computer Society Conference on Computer Vi- sion and Pattern Recognition, pp. 3485-3492, San Francisco, CA (2010). 10. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856 (2014). 11. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 806-813, Columbus, OH (2014). 12. Basturk, B., Karaboga, D.: An Artificial Bee Colony (ABC) Algorithm for Numeric func- tion Optimization. In: IEEE Swarm Intelligence Symposium, Indianapolis, Indiana, USA (2006). 13. Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A.: Intrinsic and extrinsic ef- fects on image memorability. Vision research 116, 165-178 (2015). 14. Khosla, A., Bainbridge, W., Torralba, A., Oliva, A.: Modifying the memorability of face photographs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3200-3207 (2013). 15. Worthen, R. R. H. J. B.: Distinctiveness and memory. University Press, Oxford (2006). 16. Mikhov, D., Kondratenko, Y., Kondratenko, G., Sidenko, I.: Fuzzy logic approach to im- proving the digital images contrast. In: IEEE 2nd Ukraine Conference on Electrical and Computer Engineering, UKRCON, pp. 1183-1188, Lviv, Ukraine (2019). 17. Pomanysochka, Y., Kondratenko, Y., Kondratenko, G., Sidenko, I.: Soft computing tech- niques for noise filtration in the image recognition processes. In: IEEE 2nd Ukraine Con- ference on Electrical and Computer Engineering, UKRCON, pp. 1189-1195, Lviv, Ukraine (2019). 18. Wang, W., Sun, J., Li, J., Wu, Q., Liu, J.: Investigation on the Influence of Visual Atten- tion on Image Memorability. In: Zhang, Y. (eds) Image and Graphics. ICIG 2015. Lecture Notes in Computer Science, vol. 9219, pp. 573-582. Springer, Cham (2015). 19. Basavaraju, S., Sur, A.: Multiple instance learning based deep CNN for image memorabil- ity prediction. Multimed Tools Appl 78, 35511-35535 (2019). DOI: 10.1007/s11042-019- 08202-y. 20. Leonardi, M. et al.: Image Memorability Using Diverse Visual Features and Soft Atten- tion. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds) Image Analysis and Processing. Lecture Notes in Computer Science, vol. 11752, pp. 171-180. Springer, Cham (2019). 21. Glanzer, M., Adams, J. K.: The mirror effect in recognition memory: data and theory. Journal of Experimental Psychology: Learning, Memory, and Cognition 16(1), 5-16 (1990). 22. Ji, J., Chen, G., Sun, L.: A novel Hough transform method for line detection by enhancing accumulator array. Pattern Recognition Letters 32(11), 1503-1510 (2011). 23. Pomanysochka, Y., Kondratenko, Y., Sidenko, I.: Noise filtration in the digital images us- ing fuzzy sets and fuzzy logic. In: 15th International Conference on ICT in Education, Re- search, and Industrial Applications: PhD Symposium (ICTERI 2019: PhD Symposium), vol. 2403, pp. 63-72, Kherson, Ukraine (2019). 24. Watanabe, K., Imamura, M., Asami, K., Amanuma, T.: A Web Application Development Framework Using Code Generation from MVC-Based UI Model. In: Omatu, S. et al. (eds) Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Am- bient Assisted Living. IWANN 2009. Lecture Notes in Computer Science, vol. 5518, pp. 404-411. Springer, Berlin, Heidelberg (2009). DOI: 10.1007/978-3-642-02481-8_57. 25. Kondratenko, Y., Gordienko, E.: Neural Networks for Adaptive Control System of Cater- pillar Turn. In: Annals of DAAAM for 2011 & Proceeding of the 22th Int. DAAAM Symp. "Intelligent Manufacturing and Automation", 2011, pp. 0305-0306. 26. Sidenko, I., Kondratenko, G., Kushneryk, P., Kondratenko, Y.: Peculiarities of Human Machine Interaction for Synthesis of the Intelligent Dialogue Chatbot. In: 10th IEEE Inter- national Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), pp. 1056-1061, Metz, France (2019). 27. Amengual, X., Bosch, A., de la Rosa J.L.: Review of Methods to Predict Social Image In- terestingness and Memorability. In: Azzopardi, G., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2015. Lecture Notes in Computer Science, vol. 9256, pp. 64- 76. Springer, Cham (2015). 28. Cook, J.: Docker for Data Science. Apress, Berkeley, CA (2017). 29. Bhat, S.: Practical Docker with Python. Apress, Berkeley, CA (2018). 30. Freeman, A.: Essential Docker for ASP.NET Core MVC. Apress, Berkeley, CA (2017).