Research of Methods for Image Sharpness Evaluation in Photos of People

Research of Methods for Image Sharpness Evaluation in Photos of People VictoriaVysotska victoria.a.vysotska@lpnu.ua Lviv Polytechnic National University

Stepan Bandera 79013 Lviv Ukraine

NataliiaSharonova nvsharonova@ukr.net National Technical University "KhPI"

Kyrpychova str. 2 61002 Kharkiv Ukraine

MariyaShirokopetleva marija.shirokopetleva@nure.ua Kharkiv National University of Radio Electronics

Nauky Ave. 14 61166 Kharkiv Ukraine

OleksandrDolhanenko oleksandr.dolhanenko@nure.ua Kharkiv National University of Radio Electronics

Nauky Ave. 14 61166 Kharkiv Ukraine

AnastasiyaChupryna anastasiya.chupryna@nure.ua Kharkiv National University of Radio Electronics

Nauky Ave. 14 61166 Kharkiv Ukraine

SerhiiSmelyakov serhii.smeliakov@nure.ua Kharkiv National University of Radio Electronics

Nauky Ave. 14 61166 Kharkiv Ukraine

Research of Methods for Image Sharpness Evaluation in Photos of People 1613-0073 6E103C93206BC9F3058FB6DA83F32D9D GROBID - A machine learning software for extracting information from scholarly documents Sharpness, image, aperture, depth of field, focus, FFT, variance of the Laplacian 1 0000-0001-6417-3689 (V. Vysotska) 0000-0002-8161-552X (N. Sharonova) 0000-0002-7472-6045 (M. Shirokopetleva) 0000-0002-3996-6940 (O. Dolganenko) 0000-0003-0394-9900 (A. Chupryna) 0000-0002-5791-2479 (S. Smelyakov)

The subject matter of the article is image sharpness evaluation in photos of people. The goal of the work is to analyse the existing methods of image sharpness evaluation, compare their performance and quality of results, suggest improvements for the use case of sharpness classification of photos of people, where large quantities of background blur is present due to the aperture effect. In this article the methods for image sharpness evaluation were described and tested on a set of selected images. The images contained different subject sizes and types of blur. The following methods were used: Fast Fourier Transform (FFT), Variance of the Laplacian, Appearance-based face detection algorithms, metadata analysis, linear trendline analysis. As a result, the problem of naturally blurred background was demonstrated and conclusions were made. An alternative method of sharpness evaluation was de-scribed, which solves the mentioned problem. The suggested improved algorithm was tested to determine if it satisfies expectations and solves the identified problems. To implement the Fast Fourier Transform and Variance of the Laplacian methods, the OpenCV library was used. The following results were ob-tained -when using the default implementation, the FFT and Variance of the Laplacian methods are not reliable for evaluating sharpness for images containing large and unstable quantities of naturally blurred background (due to the open aperture) and when using different settings of the lens and camera. The following conclusions were made: steps need to be taken to eliminate the factors of unstable quantities of naturally blurred background and camera preferences and in this way improve accuracy, reliability of sharpness evaluation. This means evaluating the sharpness of parts of the images and the images as a whole. These steps include but are not limited to face position detection, identifying faces that were supposed to be in focus when the photo was taken, sharpness evaluation of only areas that were in-tended to be in focus. Plans were set for further research and improvements of the suggested algorithm.

Introduction

Photography is very subjective. There are many ways to take a photo, since all of us see the world in very different ways. There are, however, some basic principles for achieving a generally well composed photograph. You may or may not follow these principles and still get a great photo that tells a story, but statistically speaking the best looking and most objectively recognised as professional and pleasing photos are the ones which have the subject in critical sharp focus.

Achieving focus on a photo is a very complicated process, which has multiple factors influencing the final result. A general smartphone user takes this process for granted, as the phone does all decision making and photo processing for us in real time, most of the time achieving pretty good results for general social media use. The algorithms used there are fine-tuned, subject oriented. Most of the time our phones use machine learning to detect the scene and tune the settings to achieve better results.

Our smartphones are getting better and better every year in terms of photography, there is even the ability to imitate expensive lenses by adding a fake depth-of-field to the photos [1]. However, any well-established professional photographer will say how important it is to be in control of the manual settings of the camera, being the decision maker and scene establisher, getting the exact result the creator wants, and not the tool on its own. Moreover, they use professional hardware, which may contain some "smart" features, but generally is very exposed to manual overriding, giving the operator more flexibility. However, where humans are involved mistakes are present.

Having photographed an event, say, a wedding, the photographer usually spends hours and sometimes days looking at more than 2000 photos, filtering out the ones that are to be deleted and highlight the best ones for further editing. This is a very long and complicated process of comparing, what seem to be, identical photographs at first glance (but actually different in slight ways), which can be a big factor for the final result. The most important thing the photographer looks at is the sharpness of the subjects. It is a general rule, that, when photographing humans or animals, the eyes are the ones that need to be in critical sharp focus. Everything else in the photo can be changed -the lighting can be increased or decreased, some elements can be added or removed, but sharpening a blurry subject is a very destructive process which is highly discouraged.

The goal of this work is to analyse and compare different algorithms and program solutions for identifying the sharpness of a photo (containing humans as subjects) automatically and proposing improvements of the existing algorithms to solve the problems which will be described further on.

Related Works

When it comes down to solving a scientific problem, it is required to operate with objective terms and calculations. That is why it is important to define what a sharp photo is. The common way of measuring the sharpness is by the "rise distance" of edges within the image. That way, the sharpness is determined by the distance of a pixel level between 10% and 90%. Therefore, it can be stated that sharpness is measured by analysing the intensity of edge gradients [2]. However, the threshold level of intensity to classify a photo being sharp or not is very circumstantial and cannot be standardised. During the following experiments, a set of reference photos will be used to determine the minimum and maximum values of sharpness and thus, providing the threshold. Apart from that, the problem of image quality assessment [3] is well-known and was attempted do be generally solved by many.

In order to start detecting and classifying the sharpness of photos it is first needed to clarify which camera and lens settings influence it. The camera in combination with the lens has 3 main settings: aperture, shutter speed and ISO.

Aperture is the main setting of the lens. It is generally an opening that can be bigger or smaller and thus let in more or less light. It is usually preferred to let in as much light as possible, therefore "opening up", or "making the aperture wide open", but this has a side effect that can directly influence the sharpness of the subject. The more open the aperture -the more shallow the depth of field is. For cameras that can only focus on one object distance at a time, depth of field is the distance between the nearest and the furthest objects that are in acceptably sharp focus.

By knowing the DOF (Depth Of Field) we can understand what depth of the image had to be in focus. To give a more clear understanding of why this is important an example image is provided, shot with a very shallow DOF (see Figure 1).

Figure 1 (photo by Dolhanenko O.) demonstrates a shot with a very open aperture of 2.8 (the lower the number the more open the aperture is). The subject that lays within the shallow depth of field is in critical sharp focus, however the secondary subjects which are before and after the field boundaries are not in focus at all (in this case only before the subject on the right).

By means of traditional photo sharpness detection algorithms, this is not a sharp photo, as the area in critical sharp focus is very small compared to the blurry part (this will be experimentally tested further on). However, if the subject is correctly identified among the two and is exclusively checked, then the photo is in fact in critical sharp focus and is acceptable for further editing. Another setting that influences the sharpness of a photo is the shutter speed. The rule is "the faster -the better" for general photography. The quicker the curtain collapses before the sensorthe less blur there will be on the photo, as each movement of subjects in the photo during the shot will cause their sharpness to decrease. Especially when shooting in low light conditions, when fast shutter speeds are not available (otherwise the photo will be too dark) this motion blur is quite noticeable due to minor movements.

The pursuit of accurate image sharpness assessment has led to the exploration of various approaches, encompassing spatial domain-based methods, spectral domain-based methods, learning-based methods [4], and a combination of these techniques, each presenting unique advantages and challenges, including utilization of methods such as Local Phase Coherence [5,6], Edge Information analysis approach [7], Normal-Gradient-Based approach [8], Gradient Neighbourhood-Weighted approach [9] and others.

Zhu et al. (2023) [10] conducted a comprehensive review, offering insight into the current trends and performance comparisons of notable algorithms, revealing a landscape of ongoing innovation aimed at overcoming the shortcomings of existing methods.

Bielievtsov et al. (2018) [11] investigated network technology for the transmission of visual information, highlighting the importance of maintaining image quality in the context of digital communication and storage. This work is foundational, setting the stage for further research into image quality assessment methods that are critical for various applications, including but not limited to, facial recognition, social media, and professional photography.

In the domain of image search and retrieval, Smelyakov et al. (2020) [12] introduced an innovative approach to image engine search for big data warehouses, emphasizing the necessity of high-quality image processing for efficient and accurate image retrieval. This development is particularly relevant to our research as it underscores the significance of image sharpness in enhancing the performance of search engines, which often rely on visual content analysis to function effectively.

Furthermore, the effectiveness of preprocessing algorithms for natural language processing applications was explored by Smelyakov et al. (2020) [13], illustrating the broad applications of image and signal processing techniques across various fields of computer science. Although focused on natural language processing, the principles of preprocessing and quality enhancement are applicable to the domain of image processing, providing insights into methods that could potentially improve image sharpness evaluation algorithms.

The development of no-reference (NR) [14] sharpness metrics has been particularly noteworthy, with Duan et al. (2021) [15] introducing an efficient NR objective sharpness assessment metric designed for images with shallow depth of field, a common characteristic in portraits and photos emphasizing human subjects. This metric, which calculates sharpness based on bidirectional pixel intensity differences, addresses the limitations of traditional sharpness assessment tools when applied to such images.

Research by Her and Yang (2019) [16] on image sharpness assessment algorithms for autofocus systems further exemplifies the field's evolution. Their work evaluates the performance of several spatial domain functions, highlighting the scene adaptability and anti-jamming capabilities of the Benner algorithm and the sensitivity of the Laplace algorithm, among others. This research underscores the critical role of sharpness evaluation functions in enhancing the quality of images captured by various imaging systems.

The development of advanced artificial intelligence systems, as explored by Kyrychenko, Tereshchenko, Proniuk, and Geseleva (2023) [17], through the use of predicate clustering methods, presents potential avenues for refining image sharpness evaluation techniques Moreover, advancements in image quality assessment for zoom photos, as investigated by Han et al. (2023) [18], reveal the challenges posed by small sensor sizes and fixed focal lengths in smartphones. Their novel no-reference zoom quality metric incorporates traditional sharpness estimation with image naturalness concepts, demonstrating significant improvements in assessing image quality over traditional metrics.

Methods

Existing methods

The Fast Fourier Transform is a convenient mathematical algorithm for computing the Discrete Fourier Transform (DFT). It is used for converting a signal from one domain into another [19].

The FFT is used in different areas, such as mathematics, music, engineering, etc. This method is widely used, as sometimes calculations are much easier performed when the time-series signals are converted into the frequency domain. This method can also be used to convert the frequency domain back to the original format. When talking about FFT in image processing and computer vision, it is important to note that the image is represented in the Fourier and Spatial domains. So, the image is represented in both imaginary and real components.

The obtained values can be analysed to perform blurring or blur detection, edge detection, analysis of textures, etc.

There is a sampled Fourier Transform, which is called DFT. It contains only the set of image samples which are enough to fully represent the spatial domain image [20] (which is often used for further quality metrics extraction).

Given an image with size N*N, the resulting DFT matrix can be defined as follows:

𝐸𝐹(𝑘, 𝑙) = ∑ ∑ 𝑓(𝑖, 𝑗)𝑒 !"#$% !" # & $" # ' (!) *+, (!) -+,(1)

where f(a,b) is the image in the spatial domain and the exponential term is the basis function corresponding to each; point F(k,l) in the Fourier space.

The equation can be interpreted as: the value of each point F(k,l) is obtained by multiplying the spatial image with the corresponding base function and summing the result. At the base level functions, operations are represented as sine and cosine waves which have increasing frequencies. For example, F(0,0) represents the DC-component of the image (average image brightness) and F(N-1,N-1) is the highest frequency of the image.

The ordinary one-dimensional DFT has N 2 complexity. If the Fast Fourier Transform (FFT) is used, the complexity can be reduced to Nlog2N. For computing large images this improvement is crucial. However, some forms of the FFT may restrict the maximum size of the input to N=2 n .

The result is an output image represented with complex numbers . This image can be displayed in two states: either with the real and imaginary part or with magnitude and phase (see Fig. 2).

When solving problems in the area of image processing, commonly only the magnitude of the Fourier Transform is displayed. The example image of the result of such transformation is illustrated above.

For clear and reliable results, a contrast detection threshold must be calculated beforehand [21] in order to understand if the image has enough contrast for further analysis. In case of contrast availability and FFT algorithm completion, a floating point value of the mean of the magnitude indicates the relative sharpness of the whole image. Of course, since this value is relative, conclusions cannot be made without a reference sharpness value.

Figure 2: The result of calculating and displaying the Fourier Transform magnitude of an input image

Another option is to use the variance of the Laplacian. The Laplacian of a function f at a point p is (up to a factor) the rate at which the average value of f over spheres cantered at p deviates from f(p) as the radius of the sphere shrinks towards 0.

The Laplacian operator is defined as the divergence of the gradient of function f, as shown in formula (2).

∆𝑓(𝑥, 𝑦) = 𝑑𝑖𝑣3𝑔𝑟𝑎𝑑(𝑓)7

(2) In this definition, the gradient is the slope of steepest accent and it gives information about the point and direction of the highest accent in local maxima and, likewise, the local minima.

In case of image sharpness detection, divergence is the vector field associated with blurriness from subject motion or natural background blurriness. The calculated matrix of Laplacian is demonstrated on Figure 3. Using the process of convulsion with the source image, it is being transformed based on the Laplacian kernel. This is used to find areas of rapid source changes (but this works if there is no noise in the image [22]).

Using the OpenCV library this method will be tested on sample images which contain both blurry areas and sharp subjects. This partial blurriness was caused by a very open aperture, which made a very distinct background separation.

var imageMat = Highgui.imread(image.absolutePath, Highgui.CV_LOAD_IMAGE_GRAYSCALE) val destination = Mat() Imgproc.Laplacian(imageMat, destination) val median = MatOfDouble() val std = MatOfDouble() Core.meanStdDev(destination, median, std) val variance = Math.pow(std.get(0, 0).get(0), 2.0)

To get a single floating point number representing the overall sharpness of a photo it is first needed to load the source image in grayscale, after which the basic Laplacian function should be called.

The alternative algorithm

Looking at the results of the two popular algorithms it can be stated that neither one is optimised enough for the problem stated at the beginning. During photo sessions most portrait photos will be taken with an open aperture which will result in the background having many areas with low frequencies [23]. If the area with naturally low frequencies is greater than the area with high frequencies it will result in the image being labelled as "not sharp", which is not necessarily true. The question arises -how to make judgements on the photo subject sharpness if analysing the whole image is not optimal?

As a top level solution, the sharpness detection algorithm should be modified in such a way, so that only areas that are supposed to be in focus are evaluated and the background/foreground is ignored. This is the general principle of evaluating visible errors in specific areas of the photo [24]. An example of such approach can be found in the framework for measuring sharpness in natural images [25]. Also, there is research about image quality assessment based on regions of interest, which are identified as features which are highly spatially nonstationary [26].

The solution will be developed based on the limitation that the subjects that need to be in focus are human faces. This limitation, however, can be eliminated by modifying the algorithm and providing support for more subjects.

The following list describes the steps of the algorithm:

1. Find the coordinates and boxing boundaries of every face in the frame 2. Extract the focus distance from the photo 3. Calculate the distance to every face in the frame 4. Calculate the ideal depth of field 5. Select the faces that are within the intended focus plane (focus distance +-ideal depth of field) 6. Apply the sharpness detector only for the boxes containing the selected faces 7. Calculate the average sharpness score based on the individual sharpness scores Next, the steps of the algorithm will be described in more detail with the propositions and variants for implementation.

Finding the boundaries and coordinates of faces in the frame

There are quite a few methods and options when it comes to face detection in images. Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the image location and extent of each face. Some methods for detecting faces include:

• Knowledge-Based Top-Down Methods;

• Feature invariant approaches;

• Template matching methods;

• Appearance-based methods. Knowledge-based methods are developed with the scientific knowledge about human faces as the primary source of information. The problem with this approach is the difficulty in translating human knowledge into well-defined rules. It is very hard to find a perfect balance of strictness in these rules. If the rules are defined too strictly -the method may fail to detect faces that do not pass all rules at once. On the other hand, if the rules are too general -the method may result in false positives.

The feature-based methods are based on the fact that humans have the natural ability to recognize a face in different lighting conditions, under different angles and circumstances, meaning we are trained to detect certain features. This method builds on this fact and many variants were proposed, when first the features are extracted and analysed. One problem with these feature-based algorithms is that the image features can be severely corrupted due to illumination, noise, and occlusion.

In template matching methods a face pattern is manually predefined. When analyzing an input image, the correlation is computed for separate parts of the presumable face: the contour, the nose, eyes, mouth. The result is positive, if the mean correlation value of these components is above certain threshold. This method, however, is not ideal due to the lack of flexibility when it comes to different face shapes, poses and scale.

Lastly, the appearance-based methods are more close-to-life, than the previous ones. The templates are not generated by experts (like in template matching), rather taken from samples of actual image databases. This method relies heavily on statistical analysis and machine learning.

Appearance-based methods have many different implementations, starting like distributionbased, support vector machines, hidden Markov model, cascade classifiers, cascaded convolutional networks [27], etc.

Taking into account the advantages and disadvantages of the abovementioned face detection methods, it was decided that appearance-based methods are well suited for the task. When implementing the improved sharpness detection algorithm, cascade classifiers can be used. OpenCV contains pre-trained open classifiers that can be freely downloaded. To retrieve face coordinates using the OpenCV library one can use the CascadeClassifier.detectMultiScale function. The arguments for this function are: the input image (in gray scale), scaleFactor and minNeighbours. The scaleFactor specifies how much the image size is reduced with each scale. minNeighbours specifies how many neighbors each candidate rectangle should have to retain it. These parameters will be fine-tuned during the development process.

Of course, everything that involves image processing and subject recognition will always take up some valuable processing time, so there is another way of extracting the faces in the frame. The other method relies heavily on the camera's integrated ability to detect faces in real time and write this information to the metadata. Of course, the algorithm should not be strictly reliant on using this optional metadata, but definitely should utilize it if available -that would dramatically improve the performance of the method.

As a result of image scanning with classifiers or face extraction using metadata, the coordinates and boundaries of all faces in the frame will be retrieved for further analysis.

Extracting the focus distance and other useful parameters from the photo

Focusing plane is the image sensor of the camera. Focusing distance is the distance from the focusing plane to the subject.

When using automatic lenses (lenses with autofocus, that do not require manual input to put the subject in critical sharp focus) there are complex algorithms running in order to produce optimal results in terms of focus. The camera constantly monitors the image, understands the distance to the subject in real time and adjusts the focus motors accordingly. To achieve the autofocus, the camera lens moves to a position where the clearest image is obtained. The maximum clarity is measured from the histograms of the images on which a filter with the role of highlighting the edges was initially applied. Knowing the focal length of the lens, the distances can be found from the lens equation [28]. So, we can see that this is clearly possible and cameras perform these calculations all the time. Moreover, cameras can utilize different focusing algorithms [29] for accuracy. The distance that the focus motor travelled in the lens before the photo was taken corresponds to the estimated focusing distance which is needed for the modified sharpness detection algorithm.

It is possible to retrieve the calculated focusing distance by looking at the photo metadata. EXIF (Exchangeable Image File Format) is a standard that allows adding information (metadata) to photos and videos. This format is quite flexible, meaning that all users can modify it by adding new data entries with original names. This means that not all camera bodies from different manufacturers produce the same metadata. Having that said, many cameras contain valuable information that can be used for the purpose of this research. By viewing the metadata using an EXIF reader we can find the following relevant information (Table 1). The parameter "Focus Distance 2" implies the calculated distance (in meters) to the focused subject. This calculation was performed by the camera by analysing the rotation angle of the focus motor when the edges of the image were in focus. This parameter can be used as a ready solution and will be very useful for further calculations. There is no other reliable way of retrieving/calculating the focus distance of a photo, especially when the photo is not guaranteed to be in critical focus initially. This is why the method heavily relies on this meta parameter. Some other useful parameters include the number of faces detected and the detected faces positions. This is a "luxury" to have these parameters in the metadata and only modern cameras provide such information. Heavily relying on the presence of these parameters would limit the method to work only for modern high-tech camera bodies. So, the method will utilise these shortcuts in metadata if they are present, but still have the ability to detect faces' boundaries described in the section above.

Calculating the distance to every face in the frame

It is required to find distances to all faces in the frame because the method needs to determine which faces in particular need to be in focus. The "need to be in focus" criteria is described in the next point.

At this step the focus distance, as well as all face boundaries (and thus, sizes) are already known. To find the distance to the human face on the photo it is necessary to first estimate the height of the faces in the real life.

To find the real-world height of each face in the frame, two methods can be used:

• the average statistical face height value (21.8cm -23.9 cm);

• calculated from a proportionality formula based on the distance between the eyes and the mouth. The second option is very complicated, involves extra operations like edge detection to find the location of eyes and mouth, depends on the pose (will not work for side portraits) and does not guarantee accuracy greatly more than the first option. This is why the first option of assuming that the height of the face is somewhere between 21.8cm and 23.9cm will be used during the implementation.

When the real world height of all faces in the frame is known, the real world distance to the faces can be determined in two ways:

• predefined proportions at 1meter distance method (see Figure 4, photo by Dolhanenko O.); • using an alleged "focused face" from the focus point coordinates as a reference. The first method can be described as follows: practically or mathematically find the percentage height (from the frame height) of a 23.9 cm high object at a distance of 1 meter at a focal length of 50mm (which gives 1.0 times magnification) on a 35mm full frame sensor.

Figure 4:

The result of measuring a 23.9 cm high object at a distance of 1 meter with specific settings As seen from the photo, the height of the face on the image is 1857px (46.4%) on a horizontal photo with 4000*6000 resolution. On a vertical photo of the same resolution, the facial height is 30.95% of the full height. This gives a point of reference which can be proportionally scaled based on the given (non-full frame) sensor size or/and focal length.

To convert the reference measurement to a different focal length on a different format camera sensor, the following formula can be used (see formula 3):

ℎ = 𝐻 * 𝑓 50 ,(3)

where h is the new reference height of the face, f is the 35mm equivalent focal length of the selected lens, H is the initial reference height of the face (taken via 50mm full frame lens from 1 meter).

When the reference height of the 23.9 cm face is calculated taking into account the sensor format and the focal length, the distance D to a particular face in the frame can be calculated by comparing the height of the face to the reference height and determining proportionally the distance to the sensor (see formula 4).

𝐷 = ℎ 𝑔 (4)

where e D is the distance (in meters) to the face in the frame, g is the height of the face of interest in pixels, h is the reference height of the face.

The method described above is simplified. In future work the accuracy can be improved by adding support for lens distortions and other factors that may influence the measurements.

The second method heavily relies on metadata. The coordinates of the focus point in the meta data indicates the precise area of focus interest on the photo. Since the coordinates and boxes of all faces are available at this point, the face within the area of interest can be selected as the reference. Since the distance to this face is already available, once again, from the metadata, we can calculate the proportional distance to other faces in the frame by using the (4) formula.

The second method can be reliable if the image does not have stacked, slightly shifted faces, but since this is not a guarantee, the first method can be preferred. However, if the conditions are ideal, the second method will produce more precise results, since it uses more non-estimated measurements. There are other methods of object distance extraction by using reference targets [30], however, such methods are not suitable for general photography.

Calculating the ideal depth of field

Depth of field (DOF) is the distance between the nearest and the farthest objects that are in acceptably sharp focus in an image.

To understand whether or not some subject at a specific distance is supposed to be sharp in the photo (based on the camera and lens settings) it is required to first calculate the depth of field using the captured camera and lens settings. The formulas for calculating the distance to the front plane of the focus area and the distance to the back plane of the focus area are as shown in formula (5):

𝑅 * 𝑓 # 𝑓 # − 𝐾 * 𝑓 * 𝑧 + 𝐾 * 𝑅 * 𝑧 ;

𝑅 # = 𝑅 * 𝑓 # 𝑓 # + 𝐾 * 𝑓 * 𝑧 − 𝐾 * 𝑅 * 𝑧 ,,(5)

where R1 is distance to the front edge of the critical focus plane, R is distance of focus (can be retrieved from metadata), R2 is distance to the back edge of the critical focus plane, f is the focal length of the lens in meters (can be retrieved from metadata), K is the f-stop of the lens (can be retrieved from metadata), Z is the Circle Of Confusion (can be retrieved from metadata).

By subtracting R1 from R2 the depth of field can be found. However, it will be easier to use the raw R1 and R2 values for further calculations.

Applying sharpness detection only for faces required to be in focus

To understand which faces to check for sharpness it is first required to understand the criteria of expected portrait framing depth (EPFD).

The EPFD is a generalized assumption of the maximum intended distance between the subjects when taking photos of multiple rows of people at once. For example, a photographer is taking a group portrait photo with 10 people, which are placed in two rows. In this case, the expected portrait framing depth is within 0.8 meters, since there are two rows and each one is about 0.3 meters wide +-a margin in between. For such portrait framing depth, the photographer needs to set a particular aperture (f-stop), so that the depth of field is equal to or more than the framing depth.

The EPFD is a very subjective parameter and cannot be calculated from analyzing the photo. That is because it is infinitely hard for a computer to determine if the photo of a group of people is intended to be shot with an open aperture (to have only front subjects in focus), or it was intended to have both rows of people in focus and the low aperture value was chosen by mistake.

The subjective nature of this parameter leaves this part of the method to be fine-tuned by the end user, selecting one of two options on a collection of photos before the sharpness detection: single/couple styled photos or intended multi-row group photos.

For the second option, where a group photo (composed in two or more rows) is selected, the faces can be classified as "intended to be sharp" by the following sequence of formulas:

Given the measured light value (EV) from the metadata, it is required to calculate the maximum aperture, using which the EV value would be the same. The bigger the aperture value, the greater the depth of field is, meaning more faces need to be in critical sharp focus. The photographer could make the mistake of setting a low aperture, which results in only one row of people being in focus. This is why the maximum aperture value needs to be determined and the photo needs to be analysed as if these "ideal" settings were set in camera.

𝐸𝑉 = log # I 100 * 𝐾 # 𝐼 * 𝑆 M ,,(6)

where EV is the exposure value; K is the f-stop value, I is the ISO. Given the maximum acceptable ISO value by the photographer (the greater the ISO the more noise there is in the photo, thus it is down to personal photographer preference), the maximum value of the aperture to achieve the same light value can be determined as follows:

𝐾𝑚𝑎𝑥 = O 2 ./ * 𝐼𝑚𝑎𝑥 * 𝑆𝑚𝑎𝑥 100 (7)

where Kmax is the maximum f-stop number for the pre-defined EV and ISO, Imax is the maximum ISO value, acceptable by the photographer, Smax is the maximum duration of the shutter speed.

When operating a camera, the photographer may or may not use automatic settings. This is why in the formula above the shutter speed is not taken from the metadata, rather calculated as well. The maximum acceptable shutter speed can be calculated as follows:

𝑆𝑚𝑎𝑥 = 1 𝐹𝑙(8)

where Fl is the focal length used for the photo, taken from the metadata. However, there is an unformal rule, stating, that when shooting portraits, one should not select shutter speeds below 0.008 of a second. This may also be an external fine-tuning setting available for the end user.

Next, the maximum depth of field needs to be calculated. To achieve this, the formulas (5) can be used, inserting the maximum aperture f-stop value (Kmax). As a result, R1 (distance to the front edge of the critical focus plane) and R2 (distance to the back edge of the critical focus plane) with the "ideal" camera settings can be found.

Given the calculated distance to the subjects face, the criteria of in focus intention can be formulated as follows: the face of a subject was intended to be in critical sharp focus, if the distance D satisfies the condition:

𝐷 ≤ 𝑅2(9)

Sharpness detection only for regions containing the selected faces

At this stage of the algorithm, the most valuable information is already achieved -the understanding of which faces in the frame were most likely intended to be in focus. Having this information, the sharpness detection algorithm of choice can be applied these regions exclusively. This will result in very accurate results, as the background and other subjects will not be evaluated.

It is worth mentioning, that having this information, a whole window of possibilities for photo categorizing opens up. Not only the sharpness can be evaluated, but also the open state of the eyes (open/half-open/closed), preferred facial expressions of focused subjects, even the poses can be classified as appealing and not.

Experiment

The purpose of the following experiment is to demonstrate relations of the Laplacian and FFT sharpness evaluation algorithms to different factors of the photo. Revealing these dependencies on factors will lead to conclusions about the steps needed to be taken in order to improve the effectiveness of the algorithms.

In this experiment the following set of images will be used: 1.

A reference representation of a completely blurry photo 2.

A reference representation of a very sharp photo in all parts (no background blurriness) 3.

A photos which contain sharp and blurred parts due to the aperture effect The first and second images are used to determine the dramatic maximum and minimum sharpness results both algorithms provide.

All other 7 test photos contain human subjects and should be the same resolution and have identical lighting conditions for optimal evaluation results. There should not be any colour or sharpness corrections done before the experiment. All photos from the experiment are similar to figure 1 and contain same subjects and same scenes but in different configurations.

The following Table 2 verbally describes the photos that were used for the experiment.

The first photo was made with a very closed aperture (f22) and has extremely "busy" foreground, which leads to extreme sharpness results by the Laplacian method. The second photo is fully out-of-focus and results in very low value results by both algorithms. No conclusions can be made from these results yet. The following Table 3 represents the results of the main 7 photos analysis (provided in the Laplacian, FFT columns), alongside additional calculated parameters that will be needed for dependencies analysis. It should be stated, that the Laplacian and FFT methods produce values that do not relate to each other and thus should not be compared directly. The "background quantity" is the percentage value of how much of the photo is taken up by the background. The "total subject size" from photo is the opposite value, describing how much area of a photo is taken up by the subjects.

The next phase of the experiment includes utilizing the parts of the improved algorithm that was described in the previous section. Within this experiment the dependencies which lead to unstable results were determined. One of the main parts of the described algorithm is face detection. The following test was conducted on a set of images that were previously cut to contain only faces. Images that contained two subjects are represented by two individual cut images.

The experiment conditions and results are displayed in table 3. The test images are extracted faces from the frame (see Figure 5, photos by O. Dolhanenko) and the test results are stated in Table 4.

Results

By plotting the sharpness result values obtained during the experiment against the background quantity, the following chart can be achieved (see Figure 6) As seen from the chart, a clear linear trendline describes the relation of the quantity of the background to the calculated sharpness (both in Laplacian and FFT methods). This means that the more background there is visible on a photo (even if the subject is in critical sharp focus), the less likely the photo will be classified as sharp.

By plotting the sharpness result values obtained during the experiment against the f-stop values (see Figure 7), the following chart can be achieved. The relation on this chart is not as obvious as the previous one, since the data set is rather small, but even here a visible linear trendline for both Laplacian and DFT methods is noticeable. This trend implies that the higher the f-stop value, the more likely the photo will be classified as sharp.

Having analyzed both relations it can be stated, that the result of DFT and Laplacian methods are dependent on the scene and camera settings, which means that they are not reliable for generic automatic photo filtering. Furthermore, comparing the average Laplacian sharpness value for subject photos (45.12) to the average sharpness value for the 2 reference photos (325.54) reveals a major difference. This difference demonstrates the source of errors that can appear by selecting an incorrect reference value. The "Image(s) name" is the name of the analyzed photo. If the photo contained multiple subjects, it was split into separate images and visually merged in the table.

The "Expectation" is a non-bias subjective rating of image quality and usability given by the photographer, where 0 is unusable (blurry) and 1 -usable (sharp). The "Laplacian", "FFT" are actual results of sharpness evaluation. The "Laplacian normalized" and "FFT normalized" are conversions of actual results to a 0-1 scale, where 1 is the maximum value taken from the algorithm output. This way the normalized values of different algorithms are relatable to the "Expectation" values. The "time" columns represent the time taken to process the image in milliseconds.

After the data was collected, the normalized columns were analyzed for visual trends. To fulfil the first objective of the experiment, the sorted input values trend (the expectation) must meet the trend of the sorted normalized output values. By viewing the plotted results (see Figure 8) it can be stated, that both algorithms generally fulfil this requirement.

Only one data point is either unnaturally classified as sharp (by both algorithms), or the subjective usability parameter was defined not accurately.

The second objective of the experiment was to determine which algorithm is more stable and should be used in the actual implementation. Comparing two charts (see Figure 8) a more linear trend line is received from the FFT algorithm, rather than the Laplacian. Moreover, the first method produces very unbalanced results. The third and final objective was to determine which algorithm works faster. By comparing the average speed of both algorithms -17.7ms for Laplacian and 241.25ms for the FFT, it can be concluded that the FFT method is 14 times slower.

Discussions

Two methods were analyzed and compared -FFT (Fast Fourier Transform) and the Variance of Laplacian for image sharpness evaluation. During the first experiment it was determined that both algorithms produce results that are highly dependent on the photo scene, composition and camera settings -photos with large quantities of background (produced by a low f-stop value) were classified as blurry.

An improved algorithm was proposed which is based on subject detection [31,32] and further sharpness evaluation exclusively of the subject box. This way all dependencies were eliminated and more natural evaluation results were obtained.

The improved algorithm uses metadata of the photo to simplify calculations, if the relevant data "shortcuts" are available. In some cases, the metadata may contain the detected faces coordinates, which greatly optimize the performance of the algorithm. The method is built on the principle of data analysis and does not require much user input to function properly. The only parameter that cannot be achieved automatically is the intended photography style -whether the user was shooting groups of people to get all in focus, or it was intended to focus on only one-two in the frame, while isolating the rest of the background. Another parameter that the user may input is the maximum acceptable ISO value. This parameter has a default value but is very subjective and so depends on the user preference.

The goals of the experiment were to identify whether the improved algorithm results meet the expectation and to determine which of the sharpness evaluation methods work best for the task. As a result it was found that the improved algorithm meets the expectations and works best with the FFT sharpness evaluation method as it produces more linear trending results. The Laplacian method, however, is 14 times faster and may be preferred for very large datasets, where accuracy is not as important as time efficiency.

The results of the research were used to implement a software solution prototype for automated image files sorting by subject sharpness (see Figure 9).

The resulting software was configured to operate with JPG files, built using primarily Kotlin and OpenCV, deployed and run on a MacOS environment.

Conclusions

The purpose of this research was to analyze the existing methods of image sharpness evaluation and suggest improvements for the use case of sharpness evaluation of photos of people. The goal end result is an automated photo sorting solution, which classifies images from sharp and usable to blurry and unusable.

The comparison of FFT and Variance of Laplacian methods for image sharpness evaluation showed that their accuracy is affected by the photo's scene, composition, and camera settings, leading to misclassification of detailed background photos as blurry.

An enhanced algorithm leveraging subject detection for focused sharpness evaluation showed significant improvements by removing biases and yielding more accurate results. It utilizes photo metadata to streamline processes, especially when such metadata includes coordinates of detected faces, enhancing performance efficiency. The method minimizes user input, requiring only the photography style and an optional maximum acceptable ISO value to adjust for personal preference. Final testing confirmed the algorithm's effectiveness, particularly in conjunction with the FFT method for its linear results, though the faster Laplacian method may be chosen for large datasets where speed trumps precision.

The results of the research and experiments have been used to implement a prototype of an automation system for sorting photos by subject sharpness, detecting unwanted motion within the image, subject blurriness.

Figure 1 :1Figure 1: Example of a photo with a shallow DOF

Figure 3 :3Figure 3: The Laplacian kernel for convulsion with the source image

Figure 5 :5Figure 5: Some examples of the test images used for the experiment

Figure 6 :6Figure 6: Results of both algorithms plotted against the background quantity

Figure 7 :7Figure 7: Results of both algorithms plotted against the background quantity

Figure 8 :8Figure 8: The result of the second experiment with sharpness expectations

Figure 9 :9Figure 9: A software solution for automatic sorting based on subject sharpness

Table 1 Partial, most relevant information extracted from the test photo metadata1Parameter NameValueLens SpecFE 70-200mm F4 G OSSMin Focal Length70.0 mmMax Focal Length200.0 mmFocal Length159.0 mm (35 mm equivalent: 159.0 mm)

Table 2 Description of the test images2ImageVerbal descriptionname1.JPGA sharp photo containing one in-focus subject which takes up 24.7% of the frame2.JPGA sharp photo containing one in-focus subject which takes up 37.5% of the frame3.JPGA sharp photo containing two in-focus subjects, high aperture (background blurriness ishigh), collectively taking up 38.38% of the frame.4.JPGA sharp photo containing one in-focus and one out-of-focus subject, collectively takingup 36.09% of the frame5.JPGA sharp photo containing two in-focus subjects, low aperture (background blurriness islow), taking up 33.13% of the frame6.JPGA sharp photo containing one in-focus and one in-motion subject, collectively taking up37.87% of the frame7.JPGA sharp photo containing one in-focus and one out-of-focus subject, collectively takingup 57.24% of the frame8.JPGA very "busy" and sharp photo, shot with a closed aperture (f22)9.JPGA completely blurry, out-of-focus photo of the same scene as in 8.JPG

Table 3 Parameters and results of the experiment with the main photos3Image Name Laplacian FFTBackgroundTotal subjects sizeff-stop valuequantity (%)from photo (%)1.JPG33.8420.8975.2224.7842.JPG77.0313.3962.47537.52543.JPG60.6340.9561.6238.38204.JPG23.9725.5363.9136.0945.JPG36.7635.0266.8733.1346.JPG44.0632.1262.12937.87147.JPG39.5721.2442.7657.242.88.JPG647.464.0667.192.9229.JPG3.71-6.8010002.8

Table 4 Results of the experiment for face-only evaluation4Image(s) nameExpectatio n (1-sharp, 0 -blurry) LaplacianLaplacian avgLaplacian normalize-edLaplaci an time (ms)FFTFFT avgFFT normalizedFFT time (ms)10-2.JPG 0 10.JPG7.1056 39.460823.2832 0.08987 10-4.2154 7.1930 18.60130.2517132 1554-2.JPG 4.JPG0.27.0557 149.031678.0437 0.301012 102.1674 22.196812.1821 0.4263233 963-2.JPG 3.JPG0.345.4098 92.2698 0.3559 139.12985 1313.8346 18.2867 0.6400 22.738875 2017-2.JPG 7.JPG0.46.5819 137.901572.2417 0.278618 409.8695 26.380818.1252 0.6343353 5896-2.JPG 6.JPG0.4373.5180 203.0986 0.7834 32.679111 726.3782 19.9107 0.6968 13.4432104 1025-2.JPG 5.JPG0.780.4655 213.9522 0.8252 347.43896 817.0262 21.8001 0.7630 26.574050 1331.JPG0.8221.8970 221.8970 0.85595226.2262 26.2262 0.91792772.JPG1259.2589 259.2589 1.00004928.5732 28.5732 1.0000879

Synthetic depth-of-field with a single-camera mobile phone NWadhwa 10.1145/3197517.3201329 ACM Transactions on Graphics 37 4 64 2018 No-reference image blur assessment using multiscale gradient M.-JChen ACBovik 10.1186/1687-5281-2011-3 J Image Video Proc 3 2011. 2011 Image information and visual quality HRSheikh 10.1109/tip.2005.859378 IEEE Transactions on Image Processing 15 2 2006 Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network WZhang KMa JYan DDeng ZWang 10.1109/TCSVT.2018.2886771 IEEE Transactions on Circuits and Systems for Video Technology 30 1 2020. 2020 Image Sharpness Assessment Based On local Phase Coherence SChattopadhyay doi: 0.48047/nq.2022.20.5.nq22800 NeuroQuantology 5 2022 Image Sharpness Assessment Based on Local Phase Coherence RHassen ZWang MM ASalama 10.1109/TIP.2013.2251643 IEEE Transactions on Image Processing 22 7 2013 An Improved Method for Evaluating Image Sharpness Based on Edge Information ZLiu HHong ZGan JWang YChen 10.3390/app12136712 Applied Sciences 12 6712 2022 Image sharpness evaluation method based on normal gradient feature YZhang 10.1109/ISRIMT53730.2021.9596808 Proceeding of the 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT) eeding of the 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT)

Changzhou, China

2021 Multidirectional gradient neighborhood-weighted image sharpness evaluation algorithm XYYan JLei ZZhao 10.1155/2020/7864024 Math. Problems Engin 7864024 2020 Review: A Survey on Objective Evaluation of Image Sharpness MZhu LYu ZWang ZKe CZhi 10.3390/app13042652 Applied Sciences 13 4 2652 2023 Network technology for transmission of visual information SBielievtsov IRuban KSmelyakov DSumtsov Selected Papers of the XVIII International Scientific and Practical Conference "Information Technologies and Security" (ITS 2018)

Kyiv, Ukraine

November 27, 2018. 2018 2318 CEUR Workshop Proceedings Search by Image Engine for Big Data Warehouse KSmelyakov AChupryna DSandrkin MKolisnyk 10.1109/eStream50540.2020.9108782 Proceedings of the 2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream) the 2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream)

Vilnius, Lithuania

2020 Effectiveness of Preprocessing Algorithms for Natural Language Processing Applications KSmelyakov DKarachevtsev DKulemza YSamoilenko OPatlan AChupryna 10.1109/PICST51311.2020.9467919 Proceedings of the 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T) the 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T)

Kharkiv, Ukraine

2020 No-reference image sharpness assessment via difference quotients JiyeQian HengjunZhao JinFu WeiSong QJide Qian Xiao 10.1117/1.JEI.28.1.013032 Journal of Electronic Imaging 28 1 2019 An Effective Sharpness Assessment Method For Shallow Depth-Of-Field Images ZDuan GLi GFan 10.1109/ICIP42928.2021.9506498 Proceeding of the IEEE International Conference on Image Processing (ICIP)'21 eeding of the IEEE International Conference on Image essing (ICIP)'21

Anchorage, AK, USA

2021 Research of Image Sharpness Assessment Algorithm for Autofocus LHer XYang 10.1109/ICIVC47709.2019.8980980 Proceeding of the 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC) eeding of the 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC)

Xiamen, China

2019 Predicate Clustering Method and its Application in the System of Artificial Intelligence IKyrychenko GTereshchenko GProniuk NGeseleva Proceedings of the 7th International Conference on Computational Linguistics and Intelligent Systems the 7th International Conference on Computational Linguistics and Intelligent Systems

COLINS-

2023. 2023 Image Quality Assessment for Realistic Zoom Photos ZHan YLiu RXie GZhai 10.3390/s23104724 Sensors 2023 4724 2023 Automatic detection of blurred images in UAV image sets TSieberth 10.1016/j.isprsjprs.2016.09.010 ISPRS Journal of Photogrammetry and Remote Sensing 122 2016 Bridging the Gap Between Imaging Performance and Image Quality Measures EFry 10.2352/issn.2470-1173.2018.12.iqsp-231 Electronic Imaging 12 2018 Contrast sensitivity in images of natural STriantaphillidou 10.1016/j.image.2019.03.002 Signal Processing: Image Communication 75 67 2019 Edge Detection Techniques for Quantifying Spatial Imaging System Performance and Image Quality OVan Zwanenberg STriantaphillidou RJenkin APsarrou 10.1109/cvprw.2019.00238 Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Long Beach, CA, USA

2019 Perceptual blur and ringing metrics: application to JPEG2000 PMarziliano 10.1016/j.image.2003.08.003 Signal Processing: Image Communication 19 2 167 2004 Image Quality Assessment: From Error Visibility to Structural ZWang 10.1109/tip.2003.819861 IEEE Transactions on Image Processing 13 4 601 2004 A framework for measuring sharpness in natural images captured by digital cameras based on reference image and local areas MNuutinen 10.1186/1687-5281-2012-8 EURASIP Journal on Image and Video Processing 2012 Method of image quality assessment based on region of interest WYang 10.3724/sp.j.1087.2008.01310 Journal of Computer Applications 28 5 2008 Face Detection Method Based on Cascaded Convolutional Networks RQi R. -SJia Q. -CMao H. -MSun L. -QZuo 10.1109/access.2019.2934563 IEEE Access 7 2019 Video distance measurement based on focus HEugen 10.1109/epe50722.2020.9305593 Proceedings of the 2020 International Conference and Exposition on Electrical And Power Engineering (EPE) the 2020 International Conference and Exposition on Electrical And Power Engineering (EPE)

Iasi, Romania

2020 65 Analysis of focus measure operators for shape-from-focus SPertuz DPuig MAGarcia 10.1016/j.patcog.2012.11.011 Pattern Recognition 46 5 2013 A Method for Distance Measurement of Moving Objects in a Monocular Image ZXu LWang JWang 10.1109/SIPROCESS.2018.8600495 Proceedings of the 3rd International Conference on Signal and Image Processing (ICSIP) the 3rd International Conference on Signal and Image Processing (ICSIP)

Shenzhen, China

2018 Local Feature Extraction in Images ASergiyenko PSerhiienko MOrlova 10.20535/2708-4930.2.2021.244191 Information, Computing and Intelligent systems 2 2021 Improved covariant local feature detector ZHuo YZhang HLiu JWang XLiu JZhang 10.1016/j.patrec.2020.03.027 Pattern Recognition Letters 135 2020