-

Web Framework For Improving Deepfake Images ⋆

Oleh Pitsun

0 0 West Ukrainian National University , 11 Lvivska st., Ternopil, 46001 , Ukraine

2026

Improving the quality of deepfake images is an important condition for their effective detection, since artifacts, noise, and blur can hide key signs of forgery. Optimizing visual quality allows you to preserve critical textural and structural features, which increases the accuracy of detection algorithms. OpenForensics was chosen as the dataset. The training sample was divided into training, testing, and validation samples. The sample size is 500 images In this paper, a web framework is proposed to improve the quality of images of human faces with the ability to generate dataset parameters, which allows for a universal approach to forming training parameters. Improving image quality will help to better detect deepfake images. The accuracy of improving image quality using neural networks is 91%.

eol>Deepfake image quality improvement CNN 1

1. Introduction

The aim of the research is to develop, implement and evaluate the effectiveness of a framework that allows you to improve the quality of deepfake images through adaptive reconstruction of structural and textural characteristics using convolutional neural networks.

2. Problem statements

To develop a comprehensive system, the following tasks must be implemented: - develop the main framework modules for generating dataset parameters and improving image quality;

- develop and implement a convolutional network architecture with recurrent connections to improve the quality of deepfake images;

- conduct research and analyze the results by metrics.

3. Literature review

The influence of scientists on the development and analysis of image quality enhancement technologies using neural networks is fundamental and multifaceted. Thanks to scientific research, this field has rapidly evolved from simple filtering methods to complex deep architectures capable of restoring images with extreme accuracy. In work [1], to achieve the goal of the study, a dataset of 1063 ultrasound images was created by artificially degrading the quality of the original high-quality images. The images were evaluated by doctors, and the obtained scores were averaged to form labels. After data cleaning, 478 images were selected for training and testing models. To build a quality assessment system, a deep convolutional neural network and a residual network were used, and a transfer learning strategy was also applied, which allowed to improve learning under conditions of limited data volume. The test results demonstrated that the proposed CNN approach is effective and suitable for quantitative assessment of the quality of ultrasound images.

In [ 2 ], a new algorithm for assessing the quality of screen images is proposed, which, instead of using all patches, selects only those that are closest to subjective assessments, which increases the accuracy of the model compared to traditional CNN approaches at the patch level. In [ 3 ], a new architecture Distortion-Aware CNN (DACNN) is proposed for blind image quality assessment, which effectively predicts the quality of both synthetically and authentically distorted images, thanks to the distortion type recognition module, adaptive fusion of their features and an accurate quality prediction module, which is confirmed by experiments on eight different databases. In [ 4 ], the PW360IQA model is proposed - a multi-channel convolutional neural network with perceptual weighting for assessing the quality of 360-degree images, which takes into account the properties of human vision, viewing trajectories and the probability of distortions, providing high accuracy with reduced computational complexity compared to current approaches. In [ 5 ], the Image Quality Transformer (IQT) model is presented, which combines a CNN backbone with a transformer architecture for full-reference image quality assessment, providing high accuracy on standard and generative datasets, and showing the best results at the NTIRE 2021 competition. In [ 6 ], the TRIQ architecture is proposed, which combines a convolutional neural network with a lightweight transformer encoder and adaptive position coding for image quality assessment, demonstrating high results on public datasets. In [ 7 ], an improved deep convolutional neural network (D-CNN) architecture for deepfake detection is presented, which effectively processes inter-frame differences, demonstrates high accuracy (up to 99.33%) on seven different datasets, and provides good generalization by training on images from different sources.

The study [ 8 ] presents an overview of modern deepfake detection methods based on deep learning, with a focus on video, image, audio and multi-format recognition, where the most common approach remains the use of convolutional neural networks, and the main focus of most works is on increasing accuracy.

The paper [ 9 ] proposes an efficient CNN model for detecting deepfake videos, which combines manual feature distillation, target region extraction, augmentation and frame ensemble, which provides high accuracy with reduced model complexity and demonstrates competitive results on the DFDC and Celeb-DF v2 datasets. The paper [ 10 ] reviews modern technologies for creating and detecting deepfake videos, in particular with facial manipulation in videos, and proposes a generalized approach to their detection using machine learning and computer vision, using current tools and large datasets such as DFDC. The performance of using GPUs is considered in the paper [ 11 ].

4. Dataset

The datasets selected are OpenForensics [ 12 ] https://zenodo.org/records/5528418#.YpdlS2hBzDd . This dataset contains real and fake images with a high level of processing complexity. This dataset is designed with multi-face annotations specifically for face forgery detection. The dataset size is more than 5000 images of 256 x 256 pixels. An example of the dataset is shown in Figure 1.

a) Real b) Fake

The training sample was divided into training, testing, and validation samples. The sample size is 500 images.

5. Formation of dataset parameters

One of the key aspects is the process of generating a dataset, so our task was to develop an approach for generating this dataset, for example images.

We use the JSON format as a way to store parameters and transfer data over the Internet via a separate API. JSON is a common format for transferring data over the network between individual servers.

The main page consists of a block displaying all studies (i.e. datasets) and a button for creating a new dataset (Figure 2). The main elements are: - Path to the folder with images; - Image height; - Image width; - Image type; - Crop; - Single image selection;

The path to the image provides the ability to enter the URL to the folder with files or to the archive with files

The height and width of the images is an important aspect, which is one of the key when choosing the architecture of the neural network, so bringing all the images in the sample to the same format allows classification without errors and exceptions.

The "Crop" operation is often used when it is necessary to divide a large image into smaller portions according to the requirements of the neural network architecture and the capabilities of the execution environment (computer, cloud storage).

The software module is implemented using the PHP programming language and the Laravel framework/

The module is based on the use of the MVC pattern, which allows you to clearly separate the controller, model and view of the project.

A feature of this project is the presence of the FaceDetector class, which is responsible for processing and highlighting faces in the image, for example, for the task of searching for deepfakes

The config.get method has been developed, which allows you to get specific parameter values. This approach also simplifies the process of training a neural network and retraining by different scientists, since in this case the training sample will be of the same type, which simplifies the setup process.

6. Framework

Web-based frameworks are structured software platforms that provide a set of tools, libraries, and templates to facilitate the development of web applications. They define a standard approach to building applications, reduce the amount of routine code, and facilitate rapid and unified creation of functionality. The structure of the modules of a web-based deepfake detection framework is shown in Figure 4.

Input image processing module is responsible for processing the input image, including specifying the path to the image folder and other parameters.

Determining input image parameters is intended for determining and storing the input image parameters in the database.

Architecture and CNN model storage module is intended for storing the convolutional neural network architecture configuration files and the models themselves in the form of files. Then you can use ready-made models for a specific type of task.

Dataset parameter generation module is implemented as a web page with fields for specifying dataset parameters, implemented using the Laravel framework.

Automatic image quality improvement module is responsible for directly improving the quality of images using convolutional neural networks.

Face detection module is intended for highlighting faces in images, which allows you to select only the necessary area in the image and reduce the number of calculations.

Thanks to frameworks, web development becomes more productive, safer and organized, which makes them an integral part of modern software engineering.

7. Formation of poor-quality images

To generate possible poor-quality images, blurring, shadow correction, and salt-and-pepper noise algorithms were used. An example of a noisy image is shown in Figure 5. Noise is a key tool in improving the robustness of computer vision models. Different types of noise affect different aspects of an image geometry, color, brightness, or texture. Noise analysis and modeling allow us to assess the robustness of algorithms, improve the quality of preprocessing, and build more adaptive systems.

8. Convolutional neural network architecture

Image enhancement is one of the key tasks in computer vision, covering detail restoration, noise removal, resolution enhancement, and artifact correction.

Convolutional neural networks (CNNs) have proven to be a powerful tool for solving these tasks due to their ability to automatically extract and generalize relevant spatial features. Over the past decade, a number of architectures have been proposed that are specifically adapted to processing low-quality or corrupted images, including autoencoders, U-Nets, Residual Networks, GAN-based solutions, etc. Convolutional neural networks fundamentally differ from traditional (classical) image processing algorithms in terms of their approach to data analysis, flexibility, learning ability, and adaptation to complex tasks. Traditional algorithms require manual tuning of filters and features (e.g., edge detectors, gradient histograms, color histograms, etc.), while CNNs learn to extract relevant features from data by training on a large number of images.

The proposed architecture of the convolutional neural network is shown in Figure 6. Total params: 134,067 Trainable params: 134,067 Non-trainable params: 0

The image sample is divided into two parts: test and training. A feature of Unet networks and convolutional networks in general is the presence of repeated blocks. Therefore, we will present them as a set of operations: = {〈 〉; 〈 〉; 〈 〉; 〈 〉} where 〈C〉-set of convolution functions 〈B〉-batch normalization operation; 〈A〉-set of activation functions 〈Add〉-set of layers of the recurrent block Let us consider in more detail the key blocks separately: Convolution block

Formally, the convolution process can be represented as follows: 2 ∑

2 =− 2 =− 2 , × ℎ =

∑ ( − , − )∙ ℎ( , ) GELU allows small negative values when the input is less than zero, providing a richer gradient for backpropagation. (1) (2) (3) where 2 is half the filter height, 2 is half the filter length, x is the pixel column position, y is the pixel row position, , is the input image, h is the convolution kernel.

GELU Activation Function Formula for finding the activation function; ( )= 0.5 (1 + ℎ (√ ( + 0.044715 3))) 2

9. Experiments

The performance assessment of convolutional neural networks (CNN), particularly in classification tasks, is based on quantitative metrics that reflect the accuracy of the model's decision-making.

Training parameters: Number of epochs: 45 steps_per_epoch=3 Accuracy The Accuracy metric is the basic metric for balanced classes.

Accuracy =

TP + TN TP + TN + FP + FN (4) where TP is true positive, TN is true negative, FP is false positive, FN is false negative.

Accuracy is one of the simplest and most commonly used metrics for evaluating the quality of a classification model. It shows what proportion of predictions the model made were correct. The value of Accuracy ranges from 0 to 1 (or from 0% to 100%). The network performance estimates based on the loss criterion are shown in Figure 8.

An important element for integrating deep learning methods into cloud services is the use of MlOps [ 13 ] practices, an example of the structure of a Terraform configuration file is shown in Figure 9.

Loss is a key metric in the training process of neural networks, measuring how much the model's predictions differ from the true labels (correct answers). Unlike metrics such as accuracy or precision, which are used to evaluate the finished model, loss is used during training to update the network's weights.

Conclusions

Based on the analytical approach and principles of developing software modules based on web systems, the main framework modules have been developed to generate dataset parameters and improve image quality.

Artificially noisy images have been generated for experiments. A convolutional network architecture with recurrent connections has been implemented to improve the quality of deepfake images.

The accuracy of improving image quality using a neural network is 91%.

Declaration on Generative AI

The authors have not employed any Generative AI tools. based medical ultrasound image quality assessment." Complexity 2021, no. 1 (2021): 9938367.

[2] Jiang , Xuhao, Liquan Shen, Guorui Feng, Liangwei

Yu , and Ping

An . "An optimized CNN-based quality assessment model for screen content image." Signal Processing: Image Communication 94 ( 2021 ): 116181 .

[3]

Pan et al., "DACNN: Blind Image Quality Assessment via a Distortion-Aware Convolutional Neural Network," in IEEE Transactions on Circuits and Systems for Video Technology , vol. 32 , no. 11 , pp. 7518 - 7531 , Nov. 2022 , doi: 10.1109/TCSVT. 2022 . 3188991 .

[4] Sendjasni , Abderrezzaq, and Mohamed-Chaker Larabi . 2023 . "PW-360IQA: PerceptuallyWeighted Multichannel CNN for Blind 360-Degree Image Quality Assessment" Sensors 23, no. 9: 4242 . https://doi.org/10.3390/s23094242

[5] Cheon , Manri, Sung-Jun

Yoon

, Byungyeon Kang, and

Junwoo

Lee . "Perceptual image quality assessment with transformers." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 433 - 442 . 2021 .

[6]

You and

Korhonen , "Transformer For Image Quality Assessment," 2021 IEEE International Conference on Image Processing (ICIP) , Anchorage, AK , USA, 2021 , pp. 1389 - 1393 , doi: 10.1109/ICIP42928. 2021 . 9506075 .

[7]

Patel et al., "An Improved Dense CNN Architecture for Deepfake Image Detection," in IEEE Access , vol. 11 , pp. 22081 - 22095 , 2023 , doi: 10.1109/ACCESS. 2023 . 3251417 .

[8] Heidari , Arash, Nima Jafari Navimipour, Hasan Dag, and Mehmet Unal . "Deepfake detection using deep learning methods: A systematic and comprehensive review." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 14 , no. 2 ( 2024 ): e1520 .

[9] Tran , Van-Nhan , Suk-Hwan

Lee

, Hoanh-Su Le , and Ki-Ryong Kwon . 2021 . "High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction" Applied Sciences 11 , no. 16 : 7678. https://doi.org/10.3390/app11167678

[10] Pitsun , O. , Poperechna , H. , Savanets , L. , Melnyk , G. , Halunka , B. Comparative Analysis of CNN Architecture for Emotion Classification on Human Faces . Ceur Workshop Proceedings , 2024 , 3716 , pp. 46 55 https://ceur-ws. org/ Vol- 3716 /paper4.pdf

[11]

Berezsky ,

Pitsun ,

Dubchak ,

Liashchynskyi and

Liashchynskyi , "GPU-based biomedical image processing," 2018 XIV-th International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), Lviv , Ukraine, 2018 , pp. 96 - 99 , doi: 10.1109/MEMSTECH. 2018 . 8365710 .

[12] Le , Trung- -face Forgery Detection and Segmentation In-thehttps://doi.org/10.5281/zenodo.5528418.

[13] Berezsky , O. , Pitsun , O. , Melnyk , G. , ... Derysh , B. , Liashchynskyi , P. Application Of MLOps Practices For Biomedical Image Classification - Ceur Workshop Proceedings , 2022 , 3302 , pp. 69 77 https://ceur-ws. org/ Vol- 3302 /short3.pdf