Hiding data in images using a pseudo-random sequence Alexandr Kuznetsov 1 [0000-0003-2331-6326], Oleksii Smirnov 2[0000-0001-9543-874X], Ludmila Gorbacheva 1 [0000-0002-6053-7235] and Vitalina Babenko 1 [0000-0002-4816-4579] 1 V. N. Karazin Kharkiv National University, Svobody sq., 4, Kharkiv, 61022, Ukraine kuznetsov@karazin.ua, 2 Central Ukrainian National Technical University, avenue University, 8, Kropivnitskiy, 25006, Ukraine, dr.smirnovoa@gmail.com Abstract. In this article are discussed techniques of hiding information mes- sages in cover image using direct spectrum spreading technology. This technol- ogy is based on the use of poorly correlated pseudorandom (noise) sequences. Modulating the information data with such signals, the message is presented as a noise-like form, which makes it very difficult to detect. Hiding means adding a modulated message to the cover image. If this image is interpreted as noise on the communication channel, then the task of hiding user’s data is equivalent to transmitting a noise-like modulated message on the noise communication chan- nel. At the same it is supposed that noise-like signals are poorly correlated both with each other and with the cover image (or its fragment). However, the latter assumption may not be fulfilled because a realistic image is not an implementa- tion of a random process; its pixels have a strong correlation. Obviously, the se- lection of pseudo-random spreading signals must take this feature into account. We are investigating various ways of formation spreading sequences while as- sessing Bit Error Rate (BER) of information data as well as cover image distor- tion by mean squared error (MSE) and by Peak signal-to-noise ratio (PSNR). The purpose of our work is to justify the choice of extending sequences to re- duce BER and MSE (increase PSNR). Keywords: information concealment, steganography, direct spectrum spreading technology, pseudorandom sequences, spreading signals 1 Introduction Steganographic techniques are traditionally used to hide the fact of transmission and the very existence of the information message [1-4]. With the development of com- puter science and digital methods of information processing, steganographic hiding of messages has become very common, it is used in image, audio, text documents proc- essing etc. It is a very effective and reliable way to organize secret channels. For a third viewer, the covers that are transmitted (e.g. via e-mail) and contain information messages, hidden in them, are no different from ordinary user files. It gives the chance to organize a secret communication channel, without causing suspicions about the intentions, and to detect such channels it is extremely difficult [1, 2]. One of the prom- Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). ising trends in the development of modern steganography is the technique of embed- ding data in cover image, based on direct spectrum expansion technology [5-17]. This technology is traditionally used in communication systems to increase the latency of data transmission over a channel with noise [12-17]. Information data are modulated by expanding spectrum pseudo-random (noise) sequence. During transmission the received signals are statistically indistinguishable from natural noise, which increases communication latency. Besides, the implemented methods of correlation reception allow providing correc- tion of the occurred errors, which increases error correction of communication. These and many other advantages of direct spread spectrum technology allow building reli- able and secure communication systems. For example, communication with signifi- cantly lower transmitter power can be arranged, which ensures environmentally friendly communication; application of large ensembles (sets) of expanding sequences allows increasing subscriber capacity of multiple access, etc. [18-21]. The same approach can be applied to the computer processing of digital images. Interpreting the image as noise in a communication channel and using the technology of direct spectrum extension, it is possible to organize hiding of information messages without visible container distortion. Such techniques are the subject of our article. 2 Literature review and research objective In the first works of using the direct spectrum expansion technology in digital steg- anography the idea of using pseudo-random (noise) sequences as a "carrier" of infor- mation messages was put forward [5-11]. For example, for a binary case, a modulated message S is obtained by multiplying separate information bits bi (represented in polar form bi  {1,1} ) by an expanding noise signal  i : S b  , i i i (1) Moreover i belongs to an ensemble (set) of poorly correlated pseudo random se- quences (PRS): i    0 , 1 ,...,  M 1 This means that the correlation coefficient of two different signals (calculated as a scalar product of the sequences) is roughly equal zero: i  j :  (i ,  j )  0 Expression (1), which describes the process of modulation of information bits bi  {1,1} by expanding signals i , is traditionally used in a broadband communica- tion system with direct spectrum extension. Since the expanding signal i on its sta- tistical properties is similar to noise, then the received modulated message S is slightly different from the noise in the communication channel, which allows making hidden transmission. Indeed, the transmitted messages get the form of noise-like se- quences, and due to the high power of the set  and the direct expansion of the fre- quency spectrum high secrecy and imitation resistance of organized communication channels are provided [18-21]. In systems with Code Division Multiple Access (CDMA) each signal i is assigned to a separate pair of subscribers in other words the increase of cardinality M of a set   0 , 1 ,...,  M 1 allows you to increase the subscriber capacity of communication systems that makes data transfer cheaper [18, 19]. Steganography using direct spectrum extension uses these techniques in various files covers. For example, by interpreting a cover image I as natural noise in a com- munication channel is possible to organize the transmission of information messages "inside" the image [5-17]. In works [5-11] was suggested to use the sequences formed by PRS generators as a signals i , after which the signal S formed by rule (1) is elementary summed up with the cover image I : N  IS. (2) Thus, the resulting stegano-cover image (2) is formed by adding a modulated mes- sage I to the original image (1). It is similar to how in communication systems the transferred modulated message S developed with natural noise. On the receiving side, as in communication systems, the information message is re- stored using a correlation reception. For the binary case to extract the j -th bit calcu- late the correlation coefficient between the signal  j and the received N :   N , j   I j   j b  . i i (3) i In communication systems, natural noise and noise signal  i are statistically inde- pendent (uncorrelated). Following our interpretations, it is logical to assume that the analogue of noise, the cover image I , is also uncorrelated to expansion signals, in   other words to say  I ,  j  I j  0 . Different noise signals are also non-correlated   to each other, i.e. j  i :  j i  0 . In this case  N ,  j  b j j j , i.e. value b j can  be symbolized  N ,  j :    b j  sign   N ,  j  .   (4)   Unfortunately, for steganographic applications the assumption  I ,  j  I j  0 in (3) may not execute, when a cover image I is used. Indeed, if a realistic image, which is not an implementation of some random value sensor, is used to hide an information message, then a significant correlation I and  i can be observed. In this case, the re- covery of information bits by formula (4) may be erroneous. In this paper we investi- gate different ways of generating a set    0 , 1 ,...,  M 1 and estimate the Bit Error Rate (BER) when extracting a message from cover images N . In particular, we inves- tigate the nonlinear method of formation of sequences with normal Gaussian distribu- tion proposed in [7, 8, 10, 11], as well as Walsh's orthogonal sequences [22] and pseudo-random sequences with elements uniformly distributed on the interval (-1,1). We also estimate cover image distortions by mean squared error (MSE) and Peak signal-to-noise (PSNR). These two important characteristics (BER and PSNR) clearly demonstrate the possibilities for reliable (error-free) and hidden (no significant cover distortion) transmission of information messages using direct spectrum exten- sion technology. 3 Research Methodology Several performance indicators are used to investigate various ways of hiding infor- mation in cover images. BER [23] is used to evaluate the correctness of the recovered data (their reliability, error-free). BER is the number of bit errors N error divided by the total number of transferred bits N total : N error BER  . (5) N total BER is a unit less performance measure, often expressed as a percentage [23]. We estimated BER in absolute values, i.e., directly by (5). MSE and PSNR are used to estimate cover image distortion [23-25]. For mono- chrome m  n image I and its noisy approximation N distorted by error MSE value is determined by formula: m 1 n 1 1 MSE   mn i 0 j 0 [ I i , j  N i , j ]2 . (6) PSNR characterizes the ratio between maximum signal power and distortion noise power. PSNR is usually expressed in logarithmic scale, i.e. in decibels:  I2   I  PSNR  10  log10  max   20  log10  max    MSE   MSE  (7)    20  log10  I max   10  log10  MSE  . where I max is the maximum possible value of the image pixel. If the pixels are encoded with m -bit values, then I max  2 m  1 . For example, for the simplest case m  8 we have I max  255 and PSNR value is calculated by formula: PSNR  20  log10  255   10  log10  MSE  . (8) For our experiments we used different 256  256 images, as in [7, 8, 10, 11], when encoding each monochrome pixel with one byte. In particular, we used Lenna's stan- dard test image (256x256 pixels). The results given below are the averaged values obtained from several different images. For averaging results we used quadratic re- gression formulas with interpolation of the obtained results (we used built-in func- tions regress and interp of MathCad computer systems). It should be noted that the results given here correspond to the use of different ex- tension sequences, but without the use of error correction coding. For example, in operation [8] block codes are used to reduce BER in direct error correction mode. The same job (e.g. [8, Table 2]) provides BER estimates without using error correction codes. In this sense, our results can be compared with already available data. 4 Results of research In our research, we have implemented several options for the formation of multiple expansion signals   0 , 1 ,...,  M 1 . For each method we have realized hiding information in various cover images and evaluated BER, MSE and PSNR as (5-8). Our focus was on comparing the obtained results, in order to choose the best method for forming sequences  i . 4.1 Using non-linear modulation In the works of Lisa M. Marvel et al. [7, 8, 10, 11] when using the technology of di- rect expansion of the spectrum it was suggested to use a nonlinear rule of forming a set    0 , 1 ,...,  M 1 :  1 ((ui ) j ), bi  1; (i ) j   (9) 1   ((u 'i ) j ), bi  1, where (ui ) j  0.5, ui  0.5; (u 'i ) j   (10) (ui ) j  0.5, ui  0.5, (ui ) j - an uniformly distributed random variable over an interval of (0.1) and 1  is an inverse cumulative distribution function for a standard Gaussian random variable. Thus, the expanding spectrum sequence of    0 , 1 ,...,  M 1 is a realization of a random value distributed by a normal law with a zero mean and a single standard deviation. This random implementation is calculated by formula (9), i.e. using the inverse transformation method [26]. For practical implementation of non-linear rule (9)-(10) we used built-in functions rnd ( x ) and dnorm ( p ,  ,  ) of MathCad computer calculation systems:  1 ( x )  dnorm ( x , 0,1) ; (ui ) j  rnd (1) . Obviously, the rule (1) for calculate the modulated signal S at this method of form- ing the set    0 , 1 ,...,  M 1 should be written in such form S  . i i (11) Works [7, 8, 10, 11] indicate that the direct use of expanding sequences is to hide information data in the cover images leads to large bit errors in the extracted data. For this purpose, it was suggested to increase the power of expansion signals, i.e., we will write down formula (11) in form S  P , i i (12) where P - positive value, multiply increasing «power» of sequences  i . In our experiments we have realized hiding data in cover images using formulas (12), (9) and (10). The received results for different values of P are provided on the fig- ure 1. The number of terms in (1) and (12) are determined by the number of informa- tion bits, hiding in one cover image container (or a fragment thereof). Figure 1 shows the different cases for P  2i , i  0,1,..., 6 and for different values of k . The follow- ing notation is used in the figures:  1) k  1 ;  2) k  2 ;  3) k  4 ;  4) k  8 ;  5) k  16 . If the set of signals    0 , 1 ,...,  M 1 generate using a simplified scheme, such as (ui ) j  0.5, ui  0.5; (i ) j   1 ((u 'i ) j ), (u 'i ) j   (13) (ui ) j  0.5, ui  0.5, then we can use the analogue of the formula (1) to hide it in the form of S  Pb  . i i i (14) Fig. 1. Results of experimental studies in hiding data using expressions (12), (9) and (10) We have also investigated this method of generating expansion signals; the results are shown in Figures 2. Analyzing Figures 1 and 2, we see that both methods of forming expanding se- quences (according to formulas (9), (10) and (13)) give practically equal results. In our experiments, rule (9), (10) was only slightly better in terms of PSNR (in the fig- ures, because of the logarithmic scale, it is almost invisible). We should also note the high BER value. For example, even with "power" expansion signals, the BER value was in most cases in the range of 0.1 ... 0.01, which is on the threshold of the possible use of noise-sensitive coding. 4.2 Using random numbers uniformly distributed over an interval of (-1,1). Another way to form a set that we investigated was to use random numbers uniformly distributed over a random values of interval of (-1,1). For this purpose we used the built-in function rnd ( x ) of the MathCad computer system, i.e. the rule of forming sequences had the form: ( i ) j  rnd (2)  1 . (15) The results of our studies on the effectiveness of hiding information of rule (13) for different ratios of k and P are shown in Figures 3. The results shown in Figure 3 are practically comparable with obtained results for nonlinear modulation (9), (10), and for the simplified version by the formula (13). We observed a slight increase in BER, but PSNR also increased at the same time. In gen- eral, we can argue that the revealed differences are small and that these methods of forming expanding sequences are almost equal. 4.3 Use of Walsh orthogonal sequences In our research we also used Walsh discrete orthogonal sequences. Such signals are generated from rows in the Hadamard matrix H 2i formed by a recurrence rule:  H 2i1 H 2i1  H 2i    , H1  1 . (16)  H 2i1  H 2i1  An iterative repetition of the rule (16) allows any Hadamard matrix H 2i order 2i , i  1, 2,... matrix to be formed. The rows (or columns) of formed matrices are mutu- ally orthogonal, i.e. their scalar product is zero. In our studies we used rule (16) and H 2i matrix rows were interpreted as elements of the set    0 , 1 ,...,  M 1 , the obtained results of hiding information by formula (14) are shown in Figures 4. Fig. 2. Results of experimental studies in hiding data using expressions (13) and (14). Fig. 3. Results of experimental studies in hiding data using expressions (15) and (14). Fig. 4. Results of experimental studies in hiding data using Walsh orthogonal sequences and expressions (14) Figure 4 clearly shows the advantage of using Walsh sequences. In fact, in our studies were obtained the lowest BER values. Even at small values P  5 , the BER was in most cases less than 0.01 and this is the best result of all considered variants of forming the expansion sequences. PSNR values when using Walsh sequences are in most cases comparable to the variants considered earlier. However, for a fixed PSNR value, using Walsh sequence leads to significantly lower BER values. 5 Discussion of the results and brief conclusions The obtained empirical dependencies show that the use of direct spectrum extension technology can indeed be an interesting solution to the problem of hiding information messages in cover images. By interpreting an image as noise in a communication channel, it is possible to organize a hidden data channel, and image distortions may not be large. At the same time, the basic assumption about the non-correlation of ex- pansion sequences with the cover (or its separate part) may be incorrect. In this case a high level of errors will be obtained when restoring the information data. Conse- quently, an important element of such steganosystem is the correct choice of expan- sion sequences. In our work we have analyzed several variants of building extension sequences for hiding data in cover images. In particular, we have considered one of the first known algorithms with non-linear modulation by rules (9), (10). For this method a US patent was obtained [11], and we investigated the effectiveness of such hiding by BER, MSE and PSNR. The obtained data partially coincide with the known results from [7, 8, 10, 11], which may indirectly confirm the adequacy of our results. At the same time, we investigated other ways to form expansion sequences for hiding data in cover images. For example, we have shown that the application of the simplified rule (13) and even the use of sequences with equidistant values on the interval (-1,1) does not lead to significant deterioration of the results. For example, the BER and PSNR values do not differ significantly. Finally, we studied the use of Walsh expansion sequences. As it turned out, this variant is the most successful, because a much smaller percentage of errors is achieved with comparable PSNRs. Indeed, the BER value, as it follows from our results, is much lower than for other extension sequences. The results can be used to improve techniques for hiding information in digital im- ages [5-12], as well as in other computer science applications [27-32]. In particular, our results suggest that the use of orthogonal discrete signals is most preferred. In our opinion, an interesting direction for further research is the use of adaptively formed discrete sequences. For example, if the rule of forming expansion signals takes into account the statistical properties of the cover, then it will be possible to significantly reduce the BER, or get an error-free transmission. Another useful result can be an increase in PSNR when the BER value is fixed (for example, ahead of the set value). In addition, we plan to use other ways to form expansion sequences in future studies. For example, discrete sequences with multilevel correlation functions were proposed in [33–35]. The use of these signals, in our opinion, will be effective in steg- anographic techniques with direct spectrum spreading technology. References 1. Digital Watermarking and Steganography. Elsevier (2008). doi:10.1016/b978-0-12- 372585-1.x5001-3 2. Shin, F.Y.: Digital Watermarking and Steganography. CRC Press (2017). doi:10.1201/9781315219783 3. Johnson, N.F., Jajodia, S.: Exploring steganography: Seeing the unseen. Computer. 31, 26–34 (1998). doi: 10.1109/MC.1998.4655281 4. Manoj, I.V.S.: Cryptography and Steganography. International Journal of Computer Ap- plications. 1, 63–68 (2010). doi:10.5120/257-414 5. Tirkel, A.Z., Osborne, C.F., Van Schyndel, R.G.: Image watermarking-a spread spectrum application. In: Proceedings of ISSSTA’95 International Symposium on Spread Spectrum Techniques and Applications. IEEE (0). doi:10.1109/isssta.1996.563231 6. Smith, J.R., Comiskey, B.O.: Modulation and information hiding in images. In: Informa- tion Hiding. pp. 207–226. Springer Berlin Heidelberg (1996). doi:10.1007/3-540-61996- 8_42 7. Marvel, L.M., Boncelet, C.G., Jr., R., Charles T.: Methodology of Spread-Spectrum Image Steganography. Defense Technical Information Center (1998). doi:10.21236/ada349102 8. Marvel, L.M., Boncelet, C.G., Retter, C.T.: Spread spectrum image steganography. IEEE Transactions on Image Processing. 8, 1075–1083 (1999). doi: 10.1109/83.777088 9. Kutter, M.: Performance Improvement of Spread Spectrum Based Image Watermarking Schemes through M-ary Modulation. In: Information Hiding. pp. 237–252. Springer Berlin Heidelberg (2000). doi:10.1007/10719724_17 10. Brundick, F.S., Marvel, L.M.: Implementation of Spread Spectrum Image Steganography. Defense Technical Information Center (2001). doi:10.21236/ada392155 11. Patent No.: US 6,557,103 B1, Int.Cl. G06F 11/30. Spread Spectrum Image Steganography. (2003) 12. Fan Zhang, Bin Xu, Xinhong Zhang: Digital Image Watermarking algorithm Based on CDMA Spread Spectrum. In: 2006 12th International Multi-Media Modelling Conference. IEEE (0). doi: 10.1109/MMMC.2006.1651359 13. Nguyen, T.T., Taubman, D.: Optimal linear detector for spread spectrum based multidi- mensional signal watermarking. In: 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE (2009). doi: 10.1109/ICIP.2009.5414121 14. Nezhadarya, E., Wang, Z.J., Ward, R.K.: Image quality monitoring using spread spectrum watermarking. In: 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE (2009). doi: 10.1109/ICIP.2009.5413955 15. Ghosh, S., Ray, P., Maity, S.P., Rahaman, H.: Spread Spectrum Image Watermarking with Digital Design. In: 2009 IEEE International Advance Computing Conference. IEEE (2009). doi: 10.1109/IADCC.2009.4809129 16. Altun, H.O., Orsdemir, A., Sharma, G., Bocko, M.F.: Optimal Spread Spectrum Water- mark Embedding via a Multistep Feasibility Formulation. IEEE Transactions on Image Processing. 18, 371–387 (2009). doi: 10.1109/TIP.2008.2008222 17. Samcovic, A., Milovanovic, M.: Robust digital image watermarking based on wavelet transform and spread spectrum techniques. In: 2015 23rd Telecommunications Forum Tel- for (TELFOR). IEEE (2015). doi: 10.1109/TELFOR.2015.7377589 18. Ipatov, V.P.: Spread Spectrum and CDMA. John Wiley & Sons, Ltd (2005). doi:10.1002/0470091800 19. Introduction to CDMA Wireless Communications. Elsevier (2007). doi:10.1016/b978-0- 7506-5252-0.x5001-7 20. Gerakoulis, D., Geraniotis, E.: CDMA: Access and Switching, http://dx.doi.org/10.1002/0470841699, (2001). doi:10.1002/0470841699 21. Hara, S., Prasad, R.: DS-CDMA, MC-CDMA and MT-CDMA for mobile multi-media communications. In: Proceedings of Vehicular Technology Conference - VTC. IEEE (0). doi: 10.1109/VETEC.1996.501483 22. Agaian, S.S., Sarukhanyan, H.G., Egiazarian, K.O., Astola, J.: Hadamard Transforms. SPIE (2011). doi:10.1117/3.890094 23. Probability Theory of Bit Error Rate. In: Optical Bit Error Rate. IEEE (2009). doi:10.1109/9780470545430.ch7 24. Korhonen, J., You, J.: Peak signal-to-noise ratio revisited: Is simple beautiful? In: 2012 Fourth International Workshop on Quality of Multimedia Experience. IEEE (2012). doi:10.1109/QoMEX.2012.6263880 25. Data Compression. Springer London (2007). doi:10.1007/978-1-84628-603-2 26. Devroye, L.: Non-Uniform Random Variate Generation. Springer New York (1986). doi:10.1007/978-1-4613-8643-8 27. Attari, A.A., Shirazi, A.A.B.: Robust and Transparent Audio Watermarking based on Spread Spectrum in Wavelet Domain. In: 2019 IEEE Jordan International Joint Confer- ence on Electrical Engineering and Information Technology (JEEIT). IEEE (2019). doi: 10.1109/jeeit.2019.8717415 28. Runovski, K., Schmeisser, H.: On the convergence of fourier means and interpolation means. Journal of Computational Analysis and Applications. 6(3), 211-227 (2004) 29. Hua, G.: Over-Complete-Dictionary-Based Improved Spread Spectrum Watermarking Se- curity. IEEE Signal Processing Letters. 1–1 (2020). doi: 10.1109/lsp.2020.2986154 30. Chornei, R.K., Daduna V.M., H., Knopov, P.S.: Controlled Markov Fields with Finite State Space on Graphs. Stochastic Models. 21, 847–874 (2005). doi:10.1080/15326340500294520 31. Huang, Y., Niu, B., Guan, H., Zhang, S.: Enhancing Image Watermarking With Adaptive Embedding Parameter and PSNR Guarantee. IEEE Transactions on Multimedia. 21, 2447– 2460 (2019). doi:10.1109/tmm.2019.2907475 32. Bondarenko, S., Liliya, B., Oksana, K., & Inna, G.: Modelling instruments in risk man- agement. International Journal of Civil Engineering and Technology. 10(1), 1561-1568 (2019) 33. Stasev, Y., Kuznetsov, A., Karpenko, O., Sai, V.: Discrete signals with multi-level correla- tion function. Telecommunications and Radio Engineering. 71, 91–98 (2012). doi: 10.1615/TelecomRadEng.v71.i1.100 34. Kuznetsov, A., Smirnov, O., Kovalchuk, D., Averchev, A., Pastukhov, M., Kuznetsova, K.: Formation of Pseudorandom Sequences with Special Correlation Properties. In: 2019 3rd International Conference on Advanced Information and Communications Technolo- gies (AICT). IEEE (2019). doi: 10.1109/AIACT.2019.8847861 35. Kuznetsov, A., Smirnov, O., Reshetniak, O., Ivko, T., Kuznetsova, T., Katkova, T.: Gen- erators of Pseudorandom Sequence with Multilevel Function of Correlation. In: 2019 IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T). IEEE (2019). doi: 10.1109/PICST47496.2019.9061530