Securing Technique Using Pattern-Based LSB Audio Steganography and Intensity-Based Visual Cryptography

: With the increasing need of sensitive or secret data transmission through public network, security demands using cryptography and steganography are becoming a thirsty research area of last few years. These two techniques can be merged and provide better security which is nowadays extremely required. The proposed system provides a novel method of information security using the techniques of audio steganography combined with visual cryptography. In this system, we take a secret image and divide it into several subparts to make more than one incomprehensible sub-images using the method of visual cryptography. Each of the sub-images is then hidden within individual cover audio files using audio steganographic techniques. The cover audios are then sent to the required destinations where reverse steganography schemes are applied to them to get the incomprehensible component images back. At last, all the sub-images are superimposed to get the actual secret image. This method is very secure as it uses a two-step security mechanism to maintain secrecy. The possibility of interception is less in this technique because one must have each piece of correct sub-image to regenerate the actual secret image. Without superimposing every one of the sub-images meaningful secret images cannot be formed. Audio files are composed of densely packed bits. The high density of data in audio makes it hard for a listener to detect the manipulation due to the proposed time-domain audio steganographic method.


Introduction
Secure transmission of confidential digital data via shared networks is a challenging task in itself. Shared channels like the internet and other local or wide area networks are often considerably fast and cost-effective ways of data transmission. The amount of data that changes hands each day through these media is huge as well. Instead of sending confidential data in works using an image as media for hiding secrets. Steganography techniques, these days are being used for various document marking strategies for protecting removal. Intellectual property theft detection is still a very challenging task due to the bulk of data that circulates through the World Wide Web.
A simple scheme of Digital watermarking or fingerprinting can ensure protection against the removal or misuse of copyrighted data [19,20]. In digital watermarking, an identity symbol or signature data is embedded in a copyrighted document using the methods of steganography. This invisible piece of author information authenticates the data. Digital fingerprinting, on the other hand, embeds a serial number or any other serial data on the document [21][22][23][24][25][26][27][28][29][30][31][32][33]. If the data is duplicated without consent, the fingerprint information does not get copied. It becomes easier to detect pirated documents in this manner.
In this work, these two methods are combined to produce a two-layer system that is more secure than only cryptographic or steganographic systems. The increased complexity reduces the chances of interception or detection of hidden data. Visual cryptography and audio steganography are two fundamental blocks of the proposed system. A brief idea about these two techniques is necessary for a better understanding of the system into consideration. Visual cryptography is a special type of cryptographic method which considers images as matrices of binary octets in terms of their pixel intensity.

Literature Survey
The following are some literatures which used mainly visual cryptography and audio steganography.

Visual Cryptography
Visual cryptography is a special type of cryptographic method that works only on visually comprehensible secret images. This method considers images as matrices of binary octets in terms of their pixel intensity. Using arithmetic means the pixel values are divided into some integer values and from these values, several meaningless component images are formed. When each of these images is combined by mathematical superposition, the original image is retrieved. In Fig. 1, we see a simple grayscale image that has only 8 bits of data per pixel. Using a visual cryptographic scheme this image is divided into two meaningless components. If we combine these two by superposition, the original image will be regenerated.
Depending on the type of the image on which visual cryptography is being applied, visual encryption schemes are classified into three categories; binary image encryption, grayscale image encryption, and color image encryption [10]. The basic scheme of visual cryptography is a secret sharing method was first proposed by Naor et al. in [11]. Their model demonstrated a k out of n(k, n) combinatoric system where k number of identical size share images were constructed from a binary secret image. If and only if at least n out of k shares are combined through visual superpositioning, the original image can be reconstructed. This particular scheme can hide only one secret image but there are multiple secret image hiding schemes for binary images as well [12]. Fig. 2 depicts how Naor and Shamir's algorithm produces share images from the secret image pixel by pixel.
Shyu et al. [13] first introduced visual cryptanalysis in RGB color images. Their method was based on pixel division method. In this method, each pixel of c color image is divided into several sub-pixels where each sub-pixel has c components. One of the components is used for encryption and the other components were filled with black color's value. This method had the disadvantage of pixel expansion c × 3 so the recovered image was distorted to a certain extent.

Figure 1: Visual cryptography on a grayscale image
Among the other genres of visual cryptography on color images region-based visual cryptographic methods [14] can produce distortion-free reconstructed image at the time of decryption. In this scheme, each pixel of the secret image is applied with the same arithmetic function. Our proposed system is based on this technique.

Audio Steganography
In audio steganography, compressed or uncompressed audio files are used as cover media to hide the secret message. Audio files are a collection of sampled binary bits captured from continuous audio signals. These bits streams can be manipulated in the time domain, transform domain, or codec domain.

Time Domain Audio Steganography
Time-domain methods are generally based on least significant bit schemes, silence removal schemes, or echo hiding methods. Least Significant Bit steganography schemes manipulate the lower order bits of audio samples as changing those bits do not cause any significant change in the overall audio quality. Fig. 3 explains the working principle of a simple LSB based audio stego scheme where a 16-bit message is hidden in the lower order 8 bits of audio's sampled data values. First, the LSBs are filtered out so that the spaces become blank, and then it is filled with message data. Generally, audio files have a huge number of bits per sample, so the manipulation process becomes lengthy still audio is considered as better cover media than images due to its data bulk. This work employs a pattern-based LSB steganography scheme in one of its layers. Silence removal is another kind of time-domain audio steganography scheme. It is based on the fact that audio signals like music or voice recordings generally have small pauses or silent zones in between. Reducing small time gaps out of those break time-lapses does not easily get detected by the human auditory system. This method applies to small secret data only plus uneven pauses increase the chance of detection.
The echo hiding method creates an inaudible echo from the original audio and hides data within it. This scheme loses its relevance if the echo becomes audible due to steganographic manipulation [15][16][17] so it is not a very secure technique.

Transform Domain Audio Steganography
Transform domain techniques consider an audio signal as a collection of frequencies and manipulate those frequency packets to hide the data. The human auditory system cannot detect the presence of a weak frequency in the neighbourhood of strong frequencies. If any insignificant changes are done in those weaker frequencies, it has a high chance to remain undetected. This type of technique is broadly categorized into 6 categories; spread spectrum, phase coding, discrete wavelet transform, amplitude coding, tone insertion, and cepstral domain steganography [18,19].
Spread spectrum methods distribute the parts of hidden data throughout the entire frequency spectrum of the cover audio using M-sequence codes and direct sequence spread spectrum (DSSS) method. Phase and Amplitude coding introduce phase and amplitude modification in the original audio. Small phase shifts are very hard to detect and provision of redundancy increases fault tolerance. A discrete wavelet transform manipulates the LSBs of the wavelet coefficient of the cover audio. To ensure the inaudibility of the introduced noise or the hidden data a minimum hearing threshold has to be maintained. Among all the frequency domain audio steganography techniques, cepstral domain or log spectral domain methods are most efficient in terms of embedding, fault tolerance as well as protection against detection. In this method, data streams are concealed in a few selected cepstral coefficients.

Codec Domain Audio Steganography
Codec domain steganography schemes are employed at the time of data transmission by the sender. Here we manipulate the data rate at amplitude modulation and thus small differences are created in the sending and receiving rate of data. This technique shows high detection tolerance.

Motivation
There are several schemes available to implement visual cryptography on images but 'k out of n' scheme of visual cryptography is perhaps the most explored method among them. Naor et al. [11] scheme and also its extension schemes fall into this category [12,13]. These schemes mostly work on binary black and white images or grey tone images and show low tolerance to pixel expansion. The purpose of this work was to propose a general visual cryptographic scheme that can be applied in any kind of uncompressed image be it binary, grayscale, or color image. To overcome the pixel expansion problem, we have used a 'region-based' visual cryptographic technique [14]. Region-based visual cryptography is a comparatively new strategy. This method works by dividing an image into various inherent subparts and applies similar or different encryption on each of the parts. By principle, region-based visual encryption can be applied to only those images where we can separate the object on focus and the background. Most of the real-life images it is not possible. This is one of the reasons why region-based schemes are not much explored although their efficacies are quite satisfactory for real-life usage. Our proposed system treats the entire secret image as a region and applies the same encryption scheme in each unit of the region. This way, the evenness is maintained throughout the image vicinity and the recovered image becomes free from pixel expansion. As for the steganography part of our work, we have used audio as our medium. Different audio steganography literatures are there [24][25][26][27][28][29][30][31]. Bit manipulation related steganography methods generally use the image as the media but due to the higher sensitivity of Human Visual System, chances of detection are quite high. Audio signals on the other side, have a much higher number of samples per unit runtime which ensures better scattering of a secret message within the vicinity. Here our secret is in the form of an uncompressed color image. Each pixel of the image consists of 24 bits of data. To hide such a big message, audio media is a far better choice than an image in both the time and frequency domain.

Proposed System
This security system generates 8 meaningless shares from a secret image using visual cryptography and hides each image-share in 8 separate audio files using a numeric pattern-based LSB audio steganography scheme in the sender side. These 8 shares can then be transmitted in 8 different ways. Each of the sub-images contains 1/8th share of the secret. The original secret image can be revealed if each share is extracted from their respective stego-audios using LSB stego-extraction process and then combined using a Visual decryption algorithm.

Proposed Visual Cryptography Scheme
This particular visual cryptography scheme uses a simple pixel intensity division technique to produce the intended number of basis matrices whose size is the same as the image to be encrypted. Depending on the number of basis matrices, the original image intensity will be distributed. If the number of basis matrices is n, the algorithm will generate 2 n number of shareimages. For this algorithm to work, value of n should be more than 2. Increment in the number of shares will increase the security as well as the computational complexity of the scheme. To design this particular system, we have considered n as 3 and that is why the total number of image shares is 8.
At the beginning of the algorithm, we have initialized the basis matrices B 1 , B 2 , and B 3 with (original image's pixel value/n). So, for this case, it becomes, 3 . At this point, we have created a matrix K of the same size as the original image and filled it with random numbers generated by the uniform distribution. Finally, we constructed 8 share images using bitwise XOR operation (⊕), where S 1 , . . . , S 8 are matrices for the share images.
The same procedure is applied for red, green, and blue parts of the RGB image pixel. The decryption is symmetric but exactly opposite to the encryption scheme. We will have to perform simple bitwise XOR operation on all 8 sub-images and the random value-filled K matrix. Here bitwise XOR behaves like superposition operation. It combines all the shares and cancels out the randomness introduced in the encryption by the random matrix K.

Proposed Audio Steganography Scheme
Once the secret image is encrypted in component images using visual cryptography method, the system proceeds to the audio cryptography scheme. This layer encodes each of the 8 subimages into 8 uncompressed audio files chosen by the user. Audio data are stored in groups of 16-bit pulse code modulated samples. Just like RGB image's r, g and b components, a stereo audio sample consists of 16 bits of left and 16 bits of right components. This algorithm starts information hiding from the left components of each sample and traverses through the samples sequentially. If the left samples are all covered, it moves to the right components. This algorithm introduces a novel audio steganography method using the least significant bit (LSB) replacement strategy. Instead of replacing the same number of LSBs in each sample, we will vary the number of replaced bits using a predefined pattern. The pattern implemented here can be called 4-2-2-4. Here, the first and fourth of the audio samples will have 4 LSBs replaced. In the 2nd and the 3rd samples, the number of replaced bits will be 2 in each. The pattern will repeat itself as 4-2-2-4-4-2-2-4 and so on. The retrieval will require prior knowledge of the predefined pattern for the successful extraction of the secret images.

Receiver Side Algorithm
Algorithm 4: Visual decryption 1. Select all 8 image-shares S 1 , S 2 , . . . , S 8 2. Construct a 3-dimension matrix K with the same size of secret images using random numbers in a uniform distribution 3. Generate the secret image using, Figs. 5 and 6 depict the sender and receiver window of the proposed system. It is a standalone system so the sender and receiver side software can be deployed in any computer system that pertains to their basic requirements (Windows 7 and above, 2 GB RAM, etc.). The stego audios produced by the algorithm are also shown in the respective windows diagrammatically for the convenience of the users.

Result Analysis
The spatial requirement and the efficiency of the proposed system majorly depend on the efficiency and size requirement of the two main components of the system namely the visual cryptic scheme and the audio steganography scheme.

Spatial Analysis of the Visual Cryptography Scheme
This system uses a region-based visual cryptography scheme. This particular scheme produces 8 component images from a secret image. Each sub-part pixel bears 1/8th share of the actual data of the original image pixels. This 1/8th part is not the 1/8th portion of the original images bit values; it is 1/8th portion of the data regarding that specific pixel's intensity and contrast. That means, a simple extraction and combination procedure on the visually encrypted images will not reproduce the original image. Active participation of every sub-image portions is an essential criterion for the extraction.
If the secret image is of size m × n, the resultant cryptic images will also be of the same size. That means m × n number of pixels of the secret image will produce 8 × m × n number of cryptic pixels for a total of 8 sub-images. In the case of greyscale images, each pixel is composed of 8 bits of data so the resultant pixels will be of the size 8 2 × m × n. For a RGB image, each pixel consists of 24 bits of data. So, for this case, the total number of resultant pixels will be 24 × 8 × m × n or 3 × 8 2 × m × n. Since the resultant pixels are far greater in size, the chosen cover media should have a greater number of data bits to hide these secret data beyond suspicion. That is the reason for which we have chosen audio as media where each sample consists of 16 bits of message data and the number of samples per second is in the order of 10 2 . Figure 6: Implementation of the proposed algorithm on the receiver side

Spatial Analysis of the Audio Steganography Scheme
Here we have used an LSB based audio stego scheme that works in the time domain. In the time domain, the audio signal is sampled using 16 bits of binary data. This algorithm generates a simple repetitive number . . According to the current value of this sequence data is being embedded in each sample of the cover audio. That means, if the sequence is currently at 4, the sample will be stuffed with 4 bits of secret message. To hide 8 bits of data we will require either 2 (for . . . If the secret image that is being hidden using this scheme, is a greyscale image with 8 bits of data per pixel data, it will require at least 2 or at the most 3 samples. If the entire image has m × n number of pixels, the lower bound of the required samples will be (2 × m × n + 2) and the upper bound will be (3 × m × n + 2). Here 2 is added as we will need two more samples to specify the width (m) and height (n) of the image.
If the image is a RGB based color image (as the case for our implementation), each pixel will have 3 × 8 = 24 bits of data values. To hide each pixel in the audio sample, we will need 3 samples for 8 bits of red component (sequence value . . .4-2-2), 2 samples for 8 bits of green component (sequence value -4-4-) and again 3 samples for 8 bits of blue component (sequence value -2-2-4 . . .). As the sequence will repeat itself as . . . 4-2-2-4 . . .after that so for the next pixel the sequence value for the red component will return to 4-2-2 as before. So, for this case, the lower bound = upper bound = 8 samples per pixel. For the entire image of size m × n, the total no. of (8m × n + 2) samples are needed to completely hide the image within the audio.

Efficiency of the Proposed Scheme
The efficiency of any encryption algorithm is measured in terms of their chances of being detected. In this proposed system the secret is covered in the lowermost bits of the uncompressed audio signal in the time domain samples as noise. If the listener's ear can detect the slight distortions produced by hidden data, it will cause suspicion and the algorithm will fail. Fig. 7 presents a comparative view of the original audio signal (before encoding with message data) and the encoded audio signal. Out of the left and right components of our stereo audio, here, only the left components are plotted as the left component bears the maximum portion of the hidden data. In the plotted graphs, the minor changes in stego-audio are almost invisible in such an amplified view. Our auditory system is far blunt than this fine-tuned view of the cover audio, so we can conclude that the steganography scheme will not be susceptible to detection in just by listener's perspective. Fig. 8 shows the fast Fourier transform of cover audio. Fig. 9 represents the cover audio in frequency domain. Stego audios are formed after LSB manipulation and that frequency domain representation is shown in Fig. 10.  Fig. 7 depicts unmanipulated cover audio and stego-audio (a portion of image is hidden in it) in time domain. Here we have taken PCM sampling of an uncompressed .wav audio. As we can see, both the signals look almost identical in the time domain plotting. Our main motto was to minimize the difference between original audio and stego audio so that the noise remains low. Cryptanalytic attacks are time domain analysis based. Our algorithm keeps the noise margin low so that a common listener does not get suspicious by just hearing or doing standard cryptanalysis attacks on the stego audio. In the image of Fig. 8, we have presented the fast Fourier transform of time domain audio signals shown in Fig. 7. Fig. 8a is the frequency domain plotting of original audio while Fig. 8b is for stego audio. Hiding the presence of noise or secret data is easier in the time domain. However, a frequency analysis of the same data easily reveals the presence of foreign data in the media. In Fig. 8, both original signal and stego signal look identical; we are not seeing any significant difference even after stegano-manipulation. This increases the security of our algorithm even more.

Comparative Analysis
This section deals with a comparison analysis with different methods. Initially, we have made an attribute-wise comparison analysis with different techniques mentioned in Tab. 1. Using this table, we have shown comparative analysis of different characteristics of our work with another state-of-the-art existing works.
• Quantitative Analysis Peak Signal to Noise Ratio (PSNR) measures the maximum noise, the signal tolerates is given as where C acts as a host image, S represents the stego image, C max shows the maximum value of a pixel in both original and stego image, x, and y are subscripted variables, M and N indicate image resolution in pixels.  We have presented better efficiency in Tab. 3, in which our proposed method has been compared with SCC Method, PIT, ST-FMM, Karim's Method, and CISSKA-LSB using PSNR.
In Fig. 11, we have shown the graphical representation, and we can say that the value of PSNR is greater than other schemes. This result analysis justifies the motivation of our work.

Conclusion
In this work, an intensity-based visual cryptography scheme is used along with an additional layer of audio steganography in the time domain. Here, the intensity of secret image pixel is distributed among several basis matrices. Superimposition of various combinations of those basis matrices is used to generate 8 distinct share images. The shares are then hidden in audio samples of uncompressed stereo files. The method is quite secure as the data image is completely hidden within the audio. The change in sound quality due to manipulation is quite insignificant and thus listening to the audio would not spring up suspicion. Visual cryptography schemes often cause data loss due to pixel expansion [32][33][34][35][36] but this algorithm can reconstruct the secret image without data loss. Though the process of encryption and decryption takes more time due to the presence of a large number of data bits in both secret image and hiding medium, this scheme works for both colour and greyscale images in uncompressed form. The PSNR value 48.0843 is also quite high as compared to some other current works. The cover media can also be of mono or stereo form. This work can be improvised using transform domain techniques in place of LSB based time-domain methods.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.