|Computers, Materials & Continua
An Optimal Lempel Ziv Markov Based Microarray Image Compression Algorithm
1Department of Electronics and Communication Engineering, University College of Engineering, BIT Campus, Anna University, Tiruchirappalli, 620024, India
2Department of Medical Equipment Technology, College of Applied Medical Sciences, Majmaah University, Al Majmaah, 11952, Saudi Arabia
3Department of Computer Engineering, College of Engineering and Computing, Al Ghurair University, Dubai, 37374, United Arab Emirates
4Automotive Research Center, Vellore Institute of Technology, Vellore, 632014, India
5Federal University of Piauí, Teresina, 64049-550, PI, Brazil
*Corresponding Author: R. Sowmyalakshmi. Email: firstname.lastname@example.org
Received: 15 March 2021; Accepted: 20 April 2021
Abstract: In the recent years, microarray technology gained attention for concurrent monitoring of numerous microarray images. It remains a major challenge to process, store and transmit such huge volumes of microarray images. So, image compression techniques are used in the reduction of number of bits so that it can be stored and the images can be shared easily. Various techniques have been proposed in the past with applications in different domains. The current research paper presents a novel image compression technique i.e., optimized Linde–Buzo–Gray (OLBG) with Lempel Ziv Markov Algorithm (LZMA) coding technique called OLBG-LZMA for compressing microarray images without any loss of quality. LBG model is generally used in designing a local optimal codebook for image compression. Codebook construction is treated as an optimization issue and can be resolved with the help of Grey Wolf Optimization (GWO) algorithm. Once the codebook is constructed by LBG-GWO algorithm, LZMA is employed for the compression of index table and raise its compression efficiency additionally. Experiments were performed on high resolution Tissue Microarray (TMA) image dataset of 50 prostate tissue samples collected from prostate cancer patients. The compression performance of the proposed coding esd compared with recently proposed techniques. The simulation results infer that OLBG-LZMA coding achieved a significant compression performance compared to other techniques.
Keywords: Arithmetic coding; dictionary based coding; Lempel-Ziv Markov chain algorithm; Lempel-Ziv-Welch coding; tissue microarray
Microarray analysis is a technology which enables the analysis and classification of genes in a rapid manner. At present, microarray is the main tool for gene-based investigations . Microarray technique is used to monitor large number of tissue array images in a concurrent manner. Every microarray experiment produces numerous large-sized images which are hard to store or share . Such massive number of microarray images imposes a new challenge for storage space and bandwidth resources. In the absence of high speed internet, it is difficult or sometimes impossible to share microarray images from some parts of the world. Several studies have been conducted to handle the storage of massive number of microarray image datasets in an efficient manner. Image compression is one of the ways to handle such huge volume of images.
In general, the intention of image compression is to transfer an image with fewer bits. Image compression has three major components such as recognition of redundant data in image, appropriate coding method and transformation method. The most important image compression standard is JPEG and its quantization can be divided into two kinds namely, scalar and vector (SQ and VQ). It is a non-convertible compression method and is commonly used in the compression of image with some loss of information. The foremost intention of VQ is to generate an optimal codeword that has a collection of codewords, where an input image vector is allocated based on minimum Euclidean distance. The popular VQ method is Linde–Buzo–Gray (LBG) model. LBG method offers simplicity, adaptability and flexibility. Further, the method depends on lower Euclidean distance between image vectors and respective codewords. It can generate local optimal solutions and in other terms, it fails in providing the best global solutions. The concluding solution of LBG algorithm is based on arbitrarily-produced codebook at the initial stage.
An enhanced form of LBG algorithm termed ‘ELBG’ algorithm was introduced in the literature which enhances its local optimal solution . ELBG is mainly applied in the optimal exploitation of codewords, an efficient tool to resolve the difficulties of clustering techniques. The outcomes pointed out that ELBG outperformed LBG and is independent from initial codebook. Projection VQ (PVQ) was presented based on the adoption of quadtree decomposition to segment the images into different block sizes. This technique was defined by Single Orientation Reconstruction (SOR) and enhanced results were produced in terms of objective as well subjective ways in comparison with usage of predefined block sizes . Object-relied VQ is carried out in three levels namely, initialization, iteration and finalization. Initially, it applies Max–Min algorithm while at the second level, it makes use of adaptive LBG algorithm. The third level removes the redundant data from codebook .
To overcome the limitations of existing algorithms found in the literature, the authors propose a GWO model in current research paper to construct a novel codebook that results in better VQ in terms of high PSNR and good quality reconstructed image. This paper presents a novel image compression technique i.e., Optimized Linde–Buzo–Gray (OLBG) with Lempel Ziv Markov Algorithm (LZMA) coding technique called OLBG-LZMA for microarray images without any loss of quality. The proposed OLBG-LZMA model is used to compress high resolution Tissue Microarray (TMA) images. LBG model is generally used in designing a local optimal codebook for image compression. Codebook construction is treated as an optimization issue and is solved with the help of Grey Wolf Optimization (GWO) algorithm. After the codebook is constructed by LBG-GWO algorithm, LZMA is employed in the compression of index table and to increase the compression efficiency. Experiments were performed on high resolution Tissue Microarray (TMA) image dataset containing 50 prostate tissue samples acquired from prostate cancer patients. The compression performance of the proposed method was compared with recently proposed techniques.
2 Literature Review
Quad Tree (QT) degradation is composed of VQ with different sized blocks under the examination of local sites . At the same time, the researchers in the literature  noticed that the complexity level of local regions is highly important, when compared with homogeneity. In Vector Quantization (VQ) of images with different block sizes, tedious regions of an image are quantified using Local Fractal Dimension (LFD). A fuzzy-based VQ, which makes use of competing accumulation and new codeword migration approach, was introduced to perform image compression . Likewise, the authors in the literature  employed a learning process which methodologically develops fast fuzzy clustering-based VQ techniques by integrating three learning modules . On the other hand, a Multivariate VQ (MVQ) method was developed using Fuzzy C-Mean (FCM) to compress Hyperspectral Imagery (HSI). The experimentation inferred that the MVQ is superior to traditional VQ in terms of Mean Square Error (MSE) and the quality of recreated image . In the literature , it is noted that the images can be compressed using the transformed VQ, in which the quantized image can be transformed using Discrete Wavelet Transform (DWT) .
Recently, evolutionary algorithms have been introduced to solve many engineering and technological problems. The study conducted earlier  employed Ant Colony Optimization (ACO) algorithm for codebook construction. A codebook was developed using ACO which mimics the wavelet coefficients in a bidirectional graph and defines an appropriate method to place the edges on graph. The method was found to be efficient than LBG though it requires more CT. A fast ACO algorithm was introduced in the literature  which generates the codebook based on the observation of repetitive computations present in ACO algorithm. Since the repetitive computations are identified in codebook design, the convergence time got reduced than the classical ACO algorithm. Furthermore, a PSO-based VQ was presented, which updates the particle global best (gbest) and local best (pbest) solutions which surpassed the LBG method .
A fuzzy-based PSO algorithm, proposed in the study , was able to provide efficient codebook globally. The enhanced results were attained in comparison with PSO and LBG methods. QPSO was introduced in the literature  to resolve the knapsack problem and maximize the PSO results. The attained simulation outcome of QPSO method was optimal than the traditional PSO. A tree-based VQ was presented for fast codebook design by applying the triangle inequality which in-turn offered better codewords. However, the method required high Computation Time (CT) . Hence, in the study , the researchers presented a fast codebook search method which applies two test criteria to enhance the image encoding process with no additional image distortion. From the results, it is noted that the CT got reduced to an average of 95.23% for a codebook of 256 codewords . A near optimal codebook was developed using Bacterial Foraging Optimization Algorithm (BFOA) to attain better reconstructed image quality with high PSNR .
Fuzzy Membership Functions (FMF) were selected as objective functions and were optimized using the modified BFOA. A HBMO algorithm-based VQ technique was presented in one of the investigations conducted earlier . It achieved better image superiority and codebook with minimal distortion than PSO, QPSO and LBG algorithms altogether. The researchers  used FA to design the codebook for VQ. FF algorithm was stimulated by the nature of flashing patterns for communication and mating purposes. FFs, with high brightness, are attracted by low bright FFs or else random search takes place, which reduces the exploration part. Hence, FA was altered in the literature  which provided a particular mechanism to follow in case of unavailability of brighter FFs in the search space. A bat algorithm -based codebook design was devised with a proper choice of tuning variables and the study ensured better performance with higher PSNR value and convergence time in comparison with traditional FF algorithm. An efficient VQ method was proposed by the researchers in  to encode the wavelet decomposed color image using Modified Artificial Bee Colony (ABC) optimization technology. The proposed model was compared with Genetic Algorithm (GA) and classical ABC. The model attained better PSNR and good reformed image supremacy . In spite of the established models in literature, there is a gap still exists to be fulfilled.
3 The Proposed Methodology
The proposed model involves two stages such as OLBG-based codebook construction and LZMA-based codebook compression. The working processes are detailed in the succeeding sections.
3.1 Vector Quantization (VQ)
VQ is defined as a block coding model which is used to compress images at some loss of information. In VQ, codebook construction is an essential process. Let
The above equation is subjected to constraints given in Eqs. (2) and (3)
3.2 LBG Algorithm
LBG is defined as a Scalar Quantization (SQ) algorithm and was devised in the year 1957 by Lloyd. By 1980, it was generalized as VQ. It employs two predefined conditions for input vectors to determine the codebook. Let
• Divide the input vector into different groups using minimum distance rules. The resultant block is saved in a
• Identify the centroid of every portion. The previous codewords can be replaced with any available centroids:
• Go to step 1 till no modifications occur in
3.3 The Proposed OLBG-LZMA Algorithm
The overall operation of the proposed OLBG-LZMA algorithm is illustrated in Fig. 1. The input image should be divided into non-overlapping blocks which then undergo quantization by LBG model. The codebook, which has been deployed using LBG approach, is trained first using GWO method in order to satisfy the needs of global convergence and assures the phenomenon. The index numbers are sent over transmission medium and reconstructed at the destination using decoder. Both reformed index and the corresponding codewords are arranged properly in order to generate the decompressed image size which is almost equal to the given input image. GWO algorithm was developed by Mirjalili et al.  in 2016 as a simulation from the hunting behavior of grey wolves. Grey wolves are assumed to be superior predators which often reside in group. Based on its hunting nature, the wolves are divided into alpha (α), beta (β), delta (δ), and omega (ω). The leader wolves are known to be α wolves which decide the place of sleep, hunting destination etc. The decisions taken by α are followed by the remaining members in the group. β is the second level of grey wolves which is referred to be subordinate wolves that guide the α wolves in decision making process. The subordinates can make decision in the absence of dominant wolves. Third level grey wolves are grouped under ω wolves which act as a scapegoat. When the wolves do not come under α, β, or ω, they are known to be subordinate or δ wolves.
These wolves do not dominate α and β wolves, but dominate over ω. Different levels of grey wolf hunting are listed below.
• Approaching prey
• Encircling prey
• Attacking the prey.
GWO algorithm is composed of two phases namely, exploration and exploitation. Initially, it is applied to explore the best solutions in local search area. The grey wolf surrounds and attacks the prey when searching for reliable solutions. Secondly, prey is identified by exploring the local space area. While encircling prey, the wolves finds the location of a victim and surrounds it. Hence, encircling prey is expressed as given herewith.
where the elements of ~a have been reduced in a linear manner from 2 to 0 during various iterations whereas r1 and r2 define the random vectors in [0, 1].
The default nature of grey wolves is to identify the locations of prey and encircle it. Hunting operation is carried out by α wolf, while β and δ involves in a few scenarios. Also, the numerical simulation of grey wolves is assumed to be α, β and δ which embeds massive aspects in terms of prey location. Thus, the top three solutions are saved which result in improving the position of search agents and is described as depicted in Eq. (10).
Step 1: Initializing parameters: Here, the codebook constructed using LBG approach is declared to be the initial solution whereas the remaining solutions are developed in a random manner. The attained solution refers to a codebook of NC codewords.
Step 2: (Selecting the present best solution): The fitness of each solution is processed using Eq. (1) and the higher fitness location is selected as an optimal result.
Step 3: (Generating novel solution): The position of grey wolves is updated using the location of prey. When an arbitrarily-created number (K) is higher than ~a, then the bad locations are replaced with newly-identified positions and the best location is kept unaltered.
Step 4: Rank the solutions under the application of fitness function and select the better solution.
Step 5: (Termination criteria): Follow the Steps 2 and 3 until the stopping condition is reached.
3.4 LZMA-Based Compression Process
LZMA coding is used in the compression of generated codebooks. It is used to compress real time data which is generated rapidly. Initially, the sensor nodes observe the physical environment. The sensed value is then tested for anomalies and the label value is appended. Label value is appended by the sensors to every individual sensed data. The labelled value ‘1’ is appended to the sensed data, when the latter differs from actual data i.e., an abnormal value is found. Likewise, the labelled value ‘0’ is appended to the sensed data, when the latter has no deviation from the actual data, i.e., normal value is found. The sensor node appends the label value to sensed data after which the compression is performed. Sensor node runs the LZMA algorithm whereas the compression algorithm compress the labelled data efficiently, irrespective of the label value. LZMA algorithm uses the dictionary, sliding window concept and range encoder to efficiently compress the labelled data. The compressed data is then transmitted to BS. BS receives the compressed data and performs decompression process. As LZMA is a lossless compression technique, the reconstructed data remains the exact replica of original data with no loss of information.
4 Performance Validation
In order to understand the effectiveness of the proposed model, TMA image dataset was used. The dataset contains 50 prostate tissue samples collected from prostate cancer patients. In addition, the results were compared with existing methods under different measures.
4.1 Dataset Details
For experimentation, a publicly-available microarray image dataset was employed from Zhong 2017. Besides, the particulars of the dataset is given in Fig. 3 and the sample TMA images are shown in Fig. 4. The dataset contains high resolution Tissue Microarray (TMA) image dataset containing 71 prostate tissue samples of cancer patients digitized by Carl Zeiss Axio Scan.Z1 scanner. The resolution of the image is 7000
Tab. 1 and Figs. 5–8 show the results of PSNR analysis for the proposed algorithm against existing models on the applied images 1–4.
Fig. 9 shows the average PSNR values obtained by different methods under different bit rates. In addition, an average of all the PSNR values attained for the applied test images is tabulated in Tab. 2. For the applied test_image 1, under the least bit rate of 0.1875, the methods used for comparison such as LBG, PSO, QPSO and HBMO attained a PSNR value of 24.0 which was lower than other methods. At the same time, both FF and CS algorithms obtained a PSNR value of 24.1 and 24.3 respectively. However, the newly developed OLBG-LZMA model reached a maximum PSNR value of 25.3. Under a moderate bit rate of 0.3750, the presented algorithm attained the highest PSNR value of 27.1 whereas the CS, FF, HBMO, QPSO, PSO and LBG models achieved the least PSNR values of 26.2, 26.0, 25.9, 25.7, 25.5 and 25.3 respectively. Under the highest bit rate of 0.625, the OLBG-LZMA method yielded significant results with a PSNR value of 33.2 which was much higher than the compared algorithms.
Similarly, for all the applied test images, the proposed OLBG-LZMA algorithm outperformed the existing algorithms in an efficient way. In table values, it is noted that the PSNR values start to rise with an increase in bit rate. This phenomenon can be observed from the PSNR values obtained for the least bit rate of 0.1875 and highest bit rate of 0.625 bit rate. The simulation results portray that the proposed OLBG-LZMA model is an efficient one compared to other algorithms under different codebook sizes.
Experimental validation was performed in Windows 10 operating system while its specifications were Intel(R) Core(TM) i7-7500U and 2.70 GHz CPU with 8 GB RAM. In addition, the execution code was developed and simulated in MATLAB. Tab. 3 and Fig. 10 shows the obtained average CT of various algorithms under different bit rates. The authors used 30 codebooks with 20 iterations. For a codebook of size 8 and a bit rate of 0.1875, the average CT required is shown in the table. From the table, for applied image 1, the highest CT of 977.200 was demanded by CS algorithm. In line with this, both HBMO and FF algorithms showed slightly lower CT values such as 890.254 and 877.123 respectively. The proposed OLBG-LZMA managed well with a moderate CT of 685.246.
But, the existing LBG, PSO and QPSO algorithms achieved effective CT performance with least values of 3.3715, 254.357 and 261.299 respectively. It is clear that the LBG model needs a least CT of 3.3715 s which is highly appreciable. But, its poor performance in PSNR value limits its applications. However, the proposed OLBG-LZMA model has moderate CT and it yielded significant PSNR value which makes it preferable than the CT.
Image compression techniques are generally used to minimize the number of bits needed to store and share microarray images. The current research paper proposed an effective OGWO-LZMA model to compress microarray images. The proposed model involves two stages such as OLBG-based codebook construction and LZMA-based codebook compression. After the construction of codebook using LBG-GWO algorithm, LZMA is employed to compress the index table and increase the compression efficiency in addition. Experiments were performed on high resolution microarray dataset and the compression performance of OGWO-LZMA was compared with other methods. The simulation results inferred that the OGWO-LZMA algorithm achieves significant compression performance. At the same time, the OLBG-LZMA achieved a moderate CT and provided significant PSNR value which makes the CT, unnoticeable. In future, the performance of the OGWO-LZMA model can be improved by incorporating hybrid metaheuristic algorithms.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.