A Weighted Spatially Constrained Finite Mixture Model for Image Segmentation

: Spatially Constrained Mixture Model (SCMM) is an image segmentation model that works over the framework of maximum a-posteriori and Markov Random Field (MAP-MRF). It developed its own maximization step to be used within this framework. This research has proposed an improvement in the SCMM’s maximization step for segmenting simulated brain Magnetic Resonance Images (MRIs). The improved model is named as the Weighted Spatially Constrained Finite Mixture Model (WSCFMM). To compare the performance of SCMM and WSCFMM, simulated T1-Weighted normal MRIs were segmented. A region of interest (ROI) was extracted from segmented images. The similarity level between the extracted ROI and the ground truth (GT) was found by using the Jaccard and Dice similarity measuring method. According to the Jaccard similarity measuring method, WSCFMM showed an overall improvement of 4.72%, whereas the Dice similarity measuring method provided an overall improvement of 2.65% against the SCMM. Besides, WSCFMM significantly stabilized and reduced the execution time by showing an improvement of 83.71%. The study concludes that WSCFMM is a stable model and performs better as compared to the SCMM in noisy and noise-free environments.


Introduction
Image segmentation is a cardinal area in the domain of computer vision [1][2][3][4][5]. Its employment is found in almost all walks of life, ranging from environment-machine interaction to computer-aided diagnosis [6]. It has numerous applications in neurological diagnostics such as in functional imaging, operational planning and quantitative analysis. The segmentation task becomes challenging due to the presence of noise, low contrast, blur boundaries, poor spatial resolution, partial volume effect and inhomogeneity [7]. Usually, the issues are encountered in an unsupervised segmentation, which is inevitable since gathering larger sets of manually segmented images is practically impossible [6].
A number of robust image segmentation algorithms such as level sets, mean shift, normalized cuts, region splitting and merging, were introduced to deal with this issue. The statistical algorithms were also introduced for the purpose of clustering. Out of all these methods, a statistical method Finite Mixture Model (FMM) is frequently relied upon.
FMM assumes that the density of the observed data is a weighted sum of a parametric template distribution. It belongs to the family of unsupervised learning methods and is found highly suitable for solving unsupervised clustering problems. It groups the data by investigating the statistical coherence. In actual practice, it was observed that the topology of the data plays a vital role in providing improved segmentation results. For example, it was seen that spatially neighboring pixels are combined to form the same cluster by augmenting the generative model with ad hoc topology or by heuristic smoothing [8].
The model is tractable and also became a source for producing important variants, like Student's-t Mixture Model (SMM) and Gaussian Mixture Model. The limitation of these models is that they are noise sensitive models and produce unacceptable results with noisy images. Since FMM is constructed over the assumption that image pixels are not statistically dependent on each other, therefore, spatial information about pixels forming various objects in an image is unavailable. Later research compensated this de ciency by including Markov Random Field (MRF) model into the framework. This new arrangement turned out to be an impressive element for de ning the relationship among image pixels. These relationships are de ned by considering the corresponding labels of the image pixels. The inclusion of MRF facilitated in producing appreciative results, but its computational cost was found to be higher. This computational cost was subsequently reduced by approximating Gibbs energy function.
Spatially Variant Finite Mixture Model (SVFMM) also belongs to the family of FMM. The model brought a convincing improvement over the conventional FMM by imposing spatial smoothness prior to contextual mixing proportion. The model assumes that the contextual mixing prior is a random variable. In SVFMM's con guration, the contextual mixing proportion is a probabilistic vector due to which it becomes essential that its value should be positive and secondly the total probability of the system should be equal to 1. By adding a gradient projection algorithm in the M-Step of EM-algorithm, a close form solution for contextual mixing proportion is achieved.
An improvement over SVFMM was proposed by Spatially Constrained Finite Mixture Model (SCFMM) which replaces the gradient projection algorithm with the convex quadratic programming. However, both of these models were unable to directly obtain contextual mixing proportions from the given data. To address this issue, a prior distribution is imposed on contextual mixing proportions. This prior distribution depends upon MRF. SCFMM turned out to be a more ef cient model when compared with the SVFMM because it facilitates in obtaining closed-form equation through the EM algorithm. To improve computational ef ciency, contextual mixing proportion is explicitly modeled as a probabilistic vector. The model is comparatively ef cient but its application is limited to grayscale images. Likewise, a detail-preserving mixture model was also proposed. The model picks up a pixel, investigates its neighborhood, and then assigns different weight values to different pixels according to the contextual arrangement of the surrounding pixels. Due to this arrangement, the log-likelihood function of the model becomes complex and enhances the computational complexity [9]. Fig. 1 taken from [10] provides an overview of the proposed model. SVFMM assumes that x i is an observation at ith pixel of an image (i ∈ {1, 2, 3, . . . , N}) modeled by supposing that its data is independent and is identically distributed (i.i.d). It proposed a modi cation for pixel labeling approach in conventional FMM. SVFMM supposes a mixture model which is characterized by K number of components such that each component possesses its vector of density parameters θ j . The model also includes π i j = P j | x i into the list of previously existing parameters. π i j are the probabilities or labels which describe the belonging of i th pixel to the j th cluster. The model works under the in uence of two constraints, i.e., 0 ≤ π i j ≤1 and K j=1 π i j = 1. Let us suppose that = {π 1 , π 2 , π 3 , . . . , π N } represents the probability vector for pixel 'i' and the set of component parameters, i.e., mean and variance {µ, σ } are represented by = {θ 1 , θ 2 , θ 3 , . . . , θ K }, then according to the proposed SVFMM the probability density function at the observation x i can be described in Eq. (1).

Mathematical Foundations for Spatially Variant Finite Mixture Model
where ∅ x i | θ j follows Gaussian distribution as shown in Eq. (2) under the in uence of the given parameters, i.e., the mean and the variance. The parameters of the model are updated by using the Expectation Maximization (EM) algorithm. The algorithm does not takes into account the spatial information of pixels. This is because the data is assumed to be independent and identically distributed. The gap was addressed by considering Maximum a Posteriori (MAP) approach and by integrating the Markov Random Field (MRF) model into the framework. The proposed SVFMM manages to have a prior distribution for the set of parameters which make use of Gibbs function as shown in Eq. (3) for incorporating a spatial description of pixels.
Primarily, the Gibbs function as described according to Eq. (3) facilitates in making probabilistic calculations about a pixel's label for its relationship with a speci c cluster. Fig. 2 taken from [11] illustrates the idea of pixel labels representing distinct clusters of image.
where E( ) = β N i=1 VN i ( ), −β represents regularization parameter whereas, Z serves as partition function and is described according to the Eq. (4). For pixel label vectors π m , VN i ( ) describes the clique potential by looking into the neighborhood characterized by N i where 'i' represents the i th pixel. The N i was described with the help of Eq. (5).

How Gibbs Distribution Ensures Spatial Continuity?
According to [12], a digital image is de ned by the nite rectangular grid of size M × N. Formally, it can be described according to Eq. (6).
Usually, these images consist of k number of nite classes. These classes are represented by Q = {q 1 , q 2 , q 3 , . . . , q k }. The classi cation process, i.e., X = X ij : X ij ∈ Q aims to allocate a corresponding unique index from the user-de ned class to each pixel in the image, i.e., X ij = q k where q k ∈ Q, which means that each pixel at (i, j) belongs to class q k . If at location (i, j), the neighborhood of a pixel is denoted by η i,j then the following two conditions are taken into account.
Here Eq. (7) describes that a pixel cannot be a neighbor to itself whereas, Eq. (8) de nes criteria for de ning neighborhood of a speci c pixel within the above mentioned 2D grid and additionally it represents a collection of subsets of L. With this understanding, the formal description of neighborhood system η of L can be given according to Eq. (9).
Neighborhood system has its roots in cliques. Formally, a clique C is a subset of the 2D grid, such that C ⊆ L. The set of clique C may consist of a single pixel or a group of pixels which are neighbors to each other under the in uence of a speci c neighborhood system. Therefore, for the pair (L, η) the set of all such cliques is de ned by Eq. (10). For image segmentation, a homogeneous neighborhood system plays an important role. This system can be described with the help Eq. (10).
The values m, n, i, j, d and c are of type integer. Therefore, the rst order neighborhood, η 1 which takes into account the four nearest neighbors can be de ned by putting c = 1. Similarly, the second-order neighborhood which considers eight nearest neighbors can be obtained by putting c = 2. The growing order of neighborhood supports the growth of corresponding clique types. However, generally, the rst or second-order neighborhood system is employed. This con guration is found suf cient for ensuring spatial continuity among the image pixels. Each clique type holds a positive parameter. The clique potential for one point clique is described by The two-point clique is de ned as Both α k , β are positive constants and x is the index of the input image class to be determined for the pixel being considered and may take any value from Q. For making a practical decision, the entire set of probabilities for a pixel belonging to each class is computed and from this set a maximum probability is picked up for associating a speci c pixel with a speci c class.
Z represents normalization, constant T is a temperature parameter and C c V (x) | x=q k is a potential function.
At this stage, by observing the events and the given model, there is a need to have the probability that can describe the unknown model parameter. For obtaining this description, a posterior probability is computed by using Eq. (13).
The unknown parameters of the mixture model are estimated by using the EM algorithm. The algorithm consists of E-step and M-step. The values of hidden variables are computed at E-step according to the Eq. (14).
The expected values according to SVFMM are maximized by using M-Step as described by Eq. (15). This step primarily aims to compute the log-likelihood function and provides a complete dataset.
where g (u i , m) = π i j − π m j 2 , t describes the iteration step. The parameter of the model, i.e., the mean and the variance µ are updated according to Eqs. (16) and (17) Though Q MAP with respect to label parameter {π i j } can be maximized but the model has certain limitations. It is primarily unable to provide a closed-form of update equations. Secondly, the maximization procedure was supposed to work under the in uence of certain constraints, i.e., 0 ≤ π i j ≤1 and K j=1 π i j = 1. To overcome this dif culty, SVFMM makes use of the Gradient Projection Algorithm for nding the label parameters which maximizes Q MAP function [13]. The authors in [13] and [14] further improved the maximization step by differentiating Eq. (15) with respect to π i j , and equating the result to zero reached to Eq. (18) [14].
According to [15], the segmentation results are inconsistent if the value of the regularization parameter is close to 0, but with the value more than or equal to 2, the results are consistent. Therefore, the second half of the Eq. (18) suggests that the part of the equation  is twice as big as that of the part proposed in SCMM. This is a signi cant difference which notably impacted the segmentation results. The accuracy level was raised by using a weight factor with the new M-step. This weight factor was included according to the multiplicative property of equality. According to this property, suppose that if v = w, then αv = αw, which implies that v = αw α . For segmenting T1-Weighted simulated brain MR Images, this weighting factor, through trial and error method was 0.25. Therefore, the proposed weighted maximization step can be given according to the Eq. (19).
The above-shown maximization step was used in WSCFMM. This model outperformed the conventional SCMM and ensured reliability both in terms of time and precision.

Algorithm for SCFMM
The algorithm used in this study consists of the following steps [16].

E-Step
The latent variable z is computed at E-step with the help of Eq. (14).

Evaluating Posterior Probability Eq. (13)
Here, convergence for parameter values and log-likelihood function is investigated. If the observed error is not less than 0.0001, then return to step 2 and set t = t + 1.

Experimental Data
The proposed model was used for segmenting T1-Weighted simulated brain MRIs. These MRIs are the specimen of complex multispectral images and can be downloaded from [17]. These images contain different levels of noise and different levels of intensity non-uniformity (INU). The noise levels vary from 0% to 9% comprising of only odd levels, whereas, the INU level varies from 0% to 40% with an increment of 20% respectively. For making different possible combinations of data, one factor was kept constant and the other was gradually varied. For example, by keeping the noise at 0%, INU was varied from 0% to 20% and then to 40%. This arrangement provided three images. Subsequently, the noise level was raised to 1% and INU level was again varied from 0% to 20% and then to 40%. This step provided another set of three images. The process was repeated for all levels of noise and all levels of INU. Eventually, a dataset with six different combinations of T1-Weighted normal brain MRIs was collected. Afterwards, this dataset was segmented through both models, i.e., SCMM and WSCFMM, one by one. To investigate the preciseness in the segmented images, ROIs from these segmented images were extracted. The Dice [18] and Jaccard [19] similarity measuring methods were used to compute the degree of similarity between the extracted ROI and GT segment. A higher degree of similarity indicates a higher level of accuracy and a lower degree of similarity indicates a lower level of accuracy.

Results
Both the models, i.e., SCMM and WSCFMM were used to segment the test data. The results produced by both the models were analyzed from three different angles including the visual inspection of the results, the similarity between segmented ROI and the corresponding GT segment. The GT is obtained from the image with 0% noise and 0% INU.

Human Visual Inspection
The visual comparison between the results produced by both the models showed that WSCFMM was able to control the outliers and gave an appreciative level of performance in ideal conditions and in conditions where the candidate image is affected from intensity INU and/or noise.
Figs. 3 and 4 show the performance of WSCMM in comparison to SCMM. The results shown in the gures were obtained from the images affected with a higher level of noise, i.e., 7% and 9%, combined with different levels of INU. The visual investigations of these results show that besides outliers, SCMM is also unable to provide a well-de ned segment. The circled region in Figs. 3 and 4 provides important observations which led us to the conclusion that a substantial portion of the information is missing. However, at the same settings of noise and INU, WSCFMM not only controlled the outliers but it also captured that missing information.  Figure 4: Visual comparison of images with 9% noise and INU varying from 0% to 40% segmented by SCMM and WSCFMM

Computational Comparison with Ground Truth Segment
An ROI was extracted by using both the models one by one. The extracted ROI by each model was compared with the GT. The degree of similarity of this extracted ROI was found by using Jaccard and Dice similarity measuring methods. These methods can be described with the help of the Eqs. Results presented in Tab. 1 con rmed that WSCFMM outperformed SCMM in the noisy and noise-free environment. It can be observed that the performance of WSCFMM is appreciative either in a situation where there is no noise and/or in a situation where there is a mild noise.
When the accuracy of the results was observed slice by slice, it was found that the Jaccard gave a higher average accuracy as compared to the Dice similarity measuring method, as shown in Fig. 5.  Similarly, to see an overall contribution of WSCFMM, an overall average accuracy of both the models was computed. This computation highlighted the signi cance of WSCFMM. It conrmed that according to the Jaccard similarity measuring method, WSCFMM has contributed an overall improvement of 4.76%, whereas, according to the Dice similarity measuring method, WSCFMM introduced an overall improvement of 2.65%, as shown in Fig. 6. This is a substantial improvement and it is expected to signi cantly affect the segmentation accuracy during the process of MRI analysis. 4

Time Optimization
During MRI's examination, 'time' is a critical issue. As chances for the accumulation of a large MRI dataset for a single subject cannot be completely ruled out, so it is highly desirable to work with a model that has a predictable nature. WSCFMM is one such model that uses a small amount of processing time without compromising the accuracy.
The case of SCMM is weak in terms of its processing time. It utilizes a variable amount of time for processing the same dataset under similar experimental conditions. By introducing the derivational xture and by adding a weight factor, WSCFMM has ef ciently controlled the issue of variable execution time. Besides, it has stabilized the model and has signi cantly enhanced the segmentation accuracy.
The mutual comparison of SCMM and WSCFMM for the processing time is presented in Fig. 7. The results were obtained by considering all levels of noise, i.e., from 0-9%, and all level of INU, i.e., from 0-40%. According to the results presented in Fig. 7, WSCFMM has displayed a predictable behavior by utilizing the approximately constant amount of processing time for each image. Whereas, SCMM processes the same data by using a different amount of time. Fig. 8 presents the mutual comparison of the average performance of both the models for segmenting every single combination of data. As mentioned earlier, this dataset consists of six different combinations. Every single combination, which in turn consists of three different images, is represented by one anchor point. Fig. 8 shows a mutual comparison of the average time used by both the models for segmenting every single combination of data. In terms of ef ciency, it has depicted a marked improvement of WSCFMM over the SCMM. The overall contribution of WSCFMM in terms of processing time utilization is presented in Fig. 9. The WSCFMM not only made the model predictable and reliable but was also instrumental in bringing down the execution time from 1272.04 s to 207.17 s. This leads to WSCFMM's achievement of 83.71% improvement. This is an encouraging improvement and broadens the chances for comfortably handling large MRI datasets.  The segmentation results produced by the WSCFMM enjoy an excellent superiority when compared to the segmentation results produced by the SCMM. The comparison made over the basis of human visual system undoubtedly con rmed that the WSCFMM has displayed an appreciative level of stability and has produced convincing results even under abnormal conditions, i.e., in the presence of various levels of noise and INU. The strength of the proposed WSCFMM was judged by nding the similarity of the segmented ROI with the GT. For that matter, Dice similarity and Jaccard similarity measuring methods were used. Both of these methods supported the claim that WSCFMM is a stable, reliable and ef cient method. According to the Jaccard similarity measuring method, the proposed model has shown an overall improvement of 4.76%, and according to the Dice similarity measuring method, it has shown an overall improvement of 2.65%. Moreover, the proposed WSCFMM was found to be exceptionally ef cient as it reliably brought down the processing time from 1272.04 s to 207.17 s. This leads to an overall ef ciency of 83.71%. The proposed WSCFMM would be bene cial for the eld of medical imaging, mainly due to its ability to maintain consistency and reliability in the segmentation procedures. It is expected to be highly desirable by the experts especially working in the areas of brain imaging as any loss of the relevant details or inaccurate results can be extremely fatal for the patients under treatment.
Funding Statement: The author(s) received no speci c funding for this study.

Con icts of Interest:
The authors declare that they have no con icts of interest to report regarding the present study.