One of the most complex tasks for computer-aided diagnosis (Intelligent decision support system) is the segmentation of lesions. Thus, this study proposes a new fully automated method for the segmentation of ovarian and breast ultrasound images. The main contributions of this research is the development of a novel Viola–James model capable of segmenting the ultrasound images of breast and ovarian cancer cases. In addition, proposed an approach that can efficiently generate region-of-interest (ROI) and new features that can be used in characterizing lesion boundaries. This study uses two databases in training and testing the proposed segmentation approach. The breast cancer database contains 250 images, while that of the ovarian tumor has 100 images obtained from several hospitals in Iraq. Results of the experiments showed that the proposed approach demonstrates better performance compared with those of other segmentation methods used for segmenting breast and ovarian ultrasound images. The segmentation result of the proposed system compared with the other existing techniques in the breast cancer data set was 78.8%. By contrast, the segmentation result of the proposed system in the ovarian tumor data set was 79.2%. In the classification results, we achieved 95.43% accuracy, 92.20% sensitivity, and 97.5% specificity when we used the breast cancer data set. For the ovarian tumor data set, we achieved 94.84% accuracy, 96.96% sensitivity, and 90.32% specificity.
The analysis of images involves the extraction of their relevant details. Machines are typically used in this extraction process, with minimal human effort required [
To date, an ultrasound machine has been used in identifying the risks of numerous tumors. This study uses such a machine to detect gynaecological abnormalities, particularly, ovarian tumor and breast cancer. The current research selects these two cases because they share similar problems and may have similar shapes. Accordingly, this study investigates the two cases to determine an effective solution. The hope is that this solution will assist experts in their diagnoses. To avoid human errors in the quantification and diagnosis stages, computer-based image processing and analysis tools should be developed to help minimize the rates of false positives and false negatives. Apart from the development of such tools, they should also be tested and combined into the medical identification. Many studies investigation and improvement of multidisciplinary technology must be pursued to address the challenges associated with detecting and classifying gynaecological abnormalities. Such a multidisciplinary technology must be a combination of machine learning, medical image classification methods, and pattern identification methods developed through the collaboration of domain experts. Note that human expertise cannot be replaced by computer-based tools because specialists have a wealth of knowledge obtained through life–long training. The most reliable method through which high accuracy can be achieved in terms of abnormality detection is the hybrid approach, which is also capable of improving patient care and management [
Computer-based tools equipped with decision-making capabilities, such that they are more sophisticated than conventional expert systems, should also be developed. In addition, machine learning techniques must be integrated into computer-based tools to enable the extraction of features that are seldom used by experts. Despite the need to develop sophisticated computerized tools, many challenges are associated with the design of these tools. However, they can be successfully designed through the selection of the most appropriate representation of data and methods of data analysis. The current study focuses on developing computerized tools that possess the characteristics of the second type of computerized tools. Given that domain knowledge of experts continues to increase and evolve, such tools will continue to be improved and designed rely on the facts and needs of felid specialists [
The current research aims to build an intelligent decision support system model that can segment and identify the risks of malignant breast cancer and ovarian tumor in the early stage. To achieve this objective, we should first identify the limitation of the existing methods and attempt to reduce the effects of the limitations. The limitations of ultrasound images can be classified into three parts. First, speckle noise is one of the major limitations that can affect segmentation and feature extraction. Speckle noise will reduce segmentation accuracy by increasing false cases and reducing the clarity of the ROI edge. For texture, speckle noise has resulted in an unclear ROI texture information, which cannot be used to identify malignant risks. Thereafter, building or using a good filter can facilitate the reduction of speckle noise, thereby making the segmentation and feature extraction task considerably easy. Second, the artifact produced from the machine can make the segmentation a difficult task. Lastly, powerful features may be difficult to find in identifying the risk of the malignant. Therefore, we proposed a model that can:
Enhance images, thereby making it suitable for segmentation and enhancing the texture features for the diagnosis stage. Thereafter, we proposed a new cascade model to segment and extract ROI from ultrasound images. In the filtering stage, the proposed model combines the Wiener filter with wavelet filter to highlight ROI from the rest of the images and make the ROI edge clear for the segmentation task. In addition, the Wiener filter used alone will enhance the texture for the classification stage. For the segmentation stage, we modified the Viola–Jones model and made it suitable for the segmentation instead of using it for object detection. In the traditional method, the Viola–Jones model is used for object detection by scanning images with different window sizes. By using this method, we determined problems related to the false positive cases generated by the Viola–Jones model, although we cannot obtain the entire ROI from the images. Therefore, we end up with under- and over-segmentation problems. Our proposed model modified the Viola–Jones model and made it suitable for the segmentation task by scanning images pixel by pixel utilizing the local details of pixels.
The remainder of this paper is organized as follows. Section 2 presents the existing studies on the segmentation approach of automated ultrasound images. Section 3 explains the proposed Viola–Jones model to segment ultrasound images of ovarian and breast cancer cases. Section 4 presents the experiment results in terms of the segmentation and classification results. Lastly, Section 5 details the conclusion of our study.
One of the main challenges associated with the detection of abnormalities in human organs is the extraction of relevant information from ultrasound images, and the possibility of human error in terms of inaccuracies in manual methods is high. The implication of such inaccuracies could be differences in intra- and inter-observer. Moreover, high-level experiences and expertise are required to interpret such an information. Thus, automatic methods of processing and analyzing images automatically should be developed. These automatic methods will enable gynaecologists and sonographers to accurately diagnose diseases [
A common challenge in the area of image processing is segmentation (e.g., image analysis, recognition of pattern, and scene analysis) [
At present, a variety of image segmentation applications enable the compression of images, retrieval of contents, editing of images, and machine vision (e.g., a method of arranging versatile robots). Other applications include computer-aided fingerprint recognition and facial identification, satellite imaging usages, and remote sensing applications. However, four main categories of image processing techniques are often used, and they are described as follows [
Edge-based methods. These methods are among the main essential approaches of medical image handling used in computer vision, the main aim of which is detection contours, which are representative of the boundaries of image objects. Such algorithms are advantageous because they offer low-cost computation. However, major challenges are associated with the edge grouping process, and some of them include setting the right thresholds and the production of connected, one-pixel-wide contours. The task of edge detecting often involves three steps. In the first step, noise is reduced through the use of smoothing techniques. The second step involves the application of local operators to detect edge points. In the last step, spurious pixels are eliminated from the edges [
Clustering-based methods: For this method, sorting or image pixels is performed in an aggregate order as a histogram based on intensity ratios. Examples of this approach include K-means and fuzzy-c-means (FCM) methods. This approach is advantageous because it allows the use of iterative processes on problems associated with threshold setting. Moreover, contours that are segmented are consistently continuous. However, the occurrence of over-segmentation will be recorded in an event that pixels belonging to the same cluster are not adjacent [
Region-based approaches: These approaches aim to detect regions in accordance with a certain predefined homogeneity threshold. This approach is widely available because segmentation contour is often uninterrupted and one-pixel-wide in this method. In addition, this approach offers shorter computational time. Nevertheless, variation may occur in the results of segmentation owing to the variation in similarity threshold settings, thereby possibly resulting in over-segmentation [
Split-merge methods: These approaches on response case can be partitioned into homogeneous original areas through the use of FCM or K-means as a split step. Thereafter, neighbouring objects that are similar to one another are combined into a specific decision rule as a merging step [
One of the most critical steps in the processing and analysis of images is the segmentation of US cases, which is also a difficult task aimed at dividing images into meaningful parts. This process is used in distinguishing objects of interest from the background. Numerous conventional techniques and approaches have been used in ultrasound image segmentation. However, binary segmentation is the easiest method that can be implemented easily and is accompanied by rapid operation procedures. In this section, the studies reviewed have been subjected to thorough clinical validation.
A novel automatic method of follicle segmentation was proposed. In the current research, the images of human ovaries were smoothened through the use of an adaptive neighbourhood median filter, whereas the geometric active contour methods are utilized for the initial segmentation of the dark regions. The process involve as it were portion of these dark segmenting areas is a true, curiously objects. A SVM classifier was used to determine whether every dark area is completely carpel or not [
An active contour approach was proposed by [
In [
However, the main limitations of US images can be classified into three issues. Firstly, speckle noise is one of the major limitations that can affect segmentation and feature extraction. Secondly, the artifact produced from the machine can make the segmentation a difficult task. Lastly, powerful features may be difficult to find in identifying the risk of the malignant. The proposed research and model aims to build an intelligent decision support system model that can segment and identify the risks of malignant breast cancer and ovarian tumor in the early stage.
The result of an ultrasound is reflected through images. The ultrasound could be that of various types of tissues. Thus, these images are characterized by darkness, low contrast, blurry RIO edge, and objects with nearly the same characteristics. Moreover, the type of machine used and its resolution determines the quality of images. Thus, these factors result in the segmentation process and risk of the malignant becoming complicated. We begin by describing the basic idea of the proposed method to understand the impact of the different techniques used. Three basic components possess the framework presented in this study.
The proposed system includes the following main steps:
The pre-processing phase enhances the image, highlights ROI, and makes US image clear for the segmentation task. In this stage, we used Wiener filter followed by wavelet transform to highlight the region will be working on it. We used Wiener filter only for texture feature enhancement. The second stage includes segment and ROI is extracted from the remainder of the US image. For this stage, we built a powerful trainable cascade model. Lastly, to assess the proposed segmentation, we extracted the LBP features from the segmented ROI and fed them to the SVM classifier to identify the risk of the malignant in early stage.
Degradations in image and signal can occur as a result of the presence of artifacts and noise in several clinical modalities. The degradation that occurs in image modalities vary based on the type of modality. Typically, the most common type of degradation associated with radiograph is low contrast, while speckle noise is often observed in images the formation of which was achieved using coherent energy, such as ultrasound. The quality of images can be strongly influenced by the degradation of image, which has an effect on the way the image will be interpreted by humans. In addition, the accuracy of the system can be influenced by image degradation. The simplicity and reliability of quantitative measurements is often tampered with by low-quality images. That is, poor-quality images is makes the quantitative measurements unreliable and complex. Moreover, poor-quality images affect the reliability of the system in terms of analysis of image, segmentation, and feature extraction. Accordingly, the images should be despeckled to improve the quality. Given this situation, numerous studies should be conducted in the area of medical imaging.
Images with noise that should be eliminated must be despeckled. Consequently, the quality of ultrasound images can be enhanced, while the boundaries of the salient tissues are maintained in the images. This stage aims to accurately find the boundaries of structures, thereby providing an improved visual representation of the location of structures and quantifying the morphology. The majority of previous studies have referred to speckle noise as a key challenge connected with the analysis and segmentation of US images. These studies have used techniques based on pre-processing to eliminate speckle noise. Therefore, the current study performs speckle noise reduction to eliminate noise while maintaining the object of interest.
The proposed model for the speckle noise reduction and ROI highlight include the application of Wiener filter, the output of which for the wavelet transform is used to extract the low-frequency sub-band and for segmentation. The Wiener filter was used to achieve two objectives. First, ROI is smoothen for the wavelet transform and used for segmentation (
Our investigations in this area indicated that studies have exerted effort geared toward the elimination of speckle noise in ultrasound images. Despite the relevance of speckle noise elimination, it does not guarantee that image segments will be accurately produced. The reason is that the accuracy with which images are segmented does not solely rely on the removal of speckle noise from the image. That is, apart from speckle noise, other factors that influence the accurate segmentation of images. However, one of the aims of this study is to minimize speckle noise, while highlighting ROI for image segmentation. Thus, this research proposes an approach that can smoothen images. This approach also aims to produce a clear ROI to easily implement segmentation.
The extraction of object from images can be achieved using any of the different methods available for this purpose. Some of these methods include edge-, threshold-, and region-based segmentations and clustering techniques. Some of the methods may be inappropriate for some images because of the variance between the intensity values of images in different data sets. For this reason, such complex cases were handled in this study through the use of the Viola–Jones model [
The method begins by choosing a specific number of samples as US images for the training process. To train the model for the segmentation task (binarization), we took small windows (
A group of Haar features was extracted from the windows that cropped from the training images.
To provide a trained model, the five types of Haar features were used as input to the weak classifier with window labels (i.e., labels present as ROI and non-ROI). The output of this stage is a trained cascade model (i.e., modified Viola–Jones model).
During the training stage, the best Haar features will be selected to identify ROI from non-ROI.
Each type of Haar features has been used as input for one of the weak classifiers, and the combination of the weak classifiers will result in a powerful and strong classifier.
The testing stage includes scanning each image pixel by pixel in the testing set. For every pixel found in the segmented US image is a minimal square area of the same window measure with the pixel as the middle is built (
AdaBoost is a boosting algorithm that is widely accepted and used. The use of this algorithm has proven that a strong classifier can be produced when weak classifiers are combined. Furthermore, AdaBoost has demonstrated a strong ability to efficiently combine simple statistical learners and minimize errors during training, including errors related to vague conclusions [
The cascaded classifier comprises stages and each stage possesses a strong classifier. Moreover, each stage plays a critical role in ascertaining if a given sub-window is completely an ROI or non-ROI. In the event that a non-ROI is not detected at a particular stage, the window is discarded immediately. Meanwhile, in the event that a sub-window is detected as ROI, it is forwarded to the subsequent stage of the cascaded classifier. This process is in accordance with the proposition that the more stages passed by a specific sub-window, the possibility of having an ROI sub-window is high. One of the most common problems observed in the classifiers that have just one stage is the acceptance of additional false negative, while aiming to minimize the rate of false positives. However, in the early stages of the staged classifiers, false positives are not the current concern because the problem is expected to be solved in the next stages. Thus, the conditions in which several false positives can be accepted in the early stages of such classifiers have been prescribed by the Viola–Jones model. Thus, the cases of false negative are expected to be reduced in the final stages of the staged classifier.
The experiments performed in this study involved the use of ovarian and breast cancer ultrasound images. Given that the extraction of the entire ROIs was not possible in some of the images, active contour was used as a post-segmentation step to enable the complete extraction of ROI with the correct boundary. In the proposed model, the use of active contours was employed. The first type involved the use of a single seed point possessing numerous iterations. For this reason, the detection of an object’s border in this active contour requires additional time. The second type of active contour used in the current study was used based on the largest mask through the detection of two points in the breast and ovarian cases. With this principle, the number of iterations required is decreased, while enabling the growth of the active contour from the external part of the sac based on the mask.
Local binary patterns (LBPs) are among the features used for texture classification [
The essential LBP presented by [
where
A basic LBP is inefficient in computing the pattern code of an individual pixel situated in some neighborhood. An enhanced LBP is considered to enhance computation efficiency.
In computing an individual pixel’s LBP, as many as eight comparisons are executed in a standard LBP technique. In a quasi-LBP proposed, the comparison is reduced by 50%, with the range of code reduced from 0–255 to 0–15. This result would significantly improve the performance of LBP in computing histogram. Even though the technique’s enhancement is not rotation invariant, it may continue to be employed in background modeling with monocular camera, which is attributed to the existence of the minute rotation in raw video obtained from one camera. Thresholding is enforced by a center pixel value on pixels in the block, magnified by the power of two, and with the aim to obtain the center pixel’s label. Eight pixels that exist in a neighborhood would yield 28 = 256 distinct labels, which is contingent on the neighborhood and center pixels’ relative gray values. Texture analysis has widely adopted LBP descriptors attributed to the efficiency of computation and flexibility to illumination fluctuations. Nevertheless, LBP descriptors may be incapable of capturing discriminative information completely because only the sign information of vectors’ difference in a local region is utilized.
This study uses support vector machine (SVM) classifier because it is one of the methods utilized in breast cancer diagnosis. This classifier performs a classification of malignant or benign tumors in a shrunken set of features [
The main advantage of SVM classifiers is to discover the improved decision border, which exemplifies the greatest decisiveness (maximum margin) amidst the classes. The SVM standard begins from resolving the problems of linear separable and expands to treat the non-linear cases.
The SVM method used to classify a new sample created by computing the ratio (
One of the critical activities of research is the data collection process. At this stage, the research must ensure the accuracy of the collected data to guarantee the reliability and integrity of the study. Ultrasound images comprise the data used in this study. The choice of an appropriate data set is crucial to the testing of any kind of automatic system of image classification, computer-based methods of diagnosis, and models of medical image segmentation. In general, sample pictures of breast cancer classes have been provided to enable readers to judge whether the automated pattern recognition was achieved or not. The source of the data sets should also be determined so that their importance and reliability to the study can be ascertained. The experiments were carried out using 250 ultrasound images, 150 of which are malignant and the remaining 100 are benign. The data set used by previous studies can be accessed from [
The execution of the proposed segmentation was also compared with the current methods utilized to segment diverse types of US cases see
Data set type | Using Otsu’s threshold | Using Otsu’s threshold and active contour | Proposed method | ||
---|---|---|---|---|---|
Number of images | 250 | 250 | 250 | ||
Captured ROI | 157/250 | 177/250 | 204/250 | ||
Accuracy | 62.8% | 69.14% | 81.6% | ||
Successful segmentation(Avoiding under-segmentation problem) | 154/250 | 173/250 | 197/250 | ||
Number of images | 125 | 125 | 125 | ||
Captured ROI | 79/125 | 84/125 | 103/125 | ||
Accuracy | 63.2% | 67.2 | 82.4 | ||
Successful segmentation | 73/125 | 77/125 | 99/125 | ||
This stage uses the images segmented correctly. The total number of ultrasound images contained in the data set is 250, 100 of which are benign cases, while the remaining 250 are ultrasound images. The images segmented correctly are 197, including 123 sample malignant and 74 benign images. For the ovarian images, we obtained 99 images out of the 125 segmented correctly, 67 of which are benign and 30 are malignant. A random portioning of the data set into 5 rounds is carried out, with the same number allocated to benign and malignant cases. There are 50 images contained in each rounds and are equally divided into two parts (i.e., 25 benign and 25 malignant). In each round (M), the testing portion is examined, whereas the remaining portion of the data (subset-M) are used as the training set. Thus, each round is divided into K folds
The processes of testing and training are carried out repeatedly for K times with different testing sets, and the performance of the classifier is regarded as the average performance of the K tests. Eventually, all data were used for the purposes of training and testing. However, the computational cost of testing the model is high because testing is performed K times. The same experiment strategy was used for the ovarian ultrasound images to identify the malignant risk in the early stage. The efficiency of the model is evaluated using the SVM classifier, while the accuracy is evaluated based on the LBP feature.
This section presents the results from the experiments involving the two sets of data. The proposed model was evaluated based on the results of the two experiments. Thus, in the first stage, effort was exerted to address the problem of class imbalance between the malignant and benign, which was achieved using a sample strategy described in this section. Each round involved a random selection of 25 images of benign class, which were merged thereafter with the 25 images of the malignant cases. The training sample was formed by combining the two classes. Testing was performed repeatedly for 5 rounds. Each of the rounds involved the use of tow images as testing set, and 48 images were used as the training set. Thus, testing and training were conducted using each image. The evaluation of the system was done using an average of 20 rounds of tests.
Data set | Benign | Malignant | TN | TP | FN | FP | Accuracy | Sensitivity | Specificity | |
---|---|---|---|---|---|---|---|---|---|---|
Breast images | Round 1 | 25 | 25 | 23 | 24 | 2 | 1 | 94 | 92.3 | 95.8 |
Round 2 | 25 | 25 | 24 | 24 | 1 | 1 | 96 | 96 | 96 | |
Round 3 | 25 | 25 | 25 | 25 | 0 | 0 | 100 | 100 | 100 | |
Round 4 | 25 | 25 | 22 | 21 | 3 | 4 | 86 | 87.5 | 84.6 | |
Round 5 | 25 | 25 | 24 | 24 | 1 | 1 | 96 | 96 | 96 | |
Ovarian images | Round 1 | 25 | 25 | 25 | 25 | 0 | 0 | 100 | 100 | 100 |
Round 2 | 25 | 25 | 25 | 25 | 0 | 0 | 100 | 100 | 100 | |
Round 3 | 25 | 25 | 23 | 24 | 2 | 1 | 94 | 92.3 | 95.8 | |
Round 4 | 25 | 25 | 24 | 21 | 1 | 4 | 90 | 95.4 | 85.7 | |
Round 5 | 25 | 25 | 24 | 25 | 1 | 0 | 98 | 96.1 | 100 | |
|
In the second set of experiment, we used 50% of the images as testing and 50% as training, as well as simultaneously used the training as testing and testing samples as training. For this case, we achieved 95.43% accuracy, 92.20% sensitivity, and 97.5% specificity when we used the breast cancer images. For the ovarian tumor, we achieved 94.84% accuracy, 96.96% sensitivity, and 90.32% specificity, as shown in
Data set | Benign | Malignant | TN | TP | FN | FP | Accuracy | Sensitivity | Specificity |
---|---|---|---|---|---|---|---|---|---|
Breast images | 74 | 123 | 117 | 71 | 6 | 3 | 95.43 | 92.20 | 97.5 |
Ovarian images | 67 | 30 | 28 | 64 | 2 | 3 | 94.84 | 96.96 | 90.32 |
Normally, the benchmarking is carried out either through the use of a standard dataset or the utilization of methods utilized to the same problem domain or application. Moreover, the benchmarking is achieved utilizing the best and modern approaches for our case studies cancers segmentation and classification based on Viola–Jones model existed in the literature. Our proposed method achieved better accuracy with other methods.
Data set | Accuracy | Sensitivity | Specificity |
---|---|---|---|
Our proposed method | 95.43 | 92.20 | 97.5 |
Uniform LBP ( |
80 | 72.32 | 86.23 |
Uniform LBP ( |
84.8 | 77.67 | 90.57 |
Uniform LBP (Sign Lower) [15] | 86.8 | 82.52 | 89.79 |
Uniform LBP (Sign Upper) [15] | 86.8 | 80.18 | 92.08 |
The main components affecting the yield of the region growing in US images are fully automated US segmentation and the choice of the same measurements in the initial segmentation. Fully automated segmentation for ovarian tumor remains a difficult task and challenging issue. Thus, numerous ultrasound images in our database has been done for this purpose, with a particularly challenging process that may ruin the features extracted. There are two limitations related to the tumor edge (i.e., missing and poor border), which have been used in in our study to develop a new Viola–Jones model to segment ultrasound images for ovarian and breast cancer cases and compute the number of malignant cases. This model deals with US cases, in which a distinction exists between the background texture and ROI.
This study developed a Viola–Jones model to enable the segmentation of breast and ovarian cancer ultrasound images. The results showed that the new approach was able to achieve significant improvements in terms of tumor segmentation for benign and malignant cases, with varying shapes and sizes of tumor. In different types of image analysis procedures, one of the most important steps involves the accuracy and automation of tumor segmentation. This is particularly true for the detection and diagnosis of ovarian and breast cancers. The hope is that the application of the proposed approach will result in more focus on the aforementioned areas. In particular, the objective is to develop and evaluate the efficiency of a new automated and computerized approach that is capable of analyzing gynaecological ultrasound images to detect and classify abnormal cases or objects that have an impact on the health of women. The computational examinations conducted in this study are motivated by the technological advancements in imaging technologies, advent of state-of-the-art models of data mining and machine learning, and image processing and analysis theories. The segmentation result of the proposed system with other existing techniques in breast cancer data set was 78.8%, while that in the ovarian tumor data set was 79.2%. In the classification results, we achieved 95.43% accuracy, 92.20% sensitivity, and 97.5 specificity we used the breast cancer images. For the ovarian tumor, we achieved 94.84% accuracy, 96.96% sensitivity, and 90.32% specificity. For future studies, the data set used should be increased with additional difficult cases, particularly in ovarian tumor cases using deep learning approaches.
The authors would like to acknowledge Fakulti Teknologi Maklumat dan Komunikasi, Centre for Research and Innovation Management, Universiti Teknikal Malaysia Melaka and Ministry of Education Malaysia for providing all facilities and support for this study.