Mammogram Learning System for Breast Cancer Diagnosis Using Deep Learning SVM

The most common form of cancer for women is breast cancer. Recent advances in medical imaging technologies increase the use of digital mammograms to diagnose breast cancer. Thus, an automated computerized system with high accuracy is needed. In this study, an efficient Deep Learning Architecture (DLA) with a Support Vector Machine (SVM) is designed for breast cancer diagnosis. It combines the ideas from DLA with SVM. The state-of-the-art Visual Geometric Group (VGG) architecture with 16 layers is employed in this study as it uses the small size of 3 × 3 convolution filters that reduces system complexity. The softmax layer in VGG assumes that the training samples belong to exactly only one class, which is not valid in a real situation, such as in medical image diagnosis. To overcome this situation, SVM is employed instead of the softmax layer in VGG. Data augmentation is also employed as DLA usually requires a large number of samples. VGG model with different SVM kernels is built to classify the mammograms. Results show that the VGG-SVM model has good potential for the classification of Mammographic Image Analysis Society (MIAS) database images with an accuracy of 98.67%, sensitivity of 99.32%, and specificity of 98.34%.


Introduction
Breast cancer is the leading cause of death in India, accountable for 9% of non-communicable diseases [1]. The incidence of breast cancer in metropolitan cities increases every year compared to cervical cancer, which is on the decline. Thus, a multidisciplinary approach is required to diagnose breast cancer early, that includes screening programs and awareness programs that reduce mortality. The best possible screening method is mammography, which is inexpensive now and captures breast images quickly due to the digital revolution in recent years.
former model is also known as supervised learning based on the classification technique using training samples associated with the training samples' target output. The later model is also known as unsupervised learning, based on observing the correlation between samples in similar clusters.
Recent advances in DLA greatly help to design accurate classification systems in many research areas such as image classification, text classification, speech recognition, natural language processing, and mammogram image classification [15][16]. The state-of-the-art DLAs such as VGG [17], DenseNet [18], AlexNet [19], and GoogleNet [20] have a great achievement in classifying thousands of natural objects. These pre-trained models can be effectively analyzed for mammogram classification system using transfer learning approach. This approach uses the pre-trained weights for the classification of the mammogram. The number of layers in DenseNet, AlexNet, and GoogleNet are 201, 8, and 22. Among the models, VGG only uses a small-sized (3 × 3) convolution filter and different layers such as 11, 16, and 19 from input to output. Thus, VGG with 16 layers is employed in this study to extract deep features.
This study proposes a hybrid model based on DLA and SVM for diagnosing breast cancer. To extract the highly dominant features for breast cancer classification, the VGG model is utilized, and then SVM is incorporated at the output layer to improve the classification results. This article consists of four sections. Section 2 describes the design of the VGG-SVM model to deliver high accuracy for breast cancer diagnosis. The experimental studies that support the system, the original studies' results, and the system's evaluation by comparative studies with four commonly used SVM kernels are described in Section 3. The last section concludes the proposed work.

Methods and Materials
A mammogram learning system is a pattern recognition system that makes decisions based on previous experiences, i.e., with known class labels in training samples. In general, the classification process assigns the given input to one class or another. The input to the classification process may include readings from the sensors, features of any objects to be classified, such as signals and images. The learning system classifies the testing samples correctly by adapting a general classifier model based on the inputs. In this study, mammogram images are fed to the system to classify them into either normal or abnormal. The proposed breast cancer diagnostic system operates under two modes: Local Classification Mode (LCM) and Global Classification Mode (GCM). Though the system classifies the input mammograms into two classes, the former uses the Region of Interest (ROI) images, and then later uses the whole mammograms as inputs.

Preprocessing
Generally, preprocessing is employed in medical images to remove undesirable information to ensure data integrity and improve classification performance. In this stage, the patient information in the digital X-ray is removed, and the contrast of mammograms is enhanced. Simple morphological operations are employed to remove the information embedded in the X-ray image. The previous studies in [2,4,9,11,12], Contrast Limited Adaptive Histogram Equalization (CLAHE), are chosen as a contrast enhancement technique. The over enhancement by AHE is removed by CLAHE as the enhancement is applied on a local intensity histogram. The window size used for making a local intensity histogram is 8 × 8 with a 0.01 contrast limit. Fig. 1 shows the contrast-enhanced image by CLAHE. These images are utilized by the GCM of the proposed system for breast cancer diagnosis. The ground truth data [21][22], available in the MIAS database, is used to extract the ROI. In the ground truth, seven detailed information such as reference number, type of background tissue, type of abnormality, abnormal severity, abnormality center, and the abnormality size are given. Based on the center of the abnormality, the ROI of size 256 × 256 pixels is extracted. These images are utilized by the LCM of the proposed system for breast cancer diagnosis, whereas GCM uses the whole mammogram of 1024 × 1024 pixels. Based on the abnormality in the ground truth data, the images are split into two groups; normal and abnormal for classification. Fig. 2 shows the extracted ROI from the original mammograms shown in Fig. 1.

VGG-DLA System
The DLA allows a single model for feature extraction as well as classification of the input data. Features are extracted by a sequence of convolution layers of two or more, followed by pooling layers to reduce the feature map's size. A Fully Connected Layer (FCL) interprets the features and the output layer with an activation function for predications. A simple DLA is shown in Fig. 3.
DLA has become more popular in the last decade in the pattern recognition and computer vision research area. Many DLAs are developed, such as the Visual Geometric Group (VGG), Alex Net, GoogleNet. Among these DLAs, VGG is employed in this study due to its simplicity. Fig. 4 shows the VGG-DLA with 16 layers where C-(SxS)-N represents convolution operation C of filter size of S with N filters, and MPL represents Max Pooling Layer. The main advantage of the VGG model with 16 layers is the use of smaller-sized (3 × 3) convolution filters throughout the architecture that reduces the computational complexity.
A stack of convolution filters in 5 blocks is employed for feature extraction, followed by three FCLs. In each block, the number of channels is increased from 64 to 512 by a multiplication factor of 2. Fig. 5 shows the obtained feature map at the first block of filters for the extracted ROI image shown in Fig. 1 (b). The activation function named Rectified Linear Units (ReLU) is used in the hidden layers to overcome the vanishing gradient problem. This function's output is a linear function that returns the positive inputs only, a desirable property for the backpropagation neural network. The softmax layer is defined as where y i is the obtained value from the i th input layer for the corresponding output layer o.
This process gives n output values, and the output layer with high value is considered the predicted class. The main drawback of the softmax layer is that the probabilistic-based approach assumes that the training samples belong to exactly only one class, i.e., features are independent of each other, and all are having equal importance to predict the outcomes. However, the above assumption is not valid in a real-time situation, such as in medical image diagnosis. In this study, the abnormal images containing different abnormalities such as masses of different types and micro-calcifications, and thus VGG-SVM is designed for breast cancer diagnosis.

VGG-SVM System
Let us assume that the obtained features f i from the FCL for i th samples in the training set with m samples. i.e., the computed finite feature space (T) is f i ; c i f g m i¼1 where the class c i 2 À1; 1 f g. The SVM separates the feature space into two classes by constructing a hyperplane in the form of For the linear case, the aim is to determine a weight vector w and scalar constant b that satisfy the constraints And for the non-linear case, the constraints are where n i are a measure of the classification error which is greater than zero for all i. Though many hyperplanes can be constructed, a maximum margin hyperplane should be chosen, which has the maximum distance between the hyperplane and the nearest data point of each class and is given by, subject to c i ðw T t þ bÞ ! 1 À ni; and ni ! 0; i ¼ 1; 2; …:n: where C is the controlling factor that controls the trade-off between model complexity and empirical risk. The formulation in Eq. (2) can be rewritten for the non-linear case as where s i ; i ¼ 1; 2; …::N s are support vectors computed via structural risk minimization that consist of a subset of T. Fig. 6 shows the proposed VGG-SVM system. The SVM classifier replaces the softmax layer in the conventional VGG model.
Four different kernels are used in this study, and their performances are computed. They are summarized in Tab. 1. The RBF-SVM parameters such as C in Eq. 7 and r (standard deviation of RBF kernel) are tuned to obtain better performance. A grid search algorithm obtains the optimal values of the parameters mentioned above' with ten different values of C and r. Thus the system performance is evaluated for each pair of (C, r). Among the r values ½2 3 ; 2 2 ; 2 1 ; …:2 À6 and C values ½2 4 ; 2 3 ; 2 2 ; …:2 À5 , the best performance obtained from the pair (2 3 ,2 1 ) is discussed in the next section. For P-SVM, a polynomial degree of three is used for performance evaluation.

Results and Discussions
The prototype of the VGG-SVM system for mammogram classification is explained in the previous section. It extracts deep features and classifies them using SVM with different kernels at the output layer. This section presents the experimental setup, performance metrics, and findings of the prototype with discussions of the experimental results.

Experimental Setup
The proposed VGG-SVM breast cancer diagnostic system is analyzed using the MIAS database [21][22]. It has 322 mammograms that include 207 normal images and 105 abnormal mammograms of different classes such as masses and micro-calcifications. Sample mammograms are shown in Fig. 1. It is well known that DLA usually requires a large number of samples to provide better classification. Data augmentation [23] is employed to increase the samples in the MIAS database. The samples are increased to seven times using flipping and rotating the samples by an angle of 90 0 , 180 0 , and 270 0 . Data augmentation increases the normal samples from 207 to 1449 and abnormal samples from 105 to 735. The standard parameter settings for VGG architecture are shown in Tab. 2. The same settings are used for VGG-SVM architectures, and to validate the DLAs, k-fold (10-fold) cross-validation is employed.

Performance Metrics
The performance of the VGG-SVM system is analyzed in terms of sensitivity, specificity and accuracy. The definitions are as follows: Sensitivity: It gives the correct classification rate of abnormal mammograms and is given below: where True Positive (TP) and False Negative (FN) are the correct and misclassified abnormal mammograms. x À y k k 2 þ C Specificity: It gives the correct classification rate of normal mammograms and is given below: where True Negative (TN) and False Positive (FP) are the correct and misclassified normal mammograms.
Accuracy: It gives the overall classification rate of the system and is given below: To show the trade-off between sensitivity and specificity, a plot called Receiver Operating Characteristics (ROC) is drawn in which the x-axis represents the true positive ratio (sensitivity), and the y-axis represents the false positive ratio (1-specificity). It can be used to visualize the system performances with ease.

Performance Analysis
The performances of the VGG-SVM architectures and the VGG-DLA are analyzed with the preprocessing stage and without preprocessing stage to demonstrate the effects of preprocessing on the mammograms. Tab. 3 shows the performances of GCM for different architecture without preprocessing stage.  It is observed from Tab. 3 that VGG-RBF-SVM architecture provides an accuracy of 81.27%, the highest among other VGG-SVM architectures. The VGG-16 model provides only 65.48% accuracy. The use of SVM in the output layer instead of using the softmax layer in VGG architecture increases the accuracy of mammogram classification by~15%. Since there is no preprocessing, the performance of all architectures is less than 82% only. CLAHE enhances the micro-calcifications, masses, and other tissues in the mammograms, and their performances are evaluated again using the same set of images. Tab. 4 shows the performances of GCM for different architecture with the preprocessing stage.
It is evident from Tabs. 3 and 4 that the application of preprocessing on mammograms improves the classification accuracies of a minimum of 5% approximately on all architectures. The classification accuracy of VGG architecture is increased from 65.48% to 72.21%, while the performance of VGG-RBF-SVM is increased to 86.77%. After preprocessing the whole mammogram, a maximum sensitivity of 87.76% and specificity of 86.27% are achieved by VGG-RBF-SVM. The GCM provides only 86.77% accuracy, and the complexity is also high as it uses the whole image. To further increase the accuracy of mammogram classification, LCM is developed. Tab. 5 shows the performances of LCM for different architecture without preprocessing stage.
Tab. 5 shows that the LCM achieves significant improvement over GCM without preprocessing. When operated under LCM, the performance of VGG-RBF-SVM architecture has an improvement of 13.5% over GCM. This is because the whole image does not only have the breast tissues but also have pectoral muscle too. Thus, the LCM, which uses only the abnormalities' ROI, can classify more accurately than GCM. Tab. 6 shows the performances of LCM for different architecture with a preprocessing stage.  It is noted from Tab. 6 that best performance is obtained with an accuracy of 98.67%, sensitivity of 99.32%, and specificity of 98.34% when the mammograms are classified using LCM with preprocessing approach. The performances of architectures are in the order of VGG-RBF-SVM > VGG-L-SVM > VGG-P-SVM > VGG-Q-SVM > VGG-16. The performance of the RBF kernel is better than others because it projects the feature space into infinite features space. Also, it is invariant to translation and isotropic. A comparative analysis makes further analysis of the system with existing systems in the literature. Tab. 7 illustrates the comparative analysis.
It is noted from Tab. 7 that the VGG-RBF-SVM provides better performance than existing approaches in the literature. To visualize the performance, ROC curves of two modes, GCM and LCM are drawn and shown in Fig. 7.
It can be seen from the ROCs in Fig. 7 that how the performance of the different architectures improved from GCM to LCM and with a preprocessing step. The ROCs are gradually moved toward the y-axis, which indicates better performance of the system. The ROC curve of VGG-RBF-SVM architecture under LCM with preprocessing of mammograms occupies more area under the curve as it is very close to the y-axis and the top border of the graph.

Conclusions
This paper presents an intelligent mammogram learning system via DLA and SVM for breast cancer diagnosis. Deep features are extracted from mammograms using the standard parameter settings in VGG with 16 layers. A series of preprocessing steps is applied before extracting deep features. The VGG-SVM system utilizes the SVM classifier in the output layer instead of using the softmax layer. The VGG-SVM system operates under two modes; LCM and GCM. Several VGG-SVM with different SVM kernels is trained using the deep features for performance evaluation. Among the four kernels, the RBF kernel in VGG-SVM is the most effective kernel to obtain the highest performance. Results show that the combination of DLA and SVM works efficiently for breast cancer diagnosis.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.