|Intelligent Automation & Soft Computing
Breast Cancer Classification Using Deep Convolution Neural Network with Transfer Learning
Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11047, KSA
*Corresponding Author: Hanan A. Hosni Mahmoud. Email: email@example.com
Received: 13 March 2021; Accepted: 18 April 2021
Abstract: In this paper, we aim to apply deep learning convolution neural network (Deep-CNN) technology to classify breast masses in mammograms. We develop a Deep-CNN combined with multi-feature extraction and transfer learning to detect breast cancer. The Deep-CNN is utilized to extract features from mammograms. A support vector machine (SVM) is then trained on the Deep-CNN features to classify normal, benign, and cancer cases. The scoring features from the Deep-CNN are coupled with texture features and used as inputs to the final classifier. Two texture features are included: texture features of spatial dependency and gradient-based histograms. Both are employed to locate breast masses in mammograms. Next we apply transfer learning to the classifier of the SVM. Four techniques are devised for the experimental evaluation of the proposed system. The fourth technique combines the Deep-CNN with texture features and local features extracted by the scale-invariant feature transform (SIFT) algorithm. Experiments are designed to measure the performance of the various techniques. The results demonstrate that the proposed CNN coupled with the texture features and the SIFT outperforms the other models and performs best with transfer learning embedded. The accuracy of this model is 97.8%, with a true positive rate of 98.45% and a true negative rate of 96%.
Keywords: Breast cancer; Classification; Neural network; Texture features; Transfer learning
Breast cancer is one of the most common cancers in the female population. Over two million women are affected by breast cancer each year. Breast cancer is a severe illness that affects the breast tissues and can spread to nearby organs . The American Cancer Society published data of 277,580 invasive cases and 48,640 non-invasive cases of breast cancer in 2020 .
The death rate is very high for patients with breast cancer, so women over age of forty are advised to undergo mammogram screening regularly . Mammography is utilized as an imaging tool for breast tumor examination because it allows for a precise diagnosis even in the early stages of cancer. Early discovery is important to advance the breast cancer diagnosis rate and improve the chances of recovery . The wide utilization of mammography-based diagnosis can discover asymptomatic breast cancer, which can lead to a reduced mortality rate from this disease . Radiologists look for essential properties such as microscopic calcifications and tissue distortions, which designate the presence of breast cancer .
Masses found in breast mammograms are important indications of malignant breast tumors . Tumor recognition is a difficult job because of the subtle differences between tumor tissue and adjacent healthy tissues . Mass discovery in the breast, that is, locating boundaries between mass and healthy regions, is also difficult  due to low-quality images and high ratios of noise . In addition, the resemblance of masses with the adjacent healthy regions is high . However, identifying the tumor shape and type is vital to proper diagnosis . These problems lead to less-than-optimal sensitivity and specificity of mammogram screening . Additionally, the inspection procedure of mammograms is tiresome and can be biased .
Therefore, several computer-aided diagnosis and image analysis tools are used by medical experts to increase diagnosis correctness. These tools employ feature extraction and heuristic techniques . Then classifiers utilize the extracted features to differentiate masses from normal tissues . Feature extraction relies heavily on geometrical and morphological properties. Computerized tools can enhance identification of suspicious masses and other abnormalities, such as calcifications, through learning methods . Several previous studies described various approaches for computerized detection of abnormalities in mammograms . For example, the authors in Giger et al. , regions of interest were segmented as suspicious areas and classified utilizing texture features through linear discriminant analysis. The main problems facing such approaches are the noises included in the images and the size, texture, and appearance variations of breast tumors .
The performance of breast cancer automated diagnosis can be enhanced using deep learning methods . Machine learning techniques can be used for identifying features as representatives . Texture also can be utilized through identifying local statistical aspects of image intensity maps . Texture analysis techniques can be used in analyzing lesions instigated by masses in mammograms and can differentiate masses and normal tissues .
In Oliver et al. , the authors extracted texture features from breast suspected regions via fuzzy C-means algorithm. In Chen et al.  the authors modeled tissue patterns in mammograms using tissue appearances.
Deep learning techniques require medical image analysis CAD tools [26–29]. Classification processes can utilize statistical methods, artificial intelligence techniques, or support vector machines (SVM) to predict breast cancer . Deep learning approaches can be used to generate semantic information through adaptive learning [31,32].
CNN can classify masses from breast mammograms by extracting texture features. The textural topographies are fed as inputs to a CNN classifier . However, texture features are not adequate to classify cancer masses from mammograms. Therefore, morphological features of the tumor shape are also utilized in categorizing cancer masses.
In this paper, we present a Deep-CNN method for feature extraction of three kinds of breast cancer masses. Multiple properties of the gray level co-occurrence matrix (GLCM) and the histogram of oriented gradient (HOG) [34,35] are examined to emphasize the texture features of the region of interest (ROI). Texture scoring features analogous to each breast mammogram are pooled into multi-features. The pooled features are presented to diverse numbers of classifiers and grouped into the anticipated classes. We utilize the SVM classification algorithm to define the ROI as normal, benign tumor, or cancer [36–38]. The proposed system is composed of three main phases: feature extraction, texture extraction, and tumor classification.
In this section, we will describe the dataset used and the methodology.
We utilize the dataset in the digital database of mammography in Nascimento et al. . The dataset consists of mammography from 2200 cases. Each mammogram has clinical data such as shapes, sizes, and densities of the breast masses as well as diagnoses annotated by radiologists [40–43].
Mammograms in the dataset include the pectoral muscle and the background, both of which include many artifacts. The classification is performed only on ROIs that might contain abnormalities. The ROI has to cover the whole abnormality and minimal normal tissue surrounding the abnormality. Mammograms are cropped, and the parts that affect classification are removed. A simulation study of extracting ROIs from mammograms in the dataset is conducted. The extracted ROIs include benign and cancer masses following the process in [44–46]. The data are partitioned into a training set and a testing set with the proportions of 80% and 20%, respectively, as depicted in Tab. 1.
The experimental dataset is composed of 7500 images. The training set contains 6000 images, among which 1890 are cancer, 1780 are benign, and 2330 are normal. Random selections of 500 cancer regions, 400 benign regions, and 600 normal regions are designated for training. The selected regions are selected from different mammograms of different cases to avoid data bias.
The Inception module is utilized for the first time in GoogLeNet . This module approximates an optimal sparse configuration in a CNN by utilizing dense components . It searches for the optimal local configuration and duplicates it, building a multi-layer network from the convolutional network blocks.
In Arora et al. , the authors reported that every unit from the previous layer represents an input image region, and they concluded that these units are assembled in filter stores. Correlated units in lower layers focus on local regions and can be used as inputs to convolutions layers of 1*1 in the succeeding layer . If Smaller clusters are enclosed within larger regions, the convolutions will ignore those larger regions. Spatially spread smaller clusters are enclosed by convolutions over patches with larger regions, and the convolutions will ignore larger region patches. To avoid such problems, the Inception architecture is limited to filter sizes of 1*1, 3*3, and 5*5. For better results, the authors added pooling layers with alternative path [51,52].
Adding parallel pooling layers can result in a high cost, especially when there are convolutional layers with many filters. When pool layers are supplemented, the cost becomes more problematic. Even with the architecture covering the optimal structure, it is highly inefficient and can result in a huge computational cost in few stages. Dimension reduction can be applied to reduce the computational cost. In one study, the author suggested using (1*1) convolutions instead of (3*3) or (5*5) convolutions to reduce the computational cost .
Some authors suggested factorizing filters into smaller ones. Utilizing asymmetric convolutions (n∗1) can offer better performance than symmetric convolutions. In Fedorov et al. , the authors described an (n∗1) convolution trailed by a (1∗n) convolution and found out that it can be equal to a two-layer convolution equivalent to an (n∗n) convolution as shown in Fig. 1. The cost of the two-layer convolution is 33% less compared to the equivalent output filters. It is proved that a (1∗n) convolution trailed by an (n∗1) convolution can replace the (n∗n) solution with significant cost savings.
2.3 Texture Feature
2.3.1 Texture Features of Spatial Dependency (TFSD)
Texture features are often utilized to identify objects in an image. This is considerably helpful for breast cancer diagnosis because breast masses have different shapes in mammography. Mass edges can define the type of breast cancer as well as its degree. The length of the edge is long, and the different lumps develop irregular texture features that resemble a definite type of mass. We develop a technique to classify masses in mammograms by identifying texture features utilizing spatial dependency (TFSP). TFSP is calculated from pixel pairs with specific frequencies and spatial associations.
In Fig. 2, a spatial cell has four nearest spatial cells in the vertical and horizontal directions. Texture is indicated by a gray intensity matrix (GIM) F, where Fi,j represents a relative frequency. Fi,j represents two spatial cells parted by a displacement d on the image: one is of gray intensity i, and the other is of gray intensity j. The GIM is a function of the angle between the resolution of the neighboring cells and the distance between them (Fi,j (d,0); Fi,j (d,90)).
We compute the features of each element from elements of the GIM. The GIM’s first two features are the mean and standard deviation of various angles. The other features are correlation and homogeneity. These features are functions of angle and distance. If image I has features X and Y at angles of 0° and 90° (as depicted in Fig. 2), the features will be represented by the functions of X and Y, along with their mean and variance. These features will be utilized as the classifier’s inputs.
The correlation, C, is the gray intensity in the image and is calculated with Eq. (1).
where μi and μj represent the means of Fi,j, while σi and σj represent the standard deviations of Fi,j.
Homogeneity, H, is the distance difference of the texture of the image. H is a metric of the local change of the texture and is defined in Eq. (2).
2.3.2 Gradient-Based Histogram
The gradient-based histogram has the ability to describe features such as object local structure, extract accurate gradient and edge information, and calculate the horizontal as well as the vertical gradients of an image. The algorithm partitions the image into units of equal size. For each cell, the histogram is computed as a representation of relative pixel intensity. The image is partitioned into 8 × 8 non-overlapping pixel units. For each unit, we compute the histogram in the direction of the pixels’ gradient. Depending on the gradient at that pixel, four (2 × 2) block units will be utilized to normalize the histogram of a given unit to be adapted to the energy in each of these block units.
3 Experiment Design
Experiments are designed to evaluate the abilities of our methodology to detect breast cancer tumors and classify them in the selected dataset . A Deep-CNN network combined with transfer learning and a Deep-CNN combined with SVM are tested for their ability to enhance the classifier performance. The main function of the classifier is to classify an ROI as normal, benign, or cancer.
Technique 1 (T1): The Deep-CNN is designed to be coupled with transfer learning. The CNN is trained with the mammogram datasets (DS) with previous transfer learning from ImageNet. The performance of the classifier is tested by inputting images into the Deep-CNN via runs of convolution, pooling, and classification layers. This technique is designed to evaluate the classification ability of Deep-CNN with transfer learning. The block diagram of T1 is shown in Fig. 3.
Technique 2 (T2): The Deep-CNN is designed to be combined with transfer learning (TF) and SVM. The CNN is trained with the mammogram datasets. The classifier score is the output feature vector through inputting images into the SVM for training. This technique is designed to compare the classification abilities with and without transfer learning. The block diagram of T2 is shown in Fig. 4.
Technique 3 (T3): This technique utilizes texture to identify ROI in the mammogram image. The texture features are computed using texture features of spatial dependency (TFSD) and gradient-based histogram (GBH). This technique evaluates the classifier ability without learning transfer. The texture features are computed from the ROI from the dataset DS and are used for the training of the Deep-CNN. The block diagram of T3 is shown in Fig. 5.
Technique 4 (T4): This technique integrates the technique in T3 with the SIFT local features. The integrated features are used as input vectors into the SVM machine. The block diagram of T4 is shown in Fig. 6.
Fig. 7 depicts a mammogram image of a breast with normal tissues from the dataset DS, while Fig. 8 depicts a mammogram image from the DS dataset with cancer ROI that has been detected using T3. In Fig. 9, a mammogram is shown after applying texture features coupled with SIFT local features using T4.
4 Experimental Results
4.1 Results of the Deep-CNN
The Deep-CNN network classifies breast mammogram ROIs as cancer, benign, or normal. Utilizing the Softmax layer, the evaluation of the classifier performance is defined as the capability of the classifier to correctly detect the breast region type . We utilize the accuracy, sensitivity, and specificity metrics to measure the four classification techniques described in Section 3.
The confusion matrix for T4 is illustrated in Tab. 2. The results of medical diagnosis with the classifier described in T4 are depicted. Our proposed technique T4 is shown to have the capability to classify three types of breast ROI mammogram images as normal, benign, or cancer.
The four techniques are then compared according to the chosen accuracy metrics in Tab. 3.
The Deep-CNN classifier in the defined four techniques, T1 through T4, is assessed using the accuracy, sensitivity, and specificity metrics, as depicted in Eqs. (3)–(5), respectively.
TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative cases, respectively. Sensitivity indicates the true positive rate, and Specificity indicates the true negative rate.
Tab. 3 depicts the comparison of results of the four experimental techniques adopted in this research. The k-Fold Cross-Validation testing method is used, where the training set is 80% and the testing set is 20% of the whole dataset.
4.2 Results of Transfer Learning
In this section, we will discuss the impact of transfer learning on the performance of the different classifiers enhanced by T1 through T4.
The Deep-CNN with transfer learning provides much better classification accuracy and sensitivity than the same Deep-CNN without transfer learning. Figs. 10 and 11 show the comparison of the four techniques (T1 through T4) with and without transfer learning with an epoch of 100,000 and 200,000.
In our research, we proposed a classification method for breast ROI tumors into normal, benign and cancerous classes. We utilized mammograms where ROI are classified using several features through suggesting four techniques coupled with Deep-CNN. Our study indicated that using several features in a model would outperform models with single feature. We coupled our Deep-CNN with transfer learning in the proposed four techniques. Transfer learning helped the model to achieve higher accuracy than without transfer learning. Features were extracted and are used as an input to the Deep-CNN which. Our proposed method integrated the Deep-CNN with texture features and SIFT local features. SVM is used in the training phase on the Deep-CNN features to classify breast masses. The features from the Deep-CNN are combined with texture features and used in the final classifier. We applied transfer learning to the classifier of the SVM. We devised four techniques for the experimentation of the proposed system. The fourth technique combined all the featured suggested in this research. It combined the Deep-CNN with texture features and local features extracted by the scale-invariant feature transform (SIFT) algorithm. Experiments were designed to measure the performance of the various techniques. The results demonstrated that the proposed CNN coupled with the texture features and the SIFT outperforms the other models and performs the best results with the transfer learning embedded. The accuracy of this model is 97.8%, with true positive rate of 98.45% and true negative rate of 96%.
Funding Statement: This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track Research Funding Program.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.