The early symptom of lung tumor is always appeared as nodule on CT scans, among which 30% to 40% are malignant according to statistics studies. Therefore, early detection and classification of lung nodules are crucial to the treatment of lung cancer. With the increasing prevalence of lung cancer, large amount of CT images waiting for diagnosis are huge burdens to doctors who may missed or false detect abnormalities due to fatigue. Methods: In this study, we propose a novel lung nodule detection method based on YOLOv3 deep learning algorithm with only one preprocessing step is needed. In order to overcome the problem of less training data when starting a new study of Computer Aided Diagnosis (CAD), we firstly pick up a small number of diseased regions to simulate a limited datasets training procedure: 5 nodule patterns are selected and deformed into 110 nodules by random geometric transformation before fusing into 10 normal lung CT images using Poisson image editing. According to the experimental results, the Poisson fusion method achieves a detection rate of about 65.24% for testing 100 new images. Secondly, 419 slices from common database RIDER are used to train and test our YOLOv3 network. The time of lung nodule detection by YOLOv3 is shortened by 2–3 times compared with the mainstream algorithm, with the detection accuracy rate of 95.17%. Finally, the configuration of YOLOv3 is optimized by the learning data sets. The results show that YOLOv3 has the advantages of high speed and high accuracy in lung nodule detection, and it can access a large amount of CT image data within a short time to meet the huge demand of clinical practice. In addition, the use of Poisson image editing algorithms to generate data sets can reduce the need for raw training data and improve the training efficiency.
In recent years, lung cancer has the highest incidence and mortality rate in China comparing to other cancer diseases. Even in the world, lung cancer is exponentially increasing as the most common malignant tumor and lethal cancer. A large number of CT images need to be interpreted by radiologists in their routine check, but there are many drawbacks in manual detection. According to the statistics of the Boston Research Group (Hopkinton, USA), 60.5% of the doctors missed the diagnosis with the misdiagnosis rate around 10% [
At present, there are various methods for detecting lung nodules which can be divided into two groups: image processing based traditional mode and machine learning method. Traditional image processing methods mainly include segmentation of lung parenchyma, extraction of regions of interest, feature extraction, and classification or recognition [
The method based on machine learning is to use computers to simulate human’s learning behaviors of acquiring new knowledge and skills. At present, more mainstream machine learning mainly includes the following algorithms: 1) SVM classifier; 2) CNN (Convolution Neural Network); 3) ANN (Artificial Neural Network); 4) Fast R-CNN (Fast Region-based Convolutional Neural Networks). Boroczky et al. [
Recently Fast R-CNN is regarded as a better method to detect lung nodules with a higher accuracy rate among the mentioned methods. However, processing a large number of CT images not only requires higher accuracy, but also faster detection speed. Fast R-CNN uses Two-Stage method, which may reduce the detection rate by selecting candidate from a large number of boxes. In addition, because of the privacy of medical image data, clinical medical images are always not easy to collect. When only a small amount of training data is available, how to improve the accuracy of recognition is an important issue in CNN. In this paper, we apply the YOLOv3 depth learning neural network to our research, which uses the One-Stage method with some additional modules such as boundary box prediction, classification prediction, cross-scale prediction and feature extraction. YOLOv3 has been proven the advantages of high accuracy and fast detection speed, and better ability of detecting small targets using FPN (Feature Pyramid Networks).
In general, collection of a large number of training data sets is required to enable the network for fully learning, which is one of the keys for deep learning. However, less original training data sets are normal problem in medical image processing. This paper proposes a method to generate new lesions and fuse on normal cases by Poisson image editing. This algorithm is performed by using a small number of diseased areas collected from cancer cases, which are randomly deformed, displaced, rotated, and seamlessly integrated into normal lung area on CT images. Such procedure can not only produce a plenty amount of training data sets, but also different locations distributed over the whole lung area to enable the network to adjust the weights to fit the detection on every positions.
The proposed method consists of five steps: 1) Preprocessing of the lung CT images; 2) Calculating the gradient field and the divergence field of the fused image; 3) Marking the lung nodule image and configuring the optimized YOLOv3 deep neural network; 4) Randomly generating a part of the training datasets by using Poisson editing algorithm; 5) Training and evaluating the training datasets.
As a medical image, CT is usually stored in DICOM (Digital Imaging and Communications in Medicine) format. However this 16-bit formatted image cannot be directly trained as a training set for YOLOv3. Therefore, it is necessary to preprocess the image and convert the DICOM format into JPEG format.
The CT value of the 16-bit image data in a DICOM file is converted to the downscale 8-bit JPEG format. The wide lung window (W1600HU: L-550HU) is a better display setting for the lesion in the lung. Therefore, the window level and window width need to be set during preprocessing. The original CT image is preprocessed to clearly reflect the contour of the lung, the internal structure and the texture features of the lung nodules, which shows more details and benefits to observe. As shown in
In an image, the gradient of the pixel describes the difference between the pixel and other pixels. It can be calculated from the first derivative of the pixel. For a two-dimensional (2D) image, it can be regarded as a 2D digital matrix sized W × H, then the gradient can be expressed as
From the above formulas, the image gradient can also be approximately expressed as
The gradient can reflect the changes information of the image, and the region where the gradient changes greatly is the edge region of the image. Calculating the gradient of the region m to be fused can obtain the “change path” of the region, that is, the relative information. When it has the corresponding boundary condition, the absolute information can be changed. The reflection from the image is to change the pixel color of the region
The principle of calculating the gradient field of the target image
The function of divergence is to adjust the pixel value according to the difference between the pixel point and the surrounding pixel value, which can smooth the image. The convolution of Laplacian Operator is often used to obtain the divergence of an image. The Laplacian Operator operation template is shown in
In a more rigorous derivation, the divergence is obtained by partial derivative of
PASCAL VOC provides a set of standardized and excellent datasets for image recognition and classification. In many algorithms, the VOC datasets is used for testing, which is marked fast and the markup information is easy to store and parse. Therefore, the VOC format is used as the template for datasets in this study. YOLOv3 also supports the training of VOC datasets. In this study, LabelImg tool is used for labeling. According to the detection basis and the sample provided by the professional, the rectangular frame of the corresponding area drawn on the image is shown in
YOLOv3 uses a three-scale feature map (when the input scale is 416 × 416): (13 × 13), (26 × 26), (52 × 52). When the down-sampling step size is 32, the length and width of the input image should be an integral multiple of 32. At this time, the length and width of datasets is 512 × 512, which satisfies this requirement, and the high-resolution input can better detect the image with smaller details.
When configuring deep learning network structure, we should consider the requirements of the detection itself as well as the hardware and computer environment. In the configuration file of YOLOv3, parameter settings for the network structure are provided. Adjusting the batch value is beneficial to better find the direction of the gradient descent. In the last convolutional layer of the network, according to the detection category only including lung nodule, the filter parameters are adjusted as follows:
This can make the network to adapt to the target category of detection and improve the convergence of the whole architecture.
Poisson image fusion is a major tool in Poisson editing [
Describing the implementation of the algorithm in Mathematics is that it should firstly find the constraints, then construct the Poisson equation, and finally solve the Poisson equation. The basic flowchart is shown in
From the previous steps, we have been able to calculate the divergence field of the fused image, that is, the b term in the equation. In practice, the constraint condition of Poisson equation can be added by obtaining the pixel value of the fused image boundary region. Finally, we only need to construct sparse matrix according to the constraint condition and solve the Poisson reconstruction equation to obtain the pixel value of the fused image. In this paper, the optimized FFT method is used to solve the Poisson reconstruction equation, which takes 4 or 5 times less time than the SOR method to achieve similar accuracy, and does not change with the vorticity field [
In practice, a single fusion can not meet the requirements of data. In other words, the fixed position, size and shape of the region to be fused can not provide the extensive data support. Therefore, in the further processing, the fusion region is treated with geometric transformations, such as random rotation, scaling, displacement and so on. The result is shown in
YOLOv3 is an improved version of basic YOLO and YOLOv2, which was proposed by Joseph et al. [
YOLOv3 continues the idea of other versions of YOLO, dividing the input image into S × S grid cell, each of which is responsible for detecting objects that “fall into” the grid cell. If the coordinates of the center position of an object fall into a certain grid cell, then the grid cell is responsible for detecting the object. The main steps of the processing algorithm include: bounding box prediction, class prediction, trans-scale prediction, feature extraction. The specific principle is detailed in the literature [
The training data is trained by the optimized network. In the training process, small batch gradient descent method and impulse are used, which can improve the convergence speed of network training. At the same time, the parameters for evaluating the training level will be displayed in real time and the curve of loss function is drawn. In YOLO network, the loss function is defined by the
The experiment in this paper is based on c language and Python3.6 in Visual Studio 2017. It is carried out on the computer with a main frequency of 2.6 Ghz, GTX960M graphics card, and 4 GB memory.
In this paper, two kinds of data sets are used. One is a normally collected lung nodule data set, the other is a data set generated by Poisson fusion. The following two data training results will be analyzed and compared to verify the validity of Poisson fusion algorithm.
The first data set is trained with 419 images from the common database RIDER [
Select a part of the recognition image, as shown in
In this paper, the output results of the model are evaluated by using the statistical results of Recall, Precision and F1. Amongthem, Recall is obtained by the formula
The default threshold size for verification is 0.25, and the evaluation results of the test set are shown in
Evaluation item | Avg IoU | Recall | Precision | Accuracy | F1 |
---|---|---|---|---|---|
Datasets1 | 69.27% | 97.00% | 95.00% | 92.38% | 0.96 |
Datasets2 | 68.39% | 98.00% | 98.00% | 96.08% | 0.98 |
Datasets3 | 68.46% | 99.00% | 98.00% | 97.06% | 0.98 |
Average value | 68.71% | 98.00% | 97.00% | 95.17% | 0.97 |
Poisson fusion generates more data by image processing with a small amount of data. Therefore, when evaluating this part of effect, the validation data set we used did not contain the image of the disease as the source image of Poisson fusion. The results are shown in
Evaluation item | Avg IoU | Recall | Precision | Accuracy | F1 |
---|---|---|---|---|---|
Datasets1 | 52.25% | 79.00% | 75.00% | 62.30% | 0.77 |
Datasets2 | 49.66% | 80.00% | 77.00% | 64.86% | 0.79 |
Datasets3 | 51.31% | 83.00% | 80.00% | 68.57% | 0.81 |
Average value | 51.07% | 84.00% | 77.33% | 65.24% | 0.79 |
One of the advantages of YOLOv3 network is that it has better real-time performance and the detection time is shorter than other algorithms. Therefore, the detection can be achieved in a short time even on a computer with a common hardware. With a certain accuracy, CT images can be efficiently detected and a large amount of patient data can be processed. This has important practical significance for the exponential growth of lung cancer incidence.
Algorithm | mAP-50 | Time (ms) |
---|---|---|
SSD321 | 45.4 | 61 |
DSSD321 | 46.1 | 85 |
R-FCN | 51.9 | 85 |
SSD513 | 50.4 | 125 |
DSSD513 | 53.3 | 156 |
FPN FRCN | 59.1 | 172 |
RetinaNet-50-500 | 50.9 | 73 |
RetinaNet-101-500 | 53.1 | 90 |
RetinaNet-101-800 | 57.5 | 198 |
YOLOv3-320 | 51.5 | 22 |
YOLOv3-416 | 55.3 | 29 |
YOLOv3-608 | 57.9 | 51 |
In terms of detection rate, it is compared with the data results of some existing algorithms in recent years. The results are shown in
Algorithm | Experimental method | Detection rate |
---|---|---|
Gurcan et al. [ |
Feature extractor | 84.00% |
Zhang et al. [ |
Feature extractor | 82.98% |
Ye et al. [ |
SVM | 90.20% |
Choi et al. [ |
SVM | 95.28% |
Setio et al. [ |
CNN | 90.10% |
Liu et al. [ |
ANN | 89.40% |
Our algorithm | Deep learning network | 95.17% |
In view of the rising prevalence rate of lung cancer, we use the advanced YOLOv3 algorithm in the field of target detection to recognize lung nodules, which plays an early role in the diagnosis of lung cancer. At the same time, a method of generating data sets using Poisson fusion algorithm is proposed, which reduce amount of the original data for training. After the testing and evaluation, the detection rate of the algorithm in this paper is higher than other algorithms, and the algorithm has obvious speed advantage, which is shorter than other mainstream algorithms. This has important practical significance for detecting a large number of CT images.
In the verification of Poisson fusion, this paper also achieved significant results. The specific method is using a small number of diseased areas as the source image, the normal lung image as the target image, calculate the gradient and divergence of the fused image, and construct the Poisson equation with the boundary pixels as the constraint condition. In the practical verification, this method achieves a detection rate of about 65% for the new images, which means that the data generated by Poisson fusion as part of the training data sets will reduce the need for original training data.