Traffic flow statistics have become a particularly important part of intelligent transportation. To solve the problems of low real-time robustness and accuracy in traffic flow statistics. In the DeepSort tracking algorithm, the Kalman filter (KF), which is only suitable for linear problems, is replaced by the extended Kalman filter (EKF), which can effectively solve nonlinear problems and integrate the Histogram of Oriented Gradient (HOG) of the target. The multi-target tracking framework was constructed with YOLO V5 target detection algorithm. An efficient and long-running Traffic Flow Statistical framework (TFSF) is established based on the tracking framework. Virtual lines are set up to record the movement direction of vehicles to more accurate and detailed statistics of traffic flow. In order to verify the robustness and accuracy of the traffic flow statistical framework, the traffic flow in different scenes of actual road conditions was collected for verification. The experimental validation shows that the accuracy of the traffic statistics framework reaches more than 93%, and the running speed under the detection data set in this paper is 32.7FPS, which can meet the real-time requirements and has a particular significance for the development of intelligent transportation.
With the continuous popularization of automobiles, traffic congestion, emission pollution, and traffic accidents are becoming more and more prominent in traffic, and people's demand for intelligent transportation [
TFSF is divided into detection, tracking, and statistics modules. Target detection [
Target detection based on deep learning [
At present, the mainstream Tracking frameworks are all based on tracking-by-detection, that is, Tracking based on target Detection. Zeng et al. [
TFSF is built based on the position of the target and the direction of the movement track, the target detection algorithm using deep learning makes better use of the target features, the adoption of this method can be more accurate and detailed statistics of traffic flow, in addition, the algorithm eliminates the interference of useless information and can run efficiently for a long time to meet the real-time requirements of intelligent transportation. The framework is highly exploitable in the later stage and can integrate many functions. This framework can provide a theoretical basis for the development of intelligent transportation systems.
In this part, the self-made Traffic Statistics Test Dataset (TSTD) is made public for the convenience of scholars. This data set is used to test the TFSF built in this paper, which can be downloaded from the following link:
The experimental scenes in this paper are one-way and two-way lanes, as shown in
Vehicle Detection Dataset (VDD) is mainly to get more in line with the actual situation of training, improve the accuracy of target detection, so the collection in city road scene, a total of 4500 RGB images, each image has a corresponding mark, the content of the annotation includes the type of the target and the bounding box coordinates, to 4000 copies of these as a training set, the rest of 500 as a validation set. According to the survey, there are mainly three types of vehicles in the urban area, namely, cars, vans, and buses. This can be downloaded from the following link:
The vehicle detection involved in this paper is an application of a target detection algorithm. At present, vehicle detection mainly uses traditional machine learning and deep learning methods. With the rapid development of computer technology, target detection methods based on deep learning have made great progress and are far superior to traditional algorithms in precision, speed, and application.
In deep learning to YOLO as the representative of the better performance of regression-based detection methods, where the latest YOLO V5 algorithm compared to previous generations of YOLO algorithm in flexibility and speed has been dramatically improved, this can meet the field of traffic flow statistics requirements for real-time, but in the accuracy is slightly weaker than YOLO V4 [
YOLO V5 model is mainly divided into four parts, Input, Backbone, Neck, Prediction. The Input is divided into three parts: Mosaic data enhancement, adaptive anchor frame calculation, and adaptive image scaling; Backbone is divided into two structures, Focus structure is mainly to slice the image operation, CSP structure is mainly from the perspective of network structure design to solve the reasoning from the calculation of a large number of problems; Neck is FPN+PAN structure to strengthen network feature fusion; Prediction uses the GIOU_LOSS function, which is used to estimate the recognition loss of the detection box.
The main task of target tracking is to connect the detection target between two frames, number the target, obtain the target trajectory, and provide a data basis for TFSF. Among many target tracking algorithms, the DeepSort algorithm has high robustness and real-time performance, DeepSort algorithm is a classic representative of tracking-by-detection. This algorithm adds depth features and adds rules for box matching through the appearance model. This method can alleviate the occlusion problem to some extent and reduce the number of ID switching. It is most suitable for multi-target tracking tasks in complex traffic scenarios. The DeepSort algorithm is mainly composed of feature extraction, feature matching and position prediction modules. In this paper, the HOG of vehicles are considered to be fused into the algorithm, and the apparent modeling of vehicles has been solved to improve the accuracy of target tracking by solving the problems of target loss caused by vehicles in scene light changes, overlapping occlusion and turning. In addition, the location prediction module of the tracking algorithm has an important impact on the performance of the algorithm. The vehicle trajectory in the traffic scenario is non-linear, and the KF in the original algorithm is suitable for linear systems, and the prediction accuracy is not high for non-linear systems, Switching to the EKF for non-linear systems with high computational speed.
The traffic scene is complex and changeable [
HOG is a feature descriptor used in computer vision and image processing for target detection. The feature is composed by calculating a histogram of the gradient direction of a statistical image region, with the gradients mainly distributed in the edge regions. The feature provides a good description of the appearance and shape of a local target. Three images of three randomly selected vehicle types can be computed and visualized for HOG using the Skimage and Opencv packages in python,
The original DeepSort algorithm will match the unmatched Detections and Tracks only IOU after cascading matching, which is not enough to solve the matching problem of the same target between frames. In order to solve this problem, the HOG is fused in the algorithm to match the targets between different frames. The experiment proves that the average cost of extracting the HOG of the detection box is only 1ms, and the calculation is simple and almost does not consume computing power during the matching.
All initial frames are first detected and the ID number is initialized, KF is used to predict the state parameters of the next frame, and Hungarian matching is performed according to CNN, HOG, and IOU of the prediction box and the detection box of the next frame, preset the appropriate threshold, judge characteristic relationship between distance and the size of the set threshold, As long as one of the three matches, It will enter the update stage of KF. If the Tracks are not successfully matched for T times, Tracks will be considered to have left the video and will be deleted,
KF is a kind of linear equation of state, through the input and output observation data, estimate the optimal system state. For nonlinear systems, the prediction effect of KF is poor, and KF in the DeepSort algorithm is replaced by EKF. The EKF linearizes the nonlinear system, and it mainly uses the first-order Taylor series to construct an approximately linear function by obtaining the slope of the nonlinear function at the Mean and then carries out KF on the approximate function. Therefore, this algorithm is suboptimal filtering. Because of the approximate function adopted by EKF, there is bound to be some error, but for traffic statistics which requires high real-time performance, the simplicity and speed of calculation become very necessary.
The state equation of discretized system of EKF is:
where
The system state equation obtained by the first-order Taylor expansion of the nonlinear function is as follows:
The observation equation is:
The process of EKF is as follows:
Initialize system states X(0), Y(0) and covariance matrix P(0);
Prediction of states and observations:
The first-order linearization equation is used to solve the state transition matrix and observation matrix:
Covariance matrix prediction:
KF gain:
Status and covariance update:
In view of the traffic environment in this paper, EKF algorithm is applied to the vehicle trajectory prediction in this scene to test the accurate rate of EKF algorithm in the nonlinear system. The average accuracy of trajectory prediction of this algorithm is calculated to judge the quality of the algorithm, and the formula is as follows:
The TFSF built in this paper is suitable for several common scenes in the traffic field, as shown in
The principle of traffic statistics is shown in
When an ID appears for the first time, record the center point position
The overall methodology flow chart is shown in the figure. The method in this paper transfers each frame of video or camera to the algorithm to carry out the process of detection, tracking, and counting until the counting task is completed.
The framework was built using Python3.7 and PyTorch1.7.0, Cuda10.1, Cudnn7.4.1, hardware configuration CPU is i7–10700k CPU@3.00 GHz, a single NVIDIA GeForce RTX2080Ti with 12G video memory, 4352 CUDA cores, 32G of running memory, and Ubuntu as the operating system. Also, for the threshold settings, the most effective threshold configuration was obtained through experimental testing: the confidence level was set to 0.4, the IOU matching threshold is set to 0.5, and the HOG matching threshold was set to 0.5.
TP (True Positives), divided into positive samples, and divided right; TN (True Negatives), divided into negative samples, and separated correctly; FP (False Positives), divided into positive samples, and divided wrong; FN (False Negatives), divided into negative samples, and divided wrong.
For a specific classification, take a value every 0.05 from 0.5 to 0.95 as the IOU threshold of the prediction box, calculate the recall rate and accuracy under all the thresholds, and draw the P-R curve as the abscissa and ordinate. The area under the curve is the AP value of the current classification. The mAP value is the average of the AP values for all categories. The calculation method is as follows:
In the formula, I represent the number of the current classification, n represents the total number of categories, and precision(recall) represents the value of accuracy under the current recall rate.
AP refers to the combination of Precision and Recall because Precision represents the prediction ability that the target hit can pass the threshold in all prediction results, while Recall represents the ability to cover the real target in the test set. The combination of the two can better evaluate our model. mAP is the mean of the average accuracy of each category, which is also the mean of AP of each category. FPS is how many frames per second the target network can process. FPS is simply the image refresh rate.
In order to ensure the effectiveness of the detection algorithm, the trained model was tested on the verification set, and the parameters of the network model in terms of speed and accuracy were obtained. Using different deep learning algorithms, the current state-of-the-art Faster-RCNN, SSD-512 [
The numbers after mAP and mAP50 represent the parameters to set the IOU threshold in the NMS process, IOU is used to filter redundant boxes in the NMS process, detection boxes whose overlap of detection boxes is greater than the IOU threshold will be filtered, leaving the detection box with the highest confidence. After the comparative analysis of these algorithms on the detection results, it can be seen that Faster-RCNN has better detection accuracy, but the speed is too low to be suitable for the traffic field, which requires high real-time performance. The detection accuracy of SSD-512 (VGG16) and YOLO V4 is slightly higher than that of YOLO V5s, but the FPS of YOLO V5s is much higher than that of the previous two, and this paper also needs to add a tracking algorithm and a counting framework on top of the detection algorithm, which will consume some more computing power. The use of SSD-512 (VGG16) and YOLO V4 detection algorithms may not meet the real-time requirements of traffic statistics, Centernet has a high FPS but the lowest index compared with other algorithms. In conclusion, YOLO V5s is the optimal detection algorithm for this scenario after comparative analysis.
Method | mAP (%) | mAP50 (%) | FPS |
---|---|---|---|
Faster-RCNN (Resnet) | 90.1 | 95.3 | 15 |
SSD-512 (VGG16) | 89.3 | 93.8 | 31 |
Centernet (Resnet18) | 87.5 | 93.6 | 71 |
YOLO V4 (CSPDarknet53) | 89.5 | 93.3 | 35 |
YOLO V5s (CSPDarknet53) | 89.0 | 93.1 | 67 |
In this section, the performance metrics of the DeepSort algorithm before and after the improvement are evaluated. According to the official evaluation method of the MOT Challenge competition, the most recognized metrics at present are MOTA and MOTP, which evaluate the robustness of the tracking algorithm, and FPS, which evaluates the real-time performance of the algorithm [
MOTP reflects the error-index of determining the target position, which is intuitively represented by the misalignment of the comment box and the prediction frame, and mainly evaluates the accuracy of the algorithm, and the formula is as follows:
MOTA is an intuitive representation of the accuracy of the algorithm matching and tracking to the same target, reflecting the accuracy index of the tracking algorithm. This method combines three error sources to illustrate the accuracy of the algorithm, and the formula is as follows:
FPS is the time taken by the algorithm to process each frame of image, which reflects the processing speed of the algorithm, and the formula is as follows:
Method | MOTA (%) | MOTP (%) | FPS (hz) |
---|---|---|---|
DeepSort | 61.5 | 79.0 | 64.0 |
HOG-DeepSort | 64.8 | 82.2 | 57.6 |
EKF-DeepSort | 63.3 | 80.6 | 62.1 |
H.E-DeepSort | 66.2 | 84.0 | 55.7 |
According to the comparison in the table, both the HOG feature fusion and the EKF algorithm can improve the performance of the DeepSort algorithm, but at the same time, it will increase the computational amount of the algorithm, so the real-time performance of the algorithm is reduced, the real-time decline range is within the acceptable range. Considering the robustness and real-time performance of the algorithm, H.E-DeepSort was selected as the TFSF tracking framework. DeepSort and H.E-DeepSort were used to conduct experiments in a self-made test set. Part of the renderings are as follows:
As seen from the effect diagram, when the target is turning or obscured, it is easy to lose track. In the process of target tracking, frequent loss of the target will cause repeated counting of the traffic statistics framework, resulting in a decrease in counting accuracy and consumption of computing power. After integrating HOG and adopting EKF, the problem of tracking target loss can be solved effectively, and the matching accuracy of the target can be improved.
The traffic counting framework, added to the previous detection and tracking algorithm, is divided into the three most common vehicle types for traffic counting. This section investigates the accuracy of the original DeepSort algorithm and H.E-DeepSort in terms of traffic statistics. It puts them to the test in this paper's test set, using OpenCV to build a virtual counter in the top left corner of the video. The partial results of the traffic statistics framework are shown
The TFSF built on the basis of the H.E-DeepSort tracking framework is used for the six test videos of this paper, in which there are three different scenes, sunny day, rainy day, and night. The test effect of TFSF is shown in
Using manual counting to get the accurate traffic flow of the test video, after several experiments to verify that the algorithm in this paper can still guarantee good accuracy and robustness, it can completely replace the inefficient work of manual counting. The accurate traffic flow results obtained after manual counting are as follows.
Error is used to evaluate the Error rate of TFSF, that is Error between the real traffic and the framework statistical traffic. Its calculation formula is as follows:
From the comparative analysis in
MOV_20_ |
MOV_20_ |
MOV_20_ |
MOV_20_ |
MOV_20_ |
MOV_20_ |
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
up | down | up | down | up | down | up | down | up | down | up | down | |
Car | 322 | 289 | 0 | 358 | 0 | 343 | 281 | 255 | 93 | 118 | 0 | 201 |
Truck | 16 | 12 | 0 | 8 | 0 | 12 | 11 | 10 | 8 | 9 | 0 | 12 |
Bus | 26 | 19 | 0 | 21 | 0 | 25 | 16 | 13 | 10 | 13 | 0 | 16 |
Total | 363 | 319 | 0 | 385 | 0 | 379 | 308 | 278 | 108 | 137 | 0 | 226 |
MOV_20_ |
MOV_20_ |
MOV_20_ |
MOV_20_ |
MOV_20_ |
MOV_20_ |
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
up | down | up | down | up | down | up | down | up | down | up | down | |
Car | 336 | 301 | 0 | 367 | 0 | 355 | 290 | 264 | 101 | 133 | 0 | 243 |
Truck | 17 | 12 | 0 | 9 | 0 | 13 | 12 | 11 | 9 | 10 | 0 | 15 |
Bus | 27 | 20 | 0 | 22 | 0 | 27 | 18 | 15 | 11 | 15 | 0 | 20 |
Total | 380 | 333 | 0 | 398 | 0 | 395 | 320 | 290 | 121 | 158 | 0 | 278 |
Error (%) | Car | Truck | Bus | Total |
---|---|---|---|---|
Up | 4.3 | 7.9 | 7.1 | 4.6 |
Down | 6.0 | 10.0 | 10.1 | 6.4 |
Total | 5.4 | 9.3 | 9.1 |
The average accuracy was used to evaluate the overall accuracy of TFSF. After calculation, the average accuracy is 93.6.
The framework is based on the traffic scene, accuracy is not the only evaluation standard, the framework of the speed of the framework is an important parameter, tested, the building of traffic statistics framework 32.7 FPS, fully meet the requirements of the real-time, after tests verify that the frame of video under various scenarios also has good robustness, can maintain a certain accuracy.
Compared with the existing traffic statistics system [
This paper builds a traffic statistics framework based on detection and tracking, which is suitable for most traffic scenes. In order to consider the accuracy and speed of the framework detection, the YOLO V5s network structure was adopted. At the same time, the DeepSort tracking algorithm is improved to improve the performance of the algorithm. The framework can be divided into different vehicle types and movement directions to more detailed statistics of traffic flow and facilitate the integration of more functions, with an overall accuracy of more than 93%.
In the future research direction, the problem of decreased accuracy of traffic statistics caused by scene changes should be further solved. It is believed that the framework can provide some meaningful information for traffic decision-makers to improve the current traffic problems.