Heart rate is an important vital characteristic which indicates physical and mental health status. Typically heart rate measurement instruments require direct contact with the skin which is time-consuming and costly. Therefore, the study of non-contact heart rate measurement methods is of great importance. Based on the principles of photoelectric volumetric tracing, we use a computer device and camera to capture facial images, accurately detect face regions, and to detect multiple facial images using a multi-target tracking algorithm. Then after the regional segmentation of the facial image, the signal acquisition of the region of interest is further resolved. Finally, frequency detection of the collected Photoplethysmography (PPG) and Electrocardiography (ECG) signals is completed with peak detection, Fourier analysis, and a Wavelet filter. The experimental results show that the subject’s heart rate can be detected quickly and accurately even when monitoring multiple facial targets simultaneously.
The heart is one of the most important organs of the human body, and is closely associated with health status. If the state of the heart can be monitored in real-time, it can even prevent the early onset of heart disease. In conventional medical devices, monitoring the heartbeat rate and cardiac activity is done by measuring electrophysiological signals and electrocardiography (ECG), which requires connecting the electrodes to the body to measure the signals of electrical activity in the heart tissue. As a result of the heartbeat, a pressure wave passes through the blood vessels, which slightly changes the diameter of the vessels. Due to the expensive instruments for ECG measurement, they are usually configured in large hospitals and are not suitable for daily life or other specific scenarios.
An alternative to ECG uses the measurement of light reflection. In this approach the hemoglobin concentration in the blood will change with the heartbeat pulse, and when light passes through the skin through the blood, the change of blood component concentration corresponding to the change of light absorption is collected to form IPPG optical electrocardiogram [
Starting with early PPG technology in the 1980s, J. A. Nijboer studied the principle of Photoplethysmography, and observed that there is a difference between reflection and transmission values, depending on the location where the image is acquired. Since the erythrocytes have absorptive properties, and the light reflection and the volume offset is in the reverse phase, the absorptive property manifests itself by a strong reflection of the surrounding tissue; In 2007, J. Allen et al. proposed using PPG technology to measure the physiological signals of human heart rate and respiratory rate due to its simple and low-cost characteristics [
Key developments in the Photoplethysmography wave tracing methods are as follows; extract the heartbeat signal from single-channel light green signal, then to verify the technical feasibility, the blind source separation technology was proposed, which compares three separate extractions of the signal channel, to select the appropriate signal component as the final heartbeat signal, and then to put forward a series of related personnel to overcome motion artifact effects, to overcome the non-rigid movement, to overcome the influence of mirror reflection, to overcome the change of light. Then, relevant scholars proposed to divide the face region grid and give weight, to define the region of interest more effectively. The application of these methods has achieved good results in heart rate measurement.
Using a Light Emitting Diode (LED) light source and the optical volumetric heart rate sensor, the photoelectric volumetric pulse wave tomography measures the attenuated light reflected and absorbed by human blood vessels and tissues, traces the pulse state of the blood vessels, and measures the pulse wave.
The essence of heart rate measurement combined with LED light and optical sensors is the conversion between photoelectric signals.
That is, when the LED light is irradiated to the skin, the reflected light of the skin tissue will be received by a photosensitive sensor and converted into an electrical signal, which will be converted into a digital signal through an analog-digital converter. Most optical sensors choose green light (~500 nm) as the source of light reasons: (1) The skin’s melanin tends to absorb shorter wavelengths; (2) Moisture in the skin also tends to absorb UV and IR light; (3) Most of the yellow light (600 nm) is absorbed by red blood cells in the tissue; (4) Red and near-IR light passes through skin tissue more easily than other wavelengths of light; (5) Compared with red light, green (green-yellow) light can be absorbed by oxyhemoglobin and deoxyhemoglobin.
When the light source is directed on to the skin tissue and reflected to the optical displacement sensor, the light intensity fluctuates, mainly due to changes in arterial blood flow owing to the fact that muscle, bone, and intravenous groups exhibit largely constant light absorption across the time domain (assuming no large movement of the image acquisition site).
When light signals are converted into electrical signals, it is precisely because the absorption of light by the arteries changes while that of other tissues remains essentially the same that the measured signals can be divided into DC and AC signals (this is the most important premise of the approach).
The AC signal, the part of the signal that reflects the characteristics of blood flow, is extracted. Because the skin tissues absorb red light and IR to different degrees, the DC part is naturally different. “Eliminating” the DC component and analyzing the changing AC component is called photoplethysmography (PPG).
IPPG technology is based on PPG. IPPG, unlike previous PPG technology which requires special sensor equipment, can obtain human Blood Volume Pulse (BVP) signals through smartphones and cameras of various monitoring devices. It generally estimates the heart rate parameters by extracting the human skin color changes (generally facial or fingertip skin) from the video information captured by the device. With the development of IPPG technology [
This section will introduce the experimental process of a video-based multi-person heart rate detection system, to achieve a real-time multi-person heart rate detection function, which uses remote-photoplethysmography (RPPG) to calculate the heart rate values. For the experimental construction section, natural light was set to constant conditions, and real-time heart rate monitoring was conducted through a network camera. The experimental process of the whole system is mainly divided into four parts: face detection, face tracking, region of interest (ROI) selection, and IPPG extraction, The experimental device is shown in
In this paper, the face detection algorithm based on the cascade classifier model of Haar features, is an effective object detection method based on a machine learning algorithm, which is trained to obtain the cascade equation in many positive and negative samples and then applied to other images such as faces.
First, the algorithm requires many positive samples (including pictures of the face) and negative samples (pictures of the face) to train the classifier. We then want to get the features from these images. The way the Haar features are obtained, is shown in
To solve the problem of extensive computation, the concept of integral images, a matrix representation method capable of describing the global information, as shown in
The face detection process is divided into the training stage and the test stage. In the training stage, a prepared training set is adopted, which includes 5,250 images for face detection, corresponding to 11,931 faces, and each image corresponds to a marker file, including the coordinate point position in the upper left corner of the marker box, the length and width of the marker box, blur, light condition, occlusion, effectiveness, posture, and other characteristics, etc. Model parameters for face detection were obtained from the existing training set and saved locally. During the test phase, we experimentally validated the accuracy of the face detection algorithm based on the cascade classifier model with Haar features, and the faces of each subject participating in the experiment could be accurately detected.
Using face detection to identify all subjects, each subject’s real-time heart rate data can be determined., We used the face tracking algorithm in real-time tracking to prevent data confusion and pollution of the experiment results. Real-time tracking is possible because the algorithm is executed on every new frame. Thus the procedure is done in one direction: forward. Only one framework is needed to provide the necessary information and then take the appropriate follow-up.
The OpenCV library is a widely used library of computer vision research that provides many algorithms that can track a small number of targets experimentally. These algorithms can be divided into two categories: the first is an algorithm that tracks the trajectory of the target, which uses a prediction of the future location, which requires correction for inter-frame errors. One of the disadvantages is that accuracy is lost when the network camera moves too fast or the target movement direction changes suddenly and then identifies another object at the location predicted by the algorithm. Example algorithms: MIL (multiple-instance learning), Boosting, Median-Flow, etc. The second class are algorithms developed to solve complex tracking problems that can adapt to new conditions and correct all existing major errors. Their disadvantage is the need for more memory and more processing power consumption. Several examples of target tracking algorithms are described below. (1) MOSSE——The Minimum Sum of Error Filter (MOSSE) initializes the first frame in the video. It has better robustness to brightness, scale, pose, and lax deformation of time at the speed of 669 frames per second (fps). Filter-based tracking is to model the appearance of the target object with a filter trained on the template image. Goals were initially selected based on a small tracking window centered on the target in the first frame. In this regard, tracking and filter training are done together. The target is tracked by filtering in the search window of the next frame picture. Where the maximum value generated after filtering is the new location of the target. Complete the online updates based on the new location obtained. (2) KCF——The Nucleation Correlation Filter (KCF) tracking algorithm is a discriminative tracking method, which requires training a target detector in the tracking process, using the trained target detector to predict the tracking object position of the next frame, and then updating the detection results. The target area selected during the operation is generally a positive sample, while the surrounding area outside the target area is a negative sample. The algorithm collects positive and negative samples using the cycle matrix of the area around the target, uses ridge regression to train the target detector, and successfully uses the diagonal nature of the cycle matrix in the Fourier space to transform the operation of the matrix into a vector Hadamard product, namely the point multiplication of the element, which greatly reduces the operation amount, improves the operation speed, and allows the algorithm to meet the real-time requirements. (3) CSK——With the Circulant Structure of Tracking-by-detection with Kernels (CSK) method, most of the motion tracking is done by finding the mutual relationship of the two adjacent frames, and then determining the motion direction of the target object. After sufficient iteration, the object can be tracked completely, as can the CSK. After determining the tracking object, the target window and the next frame are cut out according to the target position, and then Fast Fourier Transform (FFT) is performed, after which the window is multiplied directly into the frequency domain map after transformation. This process can be simply understood as finding the frequency domain resonance position of two connected frames, then mapping the resonance frequency domain map using the nuclear function, followed by training. The training process introduces the original response Y. Y can be understood as the starting position of the object, because the starting positions are all the center of the first frame, the image of Y is a built Gaussian function based on the size of the tracking window. CSK uses the cyclic structure for the correlation detection of the adjacent frames. The so-called cycle structure is that two frames multiply by points on the frequency domain, that is, two frames are convolved on the time domain. In the previous motion tracking, the correlation detection was done using the sliding window method. If the sliding step length of the window is 1, it can be regarded as a convolution between two frames. However, the amount of convolution in the time domain is very amazing, while point multiplication in the frequency domain is much smaller, so the cycle structure used by CSK in the frequency domain can be well accelerated in the frequency domain. (4) CN——The Color Names (CN) is a way of color naming, which belongs to RGB and Hue, Saturation, Value (HSV). Research in the CN article shows that the effect of CN space is better than any other space, so CN color space is used to color expand CSK. CN is very simple; first map the images of RGB space to CN space (CN space is 11 channels, respectively black, blue, brown, grey, green, orange, pink, purple, red, white, yellow), apply FFT to each channel, perform nuclear mapping, and finally perform the 11-channel frequency domain signal linear addition (sum), and then complete the CSK calculation, such as α calculation, training, detection, etc. However, the operation is numerically intensive and in order to. reduce the scale of the operation dimension reduction using PCA is performed. Some of the colors contained within the 11 channels of CN are not significant, and thus with PCA the 11 channel information is condenced down to 2 dimensions which contain most of the meaningful information, but within a reduced 2 dimensional matrix which results in much faster calculations. (5) SAMF——The Split and Merge Files (SAMF) is improved based on the KCF, using a multi-feature (grayscale, Hog, CN) fusion. The Hog and CN features can be complementary (color and gradient)). And a multiscale search strategy is adopted.
Since the above algorithms described previously can be used in different situations, it is difficult to compare them as complex and objective. Tracking algorithms have also been developed to overcome various problems encountered. Some of these algorithms are more effective when the target rotates, while others better handle changes in light conditions or sudden movement of web cameras more effectively. Sometimes, the tracker fails to initialize, or the target is lost while running the algorithm. This makes the most complete comparison likely to require a very large set of data. After the comparative experiment in the test stage, the SAMF algorithm had the best overall performance, so the SAMF algorithm was chosen to track the target images in the video in real-time in this experiment.
IPPG unlike traditional PPG technology, does not require contact sensors. IIPG captures facial images through a network camera, and then extracts heart rate information from the video sequence. The human heartbeat has a periodic rhythm, and the systolic/diastolic contractions of the heart causes periodic fluctuations in the arterial pressure such that periodic expansion and retraction of the arterial occurs, resulting in periodic changes in blood volume within the blood vessels. This feature drives the development of IPPG technology which functions as described in
To enable IPPG technology, vascular areas of the human body should be selected as the areas of interest. Because the facial forehead area has rich capillaries and a smooth skin distribution, the forehead of the subject is ideal as the area of interest, as shown in
First, each face was modeled as a rectangular detection box with the parameters X, Y, W, and H, where X and Y are the upper left coordinates of the detection box, and W and H are the width and height of the detection box, respectively. The corresponding rectangular ROI detection box was set inside the face detection box of each object, with the parameters X1, Y1, W1, and H1. The calculation is shown in
The experimental flow chart for the work described in this paper is shown in
A wavelet filter [
The test subjects and their corresponding real-time heart rate data are shown in
Algorithm | MOSSE | KCF | CSK | CN | SAMF |
---|---|---|---|---|---|
Set1 | 1.60 | 2.40 | 1.80 | 1.80 | 0.60 |
Set2 | 3.00 | 1.90 | 2.30 | 1.00 | 1.70 |
Set3 | 2.70 | 0.80 | 2.70 | 0.50 | 0.80 |
Set4 | 2.40 | 2.10 | 1.60 | 2.10 | 1.00 |
Set5 | 2.80 | 0.90 | 1.40 | 1.30 | 0.80 |
Average | 2.50 | 1.62 | 1.96 | 1.34 | 0.82 |
The experiment used natural light sources and two devices, a pulse oximeter and a web camera to collect two types of physiological information, namely PPG and IPPG data. The results from 120 experiments are shown in
The consistency between the contact and non-contact measurements was analyzed using Bland-Altman analysis.
From
Currently, widely used heart rate measurement instruments (such as ECG monitors) require direct contact with the skin of the test subject, but the contact measurement equipment is expensive, typically not portable, and cannot be deployed during daily life activities and specific scenarios. For these reasons there is a significantly demand for contactless pulse wave extraction techniques. This paper presents a multi-human real-time heart rate detection system under sunlight exposure and a detailed introduction and evaluation of the entire experimental process. The system designed in this experiment detects and tracks ROI in acquired video frames using deep learning-based face detection and face tracking technology. The ROI’s image sequence was processed using image preprocessing and used signal separation techniques to obtain the RGB color vector signal containing pulse wave information and high signal-to-noise ratio pulse timing maps using a noise reduction by band-pass filtering algorithm. Finally, the heart rate calculation was performed using FFT. Experimental results show that the non-contact measurement method and associated algorithms presented herein can accurately determine the heart rate of multiple persons simultaneously in real-time. The experimental results showed high agreement with the accepted contact measurement method which uses a pulse oximeter device for heart rate detection. With the development of contactless heart rate measurement technology, there will be more scenarios that can be applied for medical testing as well as the physical condition testing of test subjects.
The authors gratefully acknowledge the laboratory of electronic and information engineering, Multimedia Computing Laboratory, Guizhou Normal University, Guizhou Province, China.