Enhance Egocentric Grasp Recognition Based Flex Sensor Under Low Illumination

: Egocentric recognition is exciting computer vision research by acquiring images and video from the first-person overview. However, an image becomes noisy and dark under low illumination conditions, making subse-quent hand detection tasks difficult. Thus, image enhancement is necessary to make buried detail more visible. This article addresses the challenge of egocentric hand grasp recognition in low light conditions by utilizing the flex sensor and image enhancement algorithm based on adaptive gamma correction with weighting distribution. Initially, a flex sensor is installed to the thumb for object manipulation. The thumb placement that holds in a different position on the object of each grasp affects the voltage changing of the flex sensor circuit. The average voltages are used to configure the weighting parameter to improve images in the image enhancement stage. Moreover, the contrast and gamma function are used to adjust varies the low light condition. These grasp images are then separated to be training and testing with pre-trained deep neural networks as the feature extractor in YOLOv2 detection network for the grasp recognition system. The proposed of using a flex sensor significantly improves the grasp recognition rate in low light conditions.


Introduction
Hands are the priority for humans that allow us to collaborate with the matters and the surroundings, correspond with others and carry out daily activities like dining, cleaning and dressing. Focused on their significance, the computer vision researchers have attempted to analyze hands from various aspects: determine the position of the hand in the image [1], analyze the hands from multiple perspectives: localizing them in the images are investigated in any types of actions [2][3][4], as well as interact with the computer and the robot [5][6][7]. Wearable cameras allow hands to be examined from a first-person perspective, known as egocentric or First-Person Vision (FPV) in computer vision [8][9][10][11][12], to challenge object detection and identifying activities. The essential characteristic of egocentric vision is providing a first-person perspective of the scene by laying a forward-facing wearable camera on the chest or head. This wearable camera offers a person-centric view and is optimally set to capture information arguably more relevant to the camera wearer [13]. Since then, the egocentric vision is now being applied to more applications, including video summarization [14,15] and it also extends to the realm of healthcare [16]. The egocentric vision has several advantages compared to third-person point of view, where the camera position is often fixed desultory by the user. The camera wearer affects the movements, attentions, and activities as the camera will record whatever is in front of the user. Hands and objects being manipulated tend to appear in the center of the image and reduce hand obscuring. These pros make it very interesting to develop new accesses for studying hands. Modeling the relationship between hand gestures and object characteristics can provide additional information with a model that perceived the liaison between hands and objects. There are also examining the interact with objects by hand for other proposed models of neural networks as a transmitter for 3D objects and acts from RGB images and recognize objects and actions of the user [17]. The temporal network incorporates bi-directional long short-term memory to model the long-range dependencies to predict the actions in object manipulation tasks [18]. By the way, regarding the egocentric vision, researchers still challenge a significant problem that the camera is not secure but moves along with the human body. This movement produces rapid movements and immediate diversities, which can significantly distort the quality of the recording. Also, the sudden illumination changes can significantly reduce the quality of the image. The vague images and increment of disturbance due to the camera sensor makes it troublesome to detect and recognize a hand grasp, which is also tough to distinguish the subject from the background. Hence, restoring the composition of the image in low light conditions is a difficult task. Formerly, there has not been any previous egocentric research that has developed lowlight detection and recognition. This article investigated hand grasp recognition in first-person vision in the dimmed environments or nighttime environments conditions. In order to boost the effectiveness for the detection and recognition of hand gestures, the proposed system added the flex sensor, which has the advantage of being an easy to use and low-cost device, is used as an additional parameter to work with Adaptive Gamma Correction with Weighting Distribution (AGCWD) [19]. The flex sensors are arranged on the thump of the hand to track the finger movement and combine with the AGCWD fusion technique. Then, the proposed system recognize the grasp with Deep Convolutional Neural Networks (DCNN) as the feature extractor and detection that has emerged as a valuable tool for computer vision tasks.

Method and Propose Algorithm
The objective of this article is to regcognize the hand grasp from the routine hand works. Offers an egocentric vision system to detect different hand gesture and automatically learns the image capture structure from the big data captured via the wearable camera and the flex sensor attached to the thumb finger. The images are captured using a head-mounted camera. The hand grasp performs 18 different postures. In each action, the issue is handling some object. The posture of the grasp comprises a particular five objects. For the different postures, some objects may use the same, and the output voltage of the flex sensor uniquely describes an object's grasp. Dataset consists of 3600 images for the image of grasp training. It has 18 grasp postures that have actions overlapping in daily life. The proposed summarization process consists of four stages: image enhancement, grasp factor computing, YOLOv2 network, and grasp evaluation. The architecture of the proposed method is described in the following Fig. 1. This concept is implemented by using a combination of flex sensor information from Arduino and the image from first-person perspective of the scene, then enhance the input image with the AGCWD. Then, finding hands in egocentric frames is an instantiation of one particular object detection task. The real-time object detection, "You Only Look Once (YOLO)", the algorithm that is one of the most effective technique [20,21], which show the high speed with great accuracy among the many deep-learning algorithms. The YOLOv2 [22,23] has been used to detect and recognize the frame's grasp posture. The experimental environment is in the Window 10 64bit operation system and the processor is Intel Core i7-8700, memory is 16GB, GPU NVIDIA GeForce GTX 1070. The YOLOv2 was trained with MATLAB R2021a platform.

Grasp Type
The hand grasp type is vital for figuring the hand gesture due to the holding nature of the hand during control. Numerous studies have examined the classification of grips into discrete sets of types to help study manual grasping. We improve the classifiers to categorize the eighteen different comprehension types selected from the most widely used comprehension taxonomy [24]. The catch type has been chosen to cover other standard classification criteria based on the function, shape of the object, and the fingers' joints. According to work , a total of eighteen types have a high daily frequency of use [25,26]. Thus, the selected gestures can be used to analyze large amounts of manipulation tasks and possible for automatic recognition from image appearance. The grasp types provide information about how the hands are holding the objects during manipulation. Only the sole grasping cannot classify the delicate actions without details from the object being handled. In this Research, four volunteers were used to grasp objects in 18

Flex Sensor
The flex sensors change the resistance depending upon the amount of bending on the sensors, mainly dealing with angle displacement measurement.The feature of flexible sensors produces resistance output related to the bending radius when the sensor is bent. The larger the radius, the higher the result, the more significant the change in deflection, the higher the resistance variation [27]. The flex sensor can be applied to the thumb as one of the features for grasp recognition because it is used in every grasp [28]. The thumb can either be removed or carried off. In the abducted part, the thumb can obstruct the fingertips. The abducted position allows to either apply forces on the fingers' side or move the finger out of the way. Such being the cased, the thumb has to be seized, as otherwise, the thumb cannot act against the fingertips, which will tilt the flex sensor according to that hand will manipulate an object cause a change in resistance. In this way, the voltage output will be send through the analog input ports on the Arduino MEGA 2560. by the internal digital convertor. Although the numerous hardware selections are available, the Arduino is the most popular due to the flexibility and user-friendly interface at a low cost.

The Combination Image Enhancement
The flex sensor is attached to the thumb to provide information on the movement and position of the finger, then give a data acquisition which convert into digital data with the Arduino. This information was used to configure the weighting parameter in the image enhancement section to enhance an image input. The flexible sensor signal was performed to show the response of the sensor. The Fig. 3, the start of the rest hand posture, then fingers are in the relaxed and will grasp the object when finger seat on the object and the voltage output is stable, the average voltage must be calculated, and the voltage output of the grasp could be varied depending on the griping posture and the shape of the object. The flex sensor voltage output from the serial monitor is returned in bits from 0 to 1023 due to the built-in 10-bit ADC on the Arduino MEGA 2560 based on the voltage received from the circuit. The signal is collected by using the average voltage to calculate the grasp factor (gf ) is obtained using the following Eq. (1) for each grasp posture as where V avg is the average voltage at the object grasping, V fist is average voltage at the clenched fist posture that is allowed the most bent of the finger due to no object supporting the fingers' grasp. The adaptive gamma correction method is gradual increases low intensities and avoids significant reductions in high intensity. The Weight Distribution (WD) function is also used to modify the statistical histogram and minimize its impact. The WD function can be calculated as the formula below pdf w (l) = pdf max pdf (l) − pdf min pdf max − pdf min gf (2) where gf is the adjusted parameter from grasp factor, pdf max is the maximum probability distributions function of the statistical histogram and pdf min is the minimum probability distributions function. Based on Eq. (2), the modified cdf is approximated by where the sum of pdf w is calculated as follows Finally, the gamma parameter is modified as follows Then, the image will be executed with AGCWD. The weight's function will depend on the average of the voltage output of the flex sensor circuit for each posture and doing normalization with the voltage output of the clenched fist posture that to be between 0 to 1. The AGCWD offers an automatic image conversion technique that enhances the brightness of darkened images through gamma correction and probabilistic distribution of luminance pixels. This technique uses temporary data from the differences between images to simplify calculations to improve the image input. The flowchart of this procedure method, as shown in Fig. 4. This section first conducts experiments on synthetic data to demonstrate the advantages of the proposed flex sensor approach over traditional learning methods. Then, we apply our AGCWD algorithms to grasp dataset showing their effectiveness for recognizing hand activities. The data images are taken at the various low light condition needed for testing the proposed method, in working and testing the system, initializing the images importing used in the test into the system. Then, the image will be executed with AGCWD. The weighting parameter will depend on the grasp factor (gf ). The experiment will determine the lighting conditions into three types, follow as 1) gamma adjustment γ as 1 to 6, 2) contrast adjustment α as 0.5 to 1, 3) mix contrast and gamma adjustment as shown in Fig. 5.

Image Testing Generation Method
The grasp dataset is collected the daytime images as ground truth, and the egocentric camera is recorded in the resolution of 640 × 480 pixels. The volunteer wears the camera with a strap mount belt on the heads. Then we generate a gesture to attain low-light images for each subject [29]. The details of both methods are described as follows.

Gamma Transform
The gamma transform interprets the relationship between digital sensitivity and human eye sensitivity, providing many advantages on the one hand but adding complexity on the other hand. The output I out is defined as where A is a constant determined by the maximum pixel intensity in the input image. Instinctively, when γ >1, the mapping is weighted to the generation method's lower (darker) grayscale pixel intensity value. A uniform distribution of γ is used to darken the daytime image values from 1 to 6. In the experiment, the darkened image was caused by different values of γ . Daylight images will become darker after gamma conversion. However, gamma conversion with γ >1 improves image contrast. However, it may not correspond to the effect of natural light falling.

Contrast Adjustment
The contrast is between an image's light and dark parts, making objects or details within an image more apparent. Reducing the contrast will keep the bright and dark areas close to the original. However, the overall image is getting better and starting to look washed out. The output I out is defined as Eq (7), where α is a ratio between 0.5 and 1, which controls the contrast. The testing image has shown some examples of the resulting image, as Fig. 5.

Image Enhancement Stage
The images, adjusted to various lighting conditions before YOLOv2, will be improved with AGCWD by using grasp factor to control the different weight distribution parameters for each hand posture. It depends on the voltage output of the flex sensor circuit, which generates the unique individual pattern. The pre-training CNN is used for the feature extractor to separate this uniqueness for each grasp posture. The principles of calculation for the proposed AGCWD method is to calculate from gamma parameter via probability density to combine the simple forms of the transform-based gamma correction and the traditional histogram equalization. Although the original histogram is not directly used to create image conversion functions and improve image contrast. The AGCWD method can improve the brightness and produces acceptable clear images without restricted contrast, as shown in Fig. 6.

Testing Stage
The testing procedures of the proposed method in various lighting conditions will be divided into two part: The first is the normal light condition or γ =1, α=1, and the second is the adjusted lighting condition (according to the gamma and contrast parameters) to test the robustness of the proposed technique. There will be a simulation of the light in the experiment into 3 cases: First, the contrast is fixed, α=1 and then adjust gamma 1 step increments until γ =6. Second, the gamma is fixed, γ =1, and adjust the contrast 0.1 step increments from 0.5 until 1. Third, adjust both parameters simultaneously.

Results and Discussion
In this section, the proposed algorithm is implemented using AGCWD and flex sensor information have training and testing with various environment models. These were then tested for detection and recognition with deep learning. In the experiment, YOLOv2 is used with three pre-trained CNN for the feature extraction. These are VGG16 [30], ResNet 50 [31] and GooleNet [32]. The network learnable parameters using the stochastic gradient descent with momentum is 0.9, the initial learn rate is 0.001, use a mini-batch size with 16, the learn rate drop factor is 0.1, the learn rate drop period is 10 and the maximum number of epochs for training to 20. The input image is fed into the network processed by attribute extraction to separate the grasps' attributes. The ground-truth labels and drawn hand attributes are used as inputs to supervised learning to train comprehension classifiers for different comprehension classifications. The result of proposed method has a performance evaluation and then compare with the traditional learning methods, which training only the grasp image. The Intersection over Union (IoU) is used to measure the overlap of a predicted vs. actual bounding box for the hand grasp [33]. The confidence is obtained by multiplying two items. The first is an object in the pane, and the second is the intersection ratio of box and ground truth. If there is an object in the grid, the first item is 1. Otherwise, it is 0, and the latter is a general intersection ratio. Category conditional probability is P r (Class i |Object), there is an object in the pane, it is the probability of a particular category. Probability = P r (Class i |Object)xP r (object)xIoU truth pred (8) Probability = P r (Class i )xIoU truth pred (9) where P r (Object) represents the probability of the object existing in the current grid and IoU truth pred represents the IoU between the predicted box and the actual box. Most bounding boxes below the threshold will be removed. After testing with various lighting conditions, the obtained result from IoU = 0.5 is used to judge the efficiency of the proposed method. The evaluation [34] found that the VGG 16 provides the best results, achieving mean Average Precision (mAP) at score 0.856, which compute the average precision at condition γ =1, α=1 for each grasp class. As for the conditions whose gamma and contrast parameters are adjusted, the efficiency decreases consecutively. However, the adjustment by increasing the gamma will have a more significant impact on performance than adjusting contrast which has only slightly effect compared with to the original image and results in less impact on recognition ability. As shown in Fig. 7.  8. The results showed that γ adjustment had a more significant effect on recognizing efficiency than α, with a linear decrease when increased the γ because the adjustment affects the characteristics of the image, the brightness level of each RGB model changes more than the α adjustment. However, the grasp factor (gf ) to help determine the weighting parameter of the AGCWD of each hand grasp creates especially image characteristics, allowing DCNN to improve recognition efficiency, which is noticed compared with the fixed grasp factor at 1 in all hand gestures. The proposed method had a higher recognition performance. Furthermore, the contrast and gamma adjustment retain the recognition performance almost similar to the usual light condition γ = 1, α = 1 even testing with a reduction of α = 0.5 as Fig. 9.  The results of testing on the dataset are in Tab. 1. Starting with calculating the recognizing ability of each grasp in the dataset. Then, the overall grasp range score is calculated as the simple sum of the handle ability for each object. Therefore, the high accuracy rated such as Large Diameter, Power Sphere, and Tripod because the arrangement of fingers and the shape of the gripping objects are very different, which can recognize the hand gestures more than 0.9. Which excellently detected and recognized for all three pre-trained CNN. Also, the hand grasp is Thump2 Finger, Fix Hook, Tripod, and Lateral Tripod; these are difficult to recognize because the arrangement of fingers and the shape of the gripping objects are the same in some viewpoints. Thus, the flex sensor parameter can solve that problem. The flex sensor is attached to the thumb to provide information on the movement, and that is another parameter that enhances the ability to differentiate each hand grasp. The improved grasp recognition of flex sensor results is shown in Figs. 8 and 9. The results of normal conditions at γ = 1 and α = 1 showed that recognition using YOLOv2 was the high performance in more than 0.8, there is an improvement of about 6% compared with traditional learning, especially VGG16 pre-trained CNN model that is highest recognition result. When the image becomes darker by increasing γ , this will result in a faster reduction in recognition compared to adjust the contrast. For example, the testing condition γ = 2, α = 1 decreases the recognition effect by about 10% in traditional learning, but the proposed method was able to keep the results as satisfactory as 75%. The recognizing reduction when compared between gamma and contrast adjustments, if we consider the image histogram of contrast adjustment was found to remain the similar, but it is shifted to the left more, making the image only darker. However, the characteristics are much the similar than adjusting the gamma, and an obvious example is condition γ = 1, α = 0.6 the recognition result is still close to the condition γ = 1, α = 1. Furthermore, when adjusting both, such as the condition γ = 2, α = 0.8, the recognition result of VGG 16 was 70%, better than the traditional learning. However, when the γ was increased by 1 step to γ = 3, α = 0.8, it was found that the recognition rate effect declined sharply. Evidence indicates that the proposed method has provided an increase in recognition efficiency of approximately 6%. It works great at γ < 2 and α>0.5. The VGG16 pre-trained CNN gives the best recognition results compared to GoogleNet and ResNet 50.

Conclusion
This paper showed how to detect and recognize hands grasp in egocentric vision by combining a flex sensor with image enhancement and the YOLOv2 architecture. This technique has combined the AGCWD and voltage output of the flex sensor circuit to address the low illumination condition, which makes it very difficult to perform detection and recognition for hand grasp, another of fingers arrangement and the shape of the gripping objects are the same in some viewpoint that causes difficulty to detect the hand in the scene. The flex sensor is attached to the thumb to provide information, which is a parameter to enhance the ability to differentiate each grasp posture. The proposed method can improve the grasp recognition rate from various condition models. The experimental results demonstrate that providing method can effectively grasp a wide range of different objects. The results also show that the technique can improve hand grasp recognition compared with traditional learning methods, increasing the recognition efficiency by approximately 6%, showing outstanding results at γ < 2 and α>0.5. This research is the first to utilize the flex sensor for hand grasp detection in egocentric systems, the proposed method has a not wide operating range. The future of work in egocentric system, we are currently investigating a methodology to transform the flex sensor signal to image for multiinput CNN, which will improve the performance to expand the operating range at a wider luminance.