Multiple ocular region segmentation plays an important role in different applications such as biometrics, liveness detection, healthcare, and gaze estimation. Typically, segmentation techniques focus on a single region of the eye at a time. Despite the number of obvious advantages, very limited research has focused on multiple regions of the eye. Similarly, accurate segmentation of multiple eye regions is necessary in challenging scenarios involving blur, ghost effects low resolution, off-angles, and unusual glints. Currently, the available segmentation methods cannot address these constraints. In this paper, to address the accurate segmentation of multiple eye regions in unconstrainted scenarios, a lightweight outer residual encoder-decoder network suitable for various sensor images is proposed. The proposed method can determine the true boundaries of the eye regions from inferior-quality images using the high-frequency information flow from the outer residual encoder-decoder deep convolutional neural network (called ORED-Net). Moreover, the proposed ORED-Net model does not improve the performance based on the complexity, number of parameters or network depth. The proposed network is considerably lighter than previous state-of-the-art models. Comprehensive experiments were performed, and optimal performance was achieved using SBVPI and UBIRIS.v2 datasets containing images of the eye region. The simulation results obtained using the proposed ORED-Net, with the mean intersection over union score (mIoU) of 89.25 and 85.12 on the challenging SBVPI and UBIRIS.v2 datasets, respectively.
In the last few decades, researchers have made significant contributions to biometrics, liveness detection, and gaze estimation systems that rely on traits such as the iris, sclera, pupil, or other periocular regions [
A majority of previous research studies on eye region segmentation were restricted to a single ocular region at a time, for e.g., focusing only on the iris, pupil, sclera, or retina. In multi-class segmentation, more than one eye region is segmented from the given input image using a single segmentation network. Inexplicably, very few researchers have developed multi-class segmentation techniques for the eye regions, despite several advantages in different applications. Namely, under challenging conditions, the segmentation performance can be maintained or sometimes be even enhanced when using multiple region segmentation, as the targeted region can provide useful contextual information about other neighboring regions [
In this work, we attempt to address the research gaps in the segmentation of multiple eye regions using a single network, as shown in
ORED-Net is novel in the following four ways:
• ORED-Net is a semantic segmentation network without a preprocessing overhead and does not employ conventional image processing schemes.
• ORED-Net is a standalone network for the multi-class segmentation of ocular regions.
• ORED-Net uses residual skip connections from the encoder to the decoder to reduce information loss, which allows the flow of high-frequency information through the model, thus achieving higher accuracy with a few layers.
• The performance of the proposed ORED-Net model was tested on public datasets collected under various environments.
In this study, the results obtained with the SBVPI [
The rest of the paper is structured as follows. In Section 2, a brief overview of related literature is provided. In Section 3, the proposed approach and working procedure are described. The results of the evaluation and analysis are discussed in Section 4. Finally, conclusion and future work are presented in Section 5.
Very few studies have focused on multi-class eye segmentation, particularly for segmenting multiple eye regions from the given images using a single segmentation model. Recently, Rot et al. [
The Eye Segmentation challenge for the segmentation of key eye regions was organized by Facebook Research with the purpose to developing a generalized model with the condition of least complexity in terms of the model parameters. Experiments were conducted on the OpenEDS dataset which is a large-scale dataset of eye images captured by a head-mounted display with two synchronized eye facing cameras [
Methods | Strengths | Weaknesses |
---|---|---|
Deep multi-class eye segmentation based on the SegNet architecture [ |
—A single model is used for the segmentation of multiple eye regions | —A major part of the training data is artificially created. |
Lighter residual encoder-decoder network, Ocular-Net [ |
—Residual connectivity between adjacent convolutional layers is involved | —The method is trained separately for each region |
Joint semantic segmentation of eye regions, SIP-SegNet [ |
—DnCNN is used for denoising the original images | —Considerable preprocessing of the original image is involved. |
Encoder-decoder structure based on |
—Can be run on any hardware for real-time implementation with low computational cost | —Post-processing is performed via heuristic filtering |
Outer residual encoder-decoder network, termed as ORED-Net ( |
—Information loss is reduced by using outer residual skip paths from the encoder to the decoder. |
—Rigorous training is required. |
The flowchart of the proposed ORED-Net for semantic segmentation of multiple eye regions is shown in
The image in typical encoder-decoder networks is downsampled and represented by very small features, which basically degrades the high-frequency contextual information. This results in the vanishing gradient problem for the classification of image pixels as the image is broken down into 7 × 7 sized patches [
The proposed ORED-Net is executed via different developmental stages to perform the multi-class segmentation task with good accuracy, as compared with the basic encoder-decoder networks. In the first stage, a well-known network for segmentation i.e., SegNet-Basic is employed [
ResNet [ |
Sclera-Net [ |
ORED-Net |
---|---|---|
ResNet uses a large number of identity mapping and a small number of non-identity mapping residual connections. | Convolutional layers in the encoder and decoder have identity and non-identity based residual connectivity. | Convolutional layers in the encoder and decoder do not have internal residual connectivity. |
ResNet uses the skip path connection only between adjacent layers. | There are no outer skip path connections from the encoder to the decoder. | The outer skip path connections from the encoder to the decoder are non-identity residual connections. |
Different variants of ResNet such as ResNet-50/101/152 have a 1×1 convolutional layer in each block. | There are 6 identity and 8 non-identity residual connections in the overall encoder-decoder network. | There are 4 non-identity residual paths from the encoder to the decoder. |
Different variants of ResNet, such as ResNet-18/34/50/101, are based on post activation as a ReLU is used after the elementwise addition. | In the overall network, a ReLU is used after the elementwise addition. Hence, Sclera-Net uses post activation. | On the decoder side, a ReLU is used before the elementwise addition. Hence, ORED-Net uses pre-activation. |
At the end of all the convolutional layers, average pooling is involved. | Residual connections are introduced immediately after max pooling and unpooling in the encoder and decoder networks, respectively. | The max-pooling layer is used in all the convolutional blocks to provide index information to the decoder |
The overall structure of ORED-Net is shown in
Based on
Here,
where
It can be seen from
The encoder structure of ORED-Net is presented in
Group | Size/Name | No. of filters | Output (w × h × ch) |
---|---|---|---|
EC-G-1 | 3 × 3 × 3/E-Conv-1_1†† | 64 | 224 × 224 × 64 |
64 | |||
3 × 3 × 64/E-Conv-1_2†† | 64 | ||
Pool-1 | 2 × 2/Pool-1 | – | 112 × 112 × 64 |
EC-G-2 | 3 × 3 × 64/E-Conv-2_1†† | 128 | 112 × 112 × 128 |
128 | |||
3 × 3 × 128/E-Conv-2_2† | 128 | ||
Pool-2 | 2 × 2/Pool-2 | – | 64 × 64 × 128 |
EC-G-3 | 3 × 3 × 128/E-Conv-3_1†† | 256 | 64 × 64 × 256 |
256 | |||
3 × 3 × 256/E-Conv-3_2† | 256 | ||
Pool-3 | 2 × 2/Pool-3 | – | 32 × 32 × 256 |
EC-G-4 | 3 × 3 × 256/E-Conv-4_1†† | 512 | 32 × 32 × 512 |
512 | |||
3 × 3 × 512/E-Conv-4_2†† | 512 | ||
Pool-4 | Pool-4/2 × 2 | – | 16 × 16 × 512 |
The architecture of the ORED-Net decoder shown in
In this work, two-fold cross-validation was performed for training and testing the proposed model. To this end, two subsets were created from the available images by randomly dividing the collected database. From the images of 55 participants, two subsets were created, where the data from 28 participants were used for training and that from 27 participants were used for testing. To avoid overfitting issues, data augmentation of the training data was performed. To train and test ORED-Net, a desktop computer with an Intel® Core™ (Santa Clara, CA, USA) i7-8700 CPU @3.20 GHz, 16 GB memory, and an NVIDIA GeForce RTX 2060 Super (2176 CUDA cores and 8 GB GDDR6 memory) graphics card were employed. The above-mentioned experiments were conducted using MATLAB R2019b.
ORED-Net is based on outer residual paths from the encoder to the decoder for transferring spatial information from the encoder side to the decoder side. Therefore, high frequency information travels through the convolutional network that empowers training of this information without a preprocessing overhead. To train ORED-Net, original images without any enhancement or preprocessing were employed, and a classical stochastic gradient descent (SGD) method was used as an optimizer. SGD minimizes the difference between the actual and predicted outputs. During network training, the proposed model executed the entire dataset 25 times, i.e., 25 epochs, and a mini-batch size of 5 was selected for the ORED-Net design owing to its low memory requirement. The mini-batch size was determined by the size of the database. Once training was performed with the entire dataset, one epoch was counted, as shown in
In
The ORED-Net model converges very quickly because of the outer residual connections from the encoder to the decoder. Therefore, the ORED-Net model was only trained for 25 epochs. The mini-batch size was kept to 5 images during 25 epochs of training with shuffling after each epoch. Here, the training loss was calculated based on the image pixels in the mini-batch using the cross-entropy loss reported [
Here,
To validate and compare ORED-Net with previous models, the average segmentation error (
Here,
Here
In
The segmentation performance of ORED-Net was compared with previous methods in terms of the
Evaluation Metrics | Classes | SegNet [ | ScleraNet [ | ORED-Net | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Fold 1 | Fold 2 | Average | Fold 1 | Fold 2 | Average | Fold 1 | Fold 2 | Average | ||
3.34 | 1.84 | 2.59 | 3.15 | 1.57 | 2.36 | 2.16 | 1.40 | 1.78 | ||
1.54 | 0.89 | 1.22 | 1.90 | 0.68 | 1.29 | 1.12 | 0.62 | 0.87 | ||
2.69 | 1.79 | 2.24 | 1.93 | 1.51 | 1.72 | 1.67 | 1.34 | 1.51 | ||
0.19 | 0.20 | 0.20 | 0.31 | 0.18 | 0.25 | 0.33 | 0.14 | 0.24 | ||
1.94 | 1.18 | 1.82 | 0.99 | 1.32 | 0.88 | |||||
95.84 | 97.67 | 96.76 | 96.07 | 98.12 | 97.10 | 97.30 | 98.24 | 97.77 | ||
82.99 | 86.15 | 84.57 | 82.59 | 88.62 | 85.61 | 86.80 | 89.65 | 88.23 | ||
81.05 | 86.44 | 83.75 | 85.37 | 88.58 | 86.98 | 87.39 | 89.49 | 88.44 | ||
79.89 | 79.92 | 79.91 | 79.9 | 84.48 | 82.19 | 78.74 | 86.35 | 82.55 | ||
84.94 | 87.55 | 85.98 | 89.95 | 87.56 | 90.93 | |||||
99.72 | 99.79 | 99.76 | 99.73 | 99.78 | 99.76 | 99.70 | 99.75 | 99.73 | ||
85.27 | 89.85 | 87.56 | 85.62 | 92.43 | 89.03 | 90.97 | 93.52 | 92.25 | ||
83.53 | 88.52 | 86.03 | 88.35 | 90.49 | 89.42 | 89.95 | 91.52 | 90.74 | ||
92.83 | 85.16 | 89.00 | 80.15 | 88.57 | 84.36 | 79.25 | 87.87 | 83.56 | ||
90.34 | 90.83 | 88.46 | 92.82 | 89.97 | 93.17 | |||||
96.09 | 97.88 | 96.99 | 96.31 | 98.23 | 97.27 | 97.59 | 98.48 | 98.04 | ||
96.90 | 95.44 | 96.17 | 95.85 | 95.24 | 95.55 | 94.93 | 95.28 | 95.11 | ||
96.20 | 97.34 | 96.77 | 96.04 | 97.68 | 96.86 | 96.81 | 97.61 | 97.21 | ||
85.51 | 94.01 | 89.76 | 99.66 | 95.49 | 97.58 | 99.19 | 98.24 | 98.72 | ||
93.68 | 96.17 | 96.97 | 96.66 | 97.13 | 97.40 | |||||
97.82 | 98.80 | 98.31 | 97.92 | 98.99 | 98.46 | 98.59 | 99.11 | 98.85 | ||
90.05 | 92.13 | 91.09 | 89.19 | 93.58 | 91.39 | 92.39 | 94.25 | 93.32 | ||
89.05 | 92.66 | 90.86 | 91.88 | 93.88 | 92.88 | 93.03 | 94.41 | 93.72 | ||
88.27 | 87.55 | 87.91 | 88.62 | 90.79 | 89.71 | 88.08 | 92.05 | 90.07 | ||
91.30 | 92.79 | 91.90 | 94.31 | 93.02 | 94.96 |
To evaluate the segmentation performance of ORED-Net under different image acquisition conditions, experiments with another publicly available datasets for eye region segmentation, i.e., UBIRIS.v2 dataset, were included in this study [
In
Evaluation Metrics | Classes | SegNet [ | ScleraNet [ | ORED-Net | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Fold 1 | Fold 2 | Average | Fold 1 | Fold 2 | Average | Fold 1 | Fold 2 | Average | ||
2.73 | 1.28 | 2.01 | 2.47 | 1.47 | 1.97 | 2.36 | 1.30 | 1.83 | ||
1.38 | 0.69 | 1.04 | 1.36 | 0.87 | 1.12 | 1.61 | 0.77 | 1.19 | ||
2.19 | 1.03 | 1.61 | 1.42 | 1.10 | 1.26 | 1.16 | 0.92 | 1.04 | ||
0.42 | 0.21 | 0.32 | 0.30 | 0.24 | 0.27 | 0.27 | 0.18 | 0.23 | ||
1.68 | 0.80 | 1.39 | 0.92 | 1.35 | 0.79 | |||||
96.79 | 98.46 | 97.63 | 97.03 | 98.23 | 97.63 | 97.29 | 98.44 | 97.87 | ||
78.43 | 89.02 | 83.73 | 77.99 | 87.14 | 82.57 | 79.98 | 88.42 | 84.20 | ||
64.01 | 81.06 | 72.54 | 73.43 | 79.76 | 76.60 | 77.54 | 82.54 | 80.04 | ||
63.45 | 78.62 | 71.04 | 71.69 | 78.29 | 74.99 | 74.92 | 81.86 | 78.39 | ||
75.67 | 86.79 | 80.04 | 85.86 | 82.43 | 87.82 | |||||
99.65 | 99.88 | 99.77 | 99.59 | 99.89 | 99.74 | 97.99 | 99.86 | 98.93 | ||
87.02 | 92.92 | 89.97 | 84.59 | 91.64 | 88.12 | 87.20 | 92.21 | 89.71 | ||
66.63 | 83.03 | 74.83 | 76.27 | 81.49 | 78.88 | 79.98 | 84.40 | 82.19 | ||
68.43 | 81.89 | 75.16 | 73.76 | 82.09 | 77.93 | 77.90 | 85.42 | 81.66 | ||
80.43 | 89.43 | 83.55 | 88.78 | 85.77 | 90.47 | |||||
97.12 | 98.56 | 97.84 | 97.42 | 98.34 | 97.88 | 99.22 | 98.58 | 98.90 | ||
88.60 | 95.48 | 92.04 | 90.95 | 94.74 | 92.85 | 89.14 | 95.55 | 92.35 | ||
94.36 | 97.24 | 95.80 | 95.07 | 97.48 | 96.28 | 95.90 | 97.45 | 96.68 | ||
92.43 | 95.81 | 94.12 | 96.12 | 95.41 | 95.77 | 92.54 | 95.50 | 94.02 | ||
93.13 | 96.77 | 94.89 | 96.49 | 94.20 | 96.77 | |||||
98.36 | 99.22 | 98.79 | 98.49 | 99.10 | 98.80 | 98.60 | 99.21 | 98.91 | ||
87.97 | 94.12 | 91.05 | 87.01 | 92.99 | 90.00 | 87.33 | 93.52 | 90.43 | ||
77.45 | 89.47 | 83.46 | 84.08 | 88.59 | 86.34 | 86.77 | 90.32 | 88.55 | ||
76.39 | 87.61 | 82.00 | 82.77 | 87.44 | 85.11 | 84.23 | 89.46 | 86.85 | ||
85.04 | 92.61 | 88.09 | 92.03 | 89.23 | 93.13 |
Based on the results presented in
In this paper, a novel multi-class semantic segmentation network called ORED-Net was proposed for the segmentation of eye regions such as the iris, sclera, pupil, and background. ORED-Net is based on the concept of outer residual connections for transferring spatial edge information directly from the initial layers of the encoder to the decoder layers. This framework enhances the performance of the network in the case of bad quality images. ORED-Net has fewer layers, which reduces the number parameters along with the computation time. The most notable aspects of the proposed ORED-Network are that it achieves a high accuracy with a lighter network and converges in considerably fewer number of epochs with direct flow of edge information, resulting in faster training. In ORED-Net, the original image is used for both training and testing, as no extra overhead is required in the form of preprocessing. ORED-Net is the first network of its kind that simultaneously segments three important eye regions, namely iris, sclera, and pupil, without any preprocessing overhead. The robustness and effectiveness of the proposed method were tested on various publicly available databases for eye region segmentation, including the SBVPI and UBIRIS.v2 datasets. In future studies, this work will be extended to a robust multimodal biometric identification system based on multiple eye regions.