|Computers, Materials & Continua |
Automated Identification Algorithm Using CNN for Computer Vision in Smart Refrigerators
1Chandigarh University, Mohali, 140413, India
2Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
3School of Electronics & Communication Engineering, Shri Mata Vaishno Devi University, Katra, 182320, India
*Corresponding Author: Mehedi Masud. Email: email@example.com
Received: 26 August 2021; Accepted: 18 October 2021
Abstract: Machine Learning has evolved with a variety of algorithms to enable state-of-the-art computer vision applications. In particular the need for automating the process of real-time food item identification, there is a huge surge of research so as to make smarter refrigerators. According to a survey by the Food and Agriculture Organization of the United Nations (FAO), it has been found that 1.3 billion tons of food is wasted by consumers around the world due to either food spoilage or expiry and a large amount of food is wasted from homes and restaurants itself. Smart refrigerators have been very successful in playing a pivotal role in mitigating this problem of food wastage. But a major issue is the high cost of available smart refrigerators and the lack of accurate design algorithms which can help achieve computer vision in any ordinary refrigerator. To address these issues, this work proposes an automated identification algorithm for computer vision in smart refrigerators using InceptionV3 and MobileNet Convolutional Neural Network (CNN) architectures. The designed module and algorithm have been elaborated in detail and are considerably evaluated for its accuracy using test images on standard fruits and vegetable datasets. A total of eight test cases are considered with accuracy and training time as the performance metric. In the end, real-time testing results are also presented which validates the system's performance.
Keywords: CNN; computer vision; Internet of Things (IoT); radio frequency identification (RFID); graphical user interface (GUI)
‘Smart home’ is not a new concept now, as Internet of Things (IoT) is playing a great role in revolutionizing the way one ever thought of living in a home filled with sensors where every electronic appliance can talk to one another wirelessly. IoT has allowed the control and monitoring of electronic appliances in our homes with speech, text, and many other input methods from anywhere in the world with the help of cloud platforms and smartphone applications. The kitchen is a standout amongst the most essential spots for smart home as it comprises numerous appliances that give better administration to the family. Thus incorporating IoT technology in kitchen appliances has brought significant changes leading to a more easy and modern lifestyle and serving as an aid for the household members .
As the kitchen appliances are to be used lifelong by the person and that's the reason people are ready to invest in these appliances without any second thought thereby creating a competition today among manufacturers to make these kitchen appliances smarter and smarter . A refrigerator is one of those appliances used in every household to preserve perishable food items over a long period of time. Since the modern lifestyle is driving individuals to invest less energy in preparing healthy food at home, an enjoyable and sound way of living can be achieved with an appliance like ‘Smart Refrigerator’. The refrigerator is one of the devices which have undergone several changes over the last two decades. It has evolved from being a cooling device to a smart device that has computer-like abilities incorporated in it . To be able to think that a fridge could utilize Radio Frequency Identification (RFID) labels to identify items it contains and provide an expiry check on them seemed almost impossible a few decades back. But with technological enhancements, this scenario has been completely changed.
One of the most crucial tasks for any smart refrigerator is food item scanning and its correct identification . It is evident from the literature that many smart refrigerators have been developed [2–13] for this core functionality using technologies like RFID scan, Quick Response (QR) code Scan, Image capture and processing, Fuzzy Logic, Artificial Intelligence (AI) and Computer Vision etc. The major objective is to avoid food wastage by early identification of food items that are near expiry via timely notification to the user via Graphical User Interface (GUI), Short Message Service (SMS) or email etc. Hsu et al.  developed a 3C smart system that makes the use of image processing, speech recognition, and speech broadcasting technique for food item identification and control respectively. The main features included a speech control system along with an auto dial system for ordering scarce items directly from the vendors. It also supported the wireless control of other interconnected home appliances. Zhang et al.  proposed a new approach for fruit recognition based on data fusion from multiple sources. Both weight information and data obtained from multiple CNN models were fused for improving accuracy in the recognition of fruits. With the advent of IoT technology, several authors have incorporated novel ways to send alerts to the user via emails and by using cloud servers to send data to dedicated mobile applications for remote access of databases as well. A similar smart refrigerator system was proposed by Nasir et al.  which focused on the expiry check using both weight information and odour detection using sensors like MQ3 and DHT11. It also incorporated the cloud platform Thingspeak for remote access of data along with the Pushbullet application for notification and alerts. A summary of literature review is presented in Tab. 1 based on parameters like scanning mechanism and whether food expiry check and cloud platform is provided or not.
The major issue associated with smart refrigerators available in market today is high cost  and the availability of only brand-specific applications for remote access of database of the items kept inside the fridge. The need of the hour is to design such algorithms for intelligent and cost effective systems which can add smartness to existing conventional refrigerators. The review carried out in Tab. 1 talks about various scanning techniques used by researchers so far in the design of smart refrigerators. Apart from these techniques, many researchers have made the use of CNN for automated classification of fruits and vegetables. Kodors et al.  used CNN models like MobileNet version 1 and 2 on FRUITS360 dataset for recognition of apples and pears. Basri et al.  made the use of Tensor Flow platform for detection of mango and pitaya fruits using MobileNet CNN model by testing on self-created dataset. Huang et al.  carried out testing of InceptionV3 model on FRUITS360 dataset comprising of 81 classes of fruits and vegetables using adam optimizer and achieved an accuracy of 96.5%. In the similar way, Femling et al.  made the use of Raspberry Pi (RPi), load cell and camera module to perform training of Inception and MobileNet model on a self-created dataset comprising of 400 images each of 10 classes of selected fruit items. Ashraf et al.  carried out testing using InceptionV3 model and presented a detailed comparison in accuracy values obtained using different loss and optimization functions. The maximum accuracy obtained is 87.08% using cross entropy loss function and adagard optimization function.
After literature review, one can easily point out the fact that no paper can be found which talks about designing of an intelligent module which can turn any ordinary refrigerator into a smart refrigerator. Moreover, no research article mentions about placing the weight measurement system and cameras outside the fridge as it can help avoiding the mess of wiring inside the refrigerator. To avoid these challenges and to fill the research gap, this paper proposes an automated identification algorithm for Computer Vision in Smart Refrigerators using standard CNN architectures. This paper carries forward the work done previously in the area of CNN for fruits and vegetables classification using improved CNN models namely InceptionV3 and MobileNetV3 on standard datasets. The paper is organized as below: Section 2 talks about the design of an intelligent module for ordinary to smart refrigerator conversion for achieving the task of automatic recognition of fruits and vegetables. The proposed module is portable and cost effective comprising of fruits and vegetables image scanning and weight sensing mechanism outside the refrigerator. Section 3 and 4 talks about standard datasets selected for training of the system using InceptionV3 and MobileNetV3 CNN models. The experimental results have been depicted in Section 5 followed by conclusion and future scope of the work at the end.
2 Intelligent Module Design and Working
To address the challenge of mess of wiring inside the refrigerator, the module was designed to avoid any wiring or modification required inside any compartment of refrigerator. The block diagram of the entire system is depicted in Fig. 1. It comprises of three major blocks i.e., Intelligent Module section, refrigerator with attached display screen and cloud server. The role of intelligent module which is in form of a portable trolley system is camera scanning for food item identification and noting down the weight readings via load cell (label ‘D’) attached at the bottom of weight sensing area (label ‘F’). The camera sensing sub module consists of RPi Camera (label ‘A’) mounted on an L shaped arm which upon power up comes into position shown in Fig. 1 controlled via two servo motors (label ‘B and ‘E’). The camera module clicks the images of the food item when it is placed on weight sensing area depicted in Fig. 1. The Central Processing Unit (CPU) and weight sensing sub module consists of RPi (label ‘C’) which acts as the CPU of the system and a load cell of 200 Kg for sensing weight of item placed. With the help of these two sub modules the name of food item recognized along with weight readings are obtained and further sent to cloud server as well as display screen attached to the refrigerator.
The trolley system has stopper wheels (label ‘I’) and moreover its height can be easily adjusted up and down as per user requirement using screw arrangement (label ‘G’). An Ultra Violet (UVC) disinfection box (label ‘H’) can also be attached at the bottom of trolley system which can work in standalone mode to disinfect food items and other daily use items like keys, wallet, mobile phones etc. to provide safety against the spread of viruses and bacteria. The data containing the food item name and weight information can then be passed onto cloud server i.e., Google Firebase which can be further accessed remotely using android application developed namely ‘Fridge Assistant’ on any smartphone as depicted in Fig. 2. The same database can also be displayed on touch screen which can be easily attached on the front door of refrigerator and it requires only single connection with intelligent module via touch screen connector shown in Fig. 1. The ‘Fridge Assistant’ android application as depicted in Fig. 2b gets real time updates using IoT as the items are stored in the refrigerator. The weight reading along with date and time stamp is also noted which provides a way to keep a check on expiry of items and sending alerts accordingly to the user to consume the item before a fixed stipulated time. Notes can also be added as depicted in Fig. 2c. Moreover a shopping list is automatically created of scarce items which get added in shopping list tab of the application.
3 Selected Dataset
There are many datasets available as open source to train the module. The standard datasets considered for this work are explained below.
FIDS30 dataset  is a small dataset comprising of a total of 30 different classes of fruits and 971 images in total. Each fruit class consists of 32 very diverse images in Joint Photographic Experts Group (JPEG) format including single fruit image, multiple fruits image of same kind and some images with noise such as leaves, plates, hands, trees and other noisy backgrounds. Certain classes of fruits included in this dataset are apples, bananas, cherries, coconuts, grapes, lemons, guava, oranges, kiwifruit, tomatoes, pomegranates, watermelons and strawberries etc. It is provided by Visual Cognitive Systems Lab and is publicly available for use and download.
FRUITS360  is one of the popular and a very huge dataset available as open source on Kaggle platform. It comprises of color images of size 100 × 100 pixels with a total of 67,692 training set images and 22,688 test set images. This dataset consists of 131 different fruits and vegetable classes with a total of 90,483 images.
4 CNN Models
CNN's are by far the most widely used models for training such problems of food item identification. They have been applied in providing solutions to numerous complex problems involving image classification in medical fields, design and optimization problems related to reconfigurable Radio Frequency circuits [21,22]. But they are now playing a major role in almost every object detection and related computer vision tasks. In order to understand CNN in detail one must have a general idea of a single layer CNN. A single layer CNN is explained as follows: If layer l is a convolutional layer, then one can calculate the output size of single convolutional layer see Eq. (3) from applied filter and input using the following Eqs. (1) and (2):
Apart from input and output layer, a complete CNN consists of numerous hidden layers which further consist of convolution, softmax, pooling and fully connected layers. The most preferred CNN models for image recognition are Inception CNN and MobileNet CNN as they both are pre-trained networks. The detailed description about these two models is given in following subsections:
4.1 Inception CNN
GoogLeNet or InceptionV1 is a pre-trained and widely used deep convolutional neural network for image recognition applications . The heart of inception network is the inception module block as depicted in Fig. 3. The entire InceptionV1 network comprised of nine repetitions of this inception module along with addition of fully connected layers and soft max layers at intermediate stages. The inception module comprises of previous activation layer which is first passed through bottleneck layer of 1 × 1 convolutions. The major computational cost savings are achieved at this layer before passing through expensive 3 × 3 and 5 × 5 convolutions. At the end all channels are stacked up using channel concatenation. InceptionV2 network further provided cost savings in computation leading to improved accuracy using concept of factorized convolutions and by expanding the filter banks . Further upgrades were carried out resulting in a better and accurate InceptionV3 network which has been used in this paper due to its better performance and low error rates.
4.2 MobileNet CNN
A lot of classic neural networks including LeNet-5 , Alex-Net , VGG-16 and even powerful neural nets like Residual Neural Net (ResNet) , InceptionV3 are computationally very expensive. Moreover in order to run the neural network model on the system proposed in this paper having a less powerful CPU or Graphical Processing Unit (GPU), the best choice is MobileNet neural architecture. It is the network best preferred for mobile and computer vision related applications in embedded systems. With the development of MobileNetV1 in 2017 a new research area opened up in the use of deep learning in machine vision i.e., to design similar models which can run even in sophisticated embedded systems.
The computational cost is given by product of number of filter parameters, number of filter positions and number of filters. As per the formula, one can easily compute the cost summary for both convolution approaches i.e., normal convolution and depthwise separable convolution using the parameter values depicted in Fig. 4. For the parameters depicted, it can easily obtained that the cost of normal convolution is 2,160 multiplications, whereas it is only 672 (432-depthwise and 240 pointwise) in case of convolution approach supported by MobileNet architecture. Depth wise separable convolution approach involves two main steps namely depth wise convolution and point-wise convolution. This approach can be designed to have similar inputs and output dimensions as normal convolution but it can be done at a much lower computational cost i.e., approximate 10 times more savings in computations. A more improved version MobileNetV2 , developed in 2018 further reduced the computational cost by adding a bottleneck block. This block comprises of a residual connection similar to ResNet and non-residual part comprised of additional expansion layer followed by depth wise separable convolution. The expansion layer increases the size of representation allowing the neural net to learn more features. Further at the end since it has to be deployed to a mobile device with memory constraints it is compressed down to smaller representation using projection or point wise convolution operation. The latest version MobileNetV3  has been used in this work which further improves the performance with addition of squeeze and excitation layers in basic MobileNetV2 version.
5 Experimental Results
The training accuracy results of previous related works using various CNN Models and datasets is depicted in Tab. 2.
A total of eight test cases were considered using InceptionV3 and MobileNetV3 CNN models as depicted in Tab. 3. The test case ‘FIDS30-selected’ dataset comprises of only 9 most common and easily available fruit classes like apples, bananas, lemons, mangoes, oranges, pomegranates, strawberries, tomatoes and watermelons. Similarly, FRUITS360-selected dataset comprises of only 35 most common fruits and vegetables classes like apples, bananas, onions, cauliflower, ginger, lemon, mangoes, tomato, strawberry and watermelon. These test cases were selected so as to see the variations in accuracy and training time values using only selected items out of the entire dataset. Google Colab platform with time limited GPU support has been used to train the model running the Python script.
In all the test cases, the ratio of the training to validation set images is kept as 80% by 20%. The loss function, optimizer and activation function used for training of both models in the present work are cross entropy, gradient descent and Rectified Linear Units (ReLU) respectively. The accuracy values of each test case listed in Tab. 3 are obtained from graphs shown below. The accuracy v/s numbers of iterations graph in Fig. 5a shows the variations of training and validation accuracy of InceptionV3 model on FIDS30 dataset. The final validation accuracy depicted by blue line as obtained from graph is found to be 89.3%. Similarly, the loss or cross entropy depicted in Fig. 5b has a decreasing curve which reaches near zero value with increase in number of iterations.
In the 2nd test case using FIDS-30 with selected data items, very large variations are observed in the training and validation accuracy lines resulting in higher loss and lower accuracy value of 92.9% as shown in Fig. 6.
The graphs in Fig. 7 show very smooth variations in both accuracy and loss value with increase in number of iterations.
It can also be observed that both training and validation accuracy lines are very close to each other thus resulting in high accuracy value of 94.9% using FRUITS-360 dataset. A similar graph but with slight variations is obtained in Fig. 8 resulting in an accuracy value of 98.4% using FRUITS-360 selected dataset.
The last four test cases take into consideration the MobileNetV3 CNN model. The graphs shown in Fig. 9 depict very large variations between validation and training accuracy, with the final validation accuracy coming out to be 89.7%. In the test case-6 shown in Fig. 10 using FIDS-30 with selected data items, the accuracy achieved is 96.4% which is better than obtained using Inception-V3 model.
The graphs in Fig. 11 show very smooth variations in both accuracy and loss value with increase in number of iterations. It can also be observed that both training and validation accuracy lines are following each other thus resulting in highest accuracy value of 99.9% using FRUITS-360 daset. A similar graph with similar accuracy value but with slight more variations in the initial iterations is obtained in Fig. 12 using FRUITS360-selected dataset.
The graphical representation of the entire data tabulated in Tab. 3 is presented in Fig. 13. All the test cases considered are labeled on the x-axis and Fig. 13a represents the training time variation (in minutes) with respect to number of classes and total images in each test case. On other hand Fig. 13b represents the accuracy comparison (in %) among all test cases.
One can clearly observe from the graph that MobileNet model gives better results in all its four test cases in terms of shorter training times and better accuracy values. A comparison is also drawn in Tab. 4 between the accuracy results obtained of present and previous related works. One can easily conclude that the accuracy obtained using the approach and models in the present work are clearly higher than that obtained in previous related works. Both InceptionV3 and MobileNetV3 clearly outperform the other CNN models used in previous works like ResNet, MobileNetV1, V2 etc. Even in case of self-created datasets the accuracy obtained as listed in Tab. 2 of previous works is still lower than results obtained in the current work. Although the FRUITS360 dataset used in the current work is more diverse but still maximum accuracy of 99.9% is obtained using MobileNetV3, which clearly indicates that performance of the current work is far better than previous similar studies.
Several test images from the datasets were considered for the evaluation of the two trained CNN models. The test image considered, identification result containing top five results with accuracy values are depicted in Tab. 5. It also depicts the item identified along with accuracy value in %.
The real time testing for fruits and vegetables classification from images is carried out on the proposed intelligent module containing RPi as the CPU. The results obtained are tabulated in Tab. 6 which depicts the real time image captured using 5 Megapixel resolution RPi camera module. It also shows the snapshots of identification result obtained on RPi Console followed by name of the identified food item. One can easily observe that all the items are correctly identified with the designed algorithm.
The design of an intelligent module for automated identification of food items in particular fruits and vegetables for achieving the task of computer vision in smart refrigerators are proposed. The designed module and algorithm has been considerably evaluated for its accuracy by using pre-trained InceptionV3 and MobileNetV3 CNN models on standard fruits and vegetables dataset. Out of the two CNN models considered, it is evident from the results that MobileNetV3 CNN clearly outperformed the InceptionV3 model in terms of training time as well as the accuracy obtained with test images. A huge amount of training time approximately 45 min on an average is saved with the usage of a very light CNN network like MobileNetV3. Moreover a very high accuracy value of about 99.9% is achieved and that too on a bigger dataset like FRUITS360. Finally the results obtained from real time testing with fruits and vegetables clearly validate the performance of the system proposed. The proposed design algorithm mentions about the touch screen display for giving updates to the user about the data items stored. In future, it can be implemented with real system along with addition of more test cases to further validate the systems performance by enhancing the dataset.
Funding Statement: This work was supported by Taif University Researchers Supporting Project (TURSP) under number (TURSP-2020/10), Taif University, Taif, Saudi Arabia.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|