Smart Devices Based Multisensory Approach for Complex Human Activity Recognition

: Sensors based Human Activity Recognition (HAR) have numerous applications in eHeath, sports, fitness assessments, ambient assisted living (AAL), human-computer interaction and many more. The human physical activity can be monitored by using wearable sensors or external devices. The usage of external devices has disadvantages in terms of cost, hardware installation, storage, computational time and lighting conditions dependen-cies. Therefore, most of the researchers used smart devices like smart phones, smart bands and watches which contain various sensors like accelerometer, gyroscope, GPS etc., and adequate processing capabilities. For the task of recognition, human activities can be broadly categorized as basic and complex human activities. Recognition of complex activities have received very less attention of researchers due to difficulty of problem by using either smart phones or smart watches. Other reasons include lack of sensor-based labeled dataset having several complex human daily life activities. Some of the researchers have worked on the smart phone’s inertial sensors to perform human activity recognition, whereas a few of them used both pocket and wrist positions. In this research, we have proposed a novel framework which is capable to recognize both basic and complex human activities using built-in-sensors of smart phone and smart watch. We have considered 25 physical activities, including 20 complex ones, using smart device’s built-in sensors. To the best of our knowledge, the existing literature consider only up to 15 activities of daily life. better recognition performance for a large number of complex human physical activities of daily life. Neural network (NN), Naive Bayes (NB), K-Nearest Neighbor (KNN) and Ensemble method AdaBoost classifiers are used along a proposed mathematical model of preprocessing and feature extraction. Neural network and KNN classifiers outperformed other classifiers. It is further concluded that the smart device based multisensory based human activity recognition is a cost effective and more practical solution rather than vision based or dedicated sensor-based approaches. Furthermore, in this work a new data set against 26 human physical activities of daily life is formulated using built-in sensors of smart phone and smart watch, that will be helpful for future research in this field. The data of smartphone and smartwatch for a large number of complex human physical activity will serve as a benchmark.


Introduction
Sensor based Human Activity Recognition (HAR) is an emerging field of machine learning, having several advantages as compared to the vision based human activity recognition [1]. Vision based HAR is less preferable due to the installation of hardware setups, cost, data storage requirements and computational time [2,3]. The constraints due to variable light conditions also affects the system's compatibility which leads to limited portability [4]. On the contrary, wearable sensor-based human activity recognition systems do not contain the aforementioned problems. Initially, the wearable sensors used for human activity recognition were not portable due to their bulky structure, cost, and additional power setup. Also, these sensors were impractical round the clock due to resource constraints. These days, majority of people use smart phones, watches and fitness bands that contain various build-in energy efficient smart sensors [5]. Generally, the sensors like accelerometer, gyroscope, magnetometer, global positioning system (GPS), temperature sensor, proximity sensor, barometer and others are present in smart devices. Due to the rich features and easy availability of these sensors, the sensor based human activity recognition systems become more popular [6]. These sensors are used in various manners. In order to observe less repetitive activities (i.e., hand/arm movements) the smart watch sensors are more effective. Similarly, to observe more repetitive activities (i.e., overall human body movements) the smart phone in pocket gives better results. This implies that the sensor position also plays an important role in activity recognition. Recent studies showed that when the collective usage of smart phone and smart watch sensors produced better results [7]. The basic human activities (e.g., sitting, walking, running, stair up and down etc.) are recognized by using smart phone (pocket position). But the complex human activities (Eating, Smoking, and talking etc.) is easily recognizable using wrist wearable sensors as hand movements are involved in these activities. In this research, the complex activities are overlapped with the basic activities e.g., smoking while walking or sitting, similarly eating ice-cream while walking etc. These activities are recognized by using multiple sensors. The combination of wrist wearable sensors and the smart phone sensors efficiently recognize the type of activity being performed by a person. Reliable recognition of complex human activities gives a new direction of HAR applications, including tracking bad habits and providing coaching to individuals [8]. According to the World Health Organization (WHO) report, smoking, alcohol usage, lack of physical activity and poor nutrition are the main causes of early deaths. The main purpose of this research is to identify these bad activities by using multiple smart sensors. Furthermore, the recognition and tracking of these activities can be useful for awareness feedback or fitness assessment [9]. The major contribution is to incorporate the applications of both smart watches and smart phones to recognize those complex human activities which are less repetitive in nature efficiently. The multi-sensor data fusion enables accurate recognition performance for human activity [10].
The proposed sensor based human physical activity recognition is divided into four main steps.
(1) data acquisition, (2) pre-processing, (3) feature extraction (4) classification of various activities. Firstly, we have formulated a dataset for more than 20 complex human physical activities using built-in sensors of smart phone and smart watch. The sensor data fusion is followed by the preprocessing stage which removes noise from raw data and divides it into windows or segments. Further the dataset is divided into test and train data. After this, feature extraction is performed by using the proposed feature extraction method. At the final stage, classification of various activities is done on the basis of feature extraction. The main contribution of this work is listed below: • The effect of two devices i.e., wrist wearable device and pocket position smart phone is studied. The results from the data fusion from these devices are presented for better understanding of basic and complex human physical activities which is not possible by using single device sensors. • A dataset for several overlapped human basic and complex physical activities is formulated.
These activities are recognized by using the proposed methodology. • A machine learning approach is presented that provide better performance for the recognition of more than 25 human physical activities.

Related Work
Human activity recognition is an important research area in computer vision [11,12]. Many techniques are introduced in the literature for action recognition using classical techniques [13,14] and deep learning based techniques [15]. The major application of activity recognition is video surveillance [16] and biometrics [17,18]. The sensor based human physical activity recognition using smartphone and smartwatch (wrist wearable devices) has been widely studied from last few years due to its various application in daily life specially in healthcare of elderly, disable or dependent peoples, sport coaching, fitness assessments, ambient assisted living (AAL), human-computer interaction, bad habits monitoring, exercise tracking, quality of life monitoring, entertainment etc. In previous studies mostly researcher use wearable motion sensors for few basic human activities recognition. When number of human physical activities increased, especially similar type (overlapped) of human physical activities, the requirement of additional sensors increased. Furthermore for several complex human activities recognition, it is also required that the sensors are placed at more than one position of human body. To overcome wearable sensor limitations, the researcher also explored the smartphone-built sensors. With the growing trends of smart phone and smartwatch usage, the complex human activities can be recognized using both devices built-inin sensors and data fusion followed by efficient feature extraction and machine learning methods. In [19], the authors used smartphone accelerometer data to classify fast vs slow walking, aerobic dancing activities that were not studied previously. In this study, authors have used combination of classifiers (MLP, Logit Boost, SVM) to classify six human activities (slow walking, fast walking, running, stair-up, stair-down, dancing). An accuracy of 91.15% has been achieved. In [20], authors used on body accelerometer and recognized seven human activities (sitting, walking, stair up, stair down, standing, walking, lying, running, cycling) using hidden Markov model (HMMs). In [21], authors used smart phone and smart watch built-in sensors for recognition of nine human activities (standing, sitting, walking, running, stair up, stair down, cycling, elevator up and elevator down) using five different classifiers. However, in this study the smartphone and smart watche sensors data were used independently. In [22], the authors used two motion sensors, one at wrist and other at hip independently to recognized five human physical activities.. However, they also treat both sensors independently. They used logistic regression classifier and detected activities like standing, sitting, walking, lying, and running. In [23] a wrist worn motion sensor was used to detect eight human activities including additional activity that is working on computer. In [24] the authors recognized a complex activity that is eating, or drinking using wrist worm accelerometer and gyroscope. Authors used HMM for classifying the activities and achieve 84.3% accuracy. They detect eating activity by splitting into sub activities like eating, drinking, resting etc. In [25] the authors detect the same as in [24] eating activity using wrist worn accelerometer and gyro scope. In this work they identified non-eating and eating periods and they achieve 81% accuracy. In [26], similar work as in [25] was done where authors recognized eating and noneating activities using a smartwatch that have built-in accelerometer and gyro scope achieving 70% accuracy. In [27], the author used accelerometers at wrist and foot position and they recognized smoking activity independently with other activities like running, walking etc., reporting 70% accuracy. In [24] some complex activities including smoking and eating are detected by using three accelerometers and gyroscope sensors. In this study, authors have used naive bays, decision tree, k-nearest neighbor classifier to classify activities.The sensors were place at pocket and wrists, which is more practical approach rather than using foot or other body parts. The authors treated the complex activities separately with some basic activities for example smoking while running or standing, eating while talking etc. In real world scenario complex activities do overlap with some basic activities, the aspect ignored in this study. In [28], the authors proposed a framework for sensor based human physical activity recognition. They used a central server-based data collection from multiple devices like smartphone, smart watch etc. via bluetooth and transfer data using advanced messaging queuing protocol (AMQP). In this work authors used three different type of fusion of multiple features and classified using SVM early fusion, late fusion and dynamic fusion of features method. Fusion is an important research domain and many techniques are introduced in the literature [29,30]. In this authors recognized human physical activities such as lying, walking, standing, sitting, bedding, getting up, lying down, standing up, standing down, putting hand back, stretching a hand. They reported 87.4% accuracy. Furthermore, they investigate effectiveness of smartphone and smartwatch sensor on different activities. In this paper, we have proposed a system that consider the fusion of both smart phone and smart watch sensors data. Moreover, we classify the overlapped basic and complex activities, based on statistical features analysis.

Proposed Methodology
Complex human activities overlapped with basic human activities leads towards the complexity of daily life activity recognition system. In order to overcome these complications, we have designed a mathematical model for multisensory learning and multilocational placement human activity recognition. The designed approach efficiently uses the data from smart phone and smart watch. This system provides advantages in terms of less computational time, reduction in installation cost, power efficiency, also the system is more appealing and contemporary than other conventional methods. The system effectively monitors the human activities for 24-h interval. The main objective of proposed system is to classify various complex human activities using built-in sensor of smartphone and smartwatch with better accuracy and less computational requirements. Fig. 1 depicts the proposed methodology of multisensory learning approach for complex human activity recognition. The overall performance of complex human activity recognition system depends on the following components.
(i) Sensing of human activities (ii) Pre-processing of sensor raw data (iii) Features extraction and selection (iv) Classification Algorithms.

Sensing of Human Activities
For complex human activity recognition, selection and positioning of a sensor is a most challenging step. The appropriate placement of sensor on human body leads toward the better performance of human activity recognition system. The basic daily life activities e.g., walking, sleeping, running, sitting, stair down, stair up, dancing etc. are recognized by using smartphone built-in sensors (i.e., pocket position) only. Whereas the complex human activities like eating, drinking coffee, typing, playing, smoking and talking etc. are not easily identified by using smartphone sensors due to the involvement of hand movements. Therefore, wrist worn sensor and smart watches are used for better recognition of these activities. In this work, we have used a multisensory and multilocational approach which provides better results for the identification of several complex human activities. It involves the overlap positioning, having smart phone in pocket/hand and a smart watch at wrist position. After sensing the human activity, the data fusion of cross pending sensors of both smart devices provides more useful information against different human physical activities. We have used following major components for better sensing of more complex human activities.

Multisensory Data Collection
Data for 13 basic human activities from [23,26] is integrated with the publicly available dataset for 13 complex activities and a complete data set of 26 basic and complex activities is formed. An android application named "Linear data collector V2" [20] was used for sensor data logging against each human activity performed by multiple individuals of different age. Data is collected through accelerometer, gyroscope, linear acceleration sensor, and magnetometer, at the rate of 50 samples per second.
The application easily interfaces with the sensor for the recording and collection of data. Before recording sensor data, the application required the name and ID of participants in order to keep track of different participants. During the activity, the android device interacting with the application is placed at specific body position. The data has been collected as comma-separated values (CSV) file on the smart device which can be processed later. The system has also been tested for 14 overlapped basic and complex activities. Hence, final dataset comprises of more than 25 human activities. Each physical activity has been performed by multiple individuals for more than 60 s with known labels and timestamp. Fifty samples (50 Hz of each sensor output) have been collected for different human physical activities performed by multiple individuals. Thus, for each device that consist of four triaxial sensor, total (4 * 3 = 12) dimensional data is collected at a single instant. Particularly, for one second time period against a single human activity, 50 sample of each 12-dimensional data is collected from each device. We have collected hundreds of samples against each human activity performed by single participant and finally series concatenate the samples of all participants. A big dataset of similar complex human activity leads to complexity and misclassification of recognition system. To overcome these barriers and forming an accurate activity recognition system, preprocessing and feature extraction techniques are used.

Sensor Data Fusion
12-dimensional data has been collected from each device (smart phone, smart watch) with frequency of 50 Hz, where each sample of data is labeled and time stamp. For better understanding of complex human activities, we have collected the data from smart phone and smart watch at the same instant. Afterwards, concatenate the data from both devices in parallel. Hence, our sample space dimensions become 24 against each each human activity.

Preprocessing and Feature Extraction
The human physical activity recognition system has not been classified directly using the raw data of sensor. The classification task has been performed by using structural data representation (features vector) obtained from several pre-processing and feature extraction techniques. Each sensor generates three time series, along x-axis, y-axis and z-axis. After preprocessing, the tridimensional (x, y, z) raw data from four sensors of each smart device contain 12-dimensional vector. Subsequently, the data fusion of both devices with corresponding sensor makes the data 24 dimensional. Afterwards, feature extraction increases the feature vector size up to 72 dimensional features, which means 9 features from each sensor of each device. The features vectors comprise of rotational, time domain and frequency domain features.
The instantaneous rotational feature derived from orientation of device, like pitch (θ) feature is rotation over x-axis, roll(φ) feature is rotation over z axis and module of acceleration vector (α). These rotational features are calculated as; Module of acceleration vector (αi) = sx i 2 + sy i + sz i 2 Let sx i ; sy i ; sz i are x; y; z sensors reading for all four sensors i = 1; 2; 3; 4 of each device i.e., smartphone, smartwatch. These features are normalized using the mat2gray() command in Matlab. Furthermore, we have derived some statistical features over a defined period. Six features of each sensor have been extracted using windowing method. For a given input data X t , a window of size k calculated as; X t = [x(t); x(t − 1); x(t − 2);. . . x(t − k)]. The window is filtered using an average filter of size q, using the fspecial( ) and imfilter( ) functions in Matlab, where; q = floor(k 2 3 ). Once the averaged window for particular timeperiod(t) has been obtained, the next six features for each sensor are computed. Features 4-6 are the variances of the pitch, roll, and acceleration magnitude over the window, calculated by unbiased variance as follows: where β can be, θ, φ, α or orū, the mean of the input feature over window, and n are number of sensors. Features 7-9 are components of the fast Fourier transform (FFT) over the window. The preceding steps provide us all the nine features of each sensor, hence, a set of 72 features for each time sample has been obtained. Feature space information has been given below. To find the time series components, the data from four sensors of smartphone s 1 ; s 2 ; s 3 ; s 4 is collected, where; s 1 is accelerometer sensor data, s 2 is linear acceleration sensor data, s 3 is gyroscope sensor data and s 4 is magnetometer data. Each sensor contain axis (x; y; z) of time series (t) data, so sensor s 1 data represented as s 1 x; s 1 y; s 1 z that show x-axis, y-axis and z-axis data of 1st sensor. Similarly, s 2 , s 3 and s 4 data of smartphone is denoted by (s 2 x; s 2 y; s 2 z), (s 3 x; s 3 y; s 3 z), (s 4 x; s 4 y; s 4 z) respectively. Overall data is represented Tab. 1.
The vector dimension of X1 (t) is 12 in every sample of each human activity. Similarly, smartwatch sensors time series data is represented as; After combining the data of sensors using both devices in time series, we obtain X(t).
where X (t) is 24-dimensional raw data in every sample of each human activity. Further, we have calculated rotational feature V (t) i.e., Pitch (θ), Roll (φ) and acceleration magnitude (α) of raw data.
Next 48 statistical features are obtained using windowing method over a small defined period we calculate variance in pitch σ 2 θ i , roll σ 2 φ i and acceleration magnitude σ 2 α i In next step we have calculated 24 frequency domain features V f using fast Fourier transform (FFT) over the window having size (3 × 3). It can be given as; Accelerometer (x, y, z) Accelerometer (x, y, z) Accelerometer (x, y, z) Accelerometer (x, y, z) Linear acceleration sensor (x, y, z) Linear acceleration sensor (x, y, z) Linear acceleration sensor (x, y, z) Linear acceleration sensor (x, y, z) Gyroscope (x, y, z) Gyroscope (x, y, z) Gyroscope (x, y, z) Gyroscope (x, y, z) Magneto meter (x, y, z) Magneto meter (x, y, z) Finally, the whole sensor data has been fused using Eqs. (10)- (12) in order to obtain final feature space. Let Z be the total number of features.
Hence, the whole dataset has been formed which is then utilized for classification process by splitting the data for training and testing process with ratio 8:2. Finally, different classification algorithms applied which classify the several human physical activities. Fig. 2 summarizes the proposed methodology of a multisensory learning approach of complex human activity recognition.

Results and Analysis
In literature, researchers have used different approaches to classify the human activities using sensor-based data. In this work, we have used Naive Bayes (NB), K-Nearest Neighbors (KNN) and Neural Network (NN) for classification.

Experimental Configuration
To perform Naive-Bayes Classification, we split the data into two groups: 80% of the data is used for training of the classifier and remaining 20% is reserved as testing data. Let P be the sensor readings in a single class (where J be total no of classes), and x p denotes the pth sensor reading from the training dataset. The elements of x p are 12 sensor readings (3-axis for 4 sensor).
We collected the whole training data and convert the data into a single vector. Then, we count the frequency of each value and use it to construct the probability of each value in the class. This gives us p (xijy = C j ) for each value x i and each class C j . Then we have determined whether a sensor reading belongs to class a or b, by taking all the elements of the new reading xtesti, and calculated it as; If (14) > (15), we can assume that new reading belongs to class a otherwise new reading belongs to class b. The classification is readily extended to comparison of multiple classes by taking the maximum.
This gives overall classes being compared with C j . For human activity recognition using KNN, number of neighbor K is assigned to each number of human physical activities. In this experiment, we have assigned 26 activities as "k number neighbor." For several human activity recognition system KNN is used as multiclass classifier. We have calculated the Euclidean distance between the data points. The distance between two points is Euclidean distance, it has been calculated as: Similarly, for multipoint, we have repeated the above procedure. We have calculated KNN Euclidean distance from mean value of each neighbor group. After re-assigning each datapoint to new class of minimum distance, we have calculated centroid of these neighbor groups. We have repeated the procedure until datapoints left with only few fixed numbers. The parameters of neural network (NN) i.e., number of neuron and number of hidden layers have been selected experimentally, until reached to minimum value of cross-entropy error of testing data. The simulation has been implemented using neural network toolkit of MATLAB with custom coding to overcome the limitations of the MATLAB GUI. We have used following configurations for artificial neural networks: Inputs features = 72, output classes = 26, Two hidden layers [110, 75 neurons], Sigmoid Activation function, Conjugate gradient training function, Error Back propagation, 500 epochs. We have analyzed our proposed methodology using multiple classifiers including Neural Network-Nearest Neighbor, Naive Bayes, Ensemble method AdaBoost Decision Tree. For ensembles purpose, we have used MATLAB learner GUI application. The classification performance of these class er has been given in the next sections.

Simulation Results
As we have explained earlier, four sensors of each device i.e., smartphone and smartwatch are used for the sensing of human physical activities. Nine features are derived from each sensor raw data. Although various sensors have been used for the identification of human activities, however, each feature derived from a sensor plays an independent role in sensing and recognizing of specific activity. The classification algorithm recognizes 26 basic and complex activities listed previously. Two types of datasets i.e., pocket position of smart phone and wrist position of smart watch have been considered which are lately classified by various classification methods. These methods include Naive Bayes, K-nearest neighbor, and Neural Network (NN).
Neural network having 2 hidden layers is implemented for activity recognition. The MATLAB pattern recognition (PR) neural network (NN) tool has been used. The configuration set as the number of given inputs and classified outputs are 72 and 26 respectively. The total of 72 inputs passed through first hidden layer of 75 neurons and second hidden layer of 50 neurons and lastly classify the 26 human activities. For classification purposes, Sigmoid activation function has been used which led to conjugate gradient training function and back propagation for error minimization.
The NN gather results for 500 epochs and it has been proven that NN with 2 hidden layer and KNN perform better than other algorithms. Accuracy of classifier in term of percentage is calculated as: Percentage Correct Classification = 100 * (1 − C r )) Percentage Incorrect Classification = 100 * C r ). Where, C r is a confusion value that shows declassification rate and obtained from MATLAB function confusion which returns false positive, false negative, true positive and true negative rate information. The Neural Network provides the accuracy of 99.340162% which is the highest accuracy among all the algorithms and make it more efficient. The confusion matrices of Naive Bayes, KNN and NN are given in Figs. 3-5 respectively.

Conclusion
The reliable recognition of several human physical activities of daily life can be very helpful for many applications like eHealth, remote monitoring and tracking of human for awareness feedback, coaching, human machine interaction, bad habits motoring. etc. In this paper, a multisensory learning approach of complex human activity recognition is proposed that provide better recognition performance for a large number of complex human physical activities of daily life. Neural network (NN), Naive Bayes (NB), K-Nearest Neighbor (KNN) and Ensemble method AdaBoost classifiers are used along a proposed mathematical model of preprocessing and feature extraction. Neural network and KNN classifiers outperformed other classifiers. It is further concluded that the smart device based multisensory based human activity recognition is a cost effective and more practical solution rather than vision based or dedicated sensor-based approaches. Furthermore, in this work a new data set against 26 human physical activities of daily life is formulated using built-in sensors of smart phone and smart watch, that will be helpful for future research in this field. The data of smartphone and smartwatch for a large number of complex human physical activity will serve as a benchmark.