As nearly half of the incidents in enterprise security have been triggered by insiders, it is important to deploy a more intelligent defense system to assist enterprises in pinpointing and resolving the incidents caused by insiders or malicious software (malware) in real-time. Failing to do so may cause a serious loss of reputation as well as business. At the same time, modern network traffic has dynamic patterns, high complexity, and large volumes that make it more difficult to detect malware early. The ability to learn tasks sequentially is crucial to the development of artificial intelligence. Existing neurogenetic computation models with deep-learning techniques are able to detect complex patterns; however, the models have limitations, including catastrophic forgetfulness, and require intensive computational resources. As defense systems using deep-learning models require more time to learn new traffic patterns, they cannot perform fully online (on-the-fly) learning. Hence, an intelligent attack/malware detection system with on-the-fly learning capability is required. For this paper, a memory-prediction framework was adopted, and a simplified single cell assembled sequential hierarchical memory (s.SCASHM) model instead of the hierarchical temporal memory (HTM) model is proposed to speed up learning convergence to achieve on-the-fly learning. The s.SCASHM consists of a Single Neuronal Cell (SNC) model and a simplified Sequential Hierarchical Superset (SHS) platform. The s.SCASHM is implemented as the prediction engine of a user behavior analysis tool to detect insider attacks/anomalies. The experimental results show that the proposed memory model can predict users’ traffic behavior with accuracy level ranging from 72% to 83% while performing on-the-fly learning.
Nearly half of the incidents in enterprise security have been triggered by insiders. Sophisticated insider threat attackers include leavers, outsiders, and unknowing innocents. A conventional solution to this problem,
A technique called user behavior analytics has become particularly useful in providing solutions that have flexible pattern recognition when rules are unsuitable, which humans simply cannot achieve due to the extremely large amount of data involved. It is difficult to distinguish anomaly/malware from normal behaviors due to sophisticated attack techniques. A compromised machine/node may pretend to be a normal user requesting a service from a server to gain classified information. The main issue with the traditional intrusion detection systems (IDSs) is the use of signature-based files, which increase in size over time and affect the detection accuracy and detection time. Thus, researchers have begun to use intelligent techniques to overcome the issue. As modern network traffic has dynamic patterns and a huge volume, detection systems that use deep-learning techniques require more time to be retrained for new traffic patterns as they cannot fully perform online (on-the-fly) learning. Therefore, an intelligent detection system with an on-the-fly learning capability is required.
First, a dataset generated from real traffic and combined with malware traffic was created as an alternative for benchmarking anomaly detection systems. Second, a memory-prediction framework model has been proposed as a basis for a user behavior traffic analysis in real-time, A new type of memory system called Sequential hierarchical superset (SHS) platform and Single neuronal cell (SNC) model have been introduced. Lastly, the memory-prediction model is implemented as anomaly detection system. The model will help enterprises to defend their networks from insider attacks as it has on-the-fly learning capability.
Nowadays, Internet of Things (IoT) and cloud computing networks are dominant in the Internet. The fast development in wireless sensor network technology has made the IoT network becomes more complex in term of applications [
Stiawan et al. [
Karbab et al. [
Cui et al. [
Pasha et al. [
This section discusses the dataset and the proposed model along with its implementation on user traffic behavior analysis.
The Numenta Anomaly Benchmark is an existing benchmark dataset used for real-time network traffic monitoring; however, the current Numenta Anomaly Benchmark is limited to data streams containing a single metric plus a timestamp, and thus the error analysis from the benchmark indicates that the errors across various detection algorithms are not always correlated [
The s.SCASHM is inspired by neuroscience’s memory-prediction framework theory on the human neocortex [
The SNC model is designed to have two different managers: the cell manager and the genetic manager. The cell manager is the core part of the SNC model. It manages the creation, parameters, and operation of every cell. The purpose of having a separate genetic manager is to provide a manageable gene distribution among each cell for performance improvement. A cell produced by the cell manager is arranged as such that it interconnects with other cells in a hierarchy inside the SHS platform. The nature of the SNC model is fundamentally different from neurons commonly known in the artificial cell assembly model or neural network. The data are not stored in the synapse, which needs to be strengthened or trained if the value stored by the neural network is changed. The purpose of examining the connection between cells is to determine which cell may be activated next when one cell is activated in a particular sequenced assembled cell. The cell itself is a container of data. Thus, when the data need to be updated, the required training time can be significantly reduced as no weights need to be retrained.
The SHS platform consists of five modules: the sequence manager, the prediction modulator, the input-preprocessing, the main control center, and the six-layer hierarchical columnar container. Among these five modules, the main control center is the core module that controls the assembled single neuronal cell model inside the six-layer hierarchical columnar container and manages the data flow between each layer [
The six-layer hierarchical columnar container is the actual platform where an assembled SNC model is placed. Similar to the human neocortex, each layer contains a region, and the region has a further layered columnar architecture. The proposed SNC model forms an assembled cell in a sequential manner inside the s.SCASHM memory model without the complexity of training and learning computations. Thus, both the SNC model and the SHS platform are integrated and complement each other with the aim to have a working cell assembly-based memory implementation following the human neocortex columnar pyramidal cell architecture inside its six-layered hierarchical memory structure.
The column’s first row and last row are where the input-output handler is placed. Both the column’s first row and the last row are connected to the main control center module. In addition, the column’s first row is connected to the prediction modulator. The column’s third row is the SNC holder’s row, which implements a standard SNC cell that contains the data, and the cell’s input gate is connected to the input-output handler in the fourth row as well as to other cells’ input gate in the same row defined in the SNC’s cell manager connection map. The column’s second row is the inhibitory layer, which implements an inhibitory cell. An inhibitory cell is an SNC cell without data. An inhibitory cell’s input and output gate is not connected to any cell as it has no data but instead receives an inhibitory signal in its third interface and sends inhibitory signals to other inhibitory cells inside the particular region where the column is located. The purpose of this inhibitory cell is to prevent other cells in different columns that are not part of the current active set of the sequence from being activated.
To understand the fundamental activities of the s.SCASHM with its SNC and SHS platform,
These two sets of inputs are then sent to the SHS’ main control center in sequential order. The SNC cell manager is then called to compute similarity matching on each of the inputs. Because this input is new and has never been presented before, the SNC cell manager then returns no match triggering the SHS main control center to initiate a new memory creation process as part of the SHS’ common main process. The new memory creation process includes initiating the cell manager to create new cells to hold the new memory, putting these newly created cells into the SHS platform’s lowest layer (Layer 6), and creating the connections with default weights between cells based on its sequence.
The newly created links between cells are stored in the SNC cell manager’s connection map table, and the cell’s location in the SHS platform is stored in SNC cell manager’s information details table. A set of cells that is linked together to form a sequence is then placed in the upper layer (Layer 5) and automatically added to the SHS sequence manager’s naming convention table with an auto-assigned name, which in this case is “Unknown-1” and “Unknown-2” (see
Upon completion, the s.SCASHM has knowledge of a few letters, two words, and one short sentence. The next time, when it is presented with the string “Brain Research,” the same process occurs again; however, this time, after the letter “b” is presented, the s.SCASHM, using the SHS prediction modulator, makes a prediction that the next letter might be “r”. When the next input letter “r” is presented, the prediction is matched, and the set of the sequence {“b”, “r”, “a”, “i”, “n”} becomes a predicted set. The same process occurs in the higher layer (Layer 5), and the superset of sequence {“Unknown-1,” “Unknown-2”} is then used by the SHS prediction modulator to predict what the next set might be after the set “Unknown-1.” This time, the prediction fails because the next input presented is the word “Research” and not “Model.” In the case of prediction failure, the s.SCASHM will go through a new memory creation process again for the word “Research,” and a new link in the SNC cell manager’s connection map table is created to form the new sequence of {“brain,” “model”}. Generally, these are the typical processes that occur in the s.SCASHM.
The s.SCASHM was implemented as a memory prediction framework-based online UBA tool for analyzing users’ traffic behavior. In the online UBA tool implementation, the captured raw traffic stream is preprocessed into the sequence of individual network packets, and each byte inside the network packet is represented by its smallest form in a sequence of bits. A 2048 bits vector is then used as the Sparse Distributed Representation (SDR) implementation to represent this sequence of bits. At each point in time, this 2048 bits vector is fed to the s.SCASHM memory network for analysis. On the output of the s.SCASHM network, additional computations,
For the purpose of comparison with deep learning, VGG-16 deep learning is used. VGG-16 is a convolutional neural network model introduced by Simonyan et al. [
The experimental setup for playing back the captured traffic to create the simulation consists of three computers connected to a network switch. PC-1 is used for crafting traffic packets, and a packet generator injects traffic into the network segment. The server is configured to host four virtual servers. The s.SCASHM (as the detection engine) is installed on PC-2. PC-2 is connected to a mirrored port of the switch, and thus the detection engine can see all the traffic in the network segment. PC-1 and PC-2 are computers with specifications of Intel Core i5-8250U processor with 4 GB RAM and 500 GB of storage capacity. The server’s specification is an Intel Core i7-10510Y processor with 8 GB RAM and 1 TB of storage capacity. All computers run Windows 10 as their operating system. The memory prediction engine is implemented in Python programming language, and the deep-learning engine is implemented using Tensorflow software.
Once a small local area network with a data mirroring port is set up, the simulated one-month dataset is created as per the following steps:
Design and plan a one-month traffic simulation (when and what anomalous traffic appears, etc.). Set eight users/nodes (four server nodes and four user nodes) for the simulation. Craft the required traffic packets manually, including anomalous/malware packets. Begin injecting traffic into the network and at the same time capturing the traffic through the mirroring port. After the planned traffic for one month has been injected and captured, manually label the simulated anomaly as well as specific application traffic (this label was used during the experiments to verify the results). Save the captured traffic into a file in the .pcap format as the dataset for the UBA experiments.
Experiments on the proposed s.SCASHM were carried out by feeding the raw data traffic flow for continuous learning and detection. In deep-learning experiments, for each month of data, the first 20 days of decoded traffic are used as training data, and the last 10 days are used as testing data. The average is used for the final result.
The analysis of the individual traffic flows and their contents are essential for a complete understanding of network usage. In this experiment, a reconstructive traffic analysis was carried out. The archived three-month traffic flows were analyzed to identify behavior patterns. The captured raw packet was converted into a data string before sending the data to the s.SCASHM.
Parameter | Value |
---|---|
“SCASHM” = Activated | |
0.75 | |
5000 (msec.) | |
SHS = 1000, SNC = 5 | |
SCAHSM | |
SCAHSM |
The training part is a supervision process period of the s.SCASHM’s memory. The s.SCASHM implements auto-assigned naming for every object it sees. This labeling process is useful for a better evaluation of the s.SCASHM in terms of how accurate it recognizes a memorized object.
ID | Content | Name | Type | Timestamp |
---|---|---|---|---|
1 | {08, 00} | ethIP | Std | 13:04:45.112 |
2 | {46, 06} | ipver-tcp | Std | 13:04:46.046 |
3 | {00, 55} | httpPort | Std | 13:04:46.072 |
4 | {1, 2} | tcp | Std-ID | 13:06:21.489 |
5 | {1, 2, 3} | http | Std-ID | 13:07:33.048 |
6 | {86, dd} | ethIp6 | Std | 13:07:34.481 |
7 | {6e, fe} | ipv6ver-icmp | Std | 13:08:13.653 |
8 | {6, 7} | icmp | Std-ID | 13:08:57.203 |
9 | {60, 06} | ipv6ver-tcp | Std | 13:10:28.322 |
10 | {6, 9, 3} | http6 | Std-ID | 13:10:29.406 |
11 | {08, 06} | ethArp | Std-ID | 13:13:42.811 |
12 | {00, 01} | arpOptCode | Std | 13:15:09.678 |
13 | {11, 2, 12} | arp | Std-ID | 13:15:55.576 |
14 | {13, 13, 13, 13} | arp-flow | Std-ID | 13:32:19.864 |
Supervision | Layer1 | Layer2 | Layer3 | Layer4 | Layer5 | Layer6 |
---|---|---|---|---|---|---|
Without | 708 | 3111 | 41766 | 283857 | 947886 | 3184944 |
With | 111 | 834 | 4257 | 58922 | 11823 | 369526 |
In this set of experiments, we feed the trained s.SCASHM with the simulated one-month dataset (without being decoded first) to learn and recognize various user traffics. The observation is focused on the number of packets of each user as a feature. The result shows that the proposed s.SCASHM successfully recognizes all top 8 users’ traffics in the dataset.
The aim of this experiment was to compare the ability of the proposed s.SCASHM and the deep-learning technique to predict traffic flow. The result showed that the proposed s.SCASHM is able to maintain consistent prediction accuracy even during the fluctuating period (from 9 am to 5 pm) of daily activity. The s.SCASHM has the ability to learn on-the-fly during the observation, while deep learning has lower prediction accuracy due to a rather small dataset during training.
The purpose of this experiment was to evaluate the performance of the proposed memory prediction model to profile users’ application usage behaviors (as part of the UBA) by observing the number of packets exchanged as a feature. The outcome of this profiling is useful to identify any anomaly in the use of any services/applications by legacy users. The results shown in
In this experiment, the performance of the proposed memory prediction model was compared to the deep-learning model in detecting anomalous Google Drive access. The deep-learning technique uses decoded data for this purpose and relies on the Google Drive packet decoder to recognize Google Drive traffic. In this experiment, one peak-day traffic flow was randomly chosen and observed from 12.00 noon to 00.00 midnight. The true positive (TP), true negative (TN), False Positive (FP), and False Negative (FN) of the detections of the simulated anomalies were recorded, and the results are shown in
Observation window time | TP | TN | FP | FN | Total | |||||
---|---|---|---|---|---|---|---|---|---|---|
s.SCA-SHM | DL | s.SCA-SHM | DL | s.SCA-SHM | DL | s.SCA-SHM | DL | s.SCA-SHM | DL | |
12:00–14:00 | 491 | 429 | 119 | 113 | 83 | 137 | 7 | 21 | 700 | 700 |
14:00–16:00 | 591 | 472 | 32 | 39 | 78 | 160 | 9 | 39 | 710 | 710 |
16:00–18:00 | 526 | 392 | 51 | 57 | 37 | 143 | 7 | 29 | 621 | 621 |
18:00–20:00 | 617 | 513 | 58 | 53 | 46 | 134 | 6 | 27 | 727 | 727 |
20:00–22:00 | 647 | 541 | 77 | 85 | 53 | 134 | 5 | 22 | 782 | 782 |
22:00–24:00 | 594 | 491 | 81 | 90 | 53 | 131 | 4 | 20 | 732 | 732 |
Observation window time | Accuracy (%) | Precision (%) | Sensitivity (%) | F1-score (%) | Specificity | |||||
---|---|---|---|---|---|---|---|---|---|---|
s.SCA-SHM | DL | s.SCA-SHM | DL | s.SCA-SHM | DL | s.SCA-SHM | DL | s.SCA-SHM | DL | |
12:00–14:00 | 87.14 | 77.42 | 85.54 | 75.79 | 98.59 | 95.33 | 91.60 | 84.44 | 58.91 | 45.2 |
14:00–16:00 | 87.74 | 71.97 | 88.34 | 74.68 | 98.50 | 92.36 | 93.14 | 82.58 | 29.00 | 19.59 |
16:00–18:00 | 92.91 | 72.30 | 93.42 | 73.27 | 98.68 | 93.11 | 95.98 | 82.00 | 57.59 | 28.50 |
18:00–20:00 | 92.84 | 77.85 | 93.06 | 79.28 | 99.03 | 95.00 | 95.95 | 86.43 | 55.76 | 28.34 |
20:00–22:00 | 92.58 | 80.05 | 92.42 | 80.14 | 99.23 | 96.09 | 95.71 | 87.39 | 59.23 | 38.8 |
22:00–24:00 | 92.21 | 79.37 | 91.80 | 78.93 | 99.33 | 96.08 | 95.42 | 86.67 | 60.44 | 40.72 |
In this experiment, the performance of the proposed s.SCASHM in detecting anomalous activity in overall traffic at the gateway was compared to the deep-learning model. Observing overall traffic at the gateway is important as an initial indication of whether there is an anomaly/attack. If there was an indication of an anomaly, we continue with a detailed observation of individual traffic flows. The observation was done for a period of two weeks, and the results are shown in
The aim of this experiment was to evaluate the proposed s.SCASHM’s performance on other UBA functions,
NodeID | Application | # of detected anomaly | NodeID | Application | # of detected anomaly | ||
---|---|---|---|---|---|---|---|
DL | s.SCASHM | DL | s.SCASHM | ||||
10.10.1.6 | Email services | 0 | 0 | 10.10.1.8 | Email services | 0 | 1 |
Samba file server | 0 | 0 | Samba file server | 0 | 2 | ||
Google hangout | 0 | 0 | Google hangout | 0 | 0 | ||
WhatsApp services | 0 | 0 | WhatsApp services | 0 | 0 | ||
Dropbox services | 0 | 0 | Dropbox services | 0 | 0 | ||
Google Drive services | 0 | 0 | Google Drive services | 0 | 2 | ||
10.10.1.7 | Email services | 10 | 14 | 10.10.1.9 | Email services | 9 | 10 |
Samba file server | 23 | 31 | Samba file server | 14 | 17 | ||
Google hangout | 7 | 9 | Google hangout | 3 | 4 | ||
WhatsApp services | 11 | 11 | WhatsApp services | 4 | 4 | ||
Dropbox services | 18 | 22 | Dropbox services | 12 | 13 | ||
Google Drive services | 26 | 34 | Google Drive services | 21 | 25 |
Experiment # | ||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
Active cells (%) | 85 | 45 | 78 | 81 | 53 | 72 |
Inactive cells (%) | 12 | 35 | 15 | 10 | 38 | 21 |
Inhibited cells (%) | 3 | 20 | 7 | 9 | 9 | 7 |
Finally,
Node ID | ||||||||
---|---|---|---|---|---|---|---|---|
10.10.1.2 | 10.10.1.3 | 10.10.1.4 | 10.10.1.5 | 10.10.1.6 | 10.10.1.7 | 10.10.1.8 | 10.10.1.9 | |
Prof. Det. | Prof. Det. | Prof. Det. | Prof. Det. | Prof. Det. | Prof. Det. | Prof. Det. | Prof. Det. | |
s.SCASHM | 0.12 0.01 | 0.14 0.01 | 0.14 0.01 | 0.15 0.01 | 0.27 0.01 | 0.44 0.03 | 0.51 0.06 | 0.49 0.05 |
DL | N/A 0.03 | N/A 0.04 | N/A 0.04 | N/A 0.04 | N/A 0.06 | N/A 0.08 | N/A 0.14 | N/A 0.11 |
Note: Prof. = Profiling, Det. = Detection.
A practical performance metric for tasks requiring an immediate prediction just after a few training examples is the online accuracy [
With regards to learning convergence speed, the proposed memory model is absolutely better than the deep-learning model as it does not need to update multiple cells across the set of activated cells. It can be observed in
While both are inspired by the memory prediction framework, the main difference between Hierarchical Temporal Memory (HTM) [
Shaukat et al. [
Second, we used the external validity threat as a measure to evaluate the results from our experiments. We used a simulated dataset in our experiments due to the inability to inject anomalous traffic into our real university network. This limited our options in comparing the results with other approaches using real network traffic. We addressed this issue by also replicating similar experiments using the deep-learning model for comparison.
There is no machine-learning technique that has no limitations in terms of data or processes (
Real-time sequence learning from data streams presents unique challenges for machine learning algorithms. Besides prediction accuracy, the development of the following are considered as future research directions.
Ideal algorithm that can learn Markov order automatically and efficiently to make accurate high-order predictions. Real-world sequences contain contextual dependencies that span multiple time steps, Algorithm that can recognize and learn new patterns rapidly to perform continuous learning for real-time data stream analysis. Proper sequence learning algorithm that can make multiple predictions simultaneously and evaluate the likelihood of each prediction online, because a given temporal context, there could be multiple possible future outcomes. Ultimate algorithm that possesses acceptable performance on various problems without any task-specific hyper-parameter tuning. Learning in the cortex is enormously robust for a wide range of tasks. In contrary, most machine learning algorithms require optimizing a set of hyper-parameters for various tasks. Hyper-parameter tuning presents a major challenge for applications that require a high degree of automation, like data stream mining.
In this paper, the authors have introduced a new model of a sequential hierarchical superset of a memory-prediction framework. The model was used as an unsupervised learning engine for a real-time user behavior anomaly detection system. The experimental results show that the s.SCASHM performed well in recognizing spatial as well as temporal anomalies in the network security domain. It was able to detect user traffic behavior with an accuracy ranging from 72% to 83%. Thus, the detection system meets the requirements of real-time, continuous, online detection without supervision.
As a memory prediction model, the proposed s.SCASHM has an advantage compared to the HTM model in terms of a simpler and faster execution for forecasting data without the need to update multiple cells across the set of activated mini-columns. Compared to conventional machine-learning and deep-learning models, the proposed s.SCASHM model has an advantage in terms of its capability to perform on-the-fly learning.
Researchers in neuroscience and artificial intelligence (AI) are now focusing on understanding how the brain works and thus on predicting behavior. When we can comprehend better the brain, we may produce better designs for AI algorithms. For the learning process, the neuroscience-AI collaboration can be developed further, and a precise learning process developed by neuroscience can be used to inform the design of the process for AI. Correspondingly, if AI discovers patterns from large datasets and creates a learning model, neuroscientists could carry out wet experiments to confirm the model. This research work materializes the idea of a partnership between both disciplines.
Considering the limitations and future directions discussed in Section 4.4, for future work, the authors plan to focus on three aims. First, we plan to extend the data representation model and improve the proposed s.SCASHM architecture further by implementing an advanced cortical micro-circuitry region that enhances the connection between the hierarchical layers and also allows for storing more data. An empirical study is planned to validate the proposed s.SCASHM model by utilizing functional magnetic resonance imaging (fMRI) to identify cells’ activities when dealing with an episodic and declarative memory within the neocortex’s cortical column. The goal is to measure the growth rate of the s.SCASHM model when presented with new information compared to the actual cell growth in the neocortex and also to investigate specific properties of the human neocortex that have not been incorporated into the proposed model,