Virtualization is the backbone of cloud computing, which is a developing and widely used paradigm. By finding and merging identical memory pages, memory deduplication improves memory efficiency in virtualized systems. Kernel Same Page Merging (KSM) is a Linux service for memory pages sharing in virtualized environments. Memory deduplication is vulnerable to a memory disclosure attack, which uses covert channel establishment to reveal the contents of other colocated virtual machines. To avoid a memory disclosure attack, sharing of identical pages within a single user’s virtual machine is permitted, but sharing of contents between different users is forbidden. In our proposed approach, virtual machines with similar operating systems of active domains in a node are recognised and organised into a homogenous batch, with memory deduplication performed inside that batch, to improve the memory pages sharing efficiency. When compared to memory deduplication applied to the entire host, implementation details demonstrate a significant increase in the number of pages shared when memory deduplication applied batch-wise and CPU (Central processing unit) consumption also increased.
Cloud computing is becoming increasingly popular in a variety of sectors. Virtual machines (VMs) have resurfaced, presenting a huge opportunity for cluster, parallel, cloud, grid, and distributed computing. Virtualization technology serves the majority of IT (Information Technology) and computer-related sectors by allowing users to share expensive hardware resources by executing VMs on the same set of server hosts. Virtualization is a computer architecture concept that allows the execution of many virtual machines (VMs) on the same host machine. The concept of virtual reality dates back to the 1960s. The goal of a virtual machine (VM) is to facilitate resource sharing among multiple users while also increasing computer efficiency and performance in terms of resource consumption and application flexibility. In various layers of cloud architecture, hardware resources such as Central Processing Unit, memory, and Input -Output devices, as well as software resources such as operating systems and software libraries, can be virtualized. Cloud computing is increasingly based on a virtualized architecture, in which massive programmes run inside virtual servers that are assigned to each physical server. Virtual machines are installed on top of virtual machine monitors, which are in charge of allocating physical resources such as CPU, memory, and other resources to each virtual machine separately. The number of pages shared is high if the guest operating system of virtual machines contains similar applications of data. If these sharing options are appropriately exploited, the virtual machine monitor can supply significantly more memory to the virtual machines, resulting in a higher level of server consolidation [
Memory deduplication was established primarily in Disco [
Despite the fact that memory deduplication reduces memory footprints, existing techniques lack isolation and trustworthiness in addition to their efficiency [
The rest of the paper is organized as below. In
The background and motivation of this research is given below.
Intra-virtual machine sharing, inter-virtual machine sharing, homogeneous sharing, and heterogeneous sharing are the several types of memory page sharing. Intra-Virtual machine sharing refers to the sharing of identical memory pages within the same virtual machine. Inter-Virtual machine sharing refers to the sharing of identical pages between multiple distinct virtual machines. Homogeneous virtual machine sharing refers to the sharing of memory pages across identical operating systems. Heterogeneous sharing refers to the sharing of virtual machines across multiple operating system platforms [
Kernel Same Page Merging is a memory deduplication implementation that originally appeared in the Linux Kernel version 2.6.32. Kernel daemon, ksmd, searches the user memory for pages that can be shared among users. It scans only the prospective candidates and develops signatures for them, rather than scanning the full region of memory, which is time consuming and CPU intensive. These signatures are kept in the deduplication table. When two or more pages are verified to determine if they are in the same signatures, KSM scans at a 20 millisecond interval and at a rate of 25% of possible memory pages at a time. KSM looks at three different sorts of memory pages. 1. Volatile pages, or pages that change frequently and are therefore unsuitable for memory sharing, 2. Unshared pages, also known as deduplication candidate pages, are the locations where madvise() instructs ksmd to merge., 3. Pages shared by processes or users that have been deduplicated (shared pages) [
For candidate pages of deduplication, KSM uses two Red-Black trees: a stable tree and an unstable tree. The RB (Red Black) tree’s efficiency is O(log n) per tree, and its height is never greater than 2log(n+1), where n is the number of nodes. In a round robin technique, KSM searches each memory location one by one. If the page is accessed, KSM first looks at the stable RB tree and merges with it if, it is identical. Otherwise, it checks the unstable tree for a match, and if one is found, it removes the page from the unstable tree and adds it to the stable tree [
Before starting memory sharing, allocated memory must be registered as being potentially shared by KSM. The stable tree comprises all of KSM’s shared and write-protected pages. Unstable trees are those that have the potential to be shared and whose contents haven’t changed in a long time. The contents of memory pages are used to index the nodes of both trees. Nodes in the stable tree point to memory pages that are shared, whereas nodes in the unstable tree indicate pages that are ideal candidates for sharing but are not shared. Both trees are initially empty. Scanned pages are examined for matches in the unstable tree as long as the shared tree is empty. A page is added to the unstable tree if there is no match in the unstable tree [
KVM (Kernel Virtual Machine) is a complete virtualization framework that enables hardware virtualization on x86 CPUs (Intel VT or AMD-V). It is made up of two primary parts: A group of kernel modules (kvm.ko, kvm-intel.ko, and kvm-amd.ko) that offer the underlying fundamental virtualization framework and processor specific drivers, as well as a userspace programme (qemu-kvm) that provides virtual device emulation and management mechanisms (virtual machines). The word KVM refers to the virtualization functionality at the kernel level, but it is more generally used to refer to the userspace component. Libvirt-based and QEMU-based tools could be used to manage VM Guests (virtual machines), virtual storage, and networks. libvirt is a library that provides an API (Application programming interface) for maintaining VM Guests utilising various virtualization solutions, including KVM and Xen. It has a graphical user interface and a command line program also. The QEMU (Quick Emulator) tools are specific to KVM/QEMU and are only available through the use of the command line [
QEMU is a cross-platform, fast Open - sourced machine emulation approach that can simulate a broad range of hardware architectures. QEMU allows users to run a fully functional operating system (VM Guest) on top of the current system (VM Host Server). QEMU is composed of several components: a processor emulator, emulated devices, generic devices for communicating the emulated devices to the related host devices, debugger, and a user interface for interacting with the emulator. QEMU can be used in conjunction with the KVM kernel module to provide a virtualization solution. QEMU can take use of KVM acceleration if indeed the VM Guest hardware architecture is the same as the VM Host Server’s architecture. Tools based on libvirt, such as virt-manager and vm-install, provide simple interfaces for creating and managing virtual machines [
Libvirt is a virtualization platform management toolkit that is accessible from C, Python, Perl, and GO, among many other languages, and is licenced under many standard open sources. It supports KVM, QEMU (Quick EMUlator), Virtuozzo, VMware ESX (Elastic Sky X), LXC (Linux Containers), BHyve (BSD hypervisor), and other virtualization technologies. It is destined for use with Linux, FreeBSD, Windows, and MacOS. Virsh is a shell wrapper in Libvirt that includes access to libvirt functionality on platforms that support virtualization. Virsh is a command-line and batch scriptable tool for managing all libvirt-managed domains, networks, and storage. This is included with the libvirt core distribution. libvirt-host is a libvirt module that provides several APIs. It has several macros for getting and setting various memory parameters of the virtual machine, such as virNodeGetInfo, virNodeGetMemoryParameters, virNodeGetMemoryStats, and virNodeSetMemoryParameters [
A single physical server can collocate several virtual machines being used by many users in a cloud computing environment where multi-tenants are used. The public cloud environment uses sharing of identical memory pages among different users to maximise resource utilisation, which can lead to a memory disclosure attack. On deduplicated pages that are re-created by Copy-on-write, malicious users can take advantage of the time difference. Because of the Copy-on-write method, more time is spent accessing the page than if it were accessed normally. To protect against memory disclosure attacks, sharing can be enabled within single-user virtual machines but disabled for other users.
For data clustering, hierarchical clustering is a widely used unsupervised machine learning technique. Agglomerative clustering and divisive clustering were the two broad classifications. The following are the steps involved: 1. Each data point in the dataset was treated as a separate cluster at first. 2. To create a cluster, connect the data points that are closest to each other. 3. Connect nearby clusters to form new clusters. 4. Dendrograms are used to split a large cluster into several smaller ones. The following are unique features: 1. the number of clusters does not need to be specified. 2. Dendrograms make it easy to understand how data has been grouped.
Gu et al. [
Jia et al. [
Wang et al. [
Elghamrawy et al. [
Garoa et al. [
Patel et al. [
Zhu et al. [
Lindermann et al. [
Lindermann et al. [
You et al. [
Shiba [
The Cgroups mechanism was used by Goa et al. [
Garcia et al. [
Garcia et al. [
This approach provides a mechanism that supports multiple deduplication threads, each of which is dedicated to a batch that has a set of similar operating systems. The grouping of each batch is based on similar OS. When the Kernel Virtual Machine (KVM) instantiates a new virtual machine, it registers the memory region of each virtual machine to the memory regions of KSM. Once KSM gets started, a global ksmd daemon process automatically starts and performs memory deduplication. In this approach, the global ksmd daemon is split into similar threads, each performing memory deduplication of each homogeneous batch. Cgroups, of Linux, is utilised to allocate CPU/memory resources of each user process. Overview architecture of homogeneous batch memory deduplication using clustering of virtual machines is shown in
Implementation: The aforementioned work was divided into two components for implementation.
Module1: Using Hierarchical Agglomerative Clustering, virtual machines are clustered depending on the guest operating system.
Module 2: Memory Deduplication is applied separately to each homogeneous batch.
Module 1 entails clustering virtual machines based on the guest operating system deployed, which entails the actions below.
User | Domain name | UUID | OS type | OS variant |
---|---|---|---|---|
User 1 | centos7.0 | e9a601b8-f56a-44ee-943f-6ddf640ce28f | Linux | Cent OS |
User 1 | win7 | 9986cc97-5cae-4767-a003-9b1337558b06 | Windows | Windows 7 |
User 1 | Ubu1 | aacd3461-4e67-7890-bc02-7b1337558b10 | Linux | Ubuntu14 |
User 2 | generic1 | ea7bcc59-e22d-4a7c-a5b1-028e17cf45b4 | Linux | Red Hat |
User 2 | generic2 | d43fc653-7b2d-4edf-98f3-96ebb78544b3 | Linux | Fedora |
User 3 | Winx | 38972079-3bc4-4dff-a4b0-ed33da19b9ee | Windows | Windows XP |
User 3 | Win7a | 5678098ea-4bcc-5bbf-a134-de54ad20acff | Windows | Windows 7 |
Domain Id | Domain name | UUID | Status |
---|---|---|---|
1 | centos7.0 | e9a601b8-f56a-44ee-943f-6ddf640ce28f | Running |
2 | win7 | 9986cc97-5cae-4767-a003-9b1337558b06 | Shut Off |
3 | generic1 | ea7bcc59-e22d-4a7c-a5b1-028e17cf45b4 | Running |
4 | generic2 | d43fc653-7b2d-4edf-98f3-96ebb78544b3 | Running |
5 | Ubu1 | aacd3461-4e67-7890-bc02-7b1337558b10 | Running |
6 | Winx | 38972079-3bc4-4dff-a4b0-ed33da19b9ee | Running |
7 | Win7a | 5678098ea-4bcc-5bbf-a134-de54ad20acff | Running |
Once the active domains are classified a dendrogram, shown as snapshot, representing the clusters of various active domains. In
After clustering of the active domains, it’s time to move on to the next phase. Deduplication threads are applied to each batch, and memory deduplication activities are performed inside the memory given to the batch. Various scan rates are assigned to each batch. The scan rate of a batch with CPU-intensive activities, such as games, can be set as low. Each batch has a KSM daemon thread, seperate “Stable tree” and “unstable tree” are two data structures used by KSM. In a batch, scan a new page, KSM checks the new page against the stable tree and merges it with the page if a match is found. If no match was detected, a search of the unstable tree was conducted. If a match was found in the unstable tree, the page was moved from the unstable tree to the Stable tree. If no match was found, a new page entry was generated in the unstable tree, and a new page was searched for. Flow chart of homogeneous batch memory deduplication was shown in
The experimental results are shown below:
The following
Variables | Characteristics |
---|---|
CPU Processor | Intel Core i5 Processor, 8th generation |
RAM (Random-access memory) | 4 GB |
Hypervisor | KVM with QEMU |
OS for VMs | Fedora, Ubuntu, Cent OS, Win 7, Win XP, Red Hat |
API | Libvirt (virsh) |
Three trial versions for three users are performed and the following
User 1 | User 2 | User 3 | |
---|---|---|---|
Trial 1: | |||
Batch 1 | VM1: RedHat |
VM1: RedHat |
VM1: CentOS |
Batch 2 | VM1: Win7 |
VM1: WinXP |
VM1: Win7 |
Trial 2: | |||
Batch 1 | VM1: Fedora |
VM1: CentOS |
VM1: RedHat |
Batch 2 | VM1: Win10 |
VM1: WinXP |
VM1: WinXP |
Trial 3: | |||
Batch 1 | VM1: RedHat, |
VM1: RedHat |
VM1: CentOS |
Batch 2 | VM1: Win7 |
VM1: WinXp |
VM1: Win7 |
In
Memory deduplication statistics : Number of pages shared | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
No of |
Memory deduplication applied to entire host | User 1 | User 2 | User 3 | ||||||
Trial 1 | Trial 2 | Trial 3 | Trial 1 | Trial 2 | Trial 3 | Trial 1 | Trial 2 | Trial 3 | ||
1 | 190 | 765 | 15427 | 8104 | 7622 | 8166 | 824 | 831 | 1031 | 8124 |
2 | 1054 | 919 | 17805 | 12686 | 10059 | 12396 | 7258 | 2783 | 4891 | 8224 |
3 | 2909 | 32143 | 24373 | 15762 | 11470 | 24766 | 16136 | 11003 | 17207 | 10642 |
4 | 3591 | 52639 | 26471 | 32330 | 19812 | 28870 | 34710 | 27413 | 21203 | 25046 |
5 | 4244 | 52835 | 26589 | 46116 | 26895 | 29388 | 48896 | 57219 | 21295 | 54452 |
6 | 7227 | 50977 | 26521 | 58714 | 33156 | 29392 | 61566 | 58887 | 21663 | 56048 |
7 | 8993 | 53079 | 26683 | 58848 | 33158 | 31342 | 63488 | 60887 | 21575 | 56260 |
8 | 12932 | 52399 | 26521 | 58798 | 33257 | 31412 | 63670 | 59687 | 21611 | 54828 |
9 | 14703 | 52119 | 26433 | 58788 | 33009 | 30882 | 63218 | 59595 | 21287 | 55178 |
10 | 15700 | 52515 | 25963 | 58856 | 33229 | 30172 | 63046 | 59265 | 20059 | 55088 |
11 | 16748 | 51589 | 24897 | 58898 | 33204 | 28088 | 62070 | 58327 | 17655 | 55168 |
Memory deduplication is particularly vulnerable to memory disclosure attacks. Memory deduplication can be used on virtual machines belonging to a single user group to prevent this attack. When compared to memory deduplication applied to the entire host, user virtual machines are grouped suitably and memory deduplication is performed to homogenous batch wise and the proportion of sharing of memory pages is increased and CPU utilization is high. In future work, the identical applications running in virtual machines of homogeneous batch are categorized further and memory deduplication to be performed.
The author with a deep sense of gratitude would thank the supervisor for his guidance and constant support rendered during this research.