Table of Content

Open Access

ARTICLE

A Scalable Method of Maintaining Order Statistics for Big Data Stream

Zhaohui Zhang*,1,2,3, Jian Chen1, Ligong Chen1, Qiuwen Liu1, Lijun Yang1, Pengwei Wang1,2,3, Yongjun Zheng4
School of Computer Science and Technology, Donghua University, 201620, China.
The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, 201804, China.
Shanghai Engineering Research Center of Network Information Services, 201804, China.
School of Electronics, Computing and Mathematics, University of Derby, Derby, United Kingdom.
* Corresponding Author: Zhaohui Zhang. Email: .

Computers, Materials & Continua 2019, 60(1), 117-132. https://doi.org/10.32604/cmc.2019.05325

Abstract

Recently, there are some online quantile algorithms that work on how to analyze the order statistics about the high-volume and high-velocity data stream, but the drawback of these algorithms is not scalable because they take the GK algorithm as the subroutine, which is not known to be mergeable. Another drawback is that they can’t maintain the correctness, which means the error will increase during the process of the window sliding. In this paper, we use a novel data structure to store the sketch that maintains the order statistics over sliding windows. Therefore three algorithms have been proposed based on the data structure. And the fixed-size window algorithm can keep the sketch of the last W elements. It is also scalable because of the mergeable property. The time-based window algorithm can always keep the sketch of the data in the last T time units. Finally, we provide the window aggregation algorithm which can help extend our algorithm into the distributed system. This provides a speed performance boost and makes it more suitable for modern applications such as system/network monitoring and anomaly detection. The experimental results show that our algorithm can not only achieve acceptable performance but also can actually maintain the correctness and be mergeable.

Keywords

Big data stream, online analytical processing, sliding windows, mergeable data sketches.

Cite This Article

Z. Zhang, J. Chen, L. Chen, Q. Liu, L. Yang et al., "A scalable method of maintaining order statistics for big data stream," Computers, Materials & Continua, vol. 60, no.1, pp. 117–132, 2019.



This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1148

    View

  • 746

    Download

  • 0

    Like

Related articles

Share Link

WeChat scan