Tech Science Press - Publisher of Open Access Journals

News & Announcements

23 April 2024
Revue Internationale de Géomatique (RIG) welcomes its new Editor-in-Chief Prof. Manchun Li
22 March 2024
Henderson Office Address Change Notification
19 March 2024
Frontiers in Heat and Mass Transfer Welcomes Prof. Chun Yang as Editor-in-Chief
24 January 2024
In Memoriam: Professor Kazuo Umezawa
15 January 2024
Tech Science Press Collaborates with STM to Promote Open Access Publishing
29 December 2023
Rising Talents in Engineering win CMES 2022 Young Researcher Award

Show export options

Articles
Online

Search Results (16)

Open Access

ARTICLE

SMConf: One-Size-Fit-Bunch, Automated Memory Capacity Configuration for In-Memory Data Analytic Platform

Yi Liang^1,*, Shaokang Zeng¹, Xiaoxian Xu², Shilu Chang¹, Xing Su¹

CMC-Computers, Materials & Continua, Vol.66, No.2, pp. 1697-1717, 2021, DOI:10.32604/cmc.2020.012513

Abstract Spark is the most popular in-memory processing framework for big data analytics. Memory is the crucial resource for workloads to achieve performance acceleration on Spark. The extant memory capacity configuration approach in Spark is to statically configure the memory capacity for workloads based on user’s specifications. However, without the deep knowledge of the workload’s system-level characteristics, users in practice often conservatively overestimate the memory utilizations of their workloads and require resource manager to grant more memory share than that they actually need, which leads to the severe waste of memory resources. To address the above issue, SMConf, an automated memory… More >

View
1606

Download
1157

Like
0
Open Access

ARTICLE

Case Study: Spark GPU-Enabled Framework to Control COVID-19 Spread Using Cell-Phone Spatio-Temporal Data

Hussein Shahata Abdallah^{1, *}, Mohamed H. Khafagy¹, Fatma A. Omara²

CMC-Computers, Materials & Continua, Vol.65, No.2, pp. 1303-1320, 2020, DOI:10.32604/cmc.2020.011313

Abstract Nowadays, the world is fighting a dangerous form of Coronavirus that represents an emerging pandemic. Since its early appearance in China Wuhan city, many countries undertook several strict regulations including lockdowns and social distancing measures. Unfortunately, these procedures have badly impacted the world economy. Detecting and isolating positive/probable virus infected cases using a tree tracking mechanism constitutes a backbone for containing and resisting such fast spreading disease. For helping this hard effort, this research presents an innovative case study based on big data processing techniques to build a complete tracking system able to identify the central areas of infected/suspected people,… More >

View
3259

Download
1949

Like
0

Cited by
5
Open Access

ARTICLE

News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark

Zhuo Zhou¹, Jiaohua Qin^1,*, Xuyu Xiang¹, Yun Tan¹, Qiang Liu¹, Neal N. Xiong²

CMC-Computers, Materials & Continua, Vol.62, No.1, pp. 217-231, 2020, DOI:10.32604/cmc.2020.06431

Abstract Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data, this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform. Since the TF-IDF (term frequency-inverse document frequency) algorithm under Spark is irreversible to word mapping, the mapped words indexes cannot be traced back to the original words. In this paper, an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored. Firstly, the text feature is extracted by the TF-IDF algorithm combined CountVectorizer… More >

View
3191

Download
1651

Like
0

Cited by
20
Open Access

ARTICLE

A Dynamic Memory Allocation Optimization Mechanism Based on Spark

Suzhen Wang¹, Shanshan Geng¹, Zhanfeng Zhang¹, Anshan Ye², Keming Chen², Zhaosheng Xu², Huimin Luo², Gangshan Wu^3,*, Lina Xu⁴, Ning Cao⁵
CMC-Computers, Materials & Continua, Vol.61, No.2, pp. 739-757, 2019, DOI:10.32604/cmc.2019.06097

Abstract Spark is a distributed data processing framework based on memory. Memory allocation is a focus question of Spark research. A good memory allocation scheme can effectively improve the efficiency of task execution and memory resource utilization of the Spark. Aiming at the memory allocation problem in the Spark2.x version, this paper optimizes the memory allocation strategy by analyzing the Spark memory model, the existing cache replacement algorithms and the memory allocation methods, which is on the basis of minimizing the storage area and allocating the execution area according to the demand. It mainly including two parts: cache replacement optimization and… More >

View
1647

Download
1665

Like
0
Open Access

ARTICLE

An Improved Memory Cache Management Study Based on Spark

Suzhen Wang¹, Yanpiao Zhang¹, Lu Zhang¹, Ning Cao^{2, *}, Chaoyi Pang³

CMC-Computers, Materials & Continua, Vol.56, No.3, pp. 415-431, 2018, DOI: 10.3970/cmc.2018.03716

Abstract Spark is a fast unified analysis engine for big data and machine learning, in which the memory is a crucial resource. Resilient Distribution Datasets (RDDs) are parallel data structures that allow users explicitly persist intermediate results in memory or on disk, and each one can be divided into several partitions. During task execution, Spark automatically monitors cache usage on each node. And when there is a RDD that needs to be stored in the cache where the space is insufficient, the system would drop out old data partitions in a least recently used (LRU) fashion to release more space. However,… More >

View
1776

Download
955

Like
0
Open Access

ARTICLE

A Spark Scheduling Strategy for Heterogeneous Cluster

Xuewen Zhang¹, Zhonghao Li¹, Gongshen Liu^1,*, Jiajun Xu¹, Tiankai Xie², Jan Pan Nees¹

CMC-Computers, Materials & Continua, Vol.55, No.3, pp. 405-417, 2018, DOI: 10.3970/cmc.2018.02527

Abstract As a main distributed computing system, Spark has been used to solve problems with more and more complex tasks. However, the native scheduling strategy of Spark assumes it works on a homogenized cluster, which is not so effective when it comes to heterogeneous cluster. The aim of this study is looking for a more effective strategy to schedule tasks and adding it to the source code of Spark. After investigating Spark scheduling principles and mechanisms, we developed a stratifying algorithm and a node scheduling algorithm is proposed in this paper to optimize the native scheduling strategy of Spark. In this… More >

View
2129

Download
1352

Like
0

Displaying 11-20 on page 2 of 16. Per Page

Pre 12

SMConf: One-Size-Fit-Bunch, Automated Memory Capacity Configuration for In-Memory Data Analytic Platform

View

1606

Download

1157

Like

0

Case Study: Spark GPU-Enabled Framework to Control COVID-19 Spread Using Cell-Phone Spatio-Temporal Data

View

3259

Download

1949

Like

0

Cited by

5

News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark

View

3191

Download

1651

Like

0

Cited by

20

A Dynamic Memory Allocation Optimization Mechanism Based on Spark

View

1647

Download

1665

Like

0

An Improved Memory Cache Management Study Based on Spark

View

1776

Download

955

Like

0

A Spark Scheduling Strategy for Heterogeneous Cluster

View

2129

Download

1352

Like

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: