Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (14)
  • Open Access

    ARTICLE

    Performance Improvement through Novel Adaptive Node and Container Aware Scheduler with Resource Availability Control in Hadoop YARN

    J. S. Manjaly, T. Subbulakshmi*

    Computer Systems Science and Engineering, Vol.47, No.3, pp. 3083-3108, 2023, DOI:10.32604/csse.2023.036320

    Abstract The default scheduler of Apache Hadoop demonstrates operational inefficiencies when connecting external sources and processing transformation jobs. This paper has proposed a novel scheduler for enhancement of the performance of the Hadoop Yet Another Resource Negotiator (YARN) scheduler, called the Adaptive Node and Container Aware Scheduler (ANACRAC), that aligns cluster resources to the demands of the applications in the real world. The approach performs to leverage the user-provided configurations as a unique design to apportion nodes, or containers within the nodes, to application thresholds. Additionally, it provides the flexibility to the applications for selecting and choosing which node’s resources they… More >

  • Open Access

    ARTICLE

    Enhanced Best Fit Algorithm for Merging Small Files

    Adnan Ali1, Nada Masood Mirza1,2, Mohamad Khairi Ishak1,*

    Computer Systems Science and Engineering, Vol.46, No.1, pp. 913-928, 2023, DOI:10.32604/csse.2023.036400

    Abstract In the Big Data era, numerous sources and environments generate massive amounts of data. This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes. Hadoop is used to process this kind of data. It is known to handle vast volumes of data more efficiently than tiny amounts, which results in inefficiency in the framework. This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm (EBFM) that merges files depending on predefined parameters (type and size). Implementing this algorithm will ensure that… More >

  • Open Access

    ARTICLE

    New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes

    Keungyeup Ji, Youngmi Kwon*

    Computer Systems Science and Engineering, Vol.45, No.1, pp. 201-214, 2023, DOI:10.32604/csse.2023.031270

    Abstract As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in… More >

  • Open Access

    ARTICLE

    Twitter Data Analysis Using Hadoop and ‘R’ and Emotional Analysis Using Optimized SVNN

    K. Sailaja Kumar*, H. K. Manoj, D. Evangelin Geetha

    Computer Systems Science and Engineering, Vol.44, No.1, pp. 485-499, 2023, DOI:10.32604/csse.2023.025390

    Abstract Standalone systems cannot handle the giant traffic loads generated by Twitter due to memory constraints. A parallel computational environment provided by Apache Hadoop can distribute and process the data over different destination systems. In this paper, the Hadoop cluster with four nodes integrated with RHadoop, Flume, and Hive is created to analyze the tweets gathered from the Twitter stream. Twitter stream data is collected relevant to an event/topic like IPL- 2015, cricket, Royal Challengers Bangalore, Kohli, Modi, from May 24 to 30, 2016 using Flume. Hive is used as a data warehouse to store the streamed tweets. Twitter analytics like… More >

  • Open Access

    ARTICLE

    Research on ABAC Access Control Based on Big Data Platform

    Kun Yang1, Xuanxu Jin2, Xingyu Zeng1,*

    Journal of Cyber Security, Vol.3, No.4, pp. 187-199, 2021, DOI:10.32604/jcs.2021.026735

    Abstract In the environment of big data, the traditional access control lacks effective and flexible access mechanism. Based on attribute access control, this paper proposes a HBMC-ABAC big data access control framework. It solves the problems of difficult authority change, complex management, over-authorization and lack of authorization in big data environment. At the same time, binary mapping codes are proposed to solve the problem of low efficiency of policy retrieval in traditional ABAC. Through experimental analysis, the results show that our proposed HBMC-ABAC model can meet the current large and complex environment of big data. More >

  • Open Access

    ARTICLE

    Hybrid Deep Learning Framework for Privacy Preservation in Geo-Distributed Data Centre

    S. Nithyanantham1,*, G. Singaravel2

    Intelligent Automation & Soft Computing, Vol.32, No.3, pp. 1905-1919, 2022, DOI:10.32604/iasc.2022.022499

    Abstract In recent times, a huge amount of data is being created from different sources and the size of the data generated on the Internet has already surpassed two Exabytes. Big Data processing and analysis can be employed in many disciplines which can aid the decision-making process with privacy preservation of users’ private data. To store large quantity of data, Geo-Distributed Data Centres (GDDC) are developed. In recent times, several applications comprising data analytics and machine learning have been designed for GDDC. In this view, this paper presents a hybrid deep learning framework for privacy preservation in distributed DCs. The proposed… More >

  • Open Access

    ARTICLE

    BitmapAligner: Bit-Parallelism String Matching with MapReduce and Hadoop

    Mary Aksa1, Junaid Rashid2,*, Muhammad Wasif Nisar1, Toqeer Mahmood3, Hyuk-Yoon Kwon4, Amir Hussain5

    CMC-Computers, Materials & Continua, Vol.68, No.3, pp. 3931-3946, 2021, DOI:10.32604/cmc.2021.016081

    Abstract Advancements in next-generation sequencer (NGS) platforms have improved NGS sequence data production and reduced the cost involved, which has resulted in the production of a large amount of genome data. The downstream analysis of multiple associated sequences has become a bottleneck for the growing genomic data due to storage and space utilization issues in the domain of bioinformatics. The traditional string-matching algorithms are efficient for small sized data sequences and cannot process large amounts of data for downstream analysis. This study proposes a novel bit-parallelism algorithm called BitmapAligner to overcome the issues faced due to a large number of sequences… More >

  • Open Access

    ARTICLE

    Residential Electricity Classification Method Based On Cloud Computing Platform and Random Forest

    Ming Li1, Zhong Fang2, Wanwan Cao1, Yong Ma1,*, Shang Wu1, Yang Guo1, Yu Xue3, Romany F. Mansour4

    Computer Systems Science and Engineering, Vol.38, No.1, pp. 39-46, 2021, DOI:10.32604/csse.2021.016189

    Abstract With the rapid development and popularization of new-generation technologies such as cloud computing, big data, and artificial intelligence, the construction of smart grids has become more diversified. Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents, which is essential to ensure the normal operation of the power system, energy management and planning. Based on the distributed architecture of cloud computing, this paper designs an improved random forest residential electricity classification method. It uses the unique out-of-bag error of random forest and combines the Drosophila… More >

  • Open Access

    ARTICLE

    Design and Implementation of Log Data Analysis Management System Based on Hadoop

    Dunhong Yao1,2,3,*, Yu Chen4

    Journal of Information Hiding and Privacy Protection, Vol.2, No.2, pp. 59-65, 2020, DOI:10.32604/jihpp.2020.010223

    Abstract With the rapid development of the Internet, many enterprises have launched their network platforms. When users browse, search, and click the products of these platforms, most platforms will keep records of these network behaviors, these records are often heterogeneous, and it is called log data. To effectively to analyze and manage these heterogeneous log data, so that enterprises can grasp the behavior characteristics of their platform users in time, to realize targeted recommendation of users, increase the sales volume of enterprises’ products, and accelerate the development of enterprises. Firstly, we follow the process of big data collection, storage, analysis, and… More >

  • Open Access

    ARTICLE

    A Survey and Systematic Categorization of Parallel K-Means and Fuzzy-C-Means Algorithms

    Ahmed A. M. Jamel1,∗, Bahriye Akay2,†

    Computer Systems Science and Engineering, Vol.34, No.5, pp. 259-281, 2019, DOI:10.32604/csse.2019.34.259

    Abstract Parallel processing has turned into one of the emerging fields of machine learning due to providing consistent work by performing several tasks simultaneously, enhancing reliability (the presence of more than one device ensures the workflow even if some devices disrupted), saving processing time and introducing low cost and high-performance computation units. This research study presents a survey of parallel K-means and Fuzzy-c-means clustering algorithms based on their implementations in parallel environments such as Hadoop, MapReduce, Graphical Processing Units, and multi-core systems. Additionally, the enhancement in parallel clustering algorithms is investigated as hybrid approaches in which K-means and Fuzzy-c-means clustering algorithms… More >

Displaying 1-10 on page 1 of 14. Per Page