SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data

Kashif Irshad; Muhammad Afzal; Sanam Rizvi; Abdul Shahid; Rabia Riaz; Tae-Sun Chung

doi:10.32604/cmc.2021.014156

Open Access icon Open Access

ARTICLE

SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data

Kashif Irshad¹, Muhammad Tanvir Afzal², Sanam Shahla Rizvi³, Abdul Shahid⁴, Rabia Riaz⁵, Tae-Sun Chung^6,*

1 Department of Computer Science, Capital University of Science and Technology, Islamabad, Pakistan
2 Department of Computer Science, NAMAL Institute, Mianwali, 42250, Pakistan
3 Raptor Interactive (Pty) Ltd., Eco Boulevard, Witch Hazel Ave, Centurion, 0157, Africa
4 Institute of Computing, Kohat University of Science and Technology, Pakistan
5 Department of CS&IT, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Pakistan
6 Department of Artificial Intelligence, Ajou University, Korea

* Corresponding Author: Tae-Sun Chung. Email: email

(This article belongs to the Special Issue: Artificial Intelligence and Big Data in Entrepreneurship)

Computers, Materials & Continua 2021, 67(1), 877-894. https://doi.org/10.32604/cmc.2021.014156

Received 02 September 2020; Accepted 28 October 2020; Issue published 12 January 2021

Abstract

The growing collection of scientific data in various web repositories is referred to as Scientific Big Data, as it fulfills the four “V’s” of Big Data–-volume, variety, velocity, and veracity. This phenomenon has created new opportunities for startups; for instance, the extraction of pertinent research papers from enormous knowledge repositories using certain innovative methods has become an important task for researchers and entrepreneurs. Traditionally, the content of the papers are compared to list the relevant papers from a repository. The conventional method results in a long list of papers that is often impossible to interpret productively. Therefore, the need for a novel approach that intelligently utilizes the available data is imminent. Moreover, the primary element of the scientific knowledge base is a research article, which consists of various logical sections such as the Abstract, Introduction, Related Work, Methodology, Results, and Conclusion. Thus, this study utilizes these logical sections of research articles, because they hold significant potential in finding relevant papers. In this study, comprehensive experiments were performed to determine the role of the logical sections-based terms indexing method in improving the quality of results (i.e., retrieving relevant papers). Therefore, we proposed, implemented, and evaluated the logical sections-based content comparisons method to address the research objective with a standard method of indexing terms. The section-based approach outperformed the standard content-based approach in identifying relevant documents from all classified topics of computer science. Overall, the proposed approach extracted 14% more relevant results from the entire dataset. As the experimental results suggested that employing a finer content similarity technique improved the quality of results, the proposed approach has led the foundation of knowledge-based startups.

Keywords

Scientific big data; ACM classification; term indexing; content similarity; cosine similarity

Cite This Article

APA Style

Irshad, K., Afzal, M.T., Rizvi, S.S., Shahid, A., Riaz, R. et al. (2021). SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data. Computers, Materials & Continua, 67(1), 877–894. https://doi.org/10.32604/cmc.2021.014156

Vancouver Style

Irshad K, Afzal MT, Rizvi SS, Shahid A, Riaz R, Chung T. SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data. Comput Mater Contin. 2021;67(1):877–894. https://doi.org/10.32604/cmc.2021.014156

IEEE Style

K. Irshad, M. T. Afzal, S. S. Rizvi, A. Shahid, R. Riaz, and T. Chung, “SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data,” Comput. Mater. Contin., vol. 67, no. 1, pp. 877–894, 2021. https://doi.org/10.32604/cmc.2021.014156

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data

Abstract

Keywords

Cite This Article

4626

2490

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link