Open Access iconOpen Access

ARTICLE

crossmark

Data Warehouse Design for Big Data in Academia

Alex Rudniy*

Department of Computing Sciences, University of Scranton, Scranton, 18510, PA, USA

* Corresponding Author: Alex Rudniy. Email: email

Computers, Materials & Continua 2022, 71(1), 979-992. https://doi.org/10.32604/cmc.2022.016676

Abstract

This paper describes the process of design and construction of a data warehouse (“DW”) for an online learning platform using three prominent technologies, Microsoft SQL Server, MongoDB and Apache Hive. The three systems are evaluated for corpus construction and descriptive analytics. The case also demonstrates the value of evidence-centered design principles for data warehouse design that is sustainable enough to adapt to the demands of handling big data in a variety of contexts. Additionally, the paper addresses maintainability-performance tradeoff, storage considerations and accessibility of big data corpora. In this NSF-sponsored work, the data were processed, transformed, and stored in the three versions of a data warehouse in search for a better performing and more suitable platform. The data warehouse engines—a relational database, a No-SQL database, and a big data technology for parallel computations—were subjected to principled analysis. Design, construction and evaluation of a data warehouse were scrutinized to find improved ways of storing, organizing and extracting information. The work also examines building corpora, performing ad-hoc extractions, and ensuring confidentiality. It was found that Apache Hive demonstrated the best processing time followed by SQL Server and MongoDB. In the aspect of analytical queries, the SQL Server was a top performer followed by MongoDB and Hive. This paper also discusses a novel process for render students anonymity complying with Family Educational Rights and Privacy Act regulations. Five phases for DW design are recommended: 1) Establishing goals at the outset based on Evidence-Centered Design principles; 2) Recognizing the unique demands of student data and use; 3) Adopting a model that integrates cost with technical considerations; 4) Designing a comparative database and 5) Planning for a DW design that is sustainable. Recommendations for future research include attempting DW design in contexts involving larger data sets, more refined operations, and ensuring attention is paid to sustainability of operations.

Keywords


Cite This Article

APA Style
Rudniy, A. (2022). Data warehouse design for big data in academia. Computers, Materials & Continua, 71(1), 979-992. https://doi.org/10.32604/cmc.2022.016676
Vancouver Style
Rudniy A. Data warehouse design for big data in academia. Comput Mater Contin. 2022;71(1):979-992 https://doi.org/10.32604/cmc.2022.016676
IEEE Style
A. Rudniy, "Data Warehouse Design for Big Data in Academia," Comput. Mater. Contin., vol. 71, no. 1, pp. 979-992. 2022. https://doi.org/10.32604/cmc.2022.016676



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1948

    View

  • 1369

    Download

  • 0

    Like

Share Link