Home / Journals / ENERGY / Online First / doi:10.32604/ee.2026.069004
Special Issues

Open Access

ARTICLE

An Integrated Framework of Feature Engineering and Machine Learning for Large-Scale Energy Anomaly Detection

Thanyapisit Buaprakhong1, Varintorn Sithisint1, Awirut Phusaensaart1, Sinthon Wilke1, Thatsamaphon Boonchuntuk1, Thittaporn Ganokratanaa1,*, Mahasak Ketcham2
1 Applied Computer Science Programme, King Mongkut’s University of Technology Thonburi, Bangkok, 10140, Thailand
2 Department of Information Technology Management, King Mongkut’s University of Technology North Bangkok, Bangkok, 10800, Thailand
* Corresponding Author: Thittaporn Ganokratanaa. Email: email
(This article belongs to the Special Issue: AI in Green Energy Technologies and Their Applications)

Energy Engineering https://doi.org/10.32604/ee.2026.069004

Received 11 June 2025; Accepted 17 December 2025; Published online 22 January 2026

Abstract

The rapid digitalization of the energy sector has led to the deployment of large-scale smart metering systems that generate high-frequency time series data, creating new opportunities and challenges for energy anomaly detection. Accurate identification of anomalous patterns in building energy consumption is essential for optimizing operations, improving energy efficiency, and supporting grid reliability. This study investigates advanced feature engineering and machine learning modeling techniques for large-scale time series anomaly detection in building energy systems. Expanding upon previous benchmark frameworks, we introduce additional features such as oil price indices and solar cycle indicators, including sunset and sunrise times, to enhance the contextual understanding of consumption patterns. Our comparative modeling approach encompasses an extensive suite of algorithms, including KNeighborsUnif, KNeighborsDist, LightGBMXT, LightGBM, RandomForestMSE, CatBoost, ExtraTreesMSE, NeuralNetFastAI, XGBoost, NeuralNetTorch, and LightGBMLarge. Data preprocessing includes rigorous handling of missing values and normalization, while feature engineering focuses on temporal, environmental, and value-change attributes. The models are evaluated on a comprehensive dataset of smart meter readings, with performance assessed using metrics such as the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). The results demonstrate that the integration of diverse exogenous variables and a hybrid ensemble of traditional tree-based and neural network models can significantly improve anomaly detection performance. This work provides new insights into the design of robust, scalable, and generalizable frameworks for energy anomaly detection in complex, real-world settings.

Keywords

Building energy; smart meter; anomaly detection; supervised learning; classification
  • 148

    View

  • 23

    Download

  • 0

    Like

Share Link