Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.076329
Special Issues
Table of Content

Open Access

ARTICLE

Layered Feature Engineering for E-Commerce Purchase Prediction: A Hierarchical Evaluation on Taobao User Behavior Datasets

Liqiu Suo1, Lin Xia1, Yoona Chung1, Eunchan Kim1,2,*
1 Department of Information Systems, Hanyang University, Seoul, 04763, Republic of Korea
2 Department of Artificial Intelligence, Hanyang University, Seoul, 04763, Republic of Korea
* Corresponding Author: Eunchan Kim. Email: email
(This article belongs to the Special Issue: Big Data Technologies and Applications for a Data-Driven World)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.076329

Received 18 November 2025; Accepted 10 December 2025; Published online 29 December 2025

Abstract

Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features. This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers: Basic, Conversion & Stability (efficiency and volatility across actions), and Advanced Interactions & Activity (cross-behavior synergies and intensity). Using real Taobao (Alibaba’s primary e-commerce platform) logs (57,976 records for 10,203 users; 25 November–03 December 2017), we conducted a hierarchical, layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution. Across logistic regression (LR), decision tree, random forest, XGBoost, and CatBoost models with stratified 5-fold cross-validation, the performance improved monotonically from Basic to Conversion & Stability to Advanced features. With LR, F1 increased from 0.613 (Basic) to 0.962 (Advanced); boosted models achieved high discrimination (0.995 AUC Score) and an F1 score up to 0.983. Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short (9-day) window. By making feature contributions measurable and reproducible, the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioral modeling. The code and processed artifacts are publicly available, and future work will extend the validation to longer, seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design.

Keywords

Hierarchical feature engineering; purchase prediction; user behavior dataset; feature importance; e-commerce platform; Taobao
  • 181

    View

  • 24

    Download

  • 3

    Like

Share Link