CIT-Rec: Enhancing Sequential Recommendation System with Large Language Models
Ziyu Li1, Zhen Chen2, Xuejing Fu2, Tong Mo1,*, Weiping Li1
1 School of Software and Microelectronics, Peking University, Beijing, 100871, China
2 Information Application Research Center of Shanghai Municipal Administration for Market Regulation, Shanghai, 200032, China
* Corresponding Author: Tong Mo. Email:
Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071994
Received 17 August 2025; Accepted 02 December 2025; Published online 29 December 2025
Abstract
Recommendation systems are key to boosting user engagement, satisfaction, and retention, particularly on media platforms where personalized content is vital. Sequential recommendation systems learn from user-item interactions to predict future items of interest. However, many current methods rely on unique user and item IDs, limiting their ability to represent users and items effectively, especially in zero-shot learning scenarios where training data is scarce. With the rapid development of Large Language Models (LLMs), researchers are exploring their potential to enhance recommendation systems. However, there is a semantic gap between the linguistic semantics of LLMs and the collaborative semantics of recommendation systems, where items are typically indexed by IDs. Moreover, most research focuses on item representations, neglecting personalized user modeling. To address these issues, we propose a sequential recommendation framework using LLMs, called CIT-Rec, a model that integrates Collaborative semantics for user representation and Image and Text information for item representation to enhance Recommendations. Specifically, by aligning intuitive image information with text containing semantic features, we can more accurately represent items, improving item representation quality. We focus not only on item representations but also on user representations. To more precisely capture users’ personalized preferences, we use traditional sequential recommendation models to train on users’ historical interaction data, effectively capturing behavioral patterns. Finally, by combining LLMs and traditional sequential recommendation models, we allow the LLM to understand linguistic semantics while capturing collaborative semantics. Extensive evaluations on real-world datasets show that our model outperforms baseline methods, effectively combining user interaction history with item visual and textual modalities to provide personalized recommendations.
Keywords
Large language models; vision language models; sequential recommendation; instruction tuning