Abstract
With the rapid development of digital fashion consumption, how to understand and model consumers' visual preferences has become a key issue to promote the personalization and humanization of intelligent recommendation systems. Different from functional products, the consumption decision of fashion products is highly dependent on users' subjective visual perception. However, the existing recommendation systems often focus on the collaborative information at the behavior level, which is difficult to capture users' potential preferences at the visual style level. In addition, realistic conditions such as cold-start users, sparse information, and diverse aesthetics further limit the performance of recommender systems. Therefore, it is of great significance to build a recommendation mechanism that can understand and utilize consumers' subjective visual judgment to improve the quality of recommendation and enhance the interpretability of the system.This thesis focuses on the core theme of "the mechanism of consumers' subjective visual preference in fashion consumption and its modeling method in the recommender system", and carries out two complementary studies. Based on behavioral economics and perceptual aesthetics, the first study explores how visual style perception differences affect users' fashion product selection behavior. The second study builds a multi-modal fashion recommendation system based on the large language model, which integrates user behavior, product attributes and image style information, and improves the personalized recommendation effect in the cold start and small sample situation.
The first study is based on actual sales data of a fashion brand, we analyze the mechanism of visual style perception differences in purchasing decisions from consumer behavior. Specifically, we construct gram matrix feature representation based on clothing product images to measure the image style difference between different products. Furthermore, based on consumers' real purchase sequences, we construct the style difference index between items, and propose a loss function based on item pairs to measure the impact of visual variation on consumers' purchase behavior. In order to control the interference between individual preferences and external factors such as category and time, we use a fixed effect model to conduct causal analysis to explore whether differences in visual styles significantly affect the probability of being selected in real scenarios. The empirical results show that after controlling for price, category and other factors, consumers are more inclined to choose products whose styles are different from their past purchase records, and this result is affected by the degree of style preference variation of consumers themselves. This finding provides an empirical basis for understanding consumers' style preferences and also provides a way to quantify style information for subsequent recommendation modeling.
In the second study, we design and implement a multi-modal fashion recommendation system based on Large Language Model (LLaMA), which aims to build a personalized recommendation framework that can understand textual and graphic information, has language generation ability, and supports zero-shot and few-shot recommendation. Firstly, we use collaborative filtering method (SVD-CF) to generate a set of candidate products, and then construct a natural language prompt by combining the user's historical purchase behavior (including product category, price and image style description) and the candidate product information, which is input into the large language model to generate a top-k recommendation list. In the input construction, we introduce the image style description generated by the image understanding large model (Gemma3), so that the model can perceive the style semantic information of the product, so as to improve the matching degree of the recommendation to the user preference. In the recommendation process, we propose a "two-step recommendation" mechanism, in which the user's preferences (e.g., preference category, price range and keyword style features) are first induced by the model, and then the recommendation list is generated from the candidate set based on the user preferences. Furthermore, combined with the parameter efficient fine-tuning method Low-Rank Adaptation (LoRA), the LLaMA model is task-oriented optimized to improve the generation quality of the model, especially under the condition of small samples.
We experimentally compare the proposed recommender systems under multiple evaluation Settings. The results show that the LLM recommendation system shows comparable recommendation ability in both zero-shot and few-shot scenarios. After introducing the two-step recommendation mechanism, the recommendation results are significantly improved in Hit Rate, Recall and NDCG metrics. The introduction of image information and LoRA fine-tuning further enhances the interpretability and stability of the model. In addition, we also conduct a sequence length sensitivity analysis to verify the performance differences of the model under different user behavior sequence lengths, and discuss the reasons for the inaccurate generation preferences of the model and the direction of improvement through the analysis of error cases.
In conclusion, this paper has made a meaningful exploration and contribution in theory and methodology. At the theoretical level, based on real user purchase data, this paper reveals the mechanism of visual style in consumer choice behavior, and emphasizes the important influence of subjective visual perception on consumer decision-making. At the methodology level, this paper proposes a large language model recommendation framework that integrates images, text and structured behavioral data, which can realize the organic unification of style perception, language generation and recommendation optimization in low-resource scenarios. From the perspectives of consumer visual perception modeling and recommendation technology implementation, the two researches form a research closed loop supporting each other, which provides an empirical reference and methodology basis for building a recommendation system with better understanding of users and more aesthetic intelligence.
| Date of Award | 6 Aug 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Shaoyi Stephen LIAO (Supervisor) |
Keywords
- subjective visual preference
- fashion recommendation system
- multi-modal learning
- large language model
- parameter efficient fine-tuning
Cite this
- Standard