LLMs for RecSys

  • As mentioned in Sumit’s Diary LLMs have the ability to make recommendations based on their understanding of natural language, even in the absence of specific behavioral data. For instance, an LLM can recommend buying a turkey on Thanksgiving day without having explicit click behavior data related to turkeys or Thanksgiving.
  • Researchers have proposed various approaches to leverage LLMs in recommender systems. These approaches involve transforming recommendation tasks into language understanding or generation frameworks. The article highlights notable research conducted in this area, focusing on the use of LLMs to enhance recommender systems.
  • LLMs offer several advantages for building recommendation systems, making them an appealing choice in this domain.
  • One key advantage is that LLMs can utilize user behavior data by incorporating it into the task descriptions or prompts. This allows the knowledge stored in the LLM parameters to generate personalized recommendations. The reasoning capability of LLMs enables them to infer user interests based on the contextual information provided through prompts.
  • LLMs have also demonstrated effectiveness in zero and few-shot domain adaptation. This means that even with limited task-specific data, startups and businesses can leverage LLMs to expand into new domains and deploy recommendation applications. This adaptability of LLMs enables flexibility and scalability in building recommender systems.
  • Traditional large-scale recommender systems often involve multi-step cascade pipelines. However, using a single LLM framework for recommendations can unify and streamline common improvements, such as bias reduction, which are typically fine-tuned at each step. Moreover, using a single LLM model for multiple recommendation tasks can reduce the carbon footprint by eliminating the need to train separate models for each task.
  • Since different recommendation tasks often share a common user-item pool and have overlapping contextual features, employing a unified LLM framework can benefit from joint learning and representation of inputs. This leads to improved generalization on unseen tasks and efficient utilization of shared information.
  • LLMs also possess interactive capabilities, which can aid in model explainability. The ability to provide explanations or insights into the recommendations enhances transparency and helps users understand the rationale behind the system’s suggestions.
  • Additionally, LLMs’ feedback mechanism allows for enhancing the overall user experience. By incorporating user feedback and iteratively refining the recommendations, LLM-based recommender systems can adapt and improve over time, leading to more relevant and satisfying user experiences.
  • Overall, the characteristics of LLMs make them well-suited for recommendation systems, offering benefits such as personalization, adaptability, unified frameworks, generalization, explainability, and enhanced user experience.
  • As Sumit’s Diary mentions, here are a few advantages of using LLM’s for recommender systems:
    • LLMs excel in addressing sparsity scenarios and cold start problems. Even in scenarios where there is limited data available, LLMs with large parameter sizes have shown promising results comparable to heuristic-based baselines.
    • LLMs have the ability to adapt to new information without requiring changes to the model architecture or retraining. This adaptability allows for dynamic updates and incorporation of real-time data.
    • LLMs enable users to express their needs freely and precisely through natural language instructions, often via chat-based interfaces. This active involvement of users in the recommendation process goes beyond passive feedback, leading to more personalized and accurate recommendations.
    • Traditional recommendation algorithms are usually task-specific and require specific user-item interaction data for training. In contrast, LLMs can leverage user interaction data represented as sequences, allowing them to supplement their extensive world knowledge with user behavior information.
    • LLMs inherently possess a vast amount of world knowledge, reducing the dependency on large volumes of training data that traditional methods like collaborative filtering rely on. This makes LLMs more data-efficient and capable of providing recommendations with less training data.
    • LLMs simplify the complex feature processing methods used in traditional approaches. With prompts, feature processing and modeling steps are streamlined, reducing the complexity of recommendation systems.
    • LLMs have the ability to generate explanations for their recommendations. Through intermediate chain-of-thought reasoning, LLMs can provide natural language justifications, increasing the transparency and interpretability of the recommender system.


Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation

  • Transformers4Rec is introduced as an open-source library built upon HuggingFace’s Transformers library, aiming to bring the advancements of Natural Language Processing (NLP) based Transformers to the field of recommender systems. The library is designed to be extensible, user-friendly, and suitable for both research and industrial applications.
  • To showcase the effectiveness of Transformer architectures in sequential and session-based recommendation tasks, the library was used to win two recent session-based recommendation competitions. Furthermore, a comprehensive empirical analysis comparing various Transformer architectures and training approaches was conducted for session-based recommendation. The study demonstrates that the best Transformer architectures outperform baselines on e-commerce datasets and perform similarly on news datasets.
  • The effectiveness of different training techniques, including causal language modeling, masked language modeling, permutation language modeling, and replacement token detection, was evaluated using the XLNet Transformer architecture. It was found that training XLNet with replacement token detection yields good results across all datasets.
  • Additionally, techniques for incorporating side information such as item and user context features were explored. The study establishes that including side information consistently improves recommendation performance.
  • The research highlights the potential of Transformers in recommendation systems and provides insights into the best practices and performance of various Transformer architectures and training techniques.
  • Transformers4Rec addresses the problem of leveraging the advancements in Transformer architectures from the field of Natural Language Processing (NLP) for sequential and session-based recommendation tasks. While Transformers have been highly successful in NLP, their application in recommender systems has been relatively limited.
    • NLP advancements have inspired researchers in the recommender systems (RecSys) field to adapt NLP architectures for sequential and session-based recommendation.
  • Early neural language models focused on learning representations for words, sentences, and paragraphs, which were then adapted in RecSys to learn item, user, or context embeddings based on co-occurrence within user interactions.
  • RNNs were also utilized for sequential and session-based recommendation, considering the sequential nature of item interactions. GRU4REC introduced pairwise loss functions for efficient training, and additional features (side information) were incorporated in subsequent works.
  • Attention mechanisms, introduced in 2016, showed effectiveness in handling long sequences in NLP. NARM incorporated attention into an RNN architecture for recommender systems, capturing sequential behavior and user intent in the current session. Attentional FM used attention to learn the importance of feature interactions in non-sequential models.
  • Transformers were introduced in 2017 as an efficient alternative to RNN-based sequential encoders for NLP tasks, offering benefits such as parallel processing and scalability for long sequences.
  • Transformer architectures like GPT-2, BERT, XLNet, and Transformer-XL proposed novel pre-training approaches and adapted self-attention mechanisms to address language modeling specificities.
  • Transformers have been adapted for sequential recommendation, with models like AttRec, SASRec, BERT4Rec, and SSE-PT utilizing self-attention to infer item-item relationships and capture user preferences in historical interactions.
  • Time elapsed between user interactions is important for predicting current interests, and approaches like discretizing elapsed time and representing it as categorical feature embeddings have been explored.
  • Side information, such as user contextual features and heterogeneous user behaviors, has been incorporated in Transformer models for recommendation tasks.
  • Transformers have also been applied to session-based recommendation, outperforming RNNs even for shorter sessions. Techniques like preference-aware masks and modified self-attention mechanisms have been proposed to improve session modeling.
  • Auto-regressive (CLM) and autoencoding (MLM) approaches have been used for session-based recommendation with Transformers, with some works incorporating future in-session interactions during training.
  • This work aims to perform a comprehensive analysis of different Transformer architectures and training approaches for session-based recommendation, and explores techniques to leverage side information for improved accuracy. Additionally, it focuses on Transformers for news recommendation, addressing specific challenges like shorter sessions and item relevance decay.

  • Transformers4Rec’s Meta-Architecture consists of several modules: Features Processing, Sequence Masking, Sequence Processing (with configurable Transformer blocks), and Prediction head.
  • Input features, such as sparse categorical or continuous numerical features, are normalized and combined by the Features Processing module to generate the interaction embedding.
  • The sequence of interaction embeddings is masked by the Sequence Masking module based on the training approach (e.g., Causal LM, Masked LM).
  • The masked sequence is then processed by the Sequence Processing module, which contains stacked Transformer blocks. The number of blocks and the architecture type (e.g., GPT-2, Transformer-XL, XLNet, Electra) can be configured.
  • The Sequence Processing module outputs a vector for each position in the sequence, representing a sequence embedding.
  • The Prediction head module can be configured for different tasks, such as item prediction for item recommendation or sequence-level predictions for classification or regression.
  • The items prediction head used in the experiments consists of an output layer that uses tying embeddings technique and a softmax layer to predict relevance scores for all items.
  • Transformers4Rec supports multiple interaction-level features, which can be normalized and combined in different ways.
  • Two aggregation functions are available: concatenation merge and element-wise merge.
  • Tying embeddings technique is used to tie input embedding weights with the output projection layer matrix, reducing model parameters and introducing a matrix factorization operation.
  • The Meta-Architecture supports various regularization techniques, including Dropout, Weight Decay, Softmax Temperature Scaling, Layer Normalization, Stochastic Shared Embeddings, and Label Smoothing.
  • Different loss functions, such as cross-entropy and pairwise losses, can be used for training.
  • The Meta-Architecture modules are regular PyTorch modules, allowing customization and extensibility, such as combining multiple input sequences or enabling multi-task learning.

Reasoning with LLM

  • LLMs like GPT-4 are designed to learn contextual meaning through their word embeddings. These models utilize sophisticated techniques such as transformer architectures to capture the relationships between words and their context in a given sentence or text.
  • As a result, LLMs can distinguish between multiple meanings of a word based on the context in which it appears. For example, LLMs can understand that the word “bank” can refer to a financial institution or a riverbank, depending on the surrounding words and the overall context of the sentence.
  • By considering the words and phrases nearby, LLMs can make more accurate interpretations and generate appropriate responses or predictions.
  • This contextual understanding is achieved through the training process of LLMs, where they learn to associate word embeddings with the context in which they occur. The models capture the statistical patterns and relationships between words and their surrounding words, allowing them to grasp subtle nuances and meanings.
  • The image below, (source) shows how LLMs can infer with the users purchase history that they are throwing a party. Thus, instead of just recommending similar items to the ones they are requesting, the LLM can recommend plates and other party favors.

Ranking with LLM

  • This post is by Srijan Kumar that I have found absolutely fascinating. It’s from the original paper Large Language Models are Zero-Shot Rankers for Recommender Systems. To be able to use prompt engineering to serve recommendation in a zero shot manner would augment the current recommender systems to a new status.
  • “Can LLMs (e.g., ChatGPT, Google’s PaLM 2, Meta’s LLaMA, etc.) be used as recommender systems only using prompt engineering and no training? Yes! I came across this interesting paper leveraging LLMs’ zero-shot generalization capabilities for sequential recommendations. #AI
  • The recommendation task is formalized as a conditional ranking task. The prompt includes a user’s past sequential interactions as “conditions” and a set of “candidate” items. The LLMs are instructed to rank the candidate set for recommendation in the order of interaction likelihood.
  • 👉The following prompts were used: “[pattern that contains sequential historical interactions H] [pattern that contains retrieved candidate items C] Please rank these movies by measuring the possibilities that I would like to watch next most, according to my watching history. You MUST rank the given candidate movies. You cannot generate movies that are not in the given candidate list.”
  1. prompting strategies provide sequential historical interactions: (1) Sequential prompting: “I’ve watched the following movies in the past in order: ‘0. Multiplicity’, ‘1. Jurassic Park’…” (2) Recency-focused prompting: “I’ve watched the following movies in the past in order: ‘0. Multiplicity’, ‘1. Jurassic Park’,… Note that my most recently watched movie is Dead Presidents…” (3) In-context learning: “If I’ve watched the following movies in the past in order: ‘0. Multiplicity’, ‘1. Jurassic Park’,… then you should recommend Dead Presidents to me and now that I’ve watched Dead Presidents, then…”

👉Findings: (1) LLMs have promising zero-shot ranking abilities compared to prior zero-shot ranking models. (2) Simple prompts lead the LLM to ignore the interaction order. (3) Recency-focused prompts & in-context learning force the LLM to make order-aware recommendations. (4) Larger LLM models gives better performance: GPT3.5-turbo > text-davinci-003 > LLaMA-65B (5) However, there is a HUGE gap between zero-shot LLMs and the standard fully-trained recommender system models (e.g., SASRec).

👉Why do LLMs work as recommendation systems? LLMs have general information about the products present in the datasets about movies and games. LLMs are able to leverage the past products present in the sequence to present similar products as recommendations, akin to collaborative filtering.

👉My take:

  • I do not expect recommender systems to be replaced by LLMs any time soon. That said, the future of recommender systems is bright by combining the power of LLMs with more traditional recommender systems.

  • Several ways to improve such hybrid systems: augmenting LLMs with product knowledge graphs will improve the inherent knowledge that LLMs leverage to generate more reasonable recommendations. Other ways are to use better prompting and few-shot learning methods.
  • In the ranking phase of recommendation engines, the goal is to sort a list of candidate items based on specific criteria or preferences. This sorting process helps determine the order in which the items will be presented to the users. Traditionally, learning-to-rank libraries like TensorFlow Ranking have been used to train models that can predict the ranking or ordering of items.
  • With the PaLM API (Powerful Language Model API), you can now utilize large language models to perform the ranking task as well. For example, let’s consider the scenario of predicting movie ratings. Using the PaLM API, you can provide a list of candidate movies and ask the model to predict ratings for each movie individually. Based on these predicted ratings, you can then sort the movies in descending order to obtain the final ranking.
  • This ranking approach, where the model predicts a score or rating for each item independently and the items are then sorted based on these scores, is known as “pointwise ranking.” It is a straightforward method where each item is considered in isolation during the ranking process.
  • Alternatively, you can also leverage the PaLM API for other ranking strategies such as pairwise ranking or listwise ranking. Pairwise ranking involves comparing items in pairs and predicting which item is preferred. Listwise ranking considers the entire list of items as a single entity and predicts the optimal order for the entire list.
  • By adjusting the prompt or input format to the PaLM API, you can adapt it for different ranking strategies beyond pointwise ranking.
  • To delve deeper into the topic of rating prediction using large language models, you can refer to the referenced paper from Google, which provides a comprehensive study on the subject.
  • In summary, the PaLM API allows you to leverage the power of large language models for the ranking phase of recommendation engines. It enables you to predict ratings or scores for candidate items, sort them based on these predictions, and ultimately generate personalized and relevant recommendations for users.
  • The example below (source) displays this in code.
prompt = """You are a movie recommender and your job is to predict a user's rating (ranging from 1 to 5, with 5 being the highest) on a movie, based on that user's previous ratings.

User 42 has rated the following movies:
"Moneyball" 4.5
"The Martian" 4
"Pitch Black" 3.5
“12 Angry Men” 5

Predict the user's rating on "The Matrix". Output the rating score only. Do not include other text.
response = palm.generate_text(model="models/text-bison-001", prompt=prompt)

# 4.5

Next Recommendation Prediction

  • The research paper titled, “Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)”, by Rutgers University has found many use cases for LLMs in recommender systems, namely:
    • Sequential recommendation
    • Rating prediction
    • Explanation generation
    • Review summarization
    • Direct recommendation
  • The researchers assigned unique IDs to users and items in order to train a large language model (LLM) for building a recommendation system. Here’s an explanation of the key concepts:
    1. Unique IDs for Users and Items: Each user and item in the training set is assigned a unique ID. These IDs serve as identifiers that the LLM uses to learn patterns and relationships between users and items based on their interactions, such as purchase histories.
    2. Learning Similarity and User Preferences: By analyzing the training set with thousands of users and their purchase histories, the LLM learns that certain items are similar to each other and that certain users tend to have preferences towards specific items. This understanding is achieved through the self-attention mechanism, a key component of the LLM architecture that allows the model to focus on different parts of the input sequence and capture relationships between different elements.
    3. Collaborative Filtering: During the pre-training process of the LLM, it effectively undergoes a form of collaborative filtering. Collaborative filtering involves analyzing user-item interactions to identify patterns of similarity or co-occurrence. In this case, the LLM learns from observing which users have purchased the same items and which items tend to be purchased together. This collaborative filtering information is incorporated into the model’s knowledge.
    4. Contextual Embeddings: The LLM’s ability to produce contextual embeddings is crucial for the recommendation system. Contextual embeddings capture the meaning and relationships between words or entities within a given context. In this case, the LLM generates contextual embeddings for the user and item IDs, enabling it to understand the associations and preferences based on the purchase sequences it has learned during training.
    5. Leveraging T5 Architecture: The architecture described, referred to as “P5,” utilizes the pretrained T5 checkpoints as its backbone. T5 is another large language model developed by Google, specifically designed to handle various sequence-to-sequence tasks. Leveraging T5 as a starting point for this recommendation system allows the model to benefit from its capabilities in understanding and generating sequences, which can be adapted for the specific task of recommendations.
  • By combining the power of contextual embeddings, collaborative filtering, and the underlying architecture of the T5 LLM, the recommendation system can infer associations between items and users based on observed purchase patterns. This enables the system to make personalized recommendations and suggest items that users might be interested in, even if the specific correspondence between item IDs and their real-world representations is not known.
Input: "I find the purchase history list of user_15466:
4110 -> 4467 -> 4468 -> 4472
I wonder what is the next item to recommend to the user. Can you help
me decide?"
Output: "1581"
Input: ITEMS PURCHASED: {Soccer Goal Post, Soccer Ball, Soccer Cleats, Goalie Gloves}  CANDIDATES FOR RECOMMENDATION: {Soccer Jersey, Basketball Jersey, Football Jersey, Baseball Jersey, Tennis Shirt, Hockey Jersey, Basketball, Football, Baseball, Tennis Ball, Hockey Puck, Basketball Shoes, Football Cleats, Baseball Cleats, Tennis Shoes, Hockey Helmet, Basketball Arm Sleeve, Football Shoulder Pads, Baseball Cap, Tennis Racket, Hockey Skates, Basketball Hoop, Football Helmet, Baseball Bat, Hockey Stick, Soccer Cones, Basketball Shorts, Baseball Glove, Hockey Pads, Soccer Shin Guards, Soccer Shorts}  RECOMMENDATION: 
Target Output: Soccer Jersey
  • They found that after fine-tuning the T5 model using Hugging Face’s Trainer API (Seq2SeqTrainer for ~10 epochs), they were able to obtain good results! Some example evaluations they provided were:
Input: ITEMS PURCHASED: {Basketball Jersey, Basketball, Basketball Arm Sleeve}  CANDIDATES FOR RECOMMENDATION: {Soccer Jersey, Football Jersey, Baseball Jersey, Tennis Shirt, Hockey Jersey, Soccer Ball, Football, Baseball, Tennis Ball, Hockey Puck, Soccer Cleats, Basketball Shoes, Football Cleats, Baseball Cleats, Tennis Shoes, Hockey Helmet, Goalie Gloves, Football Shoulder Pads, Baseball Cap, Tennis Racket, Hockey Skates, Soccer Goal Post, Basketball Hoop, Football Helmet, Baseball Bat, Hockey Stick, Soccer Cones, Basketball Shorts, Baseball Glove, Hockey Pads, Soccer Shin Guards, Soccer Shorts}  RECOMMENDATION: 
Model Output: Basketball Shoes

Sequential Recommendation

  • Sequential recommendations refer to the process of utilizing the historical activities or interactions of users with items to infer their preferences and make personalized recommendations. By analyzing the sequence of items that users have interacted with over time, a recommender system can identify patterns and make predictions about what items the users may be interested in next.
  • Traditionally, sequential recommendations have been implemented using machine learning libraries specifically designed for this purpose, such as TensorFlow Recommenders. These libraries provide tools and algorithms to model the sequential nature of user-item interactions and make predictions based on that information.
  • However, with the emergence of powerful Large Language Models (LLMs), such as the PaLM API Text service, it is now possible to leverage these models for sequential recommendations as well. LLMs are trained on vast amounts of textual data and have the ability to understand and generate sequences. By using the PaLM API Text service, you can leverage the capabilities of LLMs to analyze the sequence of user-item interactions and generate recommendations based on the inferred user preferences.
  • The advantage of using LLMs for sequential recommendations is that they have a deeper understanding of the textual context and can capture more nuanced patterns in the sequence of interactions. They can consider not only the immediate past interactions but also the broader context and semantic relationships between items. This can lead to more accurate and personalized recommendations for users based on their historical activities.
  • By combining the power of LLMs with sequential recommendation techniques, you can enhance the effectiveness of your recommender system and provide users with tailored recommendations based on their past behaviors and preferences.

  • The example below (source), shows how LLMs can make sequential recommendation with historical preferences.
prompt = """You are a movie recommender and your job is to recommend new movies based on the sequence of movies that a user has watched. You pay special attention to the order of movies because it matters.

User 42 has watched the following movies sequentially:

"Margin Call",
“The Big Short”,
"The Martian",

Recommend three movies and rank them in terms of priority. Titles only. Do not include any other text.

response = palm.generate_text(
   model="models/text-bison-001", prompt=prompt, temperature=0

# 1. The Wolf of Wall Street
# 2. The Social Network
# 3. Inside Job

Conversational Recommendations

  • With prompt engineering, LLMs can be a powerful tool to augment recommender systems as show below by Bard (source)
prompt = """You are a movie recommender and your job is to recommend new movies based on user input.
So for user 42, he is in the mood for some drama movies with artistic elements tonight.
Could you recommend three? Output the titles only. Do not include other text."""
response = palm.chat(messages=prompt)

# Sure, here are three drama movies with artistic elements that I recommend for user 42:
# 1. The Tree of Life (2011)
# 2. 20th Century Women (2016)
# 3. The Florida Project (2017)
# I hope you enjoy these movies!

Text embedding-based recommendation