• Moderation is a vital component of any recommender system as it ensures the systems quality, safety and trustworthiness. The goal of content moderation in recsys is twofold.
    • Firstly, it involves establishing and enforcing policies and guidelines that define what content is acceptable and what is not. These policies cover a wide range of areas, such as hate speech, violence, nudity, misinformation, and copyright infringement, among others. By defining clear guidelines, recsys can create a safe and inclusive environment for users.
    • Secondly, content moderation in recsys involves implementing robust mechanisms to identify and filter out content that violates the established policies. This process may include various techniques, such as automated algorithms, machine learning models, and human moderation teams. These mechanisms aim to detect and remove inappropriate or harmful content, ensuring that only relevant and trustworthy recommendations are presented to users.
  • Leading digital platforms establish comprehensive moderation policies that define what content is permissible and what violates their terms of service. These policies serve as guidelines for determining which types of content should be allowed and which should be restricted. They cover various aspects, such as hate speech, violence, explicit material, harassment, or other forms of prohibited content.
  • Moderating content is a delicate balance between protecting the users of the platform while maintaining freedom of expression and diversity of opinion and thus, it involves a continual effort of maintaining and evolving policies and moderation techniques.
  • Content moderation in RecSys is a complex task that requires a combination of automated systems, human expertise, and user participation. By implementing robust moderation policies and processes, platforms strive to create safe and inclusive environments where users can interact with content that aligns with the platform’s guidelines and community standards.
  • In this article, we wil delve deeper into different ways this is done.

Types of Moderation

  • First, let’s start by looking at the different types of moderation a recommender system would need.
  • Overall, in any recommender system platform, there are three types of moderation needed: Text, Image, and Video.
  • For each of these modalities, we have different methods of execution. Either we can leverage existing APIs or custom build the moderation filter using existing pretrained models. Below, we will delve deeper into each one.


  • Text moderation involves removal of hateful, overtly sexual, violent, or self-harm speech depending on the policies generated by the recommender system platform.
  • Text moderation can be done leveraging NLP techniques such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), or transformer models like BERT or GPT can be used for text classification, sentiment analysis, and identifying offensive or harmful content. Additionally, word embeddings (such as GloVe, FastText, or Word2Vec) place vector representations of words in a similar embedding space and that fact can be exploited to detect inappropriate language.
  • Moreover, techniques such as sentiment analysis, text classification, and named entity recognition can be implemented using libraries like NLTK (Natural Language Toolkit), spaCy, or TensorFlow to detect hate speech, spam, or misinformation.
  • Existing tooling includes (OpenAI), which offers a classifier that can be used over an API for hate, self-harm, sexual, or violent text to be moderated. Additonally, WebPurify, Microsoft Content Moderator, or Google Perspective API offer profanity filtering capabilities to identify and block explicit or offensive language.


  • Image moderation includes removal of sexual, violent, or other policy breaking content.
  • To build a custom image moderation, YOLO can be leveraged by being adapted to detect explicit content such as nudity and violence. It is an object detection algorithm that can identify and localize objects within an image in real-time.
  • Additional models like VGG16, Inception, or ResNet can be used for image classification and object recognition tasks by being trained on a large dataset of explicit image data.
  • Existing tooling includes Google Cloud Vision, Amazon Rekognition, or Microsoft Azure Computer Vision provide pre-trained models for explicit content detection, facial recognition, and object recognition to moderate images.


  • Video moderation include removal of violent, sexual, hateful, or dangerous content.
  • Video moderation can be broken down into speech recognition, and copyright infringement detection. Let’s look into each.
  • Speech recognition: Say someone tries to use the platform to promote a dangerous or illegal activity, the platform cna leverage the speech content of that video and moderate it.
  • Copyright infringement: If the user tries to repost content, such as a song or a movie, that is not owned by them, video moderation can help in terms of removing that content.
  • Tools and Techniques:
    • Services like Google Cloud Speech-to-Text, IBM Watson Speech to Text, or Microsoft Azure Speech to Text can transcribe and analyze audio content within videos to detect hate speech, profanity, or other forms of harmful language.
    • RNNs, particularly Long Short-Term Memory (LSTM) networks, can be used to process sequential data within videos, enabling tasks like speech recognition or identifying patterns of hate speech or offensive language.
    • Optical Character Recognition (OCR) techniques, combined with NLP models, can be used to extract text from video frames, enabling analysis and moderation of text content within videos.
    • Video frames can be treated as a sequence of images, and 3D CNNs, such as C3D or I3D, can be used to analyze temporal information and detect specific actions, objects, or explicit content within videos.


  • Let’s look at the overall workflow of a recommender system and how it pertains to moderation. We will then look into specific items within this flow.
    1. Integrity Processes: Major digital platforms have implemented sophisticated moderation policies to govern the publication, sharing, and amplification of content. Once the inventory is compiled, it undergoes scanning to identify any content that violates the platform’s policies. Additionally, there is a focus on identifying “borderline” content, which may not explicitly breach the platform’s terms of service but could potentially be problematic or offensive.
    2. Candidate Generation: Following the integrity processes, recommender systems engage in candidate generation or retrieval. This step involves reducing the vast number of items in the inventory to a more manageable set. Instead of individually ranking each item, approximate nearest neighbor (ANN) searches are often utilized. These searches identify items that align with a user’s preferences and interests, creating a preliminary selection of potential candidates.
    3. Ranking: Once the candidate pool is formed, recommender systems employ deep learning recommendation models to estimate the likelihood of user engagement with each item. By training these models, the system ranks the candidates based on their predicted relevance to the user’s interests. This ranking process considers factors such as user behavior, preferences, and other relevant signals.
    4. Re-ranking: While ranking algorithms have significantly improved, they can still benefit from additional refinement. To avoid repetitive or biased recommendations, a re-ranking step is introduced. This post-ranking phase incorporates hand-coded rules to ensure a diverse selection of content types and authors within the final ranked list. By promoting variety, the re-ranking step enhances the overall user experience and prevents monotony.
  • By incorporating these steps into the recommender system’s workflow, content moderation, integrity, and user-centric ranking are addressed, contributing to responsible AI practices.

Early and Late stage Moderation

  • Now, let’s look into which stage these moderation are executed in and what are the differences. Both early stage and late stage moderation are important components of content filtering in recommender systems.
  • Early stage moderation focuses on automated techniques to quickly filter out obvious violations, while late stage moderation involves human intervention and review to handle more nuanced and complex cases.
  • The combination of these approaches helps maintain a safe and responsible user experience in recommender systems. Below we will look at both of these stages in more detail.

Early Stage Moderation:

  • The first stage of content moderation involves removing or flagging undesirable items from a pool of content. Moderation is a complex process that includes policy-making, human content raters, automated classifiers, and an appeals process.
  • In this context, moderation specifically refers to the automated processes that remove content from consideration for recommendation. Companies can be held responsible for hosting content related to various issues like copyright infringement, defamation, child sexual abuse material (CSAM), or hate speech. Platforms also have policies to filter out harmful content such as nudity, coordinated inauthentic behavior, or public health misinformation.
  • Automated filters play a significant role in moderating content, catching different categories of undesirable content. The actions taken on filtered items depend on the category and platform policies. They may be removed from consideration for recommendation or flagged for down-ranking at a later stage.
  • It’s important to first understand the platform and companies policies and to identify if the action to take here it to eliminate these videos, moderate them for an age group, or allow them for everyone.
  • At the early stage, basic pre-processing filters can be applied to the content to remove obvious violations or low-quality items. Below we will look at a few techniques that can work for early stage moderation.
    1. Content Scanning: Once the inventory of available content is compiled, it undergoes scanning processes to identify any violations of the moderation policies. This scanning can involve automated techniques, such as machine learning algorithms, natural language processing, image recognition, or audio analysis. These methods help to detect explicit or harmful content that clearly violates the platform’s policies.
      • This can include filtering out explicit or inappropriate language, spam, or irrelevant content.
    1. Rule-based Filtering is another technique that can be used earlier in teh pipeline. Rule-based filters are simple and predefined rules that are applied to the content. These rules can target specific types of content or behavior that violate platform guidelines. For example, rules can be set to flag or remove content that contains hate speech, personal attacks, or copyrighted material.
    1. Automated content analysis techniques, such as keyword matching or pattern recognition, can be used to identify potentially problematic content. This can involve scanning text, images, or metadata for specific keywords, symbols, or patterns associated with violations.
  • The image below, (source) depicts this idea visually.

Late Stage Moderation

  • Late stage moderation usually happens right before the user is served their recommendations, usually at the last or fine ranking layer. At this stage, the recommender system has filtered out most of the candidates and thus, can have more sophisticated models performing moderation as the data is not as large.
  • The image below, (source), shows how at the last layer, a few more items have been removed from the recommendation list.
  • Late stage moderation leverages machine learning algorithms to identify and classify questionable content. The system can train models to detect patterns, explicit content, hate speech, or other forms of harmful or misleading information and can continuously improve these algorithms by incorporating user feedback and expert annotations.
  • It’s important at this stage to modify the recommendation algorithms to consider factors beyond engagement metrics alone. Including content quality, credibility, user preferences, and diversity in the recommendations can be highly beneficial. Balance engagement with responsible content curation to ensure a healthy and trustworthy user experience.
  • Late stage moderation also involves considering user feedback and reports about content. Users can report content that they find offensive, inappropriate, or in violation of platform guidelines. These reports are reviewed by human moderators who make final decisions on whether the reported content should be removed or moderated.
    • Late stage moderation often involves human moderators who manually review flagged content to make nuanced decisions. Human moderators can apply context, judgment, and domain knowledge to evaluate the content and determine if it violates platform policies. They play a crucial role in handling complex cases that require subjective judgment or understanding of cultural nuances.
  • Late stage moderation requires iterative learning to keep the moderation fresh and current. Late stage moderation also benefits from the feedback loop created by user reports and human review. The data collected from user reports and moderator decisions can be used to improve the moderation algorithms and models in the early stage, enhancing the system’s ability to detect and filter out problematic content.

Continuous Improvement

  • Late stage moderation will also regularly monitor the platform’s content and user feedback to identify emerging trends or loopholes that allow questionable content to surface. It will keep refining the strategies and algorithms based on insights and user behavior patterns.
  • Content moderation in recommender systems plays a vital role in maintaining the integrity of the platform and ensuring that users are presented with appropriate and safe content.
  • Content moderation in RecSys is an ongoing effort. Platforms continuously review and refine their moderation policies and techniques based on emerging challenges and user feedback.
  • Regular audits, evaluations, and updates are conducted to ensure that the moderation processes are effective, adaptive to evolving standards, and aligned with community guidelines.

Reactive Moderation

  • After the content is published on the platform, the platform can leverage help by the users to report any content that violates their policy.
  • The platform should encourage users to provide feedback and report any problematic videos they come across. This can be done by creating a user-friendly reporting system that allows users to easily flag inappropriate content.
  • User reports and feedback serve as valuable signals to identify potentially problematic content that may have slipped through the initial moderation processes. Platforms take user reports into consideration and investigate reported content to ensure a safe and inclusive user experience.
  • Techniques such as employing human moderators, using automated algorithms for content analysis, and providing reporting mechanisms for users to flag problematic content, can be beneficial for removing content once it’s been published.

Borderline Content

  • In addition to identifying content that directly violates the policies, content moderation also addresses “borderline” content, which may not explicitly breach the platform’s terms of service but could potentially be problematic or offensive.
  • Handling borderline content is a challenging task for content moderation systems, as it involves content that is close to violating platform guidelines but doesn’t clearly cross the line. The approach to handling borderline content can vary depending on the platform and its specific policies.
  • Borderline content could include content that may not be suitable for wide sharing or content that has the potential to cause controversy or harm and thus, these platforms employ additional measures to assess and handle such content appropriately.
  • One strategy used is to employ human reviewers. These reviewers possess expertise in understanding complex contextual nuances and can make judgments that automated systems may struggle with. Human review processes are crucial for evaluating borderline content and determining the appropriate course of action based on the platform’s policies.
  • The platform can choose to limit or remove borderline content completely and it can leverage signals, such as dislikes, comments, reporting, to make those decisions.
  • The image below, (source) shows how borderline content is filtered.

Responsible AI

  • Content moderation in recommender systems is related to responsible AI. Responsible AI encompasses the ethical and accountable use of artificial intelligence technologies, ensuring they are designed and deployed in a manner that considers the potential impact on users, society, and various stakeholders. Content moderation in RecSys aligns with the principles of responsible AI by addressing the following aspects:
    1. User Safety: Content moderation aims to create a safe environment for users by removing or restricting access to harmful or inappropriate content. This helps protect users from potential risks such as exposure to explicit or offensive material, hate speech, or other forms of harmful content.
    2. Fairness and Bias: Content moderation processes should be designed to ensure fairness and avoid biases in the treatment of different types of content or user groups. It is important to establish clear guidelines and standards to prevent discriminatory practices and ensure equal treatment of all users.
    3. Transparency and Accountability: Responsible AI requires transparency and accountability in decision-making processes. Content moderation should provide clear explanations for the reasons behind removing or restricting certain content, and platforms should be accountable for the actions taken. Users should have the ability to understand and question the moderation decisions made by the platform.
    4. User Empowerment and Privacy: Responsible AI promotes user empowerment and respect for privacy. Users should have control over their content and be able to report objectionable material. Privacy considerations should also be taken into account in the moderation processes, ensuring that user data is handled securely and confidentially.
    5. Continuous Evaluation and Improvement: Responsible AI involves continuous evaluation and improvement of AI systems. Platforms should regularly assess the effectiveness and impact of content moderation processes, seeking feedback from users and stakeholders. This iterative approach helps identify and address potential biases, shortcomings, or unintended consequences in the moderation system.
  • By integrating responsible AI practices into content moderation processes, recommender systems can strive to create platforms that prioritize user safety, fairness, transparency, and accountability.

YouTube’s Moderation Process

  • Let’s look at the steps of how YouTube handles content moderation, as explained in their blog.
    1. Developing Policies for a Global Platform:
    • YouTube strives to strike the right balance between preserving free expression and ensuring a safe and vibrant community. To achieve this, they have a dedicated policy development team that regularly reviews and updates their policies. Updates often involve clarifications and addressing areas that may be vague or confusing to the community. When dealing with complex issues, YouTube consults external experts and creators to understand shortcomings in current policies and considers regional differences to ensure fairness worldwide.
      1. Using Machines to Flag Bad Content:
    • Once a policy is defined, YouTube relies on a combination of people and technology to identify and flag content for review. Digital fingerprints or hashes are utilized to detect known violative content even before it is made available. Machine learning technology plays a crucial role in identifying potentially violative content by detecting patterns and similarities to previously removed content. While machines are effective at detecting certain types of content, such as spam or adult content, categories like hate speech require human review for nuanced decision-making. Automated systems are constantly updated and improved to enhance detection accuracy. In the second quarter of 2019, over 87% of the 9 million removed videos were initially flagged by automated systems.
      1. Removing Content Before It’s Widely Viewed:
    • YouTube prioritizes removing content that violates their rules before it gains significant viewership. Their automated flagging systems have been improved to detect and review content even before community flagging occurs. In the second quarter of 2019, more than 80% of auto-flagged videos were removed before receiving any views. YouTube’s Intelligence Desk monitors news, social media, and user reports to proactively identify emerging trends related to inappropriate content, enabling them to address issues promptly.
      1. Reducing Exposure to Violent Videos:
    • YouTube is committed to reducing exposure to videos that violate their policies. They have a substantial workforce of over 10,000 people dedicated to detecting, reviewing, and removing content that violates guidelines across Google. It is important to note that videos violating policies generate only a fraction of the overall views on YouTube.
      1. Balancing Technology and Human Expertise:
    • YouTube’s Community Guidelines Enforcement Report showcases how their technological advancements have facilitated faster removal of harmful content. However, human expertise remains essential in policy development, content review, and responsible deployment of machine learning technology.
  • YouTube continually develops and updates policies, utilizes machine learning for content detection, proactively removes content before widespread viewership, and combines technology with human expertise to ensure a safe and responsible platform for its users.
  • Addressing borderline content and harmful misinformation is a priority for YouTube. While such content represents only a small fraction of what users watch, even that percentage is considered too much. To combat this issue, YouTube has taken steps to reduce recommendations of borderline content and videos that can potentially misinform users. This initiative is gradually expanding to more countries, including non-English-language markets.
  • The process involves relying on external evaluators worldwide who assess video quality based on public guidelines. Each video is reviewed by multiple evaluators, including certified experts in specialized areas like medicine. Their consensus input helps develop well-tested machine learning models that analyze hundreds of thousands of hours of videos daily. These models play a crucial role in identifying and limiting the spread of borderline content. As time progresses, the accuracy of these systems will improve further.