Vinija's Notes • Internal • Project Deep Dive

Table of Contents {toc}

Philomath - Comprehensive Infrastructure for Training and Productionizing Music Recommendation Models

Data Integration:

Data Sources: The system ingests data from various teams and sources, including the personalization team that maintains user profiles, demographic information, and preference data. The core music data team provides metadata about the music catalog, such as genres, artists, and track features.
Coordination & Synchronization: A well-coordinated process ensures that the data from different teams is aligned, time-stamped, and made available in a consistent format. Regular meetings and communication channels are established with data providers to review and align data requirements.
Data Merging & Transformation: An ETL (Extract, Transform, Load) pipeline is implemented to merge and transform raw data into a unified schema. This involves careful mapping of fields, handling of missing values, data normalization, and ensuring data quality.

Model Training:

Data Preparation: Preprocessed data is split into training, validation, and test sets. Feature engineering is performed to create meaningful attributes that represent the users and music items.
Model Selection & Training: Various recommendation algorithms are explored, such as collaborative filtering, matrix factorization, and deep learning methods. Hyperparameter tuning and cross-validation techniques are applied to optimize the models.
Model Evaluation: Rigorous evaluation metrics such as precision, recall, NDCG, and user satisfaction scores are used to assess the model’s performance. Iterative development ensures that the models are aligned with the business objectives.

Inference and Storage:

Batch Inference: The trained models are used to compute predictions for all users and items in the catalog. These predictions, or “inference results,” are computed periodically or triggered by specific events.
Storage: Inference results are stored in a dedicated database, with careful consideration of the indexing, partitioning, and retrieval needs. This enables efficient serving of personalized recommendations.

Serving & User Experience:

Serving Architecture: A robust serving layer is designed to retrieve and present recommendations to users in real-time. This involves caching strategies, load balancing, and fault-tolerance mechanisms.
Monitoring & Observability: Comprehensive monitoring and logging are implemented to track system performance, user engagement, and model efficacy. Alerts and dashboards provide insights into system health and areas for improvement.
Optimization: Continuous A/B testing and user feedback loops are set up to optimize the user experience and ensure that the recommendations are resonating with the users.

BIST multi armed bandit to rerank

End-to-End Automation:

The entire process from data ingestion to serving recommendations is automated, allowing for rapid iterations and go-to-market efficiency.
Modular components of the system allow for flexibility, enabling different teams or projects to utilize specific parts of the infrastructure as needed.

Certainly! Below, you’ll find a text-based diagram outlining the architecture of the Philomath system using AWS components. I’ll also describe each step in detail to provide a coherent understanding of how the system components interact.

Data Integration & Preprocessing:
- S3 Buckets: Raw data storage from various teams.
- AWS Glue: ETL service to transform and load data into a common schema.
- AWS Lambda: Custom data merging and transformation functions.
- Amazon Redshift: Storing transformed data for analytics.

     Teams' Data --> S3 Buckets --> AWS Glue --> AWS Lambda --> Amazon Redshift

Model Training:
- Amazon SageMaker: Training recommendation models with various algorithms.
- Amazon EMR: Distributed processing for feature engineering and large-scale data processing.

     Amazon Redshift --> Amazon EMR --> Amazon SageMaker

Inference & Storage:
- Amazon SageMaker Endpoint: Model endpoint for generating batch inferences.
- Amazon DynamoDB: Storing inference results for efficient retrieval.

     Amazon SageMaker --> SageMaker Endpoint --> Amazon DynamoDB

Serving & User Experience:
- Amazon API Gateway: Handling API requests for recommendations.
- AWS Lambda: Business logic to serve recommendations.
- ElastiCache: Caching popular recommendations.
- Application Load Balancer: Distributing traffic across instances.

    Client Requests --> API Gateway --> AWS Lambda --> ElastiCache/DynamoDB --> Load Balancer

Monitoring & Observability:
- Amazon CloudWatch: Monitoring system performance and setting alerts.
- AWS X-Ray: Tracing user requests to diagnose and troubleshoot issues.

    Entire System --> Amazon CloudWatch + AWS X-Ray

End-to-End Automation & Coordination:
- AWS Step Functions: Orchestrating the workflow of data processing, training, and serving.
- AWS CodePipeline: Continuous integration and deployment pipeline.

    Entire Workflow --> AWS Step Functions --> AWS CodePipeline

Extract, Transform, Load (ETL) is a process used to collect data from various sources, transform it into a usable format, and then load it into a data warehouse or database. With Amazon Web Services (AWS), there are several components and services that can be used to create an ETL pipeline. Here’s an overview of how ETL can be set up using AWS tools:

Extract
- Amazon S3: Raw data can be collected from various sources and stored in Amazon S3 buckets.
- Amazon RDS/Aurora/DynamoDB: You can also extract data from these databases depending on your data source.
- AWS Glue Crawlers: Crawlers connect to your source data store, extract metadata, and create table definitions in the AWS Glue Data Catalog.
Transform
- AWS Glue: This is a fully managed ETL service that makes it easy to move data between data stores. You can use Glue for data cleaning, transformation, enrichment, and more. It uses Apache Spark behind the scenes to do the heavy lifting.
- Amazon EMR: Amazon Elastic MapReduce provides a managed Hadoop framework that can be used to process large amounts of data. You can use tools like Apache Hive and Apache Pig for transformation.
- AWS Lambda: For lighter-weight transformations, you can use Lambda functions. They can be triggered when new data lands in an S3 bucket, for example.
- Amazon Redshift Spectrum: If you’re using Redshift as your data warehouse, Spectrum allows you to run complex queries on data that resides in S3 without needing to load it into Redshift first.
Load
- Amazon Redshift: A popular choice for a data warehouse. You can load transformed data into Redshift for analytics.
- Amazon RDS/Aurora: You can also load data into these relational databases if they better fit your use case.
- AWS Data Pipeline: This service can help you define data workflows for moving and transforming data and then loading it into the required destination.
Monitoring & Management
- AWS CloudWatch: Monitor the operation of your ETL pipeline and set up alerts for any issues.
- AWS CloudTrail: Keep an audit trail of all actions related to your ETL processes.
Security
- AWS Identity and Access Management (IAM): Control who has access to various parts of your ETL process.
Analytics & Reporting
- Amazon QuickSight: Once your data is loaded into your data warehouse or database, you can use QuickSight for visualizing and analyzing the data.
Orchestration
- AWS Step Functions: Helps to coordinate multiple AWS services into serverless workflows so you can build and update apps quickly.

By combining these tools and services, you can create a powerful and scalable ETL process within the AWS ecosystem, custom-tailored to your specific needs and data types.

Overview

Me project deep dive
Infra + recommender system
complexity
challenges
trade offs in tech/ architecture
build or buy
Focus on details, number os users, num of artists, num of songs
Amazon Music runs like a startup within Amazon. We are not very big org, we run really fast and so often times, we have to wear many hats. It’s a fast paced environment, great for learning.
Non functional requirements:
Streaming should be very low-latency. Music should begin playing within 200ms of a user pressing play.
The system should support a repository of 100 million songs.
5 million active users
peak 10 million active users

Amazon music recommender with Alexa

  Alexa Device       API Gateway        Recommendation          Data Storage             ML Architecture
    + User    ---->  + Routing    ---->  + Real-Time   ---->   + User Profiles    ---->   + Feature Eng.
                    | + Auth      ---->  | + Batch      ---->   | + Music Metadata ---->   | + Model Dev.
                    |                    | + Caching   ---->   | + Recommendations ---->   | + Model Training
                    |                    | + Kafka     ---->   |                            | + Model Serving
                    |                    +------------+      +-----------------+         +--------------+

Description:

Alexa Device + User: Interaction with the user starts here.
API Gateway:
- Routing: Directs the user’s request to the appropriate service.
- Authentication: Handles user authentication and authorization.
Recommendation Processing Layer:
- Real-Time: Real-time processing (e.g., Kafka Streams).
- Batch: Batch processing (e.g., Apache Spark).
- Caching: In-memory caching (e.g., Redis).
- Kafka: Message queues for asynchronous communication.
Data Storage:
- User Profiles: Stores user-related data (e.g., MongoDB).
- Music Metadata: Information about music tracks (e.g., PostgreSQL).
- Recommendations: Storing pre-generated recommendations (e.g., Redshift).
ML Architecture:
- Feature Engineering: Tools for processing large datasets.
- Model Development: Libraries for model creation.
- Model Training: Frameworks for training the models.
- Model Serving: Real-time model serving.

You would also have lines connecting to a Monitoring and Logging module (like Prometheus and ELK stack) from various parts of the architecture, ensuring that system health, performance, and errors are continuously monitored and logged.

In the recommendation layer, the choice between real-time processing, batch processing, caching, and using message queues like Kafka depends on various factors including the use case, data volume, latency requirements, and system complexity. Here’s an explanation of when and why you might choose each component:

Real-Time Processing:
- When: User demands immediate response, like personalized music suggestions based on the current song.
- Why: Real-time processing enables instantaneous analysis of streaming data and can respond in near real-time to the user’s interactions.
Batch Processing:
- When: Data processing can be scheduled and doesn’t need to happen in real-time. E.g., generating daily or weekly personalized playlists for users.
- Why: Batch processing is often more efficient and less costly for handling large datasets that don’t require immediate action.
Caching:
- When: Frequently accessed data that doesn’t change often, like top charts or popular playlists in a specific genre.
- Why: Caching stores data in a way that allows quicker access. It helps in reducing the load on the database and improves response times for common queries.
Kafka (or other Message Queues):
- When: Need to handle high throughput of data or decouple different parts of the system, like streaming user activity for future analysis.
- Why: Kafka can buffer messages, ensuring smooth handling of data spikes, and provides a robust way to connect different parts of a distributed system.

Here’s how they could fit together:

Real-Time: For immediate user interactions, like “Play something I’ll like” – analyze current user profile and activity and respond immediately.
Batch: Nightly/weekly processing to analyze user’s listening habits and generate new recommendations. -> Used for routines, play meditation at night everyday
Caching: For common requests or recently computed recommendations, use caching to speed up the response time.
Kafka: Stream user activity and other events that can be used for both real-time and batch processing.
In many cases, a hybrid approach is used, where real-time and batch processing are combined to leverage the strengths of both. Caching is almost always beneficial for read-heavy workloads, and Kafka or similar systems are often used in complex, distributed architectures to ensure robust data handling.
The decision will depend on the specific requirements of the music recommendation system, such as the need for real-time interaction, the volume of data being processed, the acceptable latency for different types of requests, and the overall complexity of the data processing workflows.
Estimating the exact time it would take to process a general request like “play something relaxing” through ASR, TTS, recommendation system, and finally returning the song would depend on various factors, such as the complexity of the recommendation algorithms, network latency, server load, and more. Here’s a high-level breakdown of the different stages and potential time frames:

ASR (Automatic Speech Recognition):
- Conversion of the spoken request into text.
- Typical Time: 100-500 milliseconds (depending on the ASR model’s complexity and server load).
Request Parsing and Routing:
- Parsing the text request and identifying it as a general request.
- Routing it to the appropriate recommendation engine.
- Typical Time: 10-50 milliseconds.
Recommendation System:
- Identifying the user’s profile and preferences (if personalized recommendations are to be made).
- Running the recommender algorithm (could be real-time or from a cached result).
- Typical Time: 50 milliseconds to 200 milliseconds for real-time; less if cached.
Fetching the Song:
- Retrieving the song from storage or CDN.
- Typical Time: 20-100 milliseconds (depending on network latency, location of the content, etc.).
TTS (Text-to-Speech):
- If a verbal response is required (e.g., “Now playing relaxing music by XYZ artist”), the TTS system will need to convert text to speech.
- Typical Time: 100-300 milliseconds (depending on the TTS model’s complexity and server load).
Streaming the Song to the User’s Device:
- Buffering and initiating the stream.
- Typical Time: 100-500 milliseconds.

Adding all these stages together, you might be looking at a total time frame of approximately 380 milliseconds to 1.65 seconds from the moment the user’s spoken request is received until the song begins playing. Again, these are rough estimates, and actual times could vary significantly based on the specifics of the system, algorithms, network conditions, and other variables. Efficiencies in each step could further reduce this time, providing a swift and seamless user experience.

Engineering

def get_recommendations(request, user_id):
    if is_new_user(user_id) or is_cold_start(user_id):
        recommendations = get_popular_recommendations()
    elif is_popular_request(request):
        recommendations = get_popular_based_recommendations(user_id)
    else:
        recommendations = get_personalized_recommendations(user_id)
    
    # Additional logic for combining, filtering, ranking...
    
    return recommendations

Yes, the get_popular_recommendations function would typically make a call to the backend system where a specific model or algorithm is used to generate popular recommendations. This can be based on various factors like overall popularity among all users, trending content, editor’s picks, etc.
Here’s an expanded view of how this function might look:

import boto3

def get_popular_recommendations():
    # Create a SageMaker runtime client
    client = boto3.client('runtime.sagemaker')

    # Define the payload (e.g., user data, context)
    payload = {...}

    # Call the SageMaker endpoint
    response = client.invoke_endpoint(
        EndpointName='popular-recommendations-endpoint',
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    
    # Process the response
    recommendations = json.loads(response['Body'].read().decode())

    return recommendations

The backend_system.call line represents a call to the specific service or model that’s responsible for generating popular recommendations. It could be a REST API call, a query to a database containing pre-computed popular items, a call to a machine learning model, etc. The details would depend on the architecture and technologies used in the particular system.
Alright, let’s dive deeper into the low-level architecture of a music recommendation system on AWS, focusing particularly on the data flow, infrastructure interactions, and potential tools used at each step.

User Interaction Layer:
- Amazon Alexa: Receives the user’s voice command.
- AWS Lambda: On-demand computation to process the user’s request. Converts the raw voice signal to a structured request.
- Amazon API Gateway: Exposes RESTful endpoints for fetching recommendations. Helps in rate limiting, caching responses, and endpoint security.
NLU & ASR Processing:
- Amazon Transcribe: Converts the voice input to text.
- Amazon Comprehend or a Custom Model in SageMaker: Extracts intent from the text. For instance, “Play something relaxing” might be converted to {intent: "play_music", mood: "relaxing"}.
Fetching Recommendations:
- Amazon SageMaker: Holds the trained recommendation model. The endpoint here receives the structured request and returns a list of recommended song IDs or URIs. Personalization models would need user context which can be fetched from…
- Amazon DynamoDB: Fast key-value lookup to get user profiles, history, or cached recommendations.
Fetching Song Metadata:
- Amazon RDS (with PostgreSQL or MySQL): Contains details about each song, artist, album, etc. Once the song IDs are fetched, metadata can be retrieved for playback.
User Feedback Loop:
- Amazon Kinesis Data Streams: Captures real-time user activity data. For instance, if a user skips a song, this feedback is invaluable.
- AWS Lambda & AWS Glue: Processes the Kinesis stream and updates user profiles, computes aggregates, or triggers model re-training in SageMaker.
Storage and Data Management:
- Amazon S3: Raw storage. Stores user interaction logs, batch processing results, and model training datasets.
- AWS Glue Catalog: Manages metadata of the data stored, making it easily discoverable and queryable.
- AWS Glue ETL: ETL jobs to transform, clean, and prepare data for model re-training or analytics.
Infrastructure Management and Monitoring:
- AWS CloudWatch: Monitors system health, logs, and sets up alerts.
- AWS IAM: Manages user roles, permissions, and secure access to resources.
- Amazon CloudFormation: Ensures that infrastructure provisioning is repeatable and versioned.

Low-Level Flow Diagram:

[User Voice Input]
       |
       V
[Amazon Alexa] --> [AWS Lambda] --> [Amazon Transcribe]
       |
       V
[Amazon Comprehend/SageMaker NLU]
       |
       V
[Structured Request (e.g., {intent: "play_music", mood: "relaxing"})]
       |
       V
[API Gateway] --> [Lambda (Handler)]
       |
       V
[SageMaker Recommendation Endpoint]
       |
       V
[DynamoDB (User Profiles) & RDS (Song Metadata)]
       |
       V
[Return structured song data to Alexa for Playback]
       |
       V
[Kinesis (User Interaction Stream)]
       |
       V
[Lambda & Glue (Processing & ETL)]
       |
       V
[S3 & DynamoDB (Data Storage & User Profile Update)]

Fetching Recommendations:
Amazon SageMaker: Holds the trained recommendation model. The endpoint here receives the structured request and returns a list of recommended song IDs or URIs. Personalization models would need user context which can be fetched from…
Amazon DynamoDB: Fast key-value lookup to get user profiles, history, or cached recommendations.
Fetching Song Metadata:
Amazon RDS (with PostgreSQL or MySQL): Contains details about each song, artist, album, etc. Once the song IDs are fetched, metadata can be retrieved for playback.
User Feedback Loop:
Amazon Kinesis Data Streams: Captures real-time user activity data. For instance, if a user skips a song, this feedback is invaluable.
AWS Lambda & AWS Glue: Processes the Kinesis stream and updates user profiles, computes aggregates, or triggers model re-training in SageMaker.
Storage and Data Management:
Amazon S3: Raw storage. Stores user interaction logs, batch processing results, and model training datasets.
AWS Glue Catalog: Manages metadata of the data stored, making it easily discoverable and queryable.
AWS Glue ETL: ETL jobs to transform, clean, and prepare data for model re-training or analytics.
Infrastructure Management and Monitoring:
AWS CloudWatch: Monitors system health, logs, and sets up alerts.
AWS IAM: Manages user roles, permissions, and secure access to resources.
Amazon CloudFormation: Ensures that infrastructure provisioning is repeatable and versioned.
Please note that the above is a basic layout and in a real-world scenario, there would be additional layers, error handling mechanisms, data backup strategies, etc. The setup can be further optimized with additional AWS services and configurations.

[User makes request]
        |
        V
[AWS Lambda retrieves user profile from DynamoDB]
        |
        V
[Enough data for personalization?]
      /    \
    No      Yes
   /          \
Cold Start    Personalized
Recommendation Recommendation
  (e.g.,      (e.g.,
  Top Hits)    User's Genre Preferences)

Cold Start

User
Item

Popular based

In app construction of playlist

Personalized recommender runs, spits out a series of recommendations, filter out invalid candidates
Engineers take that and display it on the app
mobile development

Failed experiment: Next Best Action

Flow

The infrastructure for utilizing Apache Spark for data cleaning, followed by model training and then serving the model for predictions, involves several interconnected components. Here’s a breakdown:

1. Data Ingestion and Cleaning:

Data Source: Collect data from various sources such as logs, databases, or external APIs.
Data Storage: Store raw data in a distributed file system like Hadoop HDFS or cloud storage like Amazon S3.
Apache Spark: Use Apache Spark for distributed data processing.
- Spark SQL: For data cleaning, transformation, and feature engineering.
- Cluster Manager: Use a cluster manager like YARN or Mesos to manage resources.
- Compute Resources: Provision compute nodes (either on-premises or in the cloud) for Spark cluster.

2. Model Training:

Preprocessed Data: Use the cleaned and preprocessed data from Spark as input for model training.
Training Framework: Select a machine learning framework such as TensorFlow, PyTorch, or use Spark MLlib for training.
Training Infrastructure: Can be GPU-enabled machines for deep learning or CPU clusters.
Model Validation: Split data into training and validation sets, and use metrics for model evaluation.

3. Model Serving:

Model Export: Export the trained model in a format suitable for serving (e.g., TensorFlow SavedModel).
Model Server: Utilize a model serving solution like TensorFlow Serving or Amazon SageMaker.
Load Balancer: Implement a load balancer to distribute inference requests across multiple model servers.
Caching Layer: Optional caching (using tools like Redis) to store frequent predictions.

4. Monitoring and Maintenance:

Logging and Monitoring: Implement logging and monitoring to track system performance, errors, etc., using tools like Prometheus and Grafana.
Continuous Training: Set up continuous training pipelines to regularly retrain the model with new data, using tools like Apache Airflow or Kubernetes.

5. Integration:

API Gateway: Expose the model serving endpoint through an API Gateway to provide secure and controlled access.
Integration with Applications: Provide SDKs or APIs to allow integration with web, mobile, or other client applications.

6. Compliance and Security:

Data Security: Ensure data encryption, access controls, and compliance with legal requirements.
Authentication & Authorization: Implement proper security controls for accessing the APIs.

7. Scalability and Performance Optimization:

Auto-Scaling: Implement auto-scaling to handle varying loads.
Optimized Resource Utilization: Use resource optimization techniques to ensure efficient utilization of hardware.

8. Technology Choices:

Orchestration & Containerization: Depending on the complexity, you may use containerization like Docker and orchestration with Kubernetes.
Cloud vs On-Premises: Depending on requirements, the whole pipeline could be on-premises, cloud-based, or a hybrid solution.

This end-to-end infrastructure facilitates seamless data cleaning with Spark, followed by model training and serving, forming an integrated machine learning pipeline. Depending on the specific use case and requirements, the individual components and technologies may vary.