Vinija's Notes • Pinterest

Overview

L15: 388,500 - 525,350 Hiring Manager: Scope & Impact, Interest for Pinterest, and working well with others
- Perform at L6 level to get L5
- Large scope and large impact
- Lead people, larger scope, outside your team, collaboration and lead those
- Transformers, ML
- CPU vs GPU
- Relu vs sigmoid
- L1 vs L2
- Convex vs non-convex optimization
- Explain, don’t spend too much time on ML
Coding:
- Medium Leetcode BFS, DFS
- Need a solution and it has to be optimal
- Graph-related problems in coding
- Write clean code
- Knowledge of algorithms and data structures
- Space and Time complexity
  - Issue last time: it was brute force
- Debugging your code/well-structured
- Writing mini test cases
Feedback:
- Up-leveled, communication very good
- Limited hands-on with recommendations, especially ranking
- Not L6, good for ATG - research group
- Good general ML deep learning skills
- Communication perfect, clear and concise
- Some hands-on with LLM
- Good at clarifying
- Didn’t reflect on the design
- Spent too much time, needed help staying on track
- Had good breadth, but not depth
- Issues with batch norm in ads ranking
  - Mention switch to layer norm
  - Moving average
  - Gradient clipping for batch norm, why, not explained
- Online feature loading, not described
- Proactive
- Knowledge depth for L6
- Didn’t give a definite no
- Maybe for L5
System Design:
- Weak yes from infra, no from ML
- Scaling system from few million to billions
- Petabytes of data, latency, speed, efficiency, tradeoffs, larger models, algorithmic scaling
- Read ML blog but needs more depth
- Data storage, how are they, model is fast, efficient, tradeoffs on platform
ML Practitioner:
- End-to-end ML
- She-feng, breadth and depth in end-to-end ML, modeling about Pinterest, beyond 2 tower models
- Data generation, positive and negative samples, feature engineering, finetuning, deep learning, training offline, online, production
- Models in production
- Model deployment, drift
Scaling:
- How to train a model at a large scale
- Hands-on experience person would know
- Shu-feng: No for L6 but she didn’t say No which is a big deal because she usually says not good for Pinterest. And she got promoted as did Shu Zhang

Overview

TL;DR: ML Practitioner focuses heavily on modeling and ML systems design focused more on ML systems. Question, case studies, and interview rubric will be focused on ranking and recommendations.

ML Practitioner (60 min)

This portion of your onsite interview will consist of a 60-minute session assessing your ability to develop and own models from start to finish. During this interview, you will meet with a ML Engineering leader who will want to grasp your practical understanding of state of the art ML architectures and modeling techniques (focus on features, models, training, model evaluation, etc.)
You will be evaluated on the following focus areas:
Problem Exploration: Your ability to think from first principles, and articulate the problem you are trying to solve; framing it as a machine learning problem.
Training Data/Data Set Generation: Identifying methods to collect training data, and conversations around constraints/risks. Identifying and describing the labels used, and justifying their choice.
Model Selection: Describe the ML model(s) you want to use. Expect to answer knowledge questions around your selected models. This may include describing the loss function, and tradeoffs in the model(s) you choose.
Feature Engineering: Identifying relevant ML features for your model and describing how to build these features.
Evaluation: How you would measure the success of the model you intend to propose.

ML System Design (60 min)

This portion of your onsite interview will consist of a single 60-minute session. We are looking for a signal on if you are comfortable working with ML in a real-world, internet scale environment. Please note that even though this is not a deep architecture/infrastructure interview, there is still expectation that you have the knowledge to build machine learning systems at scale.
In this session, you may be asked about how infrastructure choices affect modeling capabilities, etc. Some examples of ML-related systems or challenges that you may be tested on include recommender systems, model training, model serving, model lifecycle management, model monitoring, batch data processing, feature management, etc.
You will be evaluated in this session based on the following criteria:
In a system design question, you will be given a real-world problem related to ML and be asked to design a system to solve this problem
We will see how you understand and set up the problem, abstract it, break it into components, and talk about how they all fit together
We will ask you about the details for different parts
We may change or add requirements and ask you to modify the system accordingly
We may ask you to compare alternative solutions and evaluate each of their pros and cons.
We may ask you how to design a proper data pipeline and feedback loop
We may ask you to optimize the system for a certain goal
If needed, the following is an ML Sys Design Interview Guide one of our very own ML Eng Managers created and may be a helpful resource: http://patrickhalina.com/posts/ml-systems-design-interview-guide/

ML Practitioner

Design Unsafe content detection system

Click probability

ML Domain. The question was how to predict click probability. There were many follow-up questions, such as when I mentioned logistic regression requires scaling and normalization of data, the follow-up question was what else could be done to enhance performance, which trees don’t require. There were also many questions about neural network parameters, how to solve position bias. Furthermore, the question was about the fact that the data used for training comes from after the auction, but the model’s function is to provide a basis for the bid during the auction, which means the training data and prediction data are not in the same distribution, how to deal with it?
Data intuition: This round is one of the groups to be added. Given the data of impression and click rate of different ads and pins on Day 0, it is required to optimize click rate with the budget contraint in number of total impressions and how to allocate impression on each ad and pin. It took me a long time to figure out the distribution of click rate. It is a beta distribution. I think this round of questions can be answered from the perspective of MAB
To address the various facets of the ML Domain question:

Predicting Click Probability:

Beyond Scaling and Normalization: Additional steps to enhance the performance of a model like logistic regression might include:
- Feature engineering to create more informative variables.
- Feature selection to remove irrelevant or redundant data.
- Regularization techniques (like L1 or L2) to prevent overfitting and help with feature selection.
- Ensemble methods that combine the predictions of several models to improve generalizability.
- Cross-validation to ensure that the model performs well on unseen data.

Trees Not Requiring Normalization:

Decision trees and their ensemble methods like Random Forests or Gradient Boosting Machines (GBM) don’t require feature scaling or normalization because the tree structure will split the nodes based on order and not the scale of the features. Thus, these algorithms are invariant to the scale of variables.

Neural Network Parameters and Position Bias:

To deal with position bias in neural networks:
- You could include position as a feature in the model so the model learns the effect of position on click probability.
- Use a re-ranking algorithm to mitigate the position bias. This can be done by learning a bias model which predicts the probability of a click just based on position, and then using it to adjust the prediction of the actual click model.
- Implement counterfactual methods like inverse propensity scoring to re-weight the training instances to correct for the bias in position.

Training on Post-Auction Data for Pre-Auction Prediction:

When the training data (post-auction) and prediction data (pre-auction) are not in the same distribution, you’re dealing with a classic problem of data shift or domain adaptation. Here are some strategies to address this issue:
- Covariate Shift Adjustment: Adjust the weights of the training samples to make the training data more closely resemble the distribution of test data.
- Domain Adaptation Techniques: Apply algorithms that can transfer knowledge from one domain to another, such as transfer learning or feature representation transfer.
- Model Regularization: Use regularization methods to prevent overfitting to the training data distribution, which may not be representative of the prediction data distribution.
- Synthetic Data Augmentation: Generate synthetic data that follows the distribution of the pre-auction data to augment the training set.
- Continuous Learning: Implement a system where the model can continuously learn from the new data coming from auctions to adapt to the shifts in data distribution.
- Feedback Loops: Incorporate real-time feedback into the model to adjust bids based on real-world performance.

In practice, you would likely need to combine several of these techniques to effectively handle the discrepancies between training and prediction data distributions. It’s also important to regularly evaluate the model on new data and recalibrate or retrain as necessary to account for changes over time.

ML System Design

Design pin recommendation system

Search autocomplete

The question was about designing search autocomplete recommendation, which I have seen many times on forums, but the interviewer said they didn’t want to hear about the NLP modeling part. They first wanted a concrete solution on how to store historical search queries. I was baffled for a moment, thinking of using a trie as I had seen on LeetCode, but before I could finish, I was interrupted and asked to practically explain how to store these unstructured data to make retrieving queries efficient. My thoughts were in disarray, hashing? How to hash… I had prepared for this ML design question the day before, not expecting to spend the entire time on questions that seemed more like infra (infrastructure) issues.
Designing an efficient storage and retrieval system for a search autocomplete recommendation feature involves understanding data structures that can support quick look-up and retrieval of strings. Here’s how one might address the question:

Using a Trie for Storage:

Explanation of Trie:
- A trie, also known as a prefix tree, is a tree-like data structure that stores a dynamic set of strings, where the keys are usually strings. It is optimized for retrieval of keys (in this case, search queries). Each node represents a character of the alphabet, and by traversing down a path, you can retrieve a whole string.
- Tries are particularly efficient for autocomplete systems because they allow for prefix searches, which are essential for autocompletion functionality.
Efficiency of Trie:
- Retrieval of terms is efficient in a trie because common prefixes of strings are stored only once. This is particularly advantageous for an autocomplete system where many strings share the same prefix. For example, if many users search for “photosynthesis,” then “photo,” “photos,” and so forth will all be stored once in the trie, with different endings for different full search strings.

Storing Unstructured Data:

Database Storage:
- If the search queries are to be stored in a database, you might use a NoSQL database like MongoDB, which can store unstructured data in a flexible, JSON-like format. This allows for efficient storage and retrieval of unstructured data like search queries which might have additional metadata associated with them.
Indexing for Efficiency:
- Indexing is crucial. Using an inverted index, where each unique word points to a list of positions where it appears, can speed up the retrieval process. For large-scale systems, an indexing service like Elasticsearch can be used, which stores data in a way that optimizes quick search and retrieval.

Hashing for Quick Lookup:

Implementing Hashing:
- Hashing can be used to quickly check if a search query has been seen before. A hash function can convert the search string into a hash code, which can then be used to look up the string in a hash table.
- Implementing a consistent and collision-resistant hash function is key. This would ensure that search queries are distributed uniformly across the storage space.
Dealing with Collision:
- In a hash table, collision resolution is important. Techniques like chaining (where each bucket is a linked list of entries that hash to the same bucket) or open addressing (where collision resolution happens through probing for the next available slot) can be used.

Practical Considerations:

Scalability:
- The system must be scalable to accommodate the growing number of search queries. Distributed databases and sharding strategies might be necessary for this.
Cache Frequently Used Queries:
- A caching layer, like Redis, can be used to store and retrieve the most frequent queries to reduce latency.
Handling Real-Time Data:
- The autocomplete system should also be able to handle real-time data. This could be facilitated by streaming data processing systems like Kafka, which can provide a real-time pipeline for new or trending queries.

1) For example, I mentioned that logistic regression requires scaling and normalization of data. The follow-up question was what else could be done to improve performance that trees do not require: One point is that LR assumes a linear relationship among inputs and cannot capture cross-product relationships; it needs manual feature engineering, like using tuples ‘AND’. In contrast, tree structures are nonlinear models that can inherently capture cross-product features. Another point is that trees can handle categorical features, while LR needs to convert these into vectors or embeddings.

2) There were also many questions about neural network parameters, such as how to solve position bias. Furthermore, there was a question because the data used for training was post-auction, and the model’s function is to provide a basis for bids during the auction. That is to say, the training data and prediction data are not in the same distribution. How to deal with it? Position bias: When training models, take position bias into account as parameters, or as separate modules. When inferring, set position bias as a fixed number, or remove this module. I can’t think of an answer to the second question, can someone more knowledgeable provide guidance?