• In similarity searching, there is often a query record that is compared against a stored database of records (documents or images etc). The main aim is to retrieve a set of database records that are similar to the query record.
  • Similarity search works with semantic representations of data and finds the similar items fast. This means when we represent images or pieces of text as vector embeddings, their semantic similarity is represented by how close their vectors are in the vector space. Hence, what we want to look at is the distance between vectors of the objects.
  • The following article has been written by Arpita Vats.


  • Image retrieval, document retrieval, recommender systems, image recognition, similar movies/items etc.

Hierarchical Navigable Small Worlds (HNSW)

  • HNSW is a data structure and algorithm used for approximate nearest neighbor search. It was introduced by Malkov and Yashunin in 2016 as an improvement over the traditional k-d tree and other methods for efficient nearest neighbor search in high-dimensional spaces.
  • HNSW is based on the concept of constructing a hierarchical graph where each node represents a point in the dataset.
  • The graph is built in a way that allows for efficient navigation through the data to find approximate nearest neighbors. The key idea is to maintain connections between nodes in such a way that the graph exhibits small-world properties, meaning that even in high-dimensional spaces, the average distance between nodes remains relatively small.

Approximate Nearest Neighbors

  • Approximate Nearest Neighbors (ANN)-based search techniques like Hierarchical Navigable Small Worlds (HNSW) are commonly deployed to carry out similarity search.
  • Approximate Nearest Neighbor (ANN) methods are commonly used on top of collaborative filtering or content-based filtering for candidate generation due to the following reasons:
    1. Scalability: Collaborative filtering and content-based filtering techniques often involve computing similarity scores between users/items, which can become computationally expensive as the number of users/items grows. ANN methods provide efficient algorithms for searching and retrieving nearest neighbors, allowing for faster processing of large-scale datasets. ANN enables the system to handle millions or even billions of users/items efficiently.
  1. Real-time Recommendations: Recommender systems often need to generate recommendations in real-time, responding quickly to user requests. ANN methods provide fast retrieval of nearest neighbors, allowing for efficient candidate generation on the fly. This is especially important in scenarios where low latency is crucial, such as online platforms or streaming services.

  2. Diversity and Serendipity: Collaborative filtering and content-based filtering techniques have a tendency to recommend items that are similar to the user’s past preferences or content attributes. This can lead to a lack of diversity in recommendations. By incorporating ANN methods, the system can introduce diversity by identifying items that are slightly different or less similar but still relevant to the user’s preferences. ANN helps strike a balance between personalized recommendations and the exploration of new and diverse items.

  3. Cold Start Problem: Collaborative filtering and content-based filtering can face challenges in scenarios where there is limited or no historical data for new users or items (cold start problem). ANN methods can be useful in these situations by leveraging the available features or metadata of new users/items to identify similar existing users/items. This allows for the generation of initial recommendations even in the absence of extensive user-item interactions.

By utilizing ANN methods on top of collaborative filtering or content-based filtering, recommender systems can enhance candidate generation by efficiently searching for nearest neighbors, improving scalability, real-time recommendation capabilities, diversity, and handling cold start scenarios. It helps in delivering more accurate and effective recommendations to users.


  • Facebook AI Similarity Search (FAISS) is a library developed by Facebook AI that enables efficient similarity search.
  • Given a set of vectors, we can index them using FAISS then using another vector (the query vector), we search for the most similar vectors within the index. FAISS not only allows us to build an index and search but it also speeds up search times to ludicrous performance levels. FAISS provides several similar search methods that span a broad spectrum of usage trade-offs.
  • FAISS is optimized for memory usage and speed.
  • FAISS offers a state-of-the-art GPU implementation for the most relevant indexing methods.
  • To know more about the different and efficient indexing done by FAISS, please refer Efficient Indexing of Billion-Scale datasets of deep descriptors by Babenko and Lempitsky (2016).
  • The figure below shows the internal mechanism of FAISS.

  • Once the vectors are extracted by learning machinery (from images, videos, text documents, and elsewhere), they’re ready to feed into the similarity search library.
  • Similarity search can be made orders of magnitude faster if we’re willing to trade some accuracy; that is, deviate a bit from the reference result. For example, it may not matter much if the first and second results of an image similarity search are swapped, since they’re probably both correct results for a given query. Accelerating the search involves some pre-processing of the data set, an operation that we call indexing.
  • This brings us to the three metrics of interest:
    • Speed: How long does it take to find the 10 (or some other number) most similar vectors to the query? Hopefully less time than the brute-force algorithm needs; otherwise, what’s the point of indexing?
    • Memory usage: How much RAM does the method require? More or less than the original vectors? FAISS supports searching only from RAM, as disk databases are orders of magnitude slower. Yes, even with SSDs.
    • Accuracy: How well does the returned list of results match the brute-force search results? Accuracy can be evaluated by counting the number of queries for which the true nearest neighbor is returned first in the result list (a measure called 1-recall@1), or by measuring the average fraction of 10 nearest neighbors that are returned in the 10 first results (the “10-intersection” measure).


  • FAISS is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
  • FAISS optimizes through various indexing techniques.


  • Similarity search can be made orders of magnitude faster if we’re willing to trade some accuracy; that is, deviate a bit from the reference result.


  • ScaNN (Scalable Nearest Neighbors), by Google, is a method for efficient vector similarity search at scale.
  • Modern ML models can transform inputs such as text and images into embeddings, high dimensional vectors trained such that more similar inputs cluster closer together.
  • For a given query, we can therefore compute its embedding, and find the literary works whose embeddings are closest to the query’s. computational challenge remains: for a given query embedding, how does one quickly find the nearest dataset embeddings? The set of embeddings is often too large for exhaustive search and its high dimensionality makes pruning difficult.
  • Embedding-based search is a technique that is effective at answering queries that rely on semantic understanding rather than simple indexable properties.
  • In this technique, machine learning models are trained to map the queries and database items to a common vector embedding space, such that the distance between embeddings carries semantic meaning, i.e., similar items are closer together.
  • To answer a query with this approach, the system must first map the query to the embedding space. It then must find, among all database embeddings, the ones closest to the query; this is the nearest neighbor search problem.
  • One of the most common ways to define the query-database embedding similarity is by their inner product; this type of nearest neighbor search is known as maximum inner-product search (MIPS).
  • Referring to the figure below source: Announcing ScaNN: Efficient Vector Similarity Search, the two-tower neural network model, illustrated below, is a specific type of embedding-based search where queries and database items are mapped to the embedding space by two respective neural networks. In this example the model responds to natural-language queries for a hypothetical literary database.

Anisotropic Vector Quantization in ScaNN

  • Anisotropic Vector Quantization (AVQ) is an indexing technique used for efficient similarity search. AVQ is designed to handle high-dimensional vector embeddings and improve the search performance in scenarios where the data distribution is anisotropic, meaning that the density of vectors is not uniform across the embedding space.
  • The AVQ approach begins by dividing the embedding space into a grid-like structure called a quantization grid. Each cell in the grid represents a region in the embedding space. During the indexing phase, AVQ assigns each vector to the cell that is closest to it based on Euclidean distance. This process quantizes the vectors and maps them to discrete grid cells.
  • To handle anisotropic data distributions, AVQ introduces an additional step called anisotropic quantization. This step adjusts the shape and size of the quantization grid cells to better align with the underlying data distribution. By adapting the cell shape and size to the data, AVQ can achieve more balanced and accurate indexing.
  • During the search phase, given a query vector, AVQ identifies the corresponding quantization grid cell and retrieves the vectors stored in that cell as potential nearest neighbors. These candidate vectors are then further refined using distance calculations to determine the actual nearest neighbors.
  • The AVQ technique in ScaNN helps improve search efficiency by reducing the number of vectors that need to be compared for similarity. By grouping similar vectors into the same grid cells, AVQ enables fast retrieval of potential nearest neighbors while maintaining accuracy. This is particularly beneficial in scenarios with high-dimensional data and non-uniform data distributions, where traditional indexing methods may struggle to provide efficient search results.
  • Anisotropic vector quantization allows ScaNN to better estimate inner products that are likely to be in the top-k MIPS results and therefore achieve higher accuracy.
  • On the glove-100-angular benchmark from ann-benchmarks.com, ScaNN outperformed eleven other carefully tuned vector similarity search libraries, handling roughly twice as many queries per second for a given accuracy as the next-fastest library (NGT-onng).

  • ScaNN is open-source software and you can try it yourself at GitHub. The library can be directly installed via Pip and has interfaces for both TensorFlow and Numpy inputs.


  • Handles roughly twice as many queries per second for a given accuracy as the next-fastest library.
  • Anisotropic vector quantization allows ScaNN to better estimate inner products that are likely to be in the top-k MIPS results and therefore achieve higher accuracy.


  • Database size can easily be in the millions or even billions, MIPS is often the computational bottleneck to inference speed, and exhaustive search is impractical. This necessitates the use of approximate MIPS algorithms that exchange some accuracy for a significant speedup over brute-force search.


  • Annoy from Spotify uses a bunch of trees to enable Spotify’s music recommendations.
  • In order to construct the index we create a forest (i.e., a bunch of trees). Each tree is constructed by picking two points at random and splitting the space into two by their hyperplane. We keep recursively splitting into subspaces until at most K items are left in each node.
    • It has the ability to use static files as indexes. In particular, this means you can share index across processes.
    • Annoy also decouples creating indexes from loading them, so you can pass around indexes as files and map them into memory quickly.
    • It tries to minimize memory footprint so the indexes are quite small.
  • To understand the data structures and algorithms that Annoy uses to do approximate nearest neighbor queries, please refer Erik Bernhardsson’s Nearest neighbors and vector models – part 2 – algorithms and data structures.
  • There are just two main parameters neede to tune Annoy: the number of trees n_trees and the number of nodes to inspect during searching search_k.
    • n_trees is provided during build time and affects the build time and the index size. A larger value will give more accurate results, but larger indexes.
    • search_k is provided in runtime and affects the search performance. A larger value will give more accurate results, but will take longer time to return.
    • If search_k is not provided, it will default to n * n_trees where n is the number of approximate nearest neighbors. Otherwise, search_k and n_trees are roughly independent, i.e. the value of n_trees will not affect search time if search_k is held constant and vice versa.
    • It’s recommended to set n_trees as large as possible given the amount of memory you can afford, and it’s recommended to set search_k as large as possible given the time constraints you have for the queries.
  • From Erik Bernhardsson’s Nearest neighbors and vector models – part 2 – algorithms and data structures, the diagram below shows the binary tree (left) when the subspaces have been recursively split such that there are at most \(K\) items left in each node (right).

  • In order to search the constructed index, the forest is traversed in order to obtain a set of candidate points from which the closest to the query point is returned.


  • Decouple index creation from loading them, so you can pass around indexes as files and map them into memory quickly.
  • We can tune the parameters to change the accuracy/speed tradeoff.
  • It has the ability to use static files as indexes, this means you can share indexes across processes.


  • The exact nearest neighbor might be across the boundary to one of the neighboring cells.
  • No support for GPU processing.
  • No support for batch processing, so in order to increase throughput “further hacking is required”.
  • Cant incrementally add points to it (annoy2 tries to fix this).

Fast Inference for Graph-based Approximate Nearest Neighbor Search (FINGER) Graph based

  • Graph-based approximation is one such technique where the data points are organized into a graph, and the search algorithm traverses the graph to find the nearest neighbors of the query. The algorithm maintains a list of candidate solutions and regularly updates it as it encounters points closer to the query.
  • The FINGER technique focuses on optimizing the search process itself and can be applied to any graph construction method. It takes advantage of the observation that when calculating the distance between the query and points that are farther away than any of the candidates currently on the list, an approximate distance measure is usually sufficient. By efficiently computing the approximate distance using certain manipulations of node vectors’ values, the time required for approximate nearest neighbor search can be reduced by 20% to 60%.
  • The approach of FINGER involves representing the query vector and the node vectors as sums of projections along a previously explored node and residual vectors orthogonal to it. By estimating the angle between the residual vectors of the explored node’s immediate neighbors, the angle between the residual vector of a neighbor node and the query’s residual vector can be approximated. This approximation provides an estimation of the distance between the query and the neighbor node.
  • Experimental results show that FINGER outperforms three prior graph-based approximation methods on different datasets, achieving more efficient search with higher recall rates. The improvement in efficiency ranges from 50% to almost 88% compared to the previous methods.



If you found our work useful, please cite it as:

  title   = {Similarity Search},
  author  = {Vats, Arpita and Chadha, Aman},
  journal = {Distilled AI},
  year    = {2022},
  note    = {\url{https://aman.ai}}