Overview

  • We will have torch and numpy implementations of some common ML algorithms here along with test cases to run them.
  • Please feel free to give input on what else you’d like to see here!

This is a comprehensive list, so we’ll break it down into a few rounds. Let’s start with the first few algorithms:

1. Scaled Dot-Product

  • Description: Used in attention mechanisms, scales the dot product of two vectors by the inverse square root of the dimension (Equation: \(\text{Attention}(Q, K) = \frac{QK^T}{\sqrt{d_k}}\)).
  • NumPy Implementation:
import numpy as np

def scaled_dot_product_attention(Q, K, V):
    """Q, K, V definition: [batch_size, seq_len, feature_dim], where:
    batch_size is the number of sequences processed at a time,
    seq_len is the length of each sequence (like the number of words in a sentence),
    feature_dim is the dimensionality of the feature vectors (queries, keys, or values)."""

    # d_k: Dimension of the key vectors. It's assumed that Q, K, and V have the same dimensionality at the last axis.
    d_k = K.shape[-1]

    # Compute the dot product between Q and the transpose of K, then scale it by the square root of d_k.
    scores = np.matmul(Q, K.transpose(-2, -1)) / np.sqrt(d_k)

    # Apply softmax to the scores over the last dimension to obtain attention weights.
    attention_weights = np.softmax(scores, axis=-1)

    # Multiply the attention weights with V to get the final output.
    return np.matmul(attention_weights, V)


def test_scaled_dot_product_attention_numpy_easy():
    # Simple and small matrices for Q, K, and V
    Q = np.array([[[1, 0], [0, 1]]])
    K = np.array([[[1, 2], [2, 1]]])
    V = np.array([[[1, 0], [0, 1]]])

    # Expected output calculated manually
    expected_output = np.array([[[0.11920292, 0.88079708],
                                 [0.88079708, 0.11920292]]])

    # Call the attention function and get the result.
    result = scaled_dot_product_attention(Q, K, V)

    # Assert that the result is close to the expected output.
    np.testing.assert_almost_equal(result, expected_output)

  • PyTorch Implementation:
import torch
import torch.nn.functional as F

def scaled_dot_product_attention(Q, K, V):
    # d_k: Dimension of the key vectors. It's assumed that Q, K, and V have the same dimensionality at the last axis.
    d_k = K.size(-1)

    # Compute the dot product between Q and the transpose of K, then scale it by the square root of d_k.
    scores = torch.matmul(Q, K.transpose(-2, -1)) / torch.sqrt(d_k)

    # Apply softmax to the scores along the last dimension to obtain attention weights.
    attention_weights = F.softmax(scores, dim=-1)

    # Multiply the attention weights with V to get the final output.
    return torch.matmul(attention_weights, V)

def test_scaled_dot_product_attention_pytorch_easy():
    # Simple and small tensors for Q, K, and V
    Q = torch.tensor([[[1., 0.], [0., 1.]]])
    K = torch.tensor([[[1., 2.], [2., 1.]]])
    V = torch.tensor([[[1., 0.], [0., 1.]]])

    # Expected output calculated manually
    expected_output = torch.tensor([[[0.1192, 0.8808],
                                     [0.8808, 0.1192]]])

    # Call the attention function and get the result.
    result = scaled_dot_product_attention(Q, K, V)

    # Assert that the result is close to the expected output.
    assert torch.allclose(result, expected_output, atol=1e-4)

2. K-Means Clustering

  • Clustering algorithm for clustering data into predefined k groups/cluster with the nearest mean and recalculates the clusters center as the mean is assigned points.

  • Numpy Implementation

import numpy as np

def kmeans_clustering(data, k, num_iterations=100):
    # Randomly initialize k centroids from the data points
    centroids = data[np.random.choice(data.shape[0], k, replace=False)]

    for _ in range(num_iterations):
        # Assign each data point to the closest centroid
        distances = np.sqrt(((data - centroids[:, np.newaxis])**2).sum(axis=2))
        closest_centroids = np.argmin(distances, axis=0)

        # Update centroids to be the mean of points in each cluster
        for i in range(k):
            centroids[i] = data[closest_centroids == i].mean(axis=0)

    return centroids, closest_centroids

def test_kmeans_clustering_numpy():
    np.random.seed(0)  # For reproducibility
    data = np.random.rand(100, 2)  # Generate some random data
    k = 3  # Number of clusters

    centroids, assignments = kmeans_clustering(data, k)
    assert len(centroids) == k
    assert len(np.unique(assignments)) == k

  • Pytorch Implementation
import torch

def kmeans_clustering_torch(data, k, num_iterations=100):
    # Randomly initialize k centroids from the data points
    centroids = data[torch.randperm(data.size(0))[:k]]

    for _ in range(num_iterations):
        # Assign each data point to the closest centroid
        distances = torch.sqrt(((data[:, None] - centroids[None, :])**2).sum(dim=2))
        closest_centroids = torch.argmin(distances, dim=0)

        # Update centroids to be the mean of points in each cluster
        for i in range(k):
            centroids[i] = data[closest_centroids == i].mean(dim=0)

    return centroids, closest_centroids

# Test case for PyTorch
def test_kmeans_clustering_pytorch():
    torch.manual_seed(0)  # For reproducibility
    data = torch.rand(100, 2)  # Generate some random data
    k = 3  # Number of clusters

    centroids, assignments = kmeans_clustering_torch(data, k)
    assert centroids.size(0) == k
    assert len(assignments.unique()) == k

# Running the tests

3. KNN (2NN)

  • K-Nearest Neighbors (KNN) is a simple algorithm that stores all cases and classifies new cases based on a similarity measure (e.g., distance functions). Predicts the label (or value) of a data point by looking at the ‘k’ closest labeled data points and choosing the most common label (classification) or averaging the labels (regression) among them.

  • NumPy Implementation:

import numpy as np

def knn_find_neighbors(data, query, k):
    # Calculate Euclidean distances between query and all data points
    distances = np.sqrt(((data - query)**2).sum(axis=1))

    # Find the indices of the k smallest distances
    k_indices = np.argsort(distances)[:k]

    # Return the k nearest neighbors
    return data[k_indices], k_indices

import pytest

# Test case for Numpy
def test_knn_find_neighbors_numpy():
    data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
    query = np.array([2.5, 3.5])
    k = 2

    neighbors, indices = knn_find_neighbors(data, query, k)
    expected_neighbors = np.array([[2, 3], [3, 4]])
    expected_indices = np.array([1, 2])

    np.testing.assert_array_equal(neighbors, expected_neighbors)
    np.testing.assert_array_equal(indices, expected_indices)

# Running the test
test_knn_find_neighbors_numpy()

  • PyTorch Implementation: Not typically implemented in PyTorch, as KNN is a non-parametric, instance-based learning method.
 import torch

def knn_find_neighbors_torch(data, query, k):
    # Calculate Euclidean distances between query and all data points
    distances = torch.sqrt(((data - query)**2).sum(dim=1))

    # Find the indices of the k smallest distances
    k_indices = torch.argsort(distances)[:k]

    # Return the k nearest neighbors and their indices
    return data[k_indices], k_indices

import pytest

# Test case for PyTorch
def test_knn_find_neighbors_pytorch():
    data = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]])
    query = torch.tensor([2.5, 3.5])
    k = 2

    neighbors, indices = knn_find_neighbors_torch(data, query, k)
    expected_neighbors = torch.tensor([[2.0, 3.0], [3.0, 4.0]])
    expected_indices = torch.tensor([1, 2])

    assert torch.equal(neighbors, expected_neighbors)
    assert torch.equal(indices, expected_indices)

# Running the test
test_knn_find_neighbors_pytorch()


4. ANN (Artificial Neural Network)

  • A computational model inspired by the way biological neural networks in the human brain process information. Approximate Nearest Neighbors (ANN) refers to algorithms that efficiently find approximate nearest neighbors of points in a dataset when an exhaustive search is infeasible. This allows approximate nearest neighbor queries to be answered quickly in large datasets.
  • Uses data structures like k-d trees, ball trees, VP trees to organize data points for faster search.
  • Approximates the true nearest neighbors by only searching part of the dataset or pruning branches.
  • Provides probabilistic guarantees on the approximation factor. Neighbors found are guardedly close to true NNs.
  • Much faster query times compared to exhaustive search, enabling large scale high-dimensional applications.
  • Popular methods include locality-sensitive hashing, hierarchical navigable small world graphs.
  • Widely used for tasks like similarity search, recommendation systems, object retrieval and more.

  • NumPy Implementation: Implementing an ANN in pure NumPy is complex due to the need for backpropagation and optimization algorithms.
import numpy as np

class SimpleANN:
    def __init__(self, input_size, hidden_size, num_classes):
        # Initialize weights and biases
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2. / input_size)
        self.b1 = np.zeros(hidden_size)
        self.W2 = np.random.randn(hidden_size, num_classes) * np.sqrt(2. / hidden_size)
        self.b2 = np.zeros(num_classes)

    def relu(self, Z):
        return np.maximum(0, Z)

    def forward(self, X):
        # Forward pass: Input layer -> Hidden layer with ReLU -> Output layer
        self.Z1 = np.dot(X, self.W1) + self.b1
        self.A1 = self.relu(self.Z1)
        self.Z2 = np.dot(self.A1, self.W2) + self.b2
        return self.Z2  # Return the final linear output

# Example usage
# ann = SimpleANN(input_size=10, hidden_size=5, num_classes=3)
# output = ann.forward(X)  # X is the input data

  • PyTorch Implementation: PyTorch provides a more suitable environment for implementing ANNs with its automatic differentiation capabilities.
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleANN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleANN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)  # First fully connected layer
        self.relu = nn.ReLU()                          # ReLU activation function
        self.fc2 = nn.Linear(hidden_size, num_classes) # Second fully connected layer

    def forward(self, x):
        out = self.fc1(x)    # Pass input through the first layer
        out = self.relu(out) # Apply ReLU activation function
        out = self.fc2(out)  # Pass through the second layer
        return out

import pytest

# Test case for the ANN
def test_simple_ann():
    input_size = 10
    hidden_size = 5
    num_classes = 3
    model = SimpleANN(input_size, hidden_size, num_classes)

    # Create a dummy input tensor of appropriate size (e.g., batch_size = 1)
    dummy_input = torch.randn(1, input_size)

    # Forward pass
    output = model(dummy_input)

    # Check if output size matches the number of classes
    assert output.size() == (1, num_classes)

# Running the test
test_simple_ann()

import pytest

def test_simple_ann_forward():
    # Define the network architecture parameters
    input_size = 10
    hidden_size = 5
    num_classes = 3

    # Instantiate the ANN
    ann = SimpleANN(input_size, hidden_size, num_classes)

    # Create a dummy input array (e.g., batch_size = 1)
    dummy_input = np.random.randn(1, input_size)

    # Forward pass through the network
    output = ann.forward(dummy_input)

    # Check if output has the correct shape
    assert output.shape == (1, num_classes)

# Running the test
test_simple_ann_forward()

5. Linear Regression

  • Regression algorithm, predicts a continuous output based on one or more input feature but assumes linear relationship between input and target output.

Linear Regression is a fundamental algorithm in machine learning, used for predicting a continuous output based on one or more input features. It assumes a linear relationship between inputs and the target output.

One-Liner Description

Linear Regression: Models the relationship between a scalar dependent variable \(y\) and one or more independent variables (or explanatory variables) \(X\) by fitting a linear equation to observed data.

Equation

The equation for linear regression with multiple variables (multiple linear regression) is: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n + \epsilon\) where \(\beta_0, \beta_1, \ldots, \beta_n\) are coefficients, \(x_1, x_2, \ldots, x_n\) are input features, and \(\epsilon\) is the error term.

  • Numpy Implementation ```python import numpy as np

def linear_regression_numpy(X, y): # Adding a column of ones to include the intercept (beta_0) X = np.append(np.ones((X.shape[0], 1)), X, axis=1)

# Calculating the coefficients: beta = (X'X)^(-1)X'y
beta = np.linalg.inv(X.T @ X) @ X.T @ y
return beta

def test_linear_regression_numpy(): X = np.array([[1, 2], [2, 3], [4, 5]]) y = np.array([1, 2, 3]) beta = linear_regression_numpy(X, y) assert beta.shape == (X.shape[1] + 1,)

Example usage

X = np.array([[feature1, feature2, …], […]])

y = np.array([target1, target2, …])

coefficients = linear_regression_numpy(X, y)


- **PyTorch Implementation**
```python
import torch

def linear_regression_pytorch(X, y):
    # Adding a column of ones to include the intercept (beta_0)
    X = torch.cat((torch.ones(X.shape[0], 1), X), 1)

    # Calculating the coefficients: beta = (X'X)^(-1)X'y
    beta = torch.inverse(X.T @ X) @ X.T @ y
    return beta

# Test case for PyTorch
def test_linear_regression_pytorch():
    X = torch.tensor([[1, 2], [2, 3], [4, 5]], dtype=torch.float32)
    y = torch.tensor([1, 2, 3], dtype=torch.float32)
    beta = linear_regression_pytorch(X, y)
    assert beta.shape == (X.shape[1] + 1,)
    
import torch
import torch.nn as nn

class LogisticRegressionPyTorch(nn.Module):
    def __init__(self, n_features):
        super(LogisticRegressionPyTorch, self).__init__()
        self.linear = nn.Linear(n_features, 1)

    def forward(self, x):
        return torch.sigmoid(self.linear(x))

# Example usage
# X = torch.tensor([[feature1, feature2, ...], [...]])
# y = torch.tensor([target1, target2, ...])
# coefficients = linear_regression_pytorch(X, y)

Explanation

  • In both implementations, a column of ones is added to X to accommodate the intercept (\(\beta_0\)) in the linear equation.
  • The coefficients (\(\beta\)) are calculated using the normal equation: \(\beta = (X'X)^{-1}X'y\).
    • @ symbolizes matrix multiplication.
    • .T or .transpose() is used for matrix transposition.
    • np.linalg.inv() and torch.inverse() calculate the matrix inverse.
  • The test cases create simple datasets and verify if the shapes of the calculated coefficient vectors are correct, considering the added intercept term.

6. Logistic Regression

  • Classification task
  • Logistic Regression is a statistical method used for binary classification. It models the probability of a binary response based on one or more predictor variables.

  • The logistic regression model is represented by the logistic function: \(P(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n)}}\)
  • where \(P(y=1)\) is the probability that the dependent variable \(y\) is 1, \(\beta_0, \beta_1, \ldots, \beta_n\) are the coefficients, and \(x_1, x_2, \ldots, x_n\) are the predictor variables.

  • Numpy Implementation
import numpy as np

class LogisticRegressionNumpy:
    def __init__(self, learning_rate=0.01, num_iterations=1000):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None
        self.bias = None

    def _sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def fit(self, X, y):
        # Initialize weights and bias
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        # Gradient descent
        for _ in range(self.num_iterations):
            model = np.dot(X, self.weights) + self.bias
            predictions = self._sigmoid(model)

            # Compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (predictions - y))
            db = (1 / n_samples) * np.sum(predictions - y)

            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        model = np.dot(X, self.weights) + self.bias
        predictions = self._sigmoid(model)
        return np.where(predictions >= 0.5, 1, 0)

import pytest

def test_logistic_regression_numpy_init():
    logistic_regression = LogisticRegressionNumpy()
    logistic_regression.fit(np.array([[1, 2], [2, 3]]), np.array([0, 1]))
    assert logistic_regression.weights.shape == (2,)
    assert isinstance(logistic_regression.bias, float)

test_logistic_regression_numpy_init()
# Example usage
# logistic_regression = LogisticRegressionNumpy()
# logistic_regression.fit(X_train, y_train)
# predictions = logistic_regression.predict(X_test)
  • Explanation
  • In the Numpy implementation, logistic regression is performed using gradient descent.
    • _sigmoid: Sigmoid function, which maps any real-valued number into the range [0, 1], suitable for probability representation.
    • fit: Function for training the model using gradient descent. Updates weights (self.weights) and bias (self.bias) to minimize the loss.
    • predict: Function to predict binary outcomes (0 or 1) based on the learned weights and bias.
  • The PyTorch implementation uses built-in linear layers and sigmoid activation, abstracting away the details of weights and bias updates.
  • The test case for the Numpy implementation checks if the weights are initialized correctly and if the bias is a float. This test ensures that the fitting process begins with the correct parameter setup.

7. Logistic Regression loss function- Binary Cross Entropy

  • The loss function used in logistic regression, typically binary cross-entropy, measures the performance of a classification model whose output is a probability value between 0 and 1.

  • Numpy Implementation

def binary_cross_entropy_loss(y_true, y_pred):
    """
    Compute the binary cross-entropy loss
    y_true: array of true labels
    y_pred: array of predicted probabilities
    """
    epsilon = 1e-15  # Small constant to avoid log(0)
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

def test_binary_cross_entropy_loss():
    y_true = np.array([1, 0, 1, 1])
    y_pred = np.array([0.9, 0.1, 0.8, 0.3])
    assert binary_cross_entropy_loss(y_true, y_pred) == pytest.approx(0.371, 0.01)

test_binary_cross_entropy_loss()


# Example usage
# loss = binary_cross_entropy_loss(y_true, y_pred)

  • Pytorch Implementation
import torch
import torch.nn as nn

# PyTorch has a built-in BCELoss function
loss_function = nn.BCELoss()

# Example usage
# y_true = torch.tensor([...], dtype=torch.float32)
# y_pred = torch.tensor([...], dtype=torch.float32)
# loss = loss_function(y_pred, y_true)

def test_binary_cross_entropy_loss_pytorch():
    y_true = torch.tensor([1, 0, 1, 1], dtype=torch.float32)
    y_pred = torch.tensor([0.9, 0.1, 0.8, 0.3], dtype=torch.float32)
    loss_function = nn.BCELoss()
    loss = loss_function(y_pred, y_true)
    assert loss.item() == pytest.approx(0.371, 0.01)

test_binary_cross_entropy_loss_pytorch()


8. Gradient Descent

  • An optimization algorithm used to minimize a function by iteratively moving towards the minimum value of the function.

  • Numpy Implementation ```python def gradient_descent(starting_point, learning_rate, num_iterations): “”” Perform gradient descent on a simple quadratic function f(x) = x^2 starting_point: initial value of x learning_rate: step size for each iteration num_iterations: number of iterations for the descent “”” x = starting_point for _ in range(num_iterations): grad = 2 * x # Derivative of x^2 x = x - learning_rate * grad return x

Example usage

minimum = gradient_descent(10, 0.1, 100)

def test_gradient_descent(): minimum = gradient_descent(10, 0.1, 100) assert minimum == pytest.approx(0, 0.01)

test_gradient_descent()


- **Pytorch Implementation**

```python
# Simple quadratic function example: f(x) = x^2
x = torch.tensor([10.0], requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1)

for _ in range(100):
    optimizer.zero_grad()
    loss = x ** 2
    loss.backward()
    optimizer.step()

# Example usage
# The optimized value of x is now stored in x

def test_gradient_descent_pytorch():
    x = torch.tensor([10.0], requires_grad=True)
    optimizer = torch.optim.SGD([x], lr=0.1)

    for _ in range(100):
        optimizer.zero_grad()
        loss = x ** 2
        loss.backward()
        optimizer.step()

    assert x.item() == pytest.approx(0, 0.01)

test_gradient_descent_pytorch()

9. BatchNorm

  • Batch Normalization is a technique to improve the performance and stability of artificial neural networks.
  • Batch Normalization (Batch Norm): A method used in deep learning to normalize the inputs of each layer, for each mini-batch, by adjusting and scaling the activations.
  • The batch normalization process is defined by the equation: \(\text{BN}(x_i) = \gamma \left( \frac{x_i - \mu_{\text{B}}}{\sqrt{\sigma_{\text{B}}^2 + \epsilon}} \right) + \beta\)
  • where \(x_i\) is the input, \(\mu_{\text{B}}\) is the mini-batch mean, \(\sigma_{\text{B}}^2\) is the mini-batch variance, \(\gamma\) is the scale parameter, \(\beta\) is the shift parameter, and \(\epsilon\) is a small constant added for numerical stability.

  • Numpy Implementation: Batch Normalization ```python import numpy as np

def batch_norm(X, gamma, beta, epsilon=1e-5): “”” Apply batch normalization. X: Input data for a mini-batch (numpy array) gamma, beta: Scale and shift parameters epsilon: Small constant for numerical stability “”” mu = np.mean(X, axis=0) var = np.var(X, axis=0) X_norm = (X - mu) / np.sqrt(var + epsilon) out = gamma * X_norm + beta return out

Example usage

gamma, beta are parameters to be learned during training

X is the input data for a mini-batch

bn_output = batch_norm(X, gamma, beta)

import pytest

def test_batch_norm_numpy(): np.random.seed(0) X = np.random.randn(100, 10) # 100 samples, 10 features gamma = np.ones(10) beta = np.zeros(10) bn_output = batch_norm(X, gamma, beta)

# Check if the mean is close to 0 and variance is close to 1
assert np.allclose(np.mean(bn_output, axis=0), np.zeros(10), atol=0.1)
assert np.allclose(np.var(bn_output, axis=0), np.ones(10), atol=0.1)

test_batch_norm_numpy()


- **PyTorch Implementation**: Batch Normalization
PyTorch has a built-in `BatchNorm1d` for 1D inputs (e.g., fully connected layers) and `BatchNorm2d` for 2D inputs (e.g., convolutional layers).

```python
import torch
import torch.nn as nn

# For fully connected layers
bn = nn.BatchNorm1d(num_features=features_dim)

# For convolutional layers
# bn = nn.BatchNorm2d(num_features=features_dim)

# Example usage
# Apply batch norm to the output of a layer
# output = bn(layer_output)
  • Explanation
  • Numpy Implementation:
    • Calculates the mean (mu) and variance (var) for the mini-batch X.
    • Normalizes X using these statistics and the epsilon value for numerical stability.
    • Scales and shifts the normalized values using gamma and beta.
  • PyTorch Implementation: Utilizes PyTorch’s built-in batch normalization layers, which handle these computations internally.
  • Testing Batch Normalization:
    • The test case for the Numpy implementation checks if the batch normalized output has the desired properties: a mean of approximately 0 and a variance of approximately 1 for each feature across the mini-batch.
  • Batch normalization helps in reducing the internal covariate shift which can lead to faster training and reduced dependence on initialization.

10. LayerNorm

  • Layer Normalization is a technique used in neural networks to stabilize the learning process.
  • Layer Normalization:** Normalizes the inputs across the features instead of the batch dimension, widely used in recurrent and transformer models.

  • Equation
  • Layer normalization can be described by the following equation: \(\text{LN}(x_i) = \gamma \left( \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \right) + \beta\)
  • where \(x_i\) is the input, \(\mu\) and \(\sigma^2\) are the mean and variance computed across the features, \(\gamma\) and \(\beta\) are learnable parameters, and \(\epsilon\) is a small constant for numerical stability.

  • Numpy Implementation: Layer Normalization
  • Numpy Implementation:**
    • Computes the mean and variance across the features of the input X.
    • Normalizes X using these statistics and epsilon.
    • Scales and shifts the normalized values using gamma and beta.
import numpy as np

def layer_norm(X, gamma, beta, epsilon=1e-5):
    """
    Apply layer normalization.
    X: Input data (numpy array)
    gamma, beta: Scale and shift parameters
    epsilon: Small constant for numerical stability
    """
    mu = np.mean(X, axis=1, keepdims=True)
    var = np.var(X, axis=1, keepdims=True)
    X_norm = (X - mu) / np.sqrt(var + epsilon)
    out = gamma * X_norm + beta
    return out

# Example usage
# gamma, beta are parameters to be learned during training
# X is the input data
# ln_output = layer_norm(X, gamma, beta)

import pytest

def test_layer_norm_numpy():
    np.random.seed(0)
    X = np.random.randn(10, 100)  # 10 samples, 100 features
    gamma = np.ones(100)
    beta = np.zeros(100)
    ln_output = layer_norm(X, gamma, beta)

    # Check if the mean and variance are close to 0 and 1, respectively, for each sample
    assert np.allclose(np.mean(ln_output, axis=1), np.zeros(10), atol=0.1)
    assert np.allclose(np.var(ln_output, axis=1), np.ones(10), atol=0.1)

test_layer_norm_numpy()
  • PyTorch Implementation: Layer Normalization PyTorch provides a built-in layer for layer normalization: torch.nn.LayerNorm.
import torch
import torch.nn as nn

# Define layer normalization
ln = nn.LayerNorm(normalized_shape=features_dim)

# Example usage
# Apply layer norm to a layer's output
# output = ln(layer_output)
  • PyTorch Implementation:** Uses PyTorch’s nn.LayerNorm for layer normalization.
  • Testing Layer Normalization:
    • Checks if the layer normalized output for each sample has a mean of approximately 0 and a variance of approximately 1.
  • Layer normalization is especially effective in recurrent neural networks and transformer models, where it helps in stabilizing the hidden state dynamics across timesteps or layers.

11. K fold cross validation

  • K-Fold Cross-Validation is a resampling procedure used to evaluate machine learning models on a limited data sample.
  • K-Fold Cross-Validation:** The process of dividing the dataset into ‘k’ subsets (folds), where the model is trained on ‘k-1’ folds and tested on the remaining one, repeated ‘k’ times with each fold used exactly once as the test set.
  • Implementing K-Fold Cross-Validation involves more about data manipulation than typical algorithmic functions. Here, we’ll implement a basic version of K-Fold Cross-Validation that splits data indices into ‘k’ folds.

  • Numpy Implementation: K-Fold Cross-Validation
import numpy as np

def k_fold_split(dataset_size, k_folds):
    """
    Splits dataset indices into k folds for cross-validation.
    dataset_size: Total number of samples in the dataset
    k_folds: Number of folds
    """
    indices = np.arange(dataset_size)
    np.random.shuffle(indices)
    fold_sizes = np.full(k_folds, dataset_size // k_folds, dtype=int)
    fold_sizes[:dataset_size % k_folds] += 1
    current = 0
    for fold_size in fold_sizes:
        start, stop = current, current + fold_size
        yield indices[start:stop]
        current = stop

# Example usage
# for fold in k_fold_split(dataset_size=100, k_folds=5):
#     # Use fold, which is a numpy array of indices
  • PyTorch Implementation: K-Fold Cross-Validation
  • In PyTorch, you can use the torch.utils.data.Subset class along with a dataset splitting approach similar to Numpy’s.
import torch
from torch.utils.data import Subset

def k_fold_split_torch(dataset_size, k_folds):
    """
    Splits dataset indices into k folds for cross-validation.
    dataset_size: Total number of samples in the dataset
    k_folds: Number of folds
    """
    indices = torch.randperm(dataset_size).tolist()
    fold_sizes = np.full(k_folds, dataset_size // k_folds, dtype=int)
    fold_sizes[:dataset_size % k_folds] += 1
    current = 0
    for fold_size in fold_sizes:
        start, stop = current, current + fold_size
        yield indices[start:stop]
        current = stop

# Example usage
# for fold in k_fold_split_torch(dataset_size=100, k_folds=5):
#     # Use fold, which is a list of indices

Pytest Test Case for K-Fold Cross-Validation

import pytest

def test_k_fold_split_numpy():
    dataset_size = 10
    k_folds = 5
    folds = list(k_fold_split(dataset_size, k_folds))
    assert len(folds) == k_folds
    # Check if each fold is mutually exclusive and collectively exhaustive
    unique_indices = np.unique(np.concatenate(folds))
    assert len(unique_indices) == dataset_size

def test_k_fold_split_torch():
    dataset_size = 10
    k_folds = 5
    folds = list(k_fold_split_torch(dataset_size, k_folds))
    assert len(folds) == k_folds
    # Check if each fold is mutually exclusive and collectively exhaustive
    unique_indices = torch.unique(torch.tensor(sum(folds, [])))
    assert len(unique_indices) == dataset_size

test_k_fold_split_numpy()
test_k_fold_split_torch()
  • Explanation
  • Numpy and PyTorch Implementations: Both implementations create indices for splitting the dataset into ‘k’ folds, ensuring each fold has roughly the same number of elements and every sample is used for validation exactly once.
  • Testing K-Fold Cross-Validation: The test cases verify that:
    • The number of created folds equals ‘k’.
    • All indices in the dataset are unique and accounted for across all folds.
  • This procedure is crucial in evaluating the performance of a model in a more robust and less biased way compared to a single train-test split, as it ensures that every data point is used for both training and testing.

12. Naive Bayes

  • Naive Bayes is a simple yet effective classification algorithm based on Bayes’ Theorem with the assumption of independence among predictors.
  • A classification algorithm based on Bayes’ Theorem, assuming independence among features, used for building classifiers by applying conditional probability.

  • Equation The Naive Bayes classifier uses Bayes’ Theorem, which is given by: [ P(y|x_1, …, x_n) = \frac{P(y) \prod_{i=1}^{n}P(x_i|y)}{P(x_1, …, x_n)} $$ where ( y ) is the class variable, ( x_1, …, x_n ) are the feature variables, ( P(y|x_1, …, x_n) ) is the probability of ( y ) given the features, ( P(y) ) is the prior probability of ( y ), and ( P(x_i|y) ) is the likelihood of feature ( i ) given class ( y ).

  • I’ll focus on the Gaussian Naive Bayes implementation which assumes that the features follow a normal distribution.

  • Numpy Implementation: Gaussian Naive Bayes ```python import numpy as np

class GaussianNaiveBayes: def fit(self, X, y): n_samples, n_features = X.shape self._classes = np.unique(y) n_classes = len(self._classes)

    # Initialize mean, var, and priors
    self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
    self._var = np.zeros((n_classes, n_features), dtype=np.float64)
    self._priors =  np.zeros(n_classes, dtype=np.float64)

    for c in self._classes:
        X_c = X[y==c]
        self._mean[c, :] = X_c.mean(axis=0)
        self._var[c, :] = X_c.var(axis=0)
        self._priors[c] = X_c.shape[0] / float(n_samples)

def predict(self, X):
    y_pred = [self._predict(x) for x in X]
    return np.array(y_pred)

def _predict(self, x):
    posteriors = []

    for idx, c in enumerate(self._classes):
        prior = np.log(self._priors[idx])
        class_conditional = np.sum(np.log(self._pdf(idx, x)))
        posterior = prior + class_conditional
        posteriors.append(posterior)

    return self._classes[np.argmax(posteriors)]

def _pdf(self, class_idx, x):
    mean = self._mean[class_idx]
    var = self._var[class_idx]
    numerator = np.exp(- (x - mean) ** 2 / (2 * var))
    denominator = np.sqrt(2 * np.pi * var)
    return numerator / denominator

Example usage

model = GaussianNaiveBayes()

model.fit(X_train, y_train)

predictions = model.predict(X_test)

import pytest

def test_gaussian_naive_bayes(): X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]) y = np.array([0, 0, 0, 1, 1]) model = GaussianNaiveBayes() model.fit(X, y) predictions = model.predict(X)

assert predictions.shape == y.shape

test_gaussian_naive_bayes()


#### PyTorch Implementation
Implementing Gaussian Naive Bayes in PyTorch is not typical, as PyTorch is more suited for neural network-based models. Naive Bayes calculations are straightforward and often more efficiently handled with libraries like Numpy or Scikit-learn.


- Explanation
- **Numpy Implementation:** 
  - `fit` method calculates the mean, variance, and prior probabilities for each class.
  - `predict` method computes the posterior probability for each class and chooses the class with the highest probability.
  - The probabilities are computed under the Gaussian (normal) distribution assumption for each feature.
- **Testing Gaussian Naive Bayes:** 
  - The test case verifies that the predictions have the same shape as the true labels, ensuring the model's compatibility with the data dimensions.

- Naive Bayes, particularly the Gaussian variant, is effective for classification problems, especially when feature independence is a reasonable assumption. Despite its simplicity, it can perform remarkably well on various tasks.

## 13. Principal Component Analysis (PCA)
- Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
- Principal Component Analysis (PCA): A dimensionality reduction technique that transforms data into a new coordinate system, reducing the number of dimensions without significant loss of information.
- Eigenvalues and eigenvectors are, respectively, the scalars that indicate how much a transformation stretches a vector, and the vectors that are only scaled, not rotated, by the transformation.

- Equation
PCA involves calculating the eigenvectors and eigenvalues of a dataset's covariance matrix to identify the principal components. The principal components are the directions where there is the most variance, the directions where the data is most spread out.

- **Numpy Implementation**: PCA

```python
import numpy as np

class PCA:
    def __init__(self, n_components):
        self.n_components = n_components
        self.components = None
        self.mean = None

    def fit(self, X):
        # Mean centering
        self.mean = np.mean(X, axis=0)
        X = X - self.mean
        
        # Calculate covariance
        cov = np.cov(X.T)
        
        # Eigen decomposition
        eigenvalues, eigenvectors = np.linalg.eig(cov)
        
        # Sort eigenvectors
        eigenvectors = eigenvectors.T
        idxs = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idxs]
        eigenvectors = eigenvectors[idxs]

        # Store first n eigenvectors
        self.components = eigenvectors[0:self.n_components]

    def transform(self, X):
        # Project data
        X = X - self.mean
        return np.dot(X, self.components.T)

# Example usage
# pca = PCA(n_components=2)
# pca.fit(X_train)
# X_projected = pca.transform(X_train)

import pytest

def test_pca_numpy():
    X = np.array([[1, 2], [3, 4], [5, 6]])
    pca = PCA(n_components=1)
    pca.fit(X)
    X_projected = pca.transform(X)
    
    assert X_projected.shape == (3, 1)

test_pca_numpy()
  • PyTorch Implementation: PCA
  • In PyTorch, PCA is not directly implemented as a class or function, but the process can be implemented using PyTorch’s operations, particularly for GPU-accelerated computing.
import torch

def pca_torch(X, n_components):
    # Mean centering
    mean = torch.mean(X, 0)
    X = X - mean

    # Calculate covariance
    cov = torch.mm(X.T, X) / (X.shape[0] - 1)

    # Eigen decomposition
    eigenvalues, eigenvectors = torch.linalg.eig(cov)
    eigenvectors = eigenvectors.T

    # Sort eigenvectors
    idxs = torch.argsort(eigenvalues, descending=True)
    eigenvalues = eigenvalues[idxs]
    eigenvectors = eigenvectors[idxs]

    # Select the top n_components eigenvectors
    components = eigenvectors[:n_components]

    # Project the data onto these components
    return torch.mm(X, components.T)

# Example usage
# X_projected = pca_torch(torch.tensor(X_train, dtype=torch.float32), n_components=2)
  • Explanation
  • Numpy Implementation:
    • fit: Computes the covariance matrix of the data, performs eigen decomposition, and selects the top n_components principal components.
    • transform: Projects the data onto the principal components.
  • PyTorch Implementation:
    • The process is similar but uses PyTorch operations, which can be executed on GPU for larger datasets.
  • Testing PCA:
    • The test case for the Numpy implementation verifies if the transformed data has the reduced dimensionality as expected.
  • PCA is widely used in exploratory data analysis and for making predictive models. It’s most effective in scenarios where there’s high correlation among input features or when the dimensionality of the dataset is high.

14. Neural Networks (e.g., Multilayer Perceptron)

  • Multi-Layer Perceptron (MLP), a type of neural network, is a connected series of nodes, where each node represents a mathematical operation, organized in layers, including an input layer, one or more hidden layers, and an output layer.
  • Multi-Layer Perceptron (MLP):** A class of feedforward artificial neural network consisting of multiple layers of nodes, each layer fully connected to the next, used for tasks like classification and regression.

  • Below are basic implementations of an MLP in both Numpy and PyTorch for binary classification tasks.

  • Numpy Implementation: Simple MLP
  • This is a rudimentary implementation focusing on the forward pass.
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

class SimpleMLP:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights and biases
        self.w1 = np.random.randn(input_size, hidden_size)
        self.b1 = np.zeros(hidden_size)
        self.w2 = np.random.randn(hidden_size, output_size)
        self.b2 = np.zeros(output_size)

    def forward(self, X):
        # Forward pass through the network
        z1 = np.dot(X, self.w1) + self.b1
        a1 = sigmoid(z1)
        z2 = np.dot(a1, self.w2) + self.b2
        a2 = sigmoid(z2)
        return a2

# Example usage
# mlp = SimpleMLP(input_size=3, hidden_size=5, output_size=1)
# output = mlp.forward(np.random.randn(1, 3))

import pytest

def test_simple_mlp_numpy():
    mlp = SimpleMLP(input_size=3, hidden_size=5, output_size=1)
    output = mlp.forward(np.random.randn(1, 3))
    assert output.shape == (1, 1)
  • PyTorch Implementation: Simple MLP
  • PyTorch provides a more straightforward way to create MLPs using its nn module.
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleMLPPyTorch(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleMLPPyTorch, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.sigmoid(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        return x

# Example usage
# mlp = SimpleMLPPyTorch(input_size=3, hidden_size=5, output_size=1)
# output = mlp.forward(torch.randn(1, 3))


def test_simple_mlp_pytorch():
    mlp = SimpleMLPPyTorch(input_size=3, hidden_size=5, output_size=1)
    output = mlp.forward(torch.randn(1, 3))
    assert output.shape == torch.Size([1, 1])
  • Explanation
  • Numpy Implementation:
    • __init__: Initializes weights (w1, w2) and biases (b1, b2) randomly.
    • forward: Conducts the forward pass, computing the linear transformations followed by the sigmoid activation function.
  • PyTorch Implementation:
    • PyTorch abstracts much of the details, allowing layers to be defined easily using nn.Linear and activations using torch.nn.functional.
  • Testing MLP:
    • The test cases ensure that the output of the MLP has the correct shape, indicating proper functioning of the forward pass.
  • These MLP implementations demonstrate the basic structure and forward pass computation of a neural network, highlighting the ease of using high-level libraries like PyTorch for such tasks.

20. Convolutional Neural Networks (CNN)

  • Convolutional Neural Networks (CNNs) are a class of deep neural networks, most commonly applied to analyzing visual imagery.
  • Convolutional Neural Network (CNN): A deep learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and differentiate one from the other.
  • Implementing a basic CNN involves setting up convolutional layers, activation functions, and pooling layers. Here’s a simple CNN implementation in both Numpy and PyTorch for image classification tasks.

  • Numpy Implementation: Simple Convolution Operation
  • Implementing a full CNN in Numpy is complex and inefficient, but we can demonstrate a basic convolution operation, which is the core of CNNs.
import numpy as np

def convolve2D(image, kernel, padding=0, strides=1):
    # Add zero padding to the input image
    image_padded = np.pad(image, [(padding, padding), (padding, padding)], mode='constant', constant_values=0)

    kernel_height, kernel_width = kernel.shape
    padded_height, padded_width = image_padded.shape

    # Calculate the dimensions of the output image
    output_height = (padded_height - kernel_height) // strides + 1
    output_width = (padded_width - kernel_width) // strides + 1

    # Perform convolution
    output = np.zeros((output_height, output_width))
    for y in range(0, output_height):
        for x in range(0, output_width):
            output[y][x] = np.sum(image_padded[y * strides:y * strides + kernel_height, x * strides:x * strides + kernel_width] * kernel)
    
    return output

# Example usage
# image = np.array([...])  # Input image
# kernel = np.array([...])  # Convolutional kernel
# output = convolve2D(image, kernel)
  • PyTorch Implementation: Simple CNN
  • PyTorch provides a more straightforward way to create CNNs using its nn module.
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNNPyTorch(nn.Module):
    def __init__(self):
        super(SimpleCNNPyTorch, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        return x

# Example usage
# cnn = SimpleCNNPyTorch()
# output = cnn.forward(torch.randn(1, 1, 28, 28))  # Example with a single 28x28 grayscale image

import pytest

def test_simple_cnn_pytorch():
    cnn = SimpleCNNPyTorch()
    output = cnn.forward(torch.randn(1, 1, 28, 28))  # Single 28x28 grayscale image
    assert output.shape == torch.Size([1, 16, 14, 14])  # Output shape after convolution and pooling

test_simple_cnn_pytorch()
  • Explanation
  • Numpy Implementation:
    • Demonstrates a basic 2D convolution operation.
    • Involves element-wise multiplication of the kernel with the image and summing up the results.
  • PyTorch Implementation:
    • Sets up a simple CNN with one convolutional layer followed by a max pooling layer.
    • Utilizes PyTorch’s nn.Conv2d for convolution and nn.MaxPool2d for pooling.
  • Testing CNN:
    • The PyTorch test case ensures that the output shape of the CNN is as expected after applying a convolution and pooling layer to an input image.
  • While the Numpy implementation provides a basic understanding of the convolution operation, CNNs in practice, especially for complex tasks like image recognition, are much more efficiently implemented using deep learning frameworks like PyTorch, which offer optimized operations and ease of use.

21. Recurrent Neural Networks (RNN)

  • Recurrent Neural Networks (RNNs) are a class of neural networks designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or spoken words.
  • Recurrent Neural Network (RNN): A type of neural network where connections between nodes form a directed graph along a temporal sequence, allowing it to exhibit temporal dynamic behavior and use its internal state (memory) to process sequences of inputs.
  • Implementing a full RNN from scratch is complex, but we can illustrate a basic RNN unit’s operation. Here’s a simple RNN implementation in both Numpy and PyTorch for demonstration purposes.

  • Numpy Implementation: Simple RNN Unit ```python import numpy as np

def rnn_step_forward(x, prev_h, Wx, Wh, b): “”” A single time step forward of a vanilla RNN. x: Input data for this time step prev_h: Hidden state from the previous time step Wx: Weight matrix for input-to-hidden connections Wh: Weight matrix for hidden-to-hidden connections b: Bias term “”” h_next = np.tanh(np.dot(prev_h, Wh) + np.dot(x, Wx) + b) return h_next

Example usage

Initialize inputs, weights, and previous hidden state

x = np.array([…]) # Input vector

prev_h = np.array([…]) # Previous hidden state

Wx = np.array([…]) # Input-to-hidden weights

Wh = np.array([…]) # Hidden-to-hidden weights

b = np.array([…]) # Bias term

next_h = rnn_step_forward(x, prev_h, Wx, Wh, b)


- **PyTorch Implementation**: Simple RNN
- PyTorch provides an easy way to create RNNs with its `nn.RNN` module.

```python
import torch
import torch.nn as nn

class SimpleRNNPyTorch(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(SimpleRNNPyTorch, self).__init__()
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size)

    def forward(self, x):
        # Assuming x is of shape (seq_len, batch, input_size)
        out, hidden = self.rnn(x)
        return out, hidden

# Example usage
# rnn = SimpleRNNPyTorch(input_size=10, hidden_size=20)
# output, hidden = rnn.forward(torch.randn(5, 1, 10))  # Example with a sequence length of 5

import pytest

def test_simple_rnn_pytorch():
    seq_len, batch_size, input_size, hidden_size = 5, 1, 10, 20
    rnn = SimpleRNNPyTorch(input_size=input_size, hidden_size=hidden_size)
    output, hidden = rnn.forward(torch.randn(seq_len, batch_size, input_size))
    
    assert output.shape == torch.Size([seq_len, batch_size, hidden_size])
    assert hidden.shape == torch.Size([1, batch_size, hidden_size])

test_simple_rnn_pytorch()
  • Explanation
  • Numpy Implementation:
    • Implements a single step of a vanilla RNN.
    • Combines the current input (x) with the previous hidden state (prev_h) using weights (Wx, Wh) and bias (b), then applies a tanh activation function.
  • PyTorch Implementation:
    • Uses PyTorch’s nn.RNN module to create a simple RNN.
    • The forward method processes an input sequence and outputs both the final hidden states and the output for each step.
  • Testing RNN:
    • The PyTorch test case checks if the output and hidden state’s dimensions are as expected after passing a sequence through the RNN.
  • RNNs are particularly useful for processing sequential data and are foundational in applications like language modeling, translation, and speech recognition. However, they can suffer from issues like vanishing and exploding gradients, which are addressed in more advanced versions like LSTMs and GRUs.

22. Long Short-Term Memory Networks (LSTM)

  • Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) capable of learning long-term dependencies, specifically designed to avoid the long-term dependency problem.
  • Long Short-Term Memory (LSTM):** An advanced RNN architecture that includes memory cells and gates to control the flow of information, effectively learning long-term dependencies in sequence data.

  • LSTMs are complex to implement from scratch due to their intricate architecture. However, I’ll provide an example of a simple LSTM layer using PyTorch, which has built-in support for LSTMs. Implementing an LSTM in Numpy is impractical due to its complexity and computational inefficiency.

  • PyTorch Implementation: Simple LSTM ```python import torch import torch.nn as nn

class SimpleLSTM(nn.Module): def init(self, input_size, hidden_size): super(SimpleLSTM, self).init() self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size)

def forward(self, x):
    # Assuming x is of shape (seq_len, batch, input_size)
    output, (hidden, cell) = self.lstm(x)
    return output, hidden, cell

Example usage

lstm = SimpleLSTM(input_size=10, hidden_size=20)

output, hidden, cell = lstm(torch.randn(5, 1, 10)) # Example with a sequence length of 5

def test_simple_lstm_pytorch(): seq_len, batch_size, input_size, hidden_size = 5, 1, 10, 20 lstm = SimpleLSTM(input_size=input_size, hidden_size=hidden_size) output, hidden, cell = lstm(torch.randn(seq_len, batch_size, input_size))

assert output.shape == torch.Size([seq_len, batch_size, hidden_size])
assert hidden.shape == torch.Size([1, batch_size, hidden_size])
assert cell.shape == torch.Size([1, batch_size, hidden_size])

test_simple_lstm_pytorch()


- Explanation
- **PyTorch Implementation:**
  - Uses PyTorch's `nn.LSTM` module to create an LSTM layer.
  - The LSTM layer processes an input sequence and outputs the final hidden states, output for each step, and cell states.
- **Testing LSTM:**
  - The test case checks if the output, hidden state, and cell state's dimensions are as expected after passing a sequence through the LSTM.

- LSTMs are widely used in complex sequence modeling tasks like language translation, speech recognition, and time-series forecasting due to their ability to capture long-range dependencies and mitigate issues like vanishing gradients. The intricacies of LSTMs, including their gating mechanisms (forget gate, input gate, and output gate), make them particularly effective for these applications.

Implementing an LSTM from scratch in Numpy is a complex and intensive task, mainly because LSTMs involve intricate computations and state management that are not trivial to optimize without specialized libraries. However, for educational purposes, I can provide a simplified version of an LSTM cell's forward pass in Numpy. This implementation will focus on the key components of an LSTM - the forget gate, input gate, cell state, and output gate.

- **Numpy Implementation**: Simplified LSTM Cell Forward Pass

```python
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

class SimpleLSTMCell:
    def __init__(self, input_size, hidden_size):
        # Initialize weights
        self.Wf = np.random.randn(hidden_size, hidden_size + input_size)
        self.Wi = np.random.randn(hidden_size, hidden_size + input_size)
        self.Wc = np.random.randn(hidden_size, hidden_size + input_size)
        self.Wo = np.random.randn(hidden_size, hidden_size + input_size)

        # Initialize biases
        self.bf = np.zeros(hidden_size)
        self.bi = np.zeros(hidden_size)
        self.bc = np.zeros(hidden_size)
        self.bo = np.zeros(hidden_size)

    def forward(self, x, h_prev, c_prev):
        # Concatenate h_prev and x
        combined = np.concatenate((h_prev, x), axis=1)

        # Forget gate
        ft = sigmoid(np.dot(self.Wf, combined.T) + self.bf[:, np.newaxis])

        # Input gate
        it = sigmoid(np.dot(self.Wi, combined.T) + self.bi[:, np.newaxis])
        ct_tilde = tanh(np.dot(self.Wc, combined.T) + self.bc[:, np.newaxis])

        # Cell state
        ct = ft * c_prev + it * ct_tilde

        # Output gate
        ot = sigmoid(np.dot(self.Wo, combined.T) + self.bo[:, np.newaxis])
        ht = ot * tanh(ct)

        return ht.T, ct

# Example usage
# lstm_cell = SimpleLSTMCell(input_size=10, hidden_size=20)
# h_prev = np.zeros((1, 20))
# c_prev = np.zeros((1, 20))
# x = np.random.randn(1, 10)
# h_next, c_next = lstm_cell.forward(x, h_prev, c_prev)
  • Explanation
  • SimpleLSTMCell Class:
    • Initializes weights (Wf, Wi, Wc, Wo) and biases (bf, bi, bc, bo) for the forget gate, input gate, cell state, and output gate.
    • The forward method computes the LSTM cell’s forward pass.
  • Forward Pass:
    • Combines the previous hidden state h_prev and the current input x.
    • Calculates the forget gate, input gate, cell state update, and output gate.
    • Outputs the new hidden state h_next and new cell state c_next.
  • This implementation provides a basic understanding of an LSTM’s internal mechanics. However, in practical applications, especially for large datasets or complex tasks, it is highly recommended to use optimized libraries like PyTorch or TensorFlow, which provide efficient, pre-built LSTM modules.

Prompt hold

  • For the below algorithms, give me a one liner description with the equation being implemented if applicable, implementation in numpy and pytorch, then test case with pytest.: Take multiple rounds of prompts if needed. Add detailed comments explaining variables and what is happening on each line: