AI Coding
 Overview
 1. Scaled DotProduct
 2. KMeans Clustering
 3. KNN (2NN)
 4. ANN (Artificial Neural Network)
 5. Linear Regression
 Example usage
 X = np.array([[feature1, feature2, …], […]])
 y = np.array([target1, target2, …])
 coefficients = linear_regression_numpy(X, y)
 Example usage
 minimum = gradient_descent(10, 0.1, 100)
 Example usage
 gamma, beta are parameters to be learned during training
 X is the input data for a minibatch
 bn_output = batch_norm(X, gamma, beta)
 Example usage
 model = GaussianNaiveBayes()
 model.fit(X_train, y_train)
 predictions = model.predict(X_test)
 Example usage
 x = np.array([…]) # Input vector
 b = np.array([…]) # Bias term
 next_h = rnn_step_forward(x, prev_h, Wx, Wh, b)
 Example usage
Overview
 We will have torch and numpy implementations of some common ML algorithms here along with test cases to run them.
 Please feel free to give input on what else you’d like to see here!
This is a comprehensive list, so we’ll break it down into a few rounds. Let’s start with the first few algorithms:
1. Scaled DotProduct
 Description: Used in attention mechanisms, scales the dot product of two vectors by the inverse square root of the dimension (Equation: \(\text{Attention}(Q, K) = \frac{QK^T}{\sqrt{d_k}}\)).
 NumPy Implementation:
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""Q, K, V definition: [batch_size, seq_len, feature_dim], where:
batch_size is the number of sequences processed at a time,
seq_len is the length of each sequence (like the number of words in a sentence),
feature_dim is the dimensionality of the feature vectors (queries, keys, or values)."""
# d_k: Dimension of the key vectors. It's assumed that Q, K, and V have the same dimensionality at the last axis.
d_k = K.shape[1]
# Compute the dot product between Q and the transpose of K, then scale it by the square root of d_k.
scores = np.matmul(Q, K.transpose(2, 1)) / np.sqrt(d_k)
# Apply softmax to the scores over the last dimension to obtain attention weights.
attention_weights = np.softmax(scores, axis=1)
# Multiply the attention weights with V to get the final output.
return np.matmul(attention_weights, V)
def test_scaled_dot_product_attention_numpy_easy():
# Simple and small matrices for Q, K, and V
Q = np.array([[[1, 0], [0, 1]]])
K = np.array([[[1, 2], [2, 1]]])
V = np.array([[[1, 0], [0, 1]]])
# Expected output calculated manually
expected_output = np.array([[[0.11920292, 0.88079708],
[0.88079708, 0.11920292]]])
# Call the attention function and get the result.
result = scaled_dot_product_attention(Q, K, V)
# Assert that the result is close to the expected output.
np.testing.assert_almost_equal(result, expected_output)
 PyTorch Implementation:
import torch
import torch.nn.functional as F
def scaled_dot_product_attention(Q, K, V):
# d_k: Dimension of the key vectors. It's assumed that Q, K, and V have the same dimensionality at the last axis.
d_k = K.size(1)
# Compute the dot product between Q and the transpose of K, then scale it by the square root of d_k.
scores = torch.matmul(Q, K.transpose(2, 1)) / torch.sqrt(d_k)
# Apply softmax to the scores along the last dimension to obtain attention weights.
attention_weights = F.softmax(scores, dim=1)
# Multiply the attention weights with V to get the final output.
return torch.matmul(attention_weights, V)
def test_scaled_dot_product_attention_pytorch_easy():
# Simple and small tensors for Q, K, and V
Q = torch.tensor([[[1., 0.], [0., 1.]]])
K = torch.tensor([[[1., 2.], [2., 1.]]])
V = torch.tensor([[[1., 0.], [0., 1.]]])
# Expected output calculated manually
expected_output = torch.tensor([[[0.1192, 0.8808],
[0.8808, 0.1192]]])
# Call the attention function and get the result.
result = scaled_dot_product_attention(Q, K, V)
# Assert that the result is close to the expected output.
assert torch.allclose(result, expected_output, atol=1e4)
2. KMeans Clustering

Clustering algorithm for clustering data into predefined k groups/cluster with the nearest mean and recalculates the clusters center as the mean is assigned points.

Numpy Implementation
import numpy as np
def kmeans_clustering(data, k, num_iterations=100):
# Randomly initialize k centroids from the data points
centroids = data[np.random.choice(data.shape[0], k, replace=False)]
for _ in range(num_iterations):
# Assign each data point to the closest centroid
distances = np.sqrt(((data  centroids[:, np.newaxis])**2).sum(axis=2))
closest_centroids = np.argmin(distances, axis=0)
# Update centroids to be the mean of points in each cluster
for i in range(k):
centroids[i] = data[closest_centroids == i].mean(axis=0)
return centroids, closest_centroids
def test_kmeans_clustering_numpy():
np.random.seed(0) # For reproducibility
data = np.random.rand(100, 2) # Generate some random data
k = 3 # Number of clusters
centroids, assignments = kmeans_clustering(data, k)
assert len(centroids) == k
assert len(np.unique(assignments)) == k
 Pytorch Implementation
import torch
def kmeans_clustering_torch(data, k, num_iterations=100):
# Randomly initialize k centroids from the data points
centroids = data[torch.randperm(data.size(0))[:k]]
for _ in range(num_iterations):
# Assign each data point to the closest centroid
distances = torch.sqrt(((data[:, None]  centroids[None, :])**2).sum(dim=2))
closest_centroids = torch.argmin(distances, dim=0)
# Update centroids to be the mean of points in each cluster
for i in range(k):
centroids[i] = data[closest_centroids == i].mean(dim=0)
return centroids, closest_centroids
# Test case for PyTorch
def test_kmeans_clustering_pytorch():
torch.manual_seed(0) # For reproducibility
data = torch.rand(100, 2) # Generate some random data
k = 3 # Number of clusters
centroids, assignments = kmeans_clustering_torch(data, k)
assert centroids.size(0) == k
assert len(assignments.unique()) == k
# Running the tests
3. KNN (2NN)

KNearest Neighbors (KNN) is a simple algorithm that stores all cases and classifies new cases based on a similarity measure (e.g., distance functions). Predicts the label (or value) of a data point by looking at the ‘k’ closest labeled data points and choosing the most common label (classification) or averaging the labels (regression) among them.

NumPy Implementation:
import numpy as np
def knn_find_neighbors(data, query, k):
# Calculate Euclidean distances between query and all data points
distances = np.sqrt(((data  query)**2).sum(axis=1))
# Find the indices of the k smallest distances
k_indices = np.argsort(distances)[:k]
# Return the k nearest neighbors
return data[k_indices], k_indices
import pytest
# Test case for Numpy
def test_knn_find_neighbors_numpy():
data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
query = np.array([2.5, 3.5])
k = 2
neighbors, indices = knn_find_neighbors(data, query, k)
expected_neighbors = np.array([[2, 3], [3, 4]])
expected_indices = np.array([1, 2])
np.testing.assert_array_equal(neighbors, expected_neighbors)
np.testing.assert_array_equal(indices, expected_indices)
# Running the test
test_knn_find_neighbors_numpy()
 PyTorch Implementation: Not typically implemented in PyTorch, as KNN is a nonparametric, instancebased learning method.
import torch
def knn_find_neighbors_torch(data, query, k):
# Calculate Euclidean distances between query and all data points
distances = torch.sqrt(((data  query)**2).sum(dim=1))
# Find the indices of the k smallest distances
k_indices = torch.argsort(distances)[:k]
# Return the k nearest neighbors and their indices
return data[k_indices], k_indices
import pytest
# Test case for PyTorch
def test_knn_find_neighbors_pytorch():
data = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]])
query = torch.tensor([2.5, 3.5])
k = 2
neighbors, indices = knn_find_neighbors_torch(data, query, k)
expected_neighbors = torch.tensor([[2.0, 3.0], [3.0, 4.0]])
expected_indices = torch.tensor([1, 2])
assert torch.equal(neighbors, expected_neighbors)
assert torch.equal(indices, expected_indices)
# Running the test
test_knn_find_neighbors_pytorch()
4. ANN (Artificial Neural Network)
 A computational model inspired by the way biological neural networks in the human brain process information. Approximate Nearest Neighbors (ANN) refers to algorithms that efficiently find approximate nearest neighbors of points in a dataset when an exhaustive search is infeasible. This allows approximate nearest neighbor queries to be answered quickly in large datasets.
 Uses data structures like kd trees, ball trees, VP trees to organize data points for faster search.
 Approximates the true nearest neighbors by only searching part of the dataset or pruning branches.
 Provides probabilistic guarantees on the approximation factor. Neighbors found are guardedly close to true NNs.
 Much faster query times compared to exhaustive search, enabling large scale highdimensional applications.
 Popular methods include localitysensitive hashing, hierarchical navigable small world graphs.

Widely used for tasks like similarity search, recommendation systems, object retrieval and more.
 NumPy Implementation: Implementing an ANN in pure NumPy is complex due to the need for backpropagation and optimization algorithms.
import numpy as np
class SimpleANN:
def __init__(self, input_size, hidden_size, num_classes):
# Initialize weights and biases
self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2. / input_size)
self.b1 = np.zeros(hidden_size)
self.W2 = np.random.randn(hidden_size, num_classes) * np.sqrt(2. / hidden_size)
self.b2 = np.zeros(num_classes)
def relu(self, Z):
return np.maximum(0, Z)
def forward(self, X):
# Forward pass: Input layer > Hidden layer with ReLU > Output layer
self.Z1 = np.dot(X, self.W1) + self.b1
self.A1 = self.relu(self.Z1)
self.Z2 = np.dot(self.A1, self.W2) + self.b2
return self.Z2 # Return the final linear output
# Example usage
# ann = SimpleANN(input_size=10, hidden_size=5, num_classes=3)
# output = ann.forward(X) # X is the input data
 PyTorch Implementation: PyTorch provides a more suitable environment for implementing ANNs with its automatic differentiation capabilities.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleANN(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(SimpleANN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # First fully connected layer
self.relu = nn.ReLU() # ReLU activation function
self.fc2 = nn.Linear(hidden_size, num_classes) # Second fully connected layer
def forward(self, x):
out = self.fc1(x) # Pass input through the first layer
out = self.relu(out) # Apply ReLU activation function
out = self.fc2(out) # Pass through the second layer
return out
import pytest
# Test case for the ANN
def test_simple_ann():
input_size = 10
hidden_size = 5
num_classes = 3
model = SimpleANN(input_size, hidden_size, num_classes)
# Create a dummy input tensor of appropriate size (e.g., batch_size = 1)
dummy_input = torch.randn(1, input_size)
# Forward pass
output = model(dummy_input)
# Check if output size matches the number of classes
assert output.size() == (1, num_classes)
# Running the test
test_simple_ann()
import pytest
def test_simple_ann_forward():
# Define the network architecture parameters
input_size = 10
hidden_size = 5
num_classes = 3
# Instantiate the ANN
ann = SimpleANN(input_size, hidden_size, num_classes)
# Create a dummy input array (e.g., batch_size = 1)
dummy_input = np.random.randn(1, input_size)
# Forward pass through the network
output = ann.forward(dummy_input)
# Check if output has the correct shape
assert output.shape == (1, num_classes)
# Running the test
test_simple_ann_forward()
5. Linear Regression
 Regression algorithm, predicts a continuous output based on one or more input feature but assumes linear relationship between input and target output.
Linear Regression is a fundamental algorithm in machine learning, used for predicting a continuous output based on one or more input features. It assumes a linear relationship between inputs and the target output.
OneLiner Description
Linear Regression: Models the relationship between a scalar dependent variable \(y\) and one or more independent variables (or explanatory variables) \(X\) by fitting a linear equation to observed data.
Equation
The equation for linear regression with multiple variables (multiple linear regression) is: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n + \epsilon\) where \(\beta_0, \beta_1, \ldots, \beta_n\) are coefficients, \(x_1, x_2, \ldots, x_n\) are input features, and \(\epsilon\) is the error term.
 Numpy Implementation ```python import numpy as np
def linear_regression_numpy(X, y): # Adding a column of ones to include the intercept (beta_0) X = np.append(np.ones((X.shape[0], 1)), X, axis=1)
# Calculating the coefficients: beta = (X'X)^(1)X'y
beta = np.linalg.inv(X.T @ X) @ X.T @ y
return beta
def test_linear_regression_numpy(): X = np.array([[1, 2], [2, 3], [4, 5]]) y = np.array([1, 2, 3]) beta = linear_regression_numpy(X, y) assert beta.shape == (X.shape[1] + 1,)
Example usage
X = np.array([[feature1, feature2, …], […]])
y = np.array([target1, target2, …])
coefficients = linear_regression_numpy(X, y)
 **PyTorch Implementation**
```python
import torch
def linear_regression_pytorch(X, y):
# Adding a column of ones to include the intercept (beta_0)
X = torch.cat((torch.ones(X.shape[0], 1), X), 1)
# Calculating the coefficients: beta = (X'X)^(1)X'y
beta = torch.inverse(X.T @ X) @ X.T @ y
return beta
# Test case for PyTorch
def test_linear_regression_pytorch():
X = torch.tensor([[1, 2], [2, 3], [4, 5]], dtype=torch.float32)
y = torch.tensor([1, 2, 3], dtype=torch.float32)
beta = linear_regression_pytorch(X, y)
assert beta.shape == (X.shape[1] + 1,)
import torch
import torch.nn as nn
class LogisticRegressionPyTorch(nn.Module):
def __init__(self, n_features):
super(LogisticRegressionPyTorch, self).__init__()
self.linear = nn.Linear(n_features, 1)
def forward(self, x):
return torch.sigmoid(self.linear(x))
# Example usage
# X = torch.tensor([[feature1, feature2, ...], [...]])
# y = torch.tensor([target1, target2, ...])
# coefficients = linear_regression_pytorch(X, y)
Explanation
 In both implementations, a column of ones is added to
X
to accommodate the intercept (\(\beta_0\)) in the linear equation.  The coefficients (\(\beta\)) are calculated using the normal equation: \(\beta = (X'X)^{1}X'y\).
@
symbolizes matrix multiplication..T
or.transpose()
is used for matrix transposition.np.linalg.inv()
andtorch.inverse()
calculate the matrix inverse.
 The test cases create simple datasets and verify if the shapes of the calculated coefficient vectors are correct, considering the added intercept term.
6. Logistic Regression
 Classification task

Logistic Regression is a statistical method used for binary classification. It models the probability of a binary response based on one or more predictor variables.
 The logistic regression model is represented by the logistic function: \(P(y=1) = \frac{1}{1 + e^{(\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n)}}\)

where \(P(y=1)\) is the probability that the dependent variable \(y\) is 1, \(\beta_0, \beta_1, \ldots, \beta_n\) are the coefficients, and \(x_1, x_2, \ldots, x_n\) are the predictor variables.
 Numpy Implementation
import numpy as np
class LogisticRegressionNumpy:
def __init__(self, learning_rate=0.01, num_iterations=1000):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.weights = None
self.bias = None
def _sigmoid(self, z):
return 1 / (1 + np.exp(z))
def fit(self, X, y):
# Initialize weights and bias
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
# Gradient descent
for _ in range(self.num_iterations):
model = np.dot(X, self.weights) + self.bias
predictions = self._sigmoid(model)
# Compute gradients
dw = (1 / n_samples) * np.dot(X.T, (predictions  y))
db = (1 / n_samples) * np.sum(predictions  y)
# Update parameters
self.weights = self.learning_rate * dw
self.bias = self.learning_rate * db
def predict(self, X):
model = np.dot(X, self.weights) + self.bias
predictions = self._sigmoid(model)
return np.where(predictions >= 0.5, 1, 0)
import pytest
def test_logistic_regression_numpy_init():
logistic_regression = LogisticRegressionNumpy()
logistic_regression.fit(np.array([[1, 2], [2, 3]]), np.array([0, 1]))
assert logistic_regression.weights.shape == (2,)
assert isinstance(logistic_regression.bias, float)
test_logistic_regression_numpy_init()
# Example usage
# logistic_regression = LogisticRegressionNumpy()
# logistic_regression.fit(X_train, y_train)
# predictions = logistic_regression.predict(X_test)
 Explanation
 In the Numpy implementation, logistic regression is performed using gradient descent.
_sigmoid
: Sigmoid function, which maps any realvalued number into the range [0, 1], suitable for probability representation.fit
: Function for training the model using gradient descent. Updates weights (self.weights
) and bias (self.bias
) to minimize the loss.predict
: Function to predict binary outcomes (0 or 1) based on the learned weights and bias.
 The PyTorch implementation uses builtin linear layers and sigmoid activation, abstracting away the details of weights and bias updates.
 The test case for the Numpy implementation checks if the weights are initialized correctly and if the bias is a float. This test ensures that the fitting process begins with the correct parameter setup.
7. Logistic Regression loss function Binary Cross Entropy

The loss function used in logistic regression, typically binary crossentropy, measures the performance of a classification model whose output is a probability value between 0 and 1.

Numpy Implementation
def binary_cross_entropy_loss(y_true, y_pred):
"""
Compute the binary crossentropy loss
y_true: array of true labels
y_pred: array of predicted probabilities
"""
epsilon = 1e15 # Small constant to avoid log(0)
y_pred = np.clip(y_pred, epsilon, 1  epsilon)
return np.mean(y_true * np.log(y_pred) + (1  y_true) * np.log(1  y_pred))
def test_binary_cross_entropy_loss():
y_true = np.array([1, 0, 1, 1])
y_pred = np.array([0.9, 0.1, 0.8, 0.3])
assert binary_cross_entropy_loss(y_true, y_pred) == pytest.approx(0.371, 0.01)
test_binary_cross_entropy_loss()
# Example usage
# loss = binary_cross_entropy_loss(y_true, y_pred)
 Pytorch Implementation
import torch
import torch.nn as nn
# PyTorch has a builtin BCELoss function
loss_function = nn.BCELoss()
# Example usage
# y_true = torch.tensor([...], dtype=torch.float32)
# y_pred = torch.tensor([...], dtype=torch.float32)
# loss = loss_function(y_pred, y_true)
def test_binary_cross_entropy_loss_pytorch():
y_true = torch.tensor([1, 0, 1, 1], dtype=torch.float32)
y_pred = torch.tensor([0.9, 0.1, 0.8, 0.3], dtype=torch.float32)
loss_function = nn.BCELoss()
loss = loss_function(y_pred, y_true)
assert loss.item() == pytest.approx(0.371, 0.01)
test_binary_cross_entropy_loss_pytorch()
8. Gradient Descent

An optimization algorithm used to minimize a function by iteratively moving towards the minimum value of the function.

Numpy Implementation ```python def gradient_descent(starting_point, learning_rate, num_iterations): “”” Perform gradient descent on a simple quadratic function f(x) = x^2 starting_point: initial value of x learning_rate: step size for each iteration num_iterations: number of iterations for the descent “”” x = starting_point for _ in range(num_iterations): grad = 2 * x # Derivative of x^2 x = x  learning_rate * grad return x
Example usage
minimum = gradient_descent(10, 0.1, 100)
def test_gradient_descent(): minimum = gradient_descent(10, 0.1, 100) assert minimum == pytest.approx(0, 0.01)
test_gradient_descent()
 **Pytorch Implementation**
```python
# Simple quadratic function example: f(x) = x^2
x = torch.tensor([10.0], requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1)
for _ in range(100):
optimizer.zero_grad()
loss = x ** 2
loss.backward()
optimizer.step()
# Example usage
# The optimized value of x is now stored in x
def test_gradient_descent_pytorch():
x = torch.tensor([10.0], requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1)
for _ in range(100):
optimizer.zero_grad()
loss = x ** 2
loss.backward()
optimizer.step()
assert x.item() == pytest.approx(0, 0.01)
test_gradient_descent_pytorch()
9. BatchNorm
 Batch Normalization is a technique to improve the performance and stability of artificial neural networks.
 Batch Normalization (Batch Norm): A method used in deep learning to normalize the inputs of each layer, for each minibatch, by adjusting and scaling the activations.
 The batch normalization process is defined by the equation: \(\text{BN}(x_i) = \gamma \left( \frac{x_i  \mu_{\text{B}}}{\sqrt{\sigma_{\text{B}}^2 + \epsilon}} \right) + \beta\)

where \(x_i\) is the input, \(\mu_{\text{B}}\) is the minibatch mean, \(\sigma_{\text{B}}^2\) is the minibatch variance, \(\gamma\) is the scale parameter, \(\beta\) is the shift parameter, and \(\epsilon\) is a small constant added for numerical stability.
 Numpy Implementation: Batch Normalization ```python import numpy as np
def batch_norm(X, gamma, beta, epsilon=1e5): “”” Apply batch normalization. X: Input data for a minibatch (numpy array) gamma, beta: Scale and shift parameters epsilon: Small constant for numerical stability “”” mu = np.mean(X, axis=0) var = np.var(X, axis=0) X_norm = (X  mu) / np.sqrt(var + epsilon) out = gamma * X_norm + beta return out
Example usage
gamma, beta are parameters to be learned during training
X is the input data for a minibatch
bn_output = batch_norm(X, gamma, beta)
import pytest
def test_batch_norm_numpy(): np.random.seed(0) X = np.random.randn(100, 10) # 100 samples, 10 features gamma = np.ones(10) beta = np.zeros(10) bn_output = batch_norm(X, gamma, beta)
# Check if the mean is close to 0 and variance is close to 1
assert np.allclose(np.mean(bn_output, axis=0), np.zeros(10), atol=0.1)
assert np.allclose(np.var(bn_output, axis=0), np.ones(10), atol=0.1)
test_batch_norm_numpy()
 **PyTorch Implementation**: Batch Normalization
PyTorch has a builtin `BatchNorm1d` for 1D inputs (e.g., fully connected layers) and `BatchNorm2d` for 2D inputs (e.g., convolutional layers).
```python
import torch
import torch.nn as nn
# For fully connected layers
bn = nn.BatchNorm1d(num_features=features_dim)
# For convolutional layers
# bn = nn.BatchNorm2d(num_features=features_dim)
# Example usage
# Apply batch norm to the output of a layer
# output = bn(layer_output)
 Explanation
 Numpy Implementation:
 Calculates the mean (
mu
) and variance (var
) for the minibatchX
.  Normalizes
X
using these statistics and theepsilon
value for numerical stability.  Scales and shifts the normalized values using
gamma
andbeta
.
 Calculates the mean (
 PyTorch Implementation: Utilizes PyTorch’s builtin batch normalization layers, which handle these computations internally.
 Testing Batch Normalization:
 The test case for the Numpy implementation checks if the batch normalized output has the desired properties: a mean of approximately 0 and a variance of approximately 1 for each feature across the minibatch.
 Batch normalization helps in reducing the internal covariate shift which can lead to faster training and reduced dependence on initialization.
10. LayerNorm
 Layer Normalization is a technique used in neural networks to stabilize the learning process.

Layer Normalization:** Normalizes the inputs across the features instead of the batch dimension, widely used in recurrent and transformer models.
 Equation
 Layer normalization can be described by the following equation: \(\text{LN}(x_i) = \gamma \left( \frac{x_i  \mu}{\sqrt{\sigma^2 + \epsilon}} \right) + \beta\)

where \(x_i\) is the input, \(\mu\) and \(\sigma^2\) are the mean and variance computed across the features, \(\gamma\) and \(\beta\) are learnable parameters, and \(\epsilon\) is a small constant for numerical stability.
 Numpy Implementation: Layer Normalization
 Numpy Implementation:**
 Computes the mean and variance across the features of the input
X
.  Normalizes
X
using these statistics andepsilon
.  Scales and shifts the normalized values using
gamma
andbeta
.
 Computes the mean and variance across the features of the input
import numpy as np
def layer_norm(X, gamma, beta, epsilon=1e5):
"""
Apply layer normalization.
X: Input data (numpy array)
gamma, beta: Scale and shift parameters
epsilon: Small constant for numerical stability
"""
mu = np.mean(X, axis=1, keepdims=True)
var = np.var(X, axis=1, keepdims=True)
X_norm = (X  mu) / np.sqrt(var + epsilon)
out = gamma * X_norm + beta
return out
# Example usage
# gamma, beta are parameters to be learned during training
# X is the input data
# ln_output = layer_norm(X, gamma, beta)
import pytest
def test_layer_norm_numpy():
np.random.seed(0)
X = np.random.randn(10, 100) # 10 samples, 100 features
gamma = np.ones(100)
beta = np.zeros(100)
ln_output = layer_norm(X, gamma, beta)
# Check if the mean and variance are close to 0 and 1, respectively, for each sample
assert np.allclose(np.mean(ln_output, axis=1), np.zeros(10), atol=0.1)
assert np.allclose(np.var(ln_output, axis=1), np.ones(10), atol=0.1)
test_layer_norm_numpy()
 PyTorch Implementation: Layer Normalization
PyTorch provides a builtin layer for layer normalization:
torch.nn.LayerNorm
.
import torch
import torch.nn as nn
# Define layer normalization
ln = nn.LayerNorm(normalized_shape=features_dim)
# Example usage
# Apply layer norm to a layer's output
# output = ln(layer_output)
 PyTorch Implementation:** Uses PyTorch’s
nn.LayerNorm
for layer normalization.  Testing Layer Normalization:
 Checks if the layer normalized output for each sample has a mean of approximately 0 and a variance of approximately 1.
 Layer normalization is especially effective in recurrent neural networks and transformer models, where it helps in stabilizing the hidden state dynamics across timesteps or layers.
11. K fold cross validation
 KFold CrossValidation is a resampling procedure used to evaluate machine learning models on a limited data sample.
 KFold CrossValidation:** The process of dividing the dataset into ‘k’ subsets (folds), where the model is trained on ‘k1’ folds and tested on the remaining one, repeated ‘k’ times with each fold used exactly once as the test set.

Implementing KFold CrossValidation involves more about data manipulation than typical algorithmic functions. Here, we’ll implement a basic version of KFold CrossValidation that splits data indices into ‘k’ folds.
 Numpy Implementation: KFold CrossValidation
import numpy as np
def k_fold_split(dataset_size, k_folds):
"""
Splits dataset indices into k folds for crossvalidation.
dataset_size: Total number of samples in the dataset
k_folds: Number of folds
"""
indices = np.arange(dataset_size)
np.random.shuffle(indices)
fold_sizes = np.full(k_folds, dataset_size // k_folds, dtype=int)
fold_sizes[:dataset_size % k_folds] += 1
current = 0
for fold_size in fold_sizes:
start, stop = current, current + fold_size
yield indices[start:stop]
current = stop
# Example usage
# for fold in k_fold_split(dataset_size=100, k_folds=5):
# # Use fold, which is a numpy array of indices
 PyTorch Implementation: KFold CrossValidation
 In PyTorch, you can use the
torch.utils.data.Subset
class along with a dataset splitting approach similar to Numpy’s.
import torch
from torch.utils.data import Subset
def k_fold_split_torch(dataset_size, k_folds):
"""
Splits dataset indices into k folds for crossvalidation.
dataset_size: Total number of samples in the dataset
k_folds: Number of folds
"""
indices = torch.randperm(dataset_size).tolist()
fold_sizes = np.full(k_folds, dataset_size // k_folds, dtype=int)
fold_sizes[:dataset_size % k_folds] += 1
current = 0
for fold_size in fold_sizes:
start, stop = current, current + fold_size
yield indices[start:stop]
current = stop
# Example usage
# for fold in k_fold_split_torch(dataset_size=100, k_folds=5):
# # Use fold, which is a list of indices
Pytest Test Case for KFold CrossValidation
import pytest
def test_k_fold_split_numpy():
dataset_size = 10
k_folds = 5
folds = list(k_fold_split(dataset_size, k_folds))
assert len(folds) == k_folds
# Check if each fold is mutually exclusive and collectively exhaustive
unique_indices = np.unique(np.concatenate(folds))
assert len(unique_indices) == dataset_size
def test_k_fold_split_torch():
dataset_size = 10
k_folds = 5
folds = list(k_fold_split_torch(dataset_size, k_folds))
assert len(folds) == k_folds
# Check if each fold is mutually exclusive and collectively exhaustive
unique_indices = torch.unique(torch.tensor(sum(folds, [])))
assert len(unique_indices) == dataset_size
test_k_fold_split_numpy()
test_k_fold_split_torch()
 Explanation
 Numpy and PyTorch Implementations: Both implementations create indices for splitting the dataset into ‘k’ folds, ensuring each fold has roughly the same number of elements and every sample is used for validation exactly once.
 Testing KFold CrossValidation: The test cases verify that:
 The number of created folds equals ‘k’.
 All indices in the dataset are unique and accounted for across all folds.
 This procedure is crucial in evaluating the performance of a model in a more robust and less biased way compared to a single traintest split, as it ensures that every data point is used for both training and testing.
12. Naive Bayes
 Naive Bayes is a simple yet effective classification algorithm based on Bayes’ Theorem with the assumption of independence among predictors.

A classification algorithm based on Bayes’ Theorem, assuming independence among features, used for building classifiers by applying conditional probability.

Equation The Naive Bayes classifier uses Bayes’ Theorem, which is given by: [ P(yx_1, …, x_n) = \frac{P(y) \prod_{i=1}^{n}P(x_iy)}{P(x_1, …, x_n)} $$ where ( y ) is the class variable, ( x_1, …, x_n ) are the feature variables, ( P(yx_1, …, x_n) ) is the probability of ( y ) given the features, ( P(y) ) is the prior probability of ( y ), and ( P(x_iy) ) is the likelihood of feature ( i ) given class ( y ).

I’ll focus on the Gaussian Naive Bayes implementation which assumes that the features follow a normal distribution.
 Numpy Implementation: Gaussian Naive Bayes ```python import numpy as np
class GaussianNaiveBayes: def fit(self, X, y): n_samples, n_features = X.shape self._classes = np.unique(y) n_classes = len(self._classes)
# Initialize mean, var, and priors
self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
self._var = np.zeros((n_classes, n_features), dtype=np.float64)
self._priors = np.zeros(n_classes, dtype=np.float64)
for c in self._classes:
X_c = X[y==c]
self._mean[c, :] = X_c.mean(axis=0)
self._var[c, :] = X_c.var(axis=0)
self._priors[c] = X_c.shape[0] / float(n_samples)
def predict(self, X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self, x):
posteriors = []
for idx, c in enumerate(self._classes):
prior = np.log(self._priors[idx])
class_conditional = np.sum(np.log(self._pdf(idx, x)))
posterior = prior + class_conditional
posteriors.append(posterior)
return self._classes[np.argmax(posteriors)]
def _pdf(self, class_idx, x):
mean = self._mean[class_idx]
var = self._var[class_idx]
numerator = np.exp( (x  mean) ** 2 / (2 * var))
denominator = np.sqrt(2 * np.pi * var)
return numerator / denominator
Example usage
model = GaussianNaiveBayes()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
import pytest
def test_gaussian_naive_bayes(): X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]) y = np.array([0, 0, 0, 1, 1]) model = GaussianNaiveBayes() model.fit(X, y) predictions = model.predict(X)
assert predictions.shape == y.shape
test_gaussian_naive_bayes()
#### PyTorch Implementation
Implementing Gaussian Naive Bayes in PyTorch is not typical, as PyTorch is more suited for neural networkbased models. Naive Bayes calculations are straightforward and often more efficiently handled with libraries like Numpy or Scikitlearn.
 Explanation
 **Numpy Implementation:**
 `fit` method calculates the mean, variance, and prior probabilities for each class.
 `predict` method computes the posterior probability for each class and chooses the class with the highest probability.
 The probabilities are computed under the Gaussian (normal) distribution assumption for each feature.
 **Testing Gaussian Naive Bayes:**
 The test case verifies that the predictions have the same shape as the true labels, ensuring the model's compatibility with the data dimensions.
 Naive Bayes, particularly the Gaussian variant, is effective for classification problems, especially when feature independence is a reasonable assumption. Despite its simplicity, it can perform remarkably well on various tasks.
## 13. Principal Component Analysis (PCA)
 Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
 Principal Component Analysis (PCA): A dimensionality reduction technique that transforms data into a new coordinate system, reducing the number of dimensions without significant loss of information.
 Eigenvalues and eigenvectors are, respectively, the scalars that indicate how much a transformation stretches a vector, and the vectors that are only scaled, not rotated, by the transformation.
 Equation
PCA involves calculating the eigenvectors and eigenvalues of a dataset's covariance matrix to identify the principal components. The principal components are the directions where there is the most variance, the directions where the data is most spread out.
 **Numpy Implementation**: PCA
```python
import numpy as np
class PCA:
def __init__(self, n_components):
self.n_components = n_components
self.components = None
self.mean = None
def fit(self, X):
# Mean centering
self.mean = np.mean(X, axis=0)
X = X  self.mean
# Calculate covariance
cov = np.cov(X.T)
# Eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov)
# Sort eigenvectors
eigenvectors = eigenvectors.T
idxs = np.argsort(eigenvalues)[::1]
eigenvalues = eigenvalues[idxs]
eigenvectors = eigenvectors[idxs]
# Store first n eigenvectors
self.components = eigenvectors[0:self.n_components]
def transform(self, X):
# Project data
X = X  self.mean
return np.dot(X, self.components.T)
# Example usage
# pca = PCA(n_components=2)
# pca.fit(X_train)
# X_projected = pca.transform(X_train)
import pytest
def test_pca_numpy():
X = np.array([[1, 2], [3, 4], [5, 6]])
pca = PCA(n_components=1)
pca.fit(X)
X_projected = pca.transform(X)
assert X_projected.shape == (3, 1)
test_pca_numpy()
 PyTorch Implementation: PCA
 In PyTorch, PCA is not directly implemented as a class or function, but the process can be implemented using PyTorch’s operations, particularly for GPUaccelerated computing.
import torch
def pca_torch(X, n_components):
# Mean centering
mean = torch.mean(X, 0)
X = X  mean
# Calculate covariance
cov = torch.mm(X.T, X) / (X.shape[0]  1)
# Eigen decomposition
eigenvalues, eigenvectors = torch.linalg.eig(cov)
eigenvectors = eigenvectors.T
# Sort eigenvectors
idxs = torch.argsort(eigenvalues, descending=True)
eigenvalues = eigenvalues[idxs]
eigenvectors = eigenvectors[idxs]
# Select the top n_components eigenvectors
components = eigenvectors[:n_components]
# Project the data onto these components
return torch.mm(X, components.T)
# Example usage
# X_projected = pca_torch(torch.tensor(X_train, dtype=torch.float32), n_components=2)
 Explanation
 Numpy Implementation:
fit
: Computes the covariance matrix of the data, performs eigen decomposition, and selects the topn_components
principal components.transform
: Projects the data onto the principal components.
 PyTorch Implementation:
 The process is similar but uses PyTorch operations, which can be executed on GPU for larger datasets.
 Testing PCA:
 The test case for the Numpy implementation verifies if the transformed data has the reduced dimensionality as expected.
 PCA is widely used in exploratory data analysis and for making predictive models. It’s most effective in scenarios where there’s high correlation among input features or when the dimensionality of the dataset is high.
14. Neural Networks (e.g., Multilayer Perceptron)
 MultiLayer Perceptron (MLP), a type of neural network, is a connected series of nodes, where each node represents a mathematical operation, organized in layers, including an input layer, one or more hidden layers, and an output layer.

MultiLayer Perceptron (MLP):** A class of feedforward artificial neural network consisting of multiple layers of nodes, each layer fully connected to the next, used for tasks like classification and regression.

Below are basic implementations of an MLP in both Numpy and PyTorch for binary classification tasks.
 Numpy Implementation: Simple MLP
 This is a rudimentary implementation focusing on the forward pass.
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(x))
class SimpleMLP:
def __init__(self, input_size, hidden_size, output_size):
# Initialize weights and biases
self.w1 = np.random.randn(input_size, hidden_size)
self.b1 = np.zeros(hidden_size)
self.w2 = np.random.randn(hidden_size, output_size)
self.b2 = np.zeros(output_size)
def forward(self, X):
# Forward pass through the network
z1 = np.dot(X, self.w1) + self.b1
a1 = sigmoid(z1)
z2 = np.dot(a1, self.w2) + self.b2
a2 = sigmoid(z2)
return a2
# Example usage
# mlp = SimpleMLP(input_size=3, hidden_size=5, output_size=1)
# output = mlp.forward(np.random.randn(1, 3))
import pytest
def test_simple_mlp_numpy():
mlp = SimpleMLP(input_size=3, hidden_size=5, output_size=1)
output = mlp.forward(np.random.randn(1, 3))
assert output.shape == (1, 1)
 PyTorch Implementation: Simple MLP
 PyTorch provides a more straightforward way to create MLPs using its
nn
module.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleMLPPyTorch(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleMLPPyTorch, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
return x
# Example usage
# mlp = SimpleMLPPyTorch(input_size=3, hidden_size=5, output_size=1)
# output = mlp.forward(torch.randn(1, 3))
def test_simple_mlp_pytorch():
mlp = SimpleMLPPyTorch(input_size=3, hidden_size=5, output_size=1)
output = mlp.forward(torch.randn(1, 3))
assert output.shape == torch.Size([1, 1])
 Explanation
 Numpy Implementation:
__init__
: Initializes weights (w1
,w2
) and biases (b1
,b2
) randomly.forward
: Conducts the forward pass, computing the linear transformations followed by the sigmoid activation function.
 PyTorch Implementation:
 PyTorch abstracts much of the details, allowing layers to be defined easily using
nn.Linear
and activations usingtorch.nn.functional
.
 PyTorch abstracts much of the details, allowing layers to be defined easily using
 Testing MLP:
 The test cases ensure that the output of the MLP has the correct shape, indicating proper functioning of the forward pass.
 These MLP implementations demonstrate the basic structure and forward pass computation of a neural network, highlighting the ease of using highlevel libraries like PyTorch for such tasks.
20. Convolutional Neural Networks (CNN)
 Convolutional Neural Networks (CNNs) are a class of deep neural networks, most commonly applied to analyzing visual imagery.
 Convolutional Neural Network (CNN): A deep learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and differentiate one from the other.

Implementing a basic CNN involves setting up convolutional layers, activation functions, and pooling layers. Here’s a simple CNN implementation in both Numpy and PyTorch for image classification tasks.
 Numpy Implementation: Simple Convolution Operation
 Implementing a full CNN in Numpy is complex and inefficient, but we can demonstrate a basic convolution operation, which is the core of CNNs.
import numpy as np
def convolve2D(image, kernel, padding=0, strides=1):
# Add zero padding to the input image
image_padded = np.pad(image, [(padding, padding), (padding, padding)], mode='constant', constant_values=0)
kernel_height, kernel_width = kernel.shape
padded_height, padded_width = image_padded.shape
# Calculate the dimensions of the output image
output_height = (padded_height  kernel_height) // strides + 1
output_width = (padded_width  kernel_width) // strides + 1
# Perform convolution
output = np.zeros((output_height, output_width))
for y in range(0, output_height):
for x in range(0, output_width):
output[y][x] = np.sum(image_padded[y * strides:y * strides + kernel_height, x * strides:x * strides + kernel_width] * kernel)
return output
# Example usage
# image = np.array([...]) # Input image
# kernel = np.array([...]) # Convolutional kernel
# output = convolve2D(image, kernel)
 PyTorch Implementation: Simple CNN
 PyTorch provides a more straightforward way to create CNNs using its
nn
module.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleCNNPyTorch(nn.Module):
def __init__(self):
super(SimpleCNNPyTorch, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
return x
# Example usage
# cnn = SimpleCNNPyTorch()
# output = cnn.forward(torch.randn(1, 1, 28, 28)) # Example with a single 28x28 grayscale image
import pytest
def test_simple_cnn_pytorch():
cnn = SimpleCNNPyTorch()
output = cnn.forward(torch.randn(1, 1, 28, 28)) # Single 28x28 grayscale image
assert output.shape == torch.Size([1, 16, 14, 14]) # Output shape after convolution and pooling
test_simple_cnn_pytorch()
 Explanation
 Numpy Implementation:
 Demonstrates a basic 2D convolution operation.
 Involves elementwise multiplication of the kernel with the image and summing up the results.
 PyTorch Implementation:
 Sets up a simple CNN with one convolutional layer followed by a max pooling layer.
 Utilizes PyTorch’s
nn.Conv2d
for convolution andnn.MaxPool2d
for pooling.
 Testing CNN:
 The PyTorch test case ensures that the output shape of the CNN is as expected after applying a convolution and pooling layer to an input image.
 While the Numpy implementation provides a basic understanding of the convolution operation, CNNs in practice, especially for complex tasks like image recognition, are much more efficiently implemented using deep learning frameworks like PyTorch, which offer optimized operations and ease of use.
21. Recurrent Neural Networks (RNN)
 Recurrent Neural Networks (RNNs) are a class of neural networks designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or spoken words.
 Recurrent Neural Network (RNN): A type of neural network where connections between nodes form a directed graph along a temporal sequence, allowing it to exhibit temporal dynamic behavior and use its internal state (memory) to process sequences of inputs.

Implementing a full RNN from scratch is complex, but we can illustrate a basic RNN unit’s operation. Here’s a simple RNN implementation in both Numpy and PyTorch for demonstration purposes.
 Numpy Implementation: Simple RNN Unit ```python import numpy as np
def rnn_step_forward(x, prev_h, Wx, Wh, b): “”” A single time step forward of a vanilla RNN. x: Input data for this time step prev_h: Hidden state from the previous time step Wx: Weight matrix for inputtohidden connections Wh: Weight matrix for hiddentohidden connections b: Bias term “”” h_next = np.tanh(np.dot(prev_h, Wh) + np.dot(x, Wx) + b) return h_next
Example usage
Initialize inputs, weights, and previous hidden state
x = np.array([…]) # Input vector
prev_h = np.array([…]) # Previous hidden state
Wx = np.array([…]) # Inputtohidden weights
Wh = np.array([…]) # Hiddentohidden weights
b = np.array([…]) # Bias term
next_h = rnn_step_forward(x, prev_h, Wx, Wh, b)
 **PyTorch Implementation**: Simple RNN
 PyTorch provides an easy way to create RNNs with its `nn.RNN` module.
```python
import torch
import torch.nn as nn
class SimpleRNNPyTorch(nn.Module):
def __init__(self, input_size, hidden_size):
super(SimpleRNNPyTorch, self).__init__()
self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size)
def forward(self, x):
# Assuming x is of shape (seq_len, batch, input_size)
out, hidden = self.rnn(x)
return out, hidden
# Example usage
# rnn = SimpleRNNPyTorch(input_size=10, hidden_size=20)
# output, hidden = rnn.forward(torch.randn(5, 1, 10)) # Example with a sequence length of 5
import pytest
def test_simple_rnn_pytorch():
seq_len, batch_size, input_size, hidden_size = 5, 1, 10, 20
rnn = SimpleRNNPyTorch(input_size=input_size, hidden_size=hidden_size)
output, hidden = rnn.forward(torch.randn(seq_len, batch_size, input_size))
assert output.shape == torch.Size([seq_len, batch_size, hidden_size])
assert hidden.shape == torch.Size([1, batch_size, hidden_size])
test_simple_rnn_pytorch()
 Explanation
 Numpy Implementation:
 Implements a single step of a vanilla RNN.
 Combines the current input (
x
) with the previous hidden state (prev_h
) using weights (Wx
,Wh
) and bias (b
), then applies atanh
activation function.
 PyTorch Implementation:
 Uses PyTorch’s
nn.RNN
module to create a simple RNN.  The
forward
method processes an input sequence and outputs both the final hidden states and the output for each step.
 Uses PyTorch’s
 Testing RNN:
 The PyTorch test case checks if the output and hidden state’s dimensions are as expected after passing a sequence through the RNN.
 RNNs are particularly useful for processing sequential data and are foundational in applications like language modeling, translation, and speech recognition. However, they can suffer from issues like vanishing and exploding gradients, which are addressed in more advanced versions like LSTMs and GRUs.
22. Long ShortTerm Memory Networks (LSTM)
 Long ShortTerm Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) capable of learning longterm dependencies, specifically designed to avoid the longterm dependency problem.

Long ShortTerm Memory (LSTM):** An advanced RNN architecture that includes memory cells and gates to control the flow of information, effectively learning longterm dependencies in sequence data.

LSTMs are complex to implement from scratch due to their intricate architecture. However, I’ll provide an example of a simple LSTM layer using PyTorch, which has builtin support for LSTMs. Implementing an LSTM in Numpy is impractical due to its complexity and computational inefficiency.
 PyTorch Implementation: Simple LSTM ```python import torch import torch.nn as nn
class SimpleLSTM(nn.Module): def init(self, input_size, hidden_size): super(SimpleLSTM, self).init() self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size)
def forward(self, x):
# Assuming x is of shape (seq_len, batch, input_size)
output, (hidden, cell) = self.lstm(x)
return output, hidden, cell
Example usage
lstm = SimpleLSTM(input_size=10, hidden_size=20)
output, hidden, cell = lstm(torch.randn(5, 1, 10)) # Example with a sequence length of 5
def test_simple_lstm_pytorch(): seq_len, batch_size, input_size, hidden_size = 5, 1, 10, 20 lstm = SimpleLSTM(input_size=input_size, hidden_size=hidden_size) output, hidden, cell = lstm(torch.randn(seq_len, batch_size, input_size))
assert output.shape == torch.Size([seq_len, batch_size, hidden_size])
assert hidden.shape == torch.Size([1, batch_size, hidden_size])
assert cell.shape == torch.Size([1, batch_size, hidden_size])
test_simple_lstm_pytorch()
 Explanation
 **PyTorch Implementation:**
 Uses PyTorch's `nn.LSTM` module to create an LSTM layer.
 The LSTM layer processes an input sequence and outputs the final hidden states, output for each step, and cell states.
 **Testing LSTM:**
 The test case checks if the output, hidden state, and cell state's dimensions are as expected after passing a sequence through the LSTM.
 LSTMs are widely used in complex sequence modeling tasks like language translation, speech recognition, and timeseries forecasting due to their ability to capture longrange dependencies and mitigate issues like vanishing gradients. The intricacies of LSTMs, including their gating mechanisms (forget gate, input gate, and output gate), make them particularly effective for these applications.
Implementing an LSTM from scratch in Numpy is a complex and intensive task, mainly because LSTMs involve intricate computations and state management that are not trivial to optimize without specialized libraries. However, for educational purposes, I can provide a simplified version of an LSTM cell's forward pass in Numpy. This implementation will focus on the key components of an LSTM  the forget gate, input gate, cell state, and output gate.
 **Numpy Implementation**: Simplified LSTM Cell Forward Pass
```python
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(x))
def tanh(x):
return np.tanh(x)
class SimpleLSTMCell:
def __init__(self, input_size, hidden_size):
# Initialize weights
self.Wf = np.random.randn(hidden_size, hidden_size + input_size)
self.Wi = np.random.randn(hidden_size, hidden_size + input_size)
self.Wc = np.random.randn(hidden_size, hidden_size + input_size)
self.Wo = np.random.randn(hidden_size, hidden_size + input_size)
# Initialize biases
self.bf = np.zeros(hidden_size)
self.bi = np.zeros(hidden_size)
self.bc = np.zeros(hidden_size)
self.bo = np.zeros(hidden_size)
def forward(self, x, h_prev, c_prev):
# Concatenate h_prev and x
combined = np.concatenate((h_prev, x), axis=1)
# Forget gate
ft = sigmoid(np.dot(self.Wf, combined.T) + self.bf[:, np.newaxis])
# Input gate
it = sigmoid(np.dot(self.Wi, combined.T) + self.bi[:, np.newaxis])
ct_tilde = tanh(np.dot(self.Wc, combined.T) + self.bc[:, np.newaxis])
# Cell state
ct = ft * c_prev + it * ct_tilde
# Output gate
ot = sigmoid(np.dot(self.Wo, combined.T) + self.bo[:, np.newaxis])
ht = ot * tanh(ct)
return ht.T, ct
# Example usage
# lstm_cell = SimpleLSTMCell(input_size=10, hidden_size=20)
# h_prev = np.zeros((1, 20))
# c_prev = np.zeros((1, 20))
# x = np.random.randn(1, 10)
# h_next, c_next = lstm_cell.forward(x, h_prev, c_prev)
 Explanation
SimpleLSTMCell
Class: Initializes weights (
Wf
,Wi
,Wc
,Wo
) and biases (bf
,bi
,bc
,bo
) for the forget gate, input gate, cell state, and output gate.  The
forward
method computes the LSTM cell’s forward pass.
 Initializes weights (
 Forward Pass:
 Combines the previous hidden state
h_prev
and the current inputx
.  Calculates the forget gate, input gate, cell state update, and output gate.
 Outputs the new hidden state
h_next
and new cell statec_next
.
 Combines the previous hidden state
 This implementation provides a basic understanding of an LSTM’s internal mechanics. However, in practical applications, especially for large datasets or complex tasks, it is highly recommended to use optimized libraries like PyTorch or TensorFlow, which provide efficient, prebuilt LSTM modules.
Prompt hold
 For the below algorithms, give me a one liner description with the equation being implemented if applicable, implementation in numpy and pytorch, then test case with pytest.: Take multiple rounds of prompts if needed. Add detailed comments explaining variables and what is happening on each line: