AI Coding
- Overview
- 1. Scaled Dot-Product
- 2. K-Means Clustering
- 3. KNN (2NN)
- 4. ANN (Artificial Neural Network)
- 5. Linear Regression
- Example usage
- X = np.array([[feature1, feature2, …], […]])
- y = np.array([target1, target2, …])
- coefficients = linear_regression_numpy(X, y)
- Example usage
- gamma, beta are parameters to be learned during training
- X is the input data for a mini-batch
- bn_output = batch_norm(X, gamma, beta)
- Example usage
- x = np.array([…]) # Input vector
- b = np.array([…]) # Bias term
- next_h = rnn_step_forward(x, prev_h, Wx, Wh, b)
- Example usage
Overview
- We will have torch and numpy implementations of some common ML algorithms here along with test cases to run them.
- Please feel free to give input on what else you’d like to see here!
This is a comprehensive list, so we’ll break it down into a few rounds. Let’s start with the first few algorithms:
1. Scaled Dot-Product
- Description: Used in attention mechanisms, scales the dot product of two vectors by the inverse square root of the dimension (Equation: \(\text{Attention}(Q, K) = \frac{QK^T}{\sqrt{d_k}}\)).
- NumPy Implementation:
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""Q, K, V definition: [batch_size, seq_len, feature_dim], where:
batch_size is the number of sequences processed at a time,
seq_len is the length of each sequence (like the number of words in a sentence),
feature_dim is the dimensionality of the feature vectors (queries, keys, or values)."""
# d_k: Dimension of the key vectors. It's assumed that Q, K, and V have the same dimensionality at the last axis.
d_k = K.shape[-1]
# Compute the dot product between Q and the transpose of K, then scale it by the square root of d_k.
scores = np.matmul(Q, K.transpose(-2, -1)) / np.sqrt(d_k)
# Apply softmax to the scores over the last dimension to obtain attention weights.
attention_weights = np.softmax(scores, axis=-1)
# Multiply the attention weights with V to get the final output.
return np.matmul(attention_weights, V)
def test_scaled_dot_product_attention_numpy_easy():
# Simple and small matrices for Q, K, and V
Q = np.array([[[1, 0], [0, 1]]])
K = np.array([[[1, 2], [2, 1]]])
V = np.array([[[1, 0], [0, 1]]])
# Expected output calculated manually
expected_output = np.array([[[0.11920292, 0.88079708],
[0.88079708, 0.11920292]]])
# Call the attention function and get the result.
result = scaled_dot_product_attention(Q, K, V)
# Assert that the result is close to the expected output.
np.testing.assert_almost_equal(result, expected_output)
- PyTorch Implementation:
import torch
import torch.nn.functional as F
def scaled_dot_product_attention(Q, K, V):
# d_k: Dimension of the key vectors. It's assumed that Q, K, and V have the same dimensionality at the last axis.
d_k = K.size(-1)
# Compute the dot product between Q and the transpose of K, then scale it by the square root of d_k.
scores = torch.matmul(Q, K.transpose(-2, -1)) / torch.sqrt(d_k)
# Apply softmax to the scores along the last dimension to obtain attention weights.
attention_weights = F.softmax(scores, dim=-1)
# Multiply the attention weights with V to get the final output.
return torch.matmul(attention_weights, V)
def test_scaled_dot_product_attention_pytorch_easy():
# Simple and small tensors for Q, K, and V
Q = torch.tensor([[[1., 0.], [0., 1.]]])
K = torch.tensor([[[1., 2.], [2., 1.]]])
V = torch.tensor([[[1., 0.], [0., 1.]]])
# Expected output calculated manually
expected_output = torch.tensor([[[0.1192, 0.8808],
[0.8808, 0.1192]]])
# Call the attention function and get the result.
result = scaled_dot_product_attention(Q, K, V)
# Assert that the result is close to the expected output.
assert torch.allclose(result, expected_output, atol=1e-4)
2. K-Means Clustering
-
Clustering algorithm for clustering data into predefined k groups/cluster with the nearest mean and recalculates the clusters center as the mean is assigned points.
-
Numpy Implementation
import numpy as np
def kmeans_clustering(data, k, num_iterations=100):
# Randomly initialize k centroids from the data points
centroids = data[np.random.choice(data.shape[0], k, replace=False)]
for _ in range(num_iterations):
# Assign each data point to the closest centroid
distances = np.sqrt(((data - centroids[:, np.newaxis])**2).sum(axis=2))
closest_centroids = np.argmin(distances, axis=0)
# Update centroids to be the mean of points in each cluster
for i in range(k):
centroids[i] = data[closest_centroids == i].mean(axis=0)
return centroids, closest_centroids
def test_kmeans_clustering_numpy():
np.random.seed(0) # For reproducibility
data = np.random.rand(100, 2) # Generate some random data
k = 3 # Number of clusters
centroids, assignments = kmeans_clustering(data, k)
assert len(centroids) == k
assert len(np.unique(assignments)) == k
- Pytorch Implementation
import torch
def kmeans_clustering_torch(data, k, num_iterations=100):
# Randomly initialize k centroids from the data points
centroids = data[torch.randperm(data.size(0))[:k]]
for _ in range(num_iterations):
# Assign each data point to the closest centroid
distances = torch.sqrt(((data[:, None] - centroids[None, :])**2).sum(dim=2))
closest_centroids = torch.argmin(distances, dim=0)
# Update centroids to be the mean of points in each cluster
for i in range(k):
centroids[i] = data[closest_centroids == i].mean(dim=0)
return centroids, closest_centroids
# Test case for PyTorch
def test_kmeans_clustering_pytorch():
torch.manual_seed(0) # For reproducibility
data = torch.rand(100, 2) # Generate some random data
k = 3 # Number of clusters
centroids, assignments = kmeans_clustering_torch(data, k)
assert centroids.size(0) == k
assert len(assignments.unique()) == k
# Running the tests
3. KNN (2NN)
-
K-Nearest Neighbors (KNN) is a simple algorithm that stores all cases and classifies new cases based on a similarity measure (e.g., distance functions). Predicts the label (or value) of a data point by looking at the ‘k’ closest labeled data points and choosing the most common label (classification) or averaging the labels (regression) among them.
-
NumPy Implementation:
import numpy as np
def knn_find_neighbors(data, query, k):
# Calculate Euclidean distances between query and all data points
distances = np.sqrt(((data - query)**2).sum(axis=1))
# Find the indices of the k smallest distances
k_indices = np.argsort(distances)[:k]
# Return the k nearest neighbors
return data[k_indices], k_indices
import pytest
# Test case for Numpy
def test_knn_find_neighbors_numpy():
data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
query = np.array([2.5, 3.5])
k = 2
neighbors, indices = knn_find_neighbors(data, query, k)
expected_neighbors = np.array([[2, 3], [3, 4]])
expected_indices = np.array([1, 2])
np.testing.assert_array_equal(neighbors, expected_neighbors)
np.testing.assert_array_equal(indices, expected_indices)
# Running the test
test_knn_find_neighbors_numpy()
- PyTorch Implementation: Not typically implemented in PyTorch, as KNN is a non-parametric, instance-based learning method.
import torch
def knn_find_neighbors_torch(data, query, k):
# Calculate Euclidean distances between query and all data points
distances = torch.sqrt(((data - query)**2).sum(dim=1))
# Find the indices of the k smallest distances
k_indices = torch.argsort(distances)[:k]
# Return the k nearest neighbors and their indices
return data[k_indices], k_indices
import pytest
# Test case for PyTorch
def test_knn_find_neighbors_pytorch():
data = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]])
query = torch.tensor([2.5, 3.5])
k = 2
neighbors, indices = knn_find_neighbors_torch(data, query, k)
expected_neighbors = torch.tensor([[2.0, 3.0], [3.0, 4.0]])
expected_indices = torch.tensor([1, 2])
assert torch.equal(neighbors, expected_neighbors)
assert torch.equal(indices, expected_indices)
# Running the test
test_knn_find_neighbors_pytorch()
4. ANN (Artificial Neural Network)
- A computational model inspired by the way biological neural networks in the human brain process information. Approximate Nearest Neighbors (ANN) refers to algorithms that efficiently find approximate nearest neighbors of points in a dataset when an exhaustive search is infeasible. This allows approximate nearest neighbor queries to be answered quickly in large datasets.
- Uses data structures like k-d trees, ball trees, VP trees to organize data points for faster search.
- Approximates the true nearest neighbors by only searching part of the dataset or pruning branches.
- Provides probabilistic guarantees on the approximation factor. Neighbors found are guardedly close to true NNs.
- Much faster query times compared to exhaustive search, enabling large scale high-dimensional applications.
- Popular methods include locality-sensitive hashing, hierarchical navigable small world graphs.
-
Widely used for tasks like similarity search, recommendation systems, object retrieval and more.
- NumPy Implementation: Implementing an ANN in pure NumPy is complex due to the need for backpropagation and optimization algorithms.
import numpy as np
class SimpleANN:
def __init__(self, input_size, hidden_size, num_classes):
# Initialize weights and biases
self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2. / input_size)
self.b1 = np.zeros(hidden_size)
self.W2 = np.random.randn(hidden_size, num_classes) * np.sqrt(2. / hidden_size)
self.b2 = np.zeros(num_classes)
def relu(self, Z):
return np.maximum(0, Z)
def forward(self, X):
# Forward pass: Input layer -> Hidden layer with ReLU -> Output layer
self.Z1 = np.dot(X, self.W1) + self.b1
self.A1 = self.relu(self.Z1)
self.Z2 = np.dot(self.A1, self.W2) + self.b2
return self.Z2 # Return the final linear output
# Example usage
# ann = SimpleANN(input_size=10, hidden_size=5, num_classes=3)
# output = ann.forward(X) # X is the input data
- PyTorch Implementation: PyTorch provides a more suitable environment for implementing ANNs with its automatic differentiation capabilities.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleANN(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(SimpleANN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # First fully connected layer
self.relu = nn.ReLU() # ReLU activation function
self.fc2 = nn.Linear(hidden_size, num_classes) # Second fully connected layer
def forward(self, x):
out = self.fc1(x) # Pass input through the first layer
out = self.relu(out) # Apply ReLU activation function
out = self.fc2(out) # Pass through the second layer
return out
import pytest
# Test case for the ANN
def test_simple_ann():
input_size = 10
hidden_size = 5
num_classes = 3
model = SimpleANN(input_size, hidden_size, num_classes)
# Create a dummy input tensor of appropriate size (e.g., batch_size = 1)
dummy_input = torch.randn(1, input_size)
# Forward pass
output = model(dummy_input)
# Check if output size matches the number of classes
assert output.size() == (1, num_classes)
# Running the test
test_simple_ann()
import pytest
def test_simple_ann_forward():
# Define the network architecture parameters
input_size = 10
hidden_size = 5
num_classes = 3
# Instantiate the ANN
ann = SimpleANN(input_size, hidden_size, num_classes)
# Create a dummy input array (e.g., batch_size = 1)
dummy_input = np.random.randn(1, input_size)
# Forward pass through the network
output = ann.forward(dummy_input)
# Check if output has the correct shape
assert output.shape == (1, num_classes)
# Running the test
test_simple_ann_forward()
5. Linear Regression
- Regression algorithm, predicts a continuous output based on one or more input feature but assumes linear relationship between input and target output.
Linear Regression is a fundamental algorithm in machine learning, used for predicting a continuous output based on one or more input features. It assumes a linear relationship between inputs and the target output.
One-Liner Description
Linear Regression: Models the relationship between a scalar dependent variable \(y\) and one or more independent variables (or explanatory variables) \(X\) by fitting a linear equation to observed data.
Equation
The equation for linear regression with multiple variables (multiple linear regression) is: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n + \epsilon\) where \(\beta_0, \beta_1, \ldots, \beta_n\) are coefficients, \(x_1, x_2, \ldots, x_n\) are input features, and \(\epsilon\) is the error term.
- Numpy Implementation ```python import numpy as np
def linear_regression_numpy(X, y): # Adding a column of ones to include the intercept (beta_0) X = np.append(np.ones((X.shape[0], 1)), X, axis=1)
# Calculating the coefficients: beta = (X'X)^(-1)X'y
beta = np.linalg.inv(X.T @ X) @ X.T @ y
return beta
def test_linear_regression_numpy(): X = np.array([[1, 2], [2, 3], [4, 5]]) y = np.array([1, 2, 3]) beta = linear_regression_numpy(X, y) assert beta.shape == (X.shape[1] + 1,)
Example usage
X = np.array([[feature1, feature2, …], […]])
y = np.array([target1, target2, …])
coefficients = linear_regression_numpy(X, y)
- **PyTorch Implementation**
```python
import torch
def linear_regression_pytorch(X, y):
# Adding a column of ones to include the intercept (beta_0)
X = torch.cat((torch.ones(X.shape[0], 1), X), 1)
# Calculating the coefficients: beta = (X'X)^(-1)X'y
beta = torch.inverse(X.T @ X) @ X.T @ y
return beta
# Test case for PyTorch
def test_linear_regression_pytorch():
X = torch.tensor([[1, 2], [2, 3], [4, 5]], dtype=torch.float32)
y = torch.tensor([1, 2, 3], dtype=torch.float32)
beta = linear_regression_pytorch(X, y)
assert beta.shape == (X.shape[1] + 1,)
import torch
import torch.nn as nn
class LogisticRegressionPyTorch(nn.Module):
def __init__(self, n_features):
super(LogisticRegressionPyTorch, self).__init__()
self.linear = nn.Linear(n_features, 1)
def forward(self, x):
return torch.sigmoid(self.linear(x))
# Example usage
# X = torch.tensor([[feature1, feature2, ...], [...]])
# y = torch.tensor([target1, target2, ...])
# coefficients = linear_regression_pytorch(X, y)
Explanation
- In both implementations, a column of ones is added to
X
to accommodate the intercept (\(\beta_0\)) in the linear equation. - The coefficients (\(\beta\)) are calculated using the normal equation: \(\beta = (X'X)^{-1}X'y\).
@
symbolizes matrix multiplication..T
or.transpose()
is used for matrix transposition.np.linalg.inv()
andtorch.inverse()
calculate the matrix inverse.
- The test cases create simple datasets and verify if the shapes of the calculated coefficient vectors are correct, considering the added intercept term.
6. Logistic Regression
- Classification task
-
Logistic Regression is a statistical method used for binary classification. It models the probability of a binary response based on one or more predictor variables.
- The logistic regression model is represented by the logistic function: \(P(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n)}}\)
-
where \(P(y=1)\) is the probability that the dependent variable \(y\) is 1, \(\beta_0, \beta_1, \ldots, \beta_n\) are the coefficients, and \(x_1, x_2, \ldots, x_n\) are the predictor variables.
- Numpy Implementation
import numpy as np
class LogisticRegressionNumpy:
def __init__(self, learning_rate=0.01, num_iterations=1000):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.weights = None
self.bias = None
def _sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def fit(self, X, y):
# Initialize weights and bias
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
# Gradient descent
for _ in range(self.num_iterations):
model = np.dot(X, self.weights) + self.bias
predictions = self._sigmoid(model)
# Compute gradients
dw = (1 / n_samples) * np.dot(X.T, (predictions - y))
db = (1 / n_samples) * np.sum(predictions - y)
# Update parameters
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
def predict(self, X):
model = np.dot(X, self.weights) + self.bias
predictions = self._sigmoid(model)
return np.where(predictions >= 0.5, 1, 0)
import pytest
def test_logistic_regression_numpy_init():
logistic_regression = LogisticRegressionNumpy()
logistic_regression.fit(np.array([[1, 2], [2, 3]]), np.array([0, 1]))
assert logistic_regression.weights.shape == (2,)
assert isinstance(logistic_regression.bias, float)
test_logistic_regression_numpy_init()
# Example usage
# logistic_regression = LogisticRegressionNumpy()
# logistic_regression.fit(X_train, y_train)
# predictions = logistic_regression.predict(X_test)
- Explanation
- In the Numpy implementation, logistic regression is performed using gradient descent.
_sigmoid
: Sigmoid function, which maps any real-valued number into the range [0, 1], suitable for probability representation.fit
: Function for training the model using gradient descent. Updates weights (self.weights
) and bias (self.bias
) to minimize the loss.predict
: Function to predict binary outcomes (0 or 1) based on the learned weights and bias.
- The PyTorch implementation uses built-in linear layers and sigmoid activation, abstracting away the details of weights and bias updates.
- The test case for the Numpy implementation checks if the weights are initialized correctly and if the bias is a float. This test ensures that the fitting process begins with the correct parameter setup.
7. Logistic Regression loss function- Binary Cross Entropy
-
The loss function used in logistic regression, typically binary cross-entropy, measures the performance of a classification model whose output is a probability value between 0 and 1.
-
Numpy Implementation
def binary_cross_entropy_loss(y_true, y_pred):
"""
Compute the binary cross-entropy loss
y_true: array of true labels
y_pred: array of predicted probabilities
"""
epsilon = 1e-15 # Small constant to avoid log(0)
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
def test_binary_cross_entropy_loss():
y_true = np.array([1, 0, 1, 1])
y_pred = np.array([0.9, 0.1, 0.8, 0.3])
assert binary_cross_entropy_loss(y_true, y_pred) == pytest.approx(0.371, 0.01)
test_binary_cross_entropy_loss()
# Example usage
# loss = binary_cross_entropy_loss(y_true, y_pred)
- Pytorch Implementation
import torch
import torch.nn as nn
# PyTorch has a built-in BCELoss function
loss_function = nn.BCELoss()
# Example usage
# y_true = torch.tensor([...], dtype=torch.float32)
# y_pred = torch.tensor([...], dtype=torch.float32)
# loss = loss_function(y_pred, y_true)
def test_binary_cross_entropy_loss_pytorch():
y_true = torch.tensor([1, 0, 1, 1], dtype=torch.float32)
y_pred = torch.tensor([0.9, 0.1, 0.8, 0.3], dtype=torch.float32)
loss_function = nn.BCELoss()
loss = loss_function(y_pred, y_true)
assert loss.item() == pytest.approx(0.371, 0.01)
test_binary_cross_entropy_loss_pytorch()
8. Gradient Descent
-
An optimization algorithm used to minimize a function by iteratively moving towards the minimum value of the function.
-
Numpy Implementation
def gradient_descent(starting_point, learning_rate, num_iterations):
"""
Perform gradient descent on a simple quadratic function f(x) = x^2
starting_point: initial value of x
learning_rate: step size for each iteration
num_iterations: number of iterations for the descent
"""
x = starting_point
for _ in range(num_iterations):
grad = 2 * x # Derivative of x^2
x = x - learning_rate * grad
return x
# Example usage
# minimum = gradient_descent(10, 0.1, 100)
def test_gradient_descent():
minimum = gradient_descent(10, 0.1, 100)
assert minimum == pytest.approx(0, 0.01)
test_gradient_descent()
- Pytorch Implementation
# Simple quadratic function example: f(x) = x^2
x = torch.tensor([10.0], requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1)
for _ in range(100):
optimizer.zero_grad()
loss = x ** 2
loss.backward()
optimizer.step()
# Example usage
# The optimized value of x is now stored in x
def test_gradient_descent_pytorch():
x = torch.tensor([10.0], requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1)
for _ in range(100):
optimizer.zero_grad()
loss = x ** 2
loss.backward()
optimizer.step()
assert x.item() == pytest.approx(0, 0.01)
test_gradient_descent_pytorch()
9. BatchNorm
- Batch Normalization is a technique to improve the performance and stability of artificial neural networks.
- Batch Normalization (Batch Norm): A method used in deep learning to normalize the inputs of each layer, for each mini-batch, by adjusting and scaling the activations.
- The batch normalization process is defined by the equation: \(\text{BN}(x_i) = \gamma \left( \frac{x_i - \mu_{\text{B}}}{\sqrt{\sigma_{\text{B}}^2 + \epsilon}} \right) + \beta\)
-
where \(x_i\) is the input, \(\mu_{\text{B}}\) is the mini-batch mean, \(\sigma_{\text{B}}^2\) is the mini-batch variance, \(\gamma\) is the scale parameter, \(\beta\) is the shift parameter, and \(\epsilon\) is a small constant added for numerical stability.
- Numpy Implementation: Batch Normalization ```python import numpy as np
def batch_norm(X, gamma, beta, epsilon=1e-5): “”” Apply batch normalization. X: Input data for a mini-batch (numpy array) gamma, beta: Scale and shift parameters epsilon: Small constant for numerical stability “”” mu = np.mean(X, axis=0) var = np.var(X, axis=0) X_norm = (X - mu) / np.sqrt(var + epsilon) out = gamma * X_norm + beta return out
Example usage
gamma, beta are parameters to be learned during training
X is the input data for a mini-batch
bn_output = batch_norm(X, gamma, beta)
import pytest
def test_batch_norm_numpy(): np.random.seed(0) X = np.random.randn(100, 10) # 100 samples, 10 features gamma = np.ones(10) beta = np.zeros(10) bn_output = batch_norm(X, gamma, beta)
# Check if the mean is close to 0 and variance is close to 1
assert np.allclose(np.mean(bn_output, axis=0), np.zeros(10), atol=0.1)
assert np.allclose(np.var(bn_output, axis=0), np.ones(10), atol=0.1)
test_batch_norm_numpy()
- **PyTorch Implementation**: Batch Normalization
PyTorch has a built-in `BatchNorm1d` for 1D inputs (e.g., fully connected layers) and `BatchNorm2d` for 2D inputs (e.g., convolutional layers).
```python
import torch
import torch.nn as nn
# For fully connected layers
bn = nn.BatchNorm1d(num_features=features_dim)
# For convolutional layers
# bn = nn.BatchNorm2d(num_features=features_dim)
# Example usage
# Apply batch norm to the output of a layer
# output = bn(layer_output)
- Explanation
- Numpy Implementation:
- Calculates the mean (
mu
) and variance (var
) for the mini-batchX
. - Normalizes
X
using these statistics and theepsilon
value for numerical stability. - Scales and shifts the normalized values using
gamma
andbeta
.
- Calculates the mean (
- PyTorch Implementation: Utilizes PyTorch’s built-in batch normalization layers, which handle these computations internally.
- Testing Batch Normalization:
- The test case for the Numpy implementation checks if the batch normalized output has the desired properties: a mean of approximately 0 and a variance of approximately 1 for each feature across the mini-batch.
- Batch normalization helps in reducing the internal covariate shift which can lead to faster training and reduced dependence on initialization.
10. LayerNorm
- Layer Normalization is a technique used in neural networks to stabilize the learning process.
-
Layer Normalization:** Normalizes the inputs across the features instead of the batch dimension, widely used in recurrent and transformer models.
- Equation
- Layer normalization can be described by the following equation: \(\text{LN}(x_i) = \gamma \left( \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \right) + \beta\)
-
where \(x_i\) is the input, \(\mu\) and \(\sigma^2\) are the mean and variance computed across the features, \(\gamma\) and \(\beta\) are learnable parameters, and \(\epsilon\) is a small constant for numerical stability.
- Numpy Implementation: Layer Normalization
- Numpy Implementation:**
- Computes the mean and variance across the features of the input
X
. - Normalizes
X
using these statistics andepsilon
. - Scales and shifts the normalized values using
gamma
andbeta
.
- Computes the mean and variance across the features of the input
import numpy as np
def layer_norm(X, gamma, beta, epsilon=1e-5):
"""
Apply layer normalization.
X: Input data (numpy array)
gamma, beta: Scale and shift parameters
epsilon: Small constant for numerical stability
"""
mu = np.mean(X, axis=1, keepdims=True)
var = np.var(X, axis=1, keepdims=True)
X_norm = (X - mu) / np.sqrt(var + epsilon)
out = gamma * X_norm + beta
return out
# Example usage
# gamma, beta are parameters to be learned during training
# X is the input data
# ln_output = layer_norm(X, gamma, beta)
import pytest
def test_layer_norm_numpy():
np.random.seed(0)
X = np.random.randn(10, 100) # 10 samples, 100 features
gamma = np.ones(100)
beta = np.zeros(100)
ln_output = layer_norm(X, gamma, beta)
# Check if the mean and variance are close to 0 and 1, respectively, for each sample
assert np.allclose(np.mean(ln_output, axis=1), np.zeros(10), atol=0.1)
assert np.allclose(np.var(ln_output, axis=1), np.ones(10), atol=0.1)
test_layer_norm_numpy()
- PyTorch Implementation: Layer Normalization
PyTorch provides a built-in layer for layer normalization:
torch.nn.LayerNorm
.
import torch
import torch.nn as nn
# Define layer normalization
ln = nn.LayerNorm(normalized_shape=features_dim)
# Example usage
# Apply layer norm to a layer's output
# output = ln(layer_output)
- PyTorch Implementation:** Uses PyTorch’s
nn.LayerNorm
for layer normalization. - Testing Layer Normalization:
- Checks if the layer normalized output for each sample has a mean of approximately 0 and a variance of approximately 1.
- Layer normalization is especially effective in recurrent neural networks and transformer models, where it helps in stabilizing the hidden state dynamics across timesteps or layers.
11. K fold cross validation
- K-Fold Cross-Validation is a resampling procedure used to evaluate machine learning models on a limited data sample.
- K-Fold Cross-Validation:** The process of dividing the dataset into ‘k’ subsets (folds), where the model is trained on ‘k-1’ folds and tested on the remaining one, repeated ‘k’ times with each fold used exactly once as the test set.
-
Implementing K-Fold Cross-Validation involves more about data manipulation than typical algorithmic functions. Here, we’ll implement a basic version of K-Fold Cross-Validation that splits data indices into ‘k’ folds.
- Numpy Implementation: K-Fold Cross-Validation
import numpy as np
def k_fold_split(dataset_size, k_folds):
"""
Splits dataset indices into k folds for cross-validation.
dataset_size: Total number of samples in the dataset
k_folds: Number of folds
"""
indices = np.arange(dataset_size)
np.random.shuffle(indices)
fold_sizes = np.full(k_folds, dataset_size // k_folds, dtype=int)
fold_sizes[:dataset_size % k_folds] += 1
current = 0
for fold_size in fold_sizes:
start, stop = current, current + fold_size
yield indices[start:stop]
current = stop
# Example usage
# for fold in k_fold_split(dataset_size=100, k_folds=5):
# # Use fold, which is a numpy array of indices
- PyTorch Implementation: K-Fold Cross-Validation
- In PyTorch, you can use the
torch.utils.data.Subset
class along with a dataset splitting approach similar to Numpy’s.
import torch
from torch.utils.data import Subset
def k_fold_split_torch(dataset_size, k_folds):
"""
Splits dataset indices into k folds for cross-validation.
dataset_size: Total number of samples in the dataset
k_folds: Number of folds
"""
indices = torch.randperm(dataset_size).tolist()
fold_sizes = np.full(k_folds, dataset_size // k_folds, dtype=int)
fold_sizes[:dataset_size % k_folds] += 1
current = 0
for fold_size in fold_sizes:
start, stop = current, current + fold_size
yield indices[start:stop]
current = stop
# Example usage
# for fold in k_fold_split_torch(dataset_size=100, k_folds=5):
# # Use fold, which is a list of indices
Pytest Test Case for K-Fold Cross-Validation
import pytest
def test_k_fold_split_numpy():
dataset_size = 10
k_folds = 5
folds = list(k_fold_split(dataset_size, k_folds))
assert len(folds) == k_folds
# Check if each fold is mutually exclusive and collectively exhaustive
unique_indices = np.unique(np.concatenate(folds))
assert len(unique_indices) == dataset_size
def test_k_fold_split_torch():
dataset_size = 10
k_folds = 5
folds = list(k_fold_split_torch(dataset_size, k_folds))
assert len(folds) == k_folds
# Check if each fold is mutually exclusive and collectively exhaustive
unique_indices = torch.unique(torch.tensor(sum(folds, [])))
assert len(unique_indices) == dataset_size
test_k_fold_split_numpy()
test_k_fold_split_torch()
- Explanation
- Numpy and PyTorch Implementations: Both implementations create indices for splitting the dataset into ‘k’ folds, ensuring each fold has roughly the same number of elements and every sample is used for validation exactly once.
- Testing K-Fold Cross-Validation: The test cases verify that:
- The number of created folds equals ‘k’.
- All indices in the dataset are unique and accounted for across all folds.
- This procedure is crucial in evaluating the performance of a model in a more robust and less biased way compared to a single train-test split, as it ensures that every data point is used for both training and testing.
12. Naive Bayes
- Naive Bayes is a simple yet effective classification algorithm based on Bayes’ Theorem with the assumption of independence among predictors.
-
A classification algorithm based on Bayes’ Theorem, assuming independence among features, used for building classifiers by applying conditional probability.
-
Equation The Naive Bayes classifier uses Bayes’ Theorem, which is given by: [ P(y|x_1, …, x_n) = \frac{P(y) \prod_{i=1}^{n}P(x_i|y)}{P(x_1, …, x_n)} $$ where ( y ) is the class variable, ( x_1, …, x_n ) are the feature variables, ( P(y|x_1, …, x_n) ) is the probability of ( y ) given the features, ( P(y) ) is the prior probability of ( y ), and ( P(x_i|y) ) is the likelihood of feature ( i ) given class ( y ).
-
I’ll focus on the Gaussian Naive Bayes implementation which assumes that the features follow a normal distribution.
- Numpy Implementation: Gaussian Naive Bayes
import numpy as np
class GaussianNaiveBayes:
def fit(self, X, y):
n_samples, n_features = X.shape
self._classes = np.unique(y)
n_classes = len(self._classes)
# Initialize mean, var, and priors
self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
self._var = np.zeros((n_classes, n_features), dtype=np.float64)
self._priors = np.zeros(n_classes, dtype=np.float64)
for c in self._classes:
X_c = X[y==c]
self._mean[c, :] = X_c.mean(axis=0)
self._var[c, :] = X_c.var(axis=0)
self._priors[c] = X_c.shape[0] / float(n_samples)
def predict(self, X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self, x):
posteriors = []
for idx, c in enumerate(self._classes):
prior = np.log(self._priors[idx])
class_conditional = np.sum(np.log(self._pdf(idx, x)))
posterior = prior + class_conditional
posteriors.append(posterior)
return self._classes[np.argmax(posteriors)]
def _pdf(self, class_idx, x):
mean = self._mean[class_idx]
var = self._var[class_idx]
numerator = np.exp(- (x - mean) ** 2 / (2 * var))
denominator = np.sqrt(2 * np.pi * var)
return numerator / denominator
# Example usage
# model = GaussianNaiveBayes()
# model.fit(X_train, y_train)
# predictions = model.predict(X_test)
import pytest
def test_gaussian_naive_bayes():
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y = np.array([0, 0, 0, 1, 1])
model = GaussianNaiveBayes()
model.fit(X, y)
predictions = model.predict(X)
assert predictions.shape == y.shape
test_gaussian_naive_bayes()
PyTorch Implementation
Implementing Gaussian Naive Bayes in PyTorch is not typical, as PyTorch is more suited for neural network-based models. Naive Bayes calculations are straightforward and often more efficiently handled with libraries like Numpy or Scikit-learn.
- Explanation
- Numpy Implementation:
fit
method calculates the mean, variance, and prior probabilities for each class.predict
method computes the posterior probability for each class and chooses the class with the highest probability.- The probabilities are computed under the Gaussian (normal) distribution assumption for each feature.
- Testing Gaussian Naive Bayes:
- The test case verifies that the predictions have the same shape as the true labels, ensuring the model’s compatibility with the data dimensions.
- Naive Bayes, particularly the Gaussian variant, is effective for classification problems, especially when feature independence is a reasonable assumption. Despite its simplicity, it can perform remarkably well on various tasks.
13. Principal Component Analysis (PCA)
- Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
- Principal Component Analysis (PCA): A dimensionality reduction technique that transforms data into a new coordinate system, reducing the number of dimensions without significant loss of information.
-
Eigenvalues and eigenvectors are, respectively, the scalars that indicate how much a transformation stretches a vector, and the vectors that are only scaled, not rotated, by the transformation.
-
Equation PCA involves calculating the eigenvectors and eigenvalues of a dataset’s covariance matrix to identify the principal components. The principal components are the directions where there is the most variance, the directions where the data is most spread out.
- Numpy Implementation: PCA
import numpy as np
class PCA:
def __init__(self, n_components):
self.n_components = n_components
self.components = None
self.mean = None
def fit(self, X):
# Mean centering
self.mean = np.mean(X, axis=0)
X = X - self.mean
# Calculate covariance
cov = np.cov(X.T)
# Eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov)
# Sort eigenvectors
eigenvectors = eigenvectors.T
idxs = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idxs]
eigenvectors = eigenvectors[idxs]
# Store first n eigenvectors
self.components = eigenvectors[0:self.n_components]
def transform(self, X):
# Project data
X = X - self.mean
return np.dot(X, self.components.T)
# Example usage
# pca = PCA(n_components=2)
# pca.fit(X_train)
# X_projected = pca.transform(X_train)
import pytest
def test_pca_numpy():
X = np.array([[1, 2], [3, 4], [5, 6]])
pca = PCA(n_components=1)
pca.fit(X)
X_projected = pca.transform(X)
assert X_projected.shape == (3, 1)
test_pca_numpy()
- PyTorch Implementation: PCA
- In PyTorch, PCA is not directly implemented as a class or function, but the process can be implemented using PyTorch’s operations, particularly for GPU-accelerated computing.
import torch
def pca_torch(X, n_components):
# Mean centering
mean = torch.mean(X, 0)
X = X - mean
# Calculate covariance
cov = torch.mm(X.T, X) / (X.shape[0] - 1)
# Eigen decomposition
eigenvalues, eigenvectors = torch.linalg.eig(cov)
eigenvectors = eigenvectors.T
# Sort eigenvectors
idxs = torch.argsort(eigenvalues, descending=True)
eigenvalues = eigenvalues[idxs]
eigenvectors = eigenvectors[idxs]
# Select the top n_components eigenvectors
components = eigenvectors[:n_components]
# Project the data onto these components
return torch.mm(X, components.T)
# Example usage
# X_projected = pca_torch(torch.tensor(X_train, dtype=torch.float32), n_components=2)
- Explanation
- Numpy Implementation:
fit
: Computes the covariance matrix of the data, performs eigen decomposition, and selects the topn_components
principal components.transform
: Projects the data onto the principal components.
- PyTorch Implementation:
- The process is similar but uses PyTorch operations, which can be executed on GPU for larger datasets.
- Testing PCA:
- The test case for the Numpy implementation verifies if the transformed data has the reduced dimensionality as expected.
- PCA is widely used in exploratory data analysis and for making predictive models. It’s most effective in scenarios where there’s high correlation among input features or when the dimensionality of the dataset is high.
14. Neural Networks (e.g., Multilayer Perceptron)
- Multi-Layer Perceptron (MLP), a type of neural network, is a connected series of nodes, where each node represents a mathematical operation, organized in layers, including an input layer, one or more hidden layers, and an output layer.
-
Multi-Layer Perceptron (MLP):** A class of feedforward artificial neural network consisting of multiple layers of nodes, each layer fully connected to the next, used for tasks like classification and regression.
-
Below are basic implementations of an MLP in both Numpy and PyTorch for binary classification tasks.
- Numpy Implementation: Simple MLP
- This is a rudimentary implementation focusing on the forward pass.
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class SimpleMLP:
def __init__(self, input_size, hidden_size, output_size):
# Initialize weights and biases
self.w1 = np.random.randn(input_size, hidden_size)
self.b1 = np.zeros(hidden_size)
self.w2 = np.random.randn(hidden_size, output_size)
self.b2 = np.zeros(output_size)
def forward(self, X):
# Forward pass through the network
z1 = np.dot(X, self.w1) + self.b1
a1 = sigmoid(z1)
z2 = np.dot(a1, self.w2) + self.b2
a2 = sigmoid(z2)
return a2
# Example usage
# mlp = SimpleMLP(input_size=3, hidden_size=5, output_size=1)
# output = mlp.forward(np.random.randn(1, 3))
import pytest
def test_simple_mlp_numpy():
mlp = SimpleMLP(input_size=3, hidden_size=5, output_size=1)
output = mlp.forward(np.random.randn(1, 3))
assert output.shape == (1, 1)
- PyTorch Implementation: Simple MLP
- PyTorch provides a more straightforward way to create MLPs using its
nn
module.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleMLPPyTorch(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleMLPPyTorch, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
return x
# Example usage
# mlp = SimpleMLPPyTorch(input_size=3, hidden_size=5, output_size=1)
# output = mlp.forward(torch.randn(1, 3))
def test_simple_mlp_pytorch():
mlp = SimpleMLPPyTorch(input_size=3, hidden_size=5, output_size=1)
output = mlp.forward(torch.randn(1, 3))
assert output.shape == torch.Size([1, 1])
- Explanation
- Numpy Implementation:
__init__
: Initializes weights (w1
,w2
) and biases (b1
,b2
) randomly.forward
: Conducts the forward pass, computing the linear transformations followed by the sigmoid activation function.
- PyTorch Implementation:
- PyTorch abstracts much of the details, allowing layers to be defined easily using
nn.Linear
and activations usingtorch.nn.functional
.
- PyTorch abstracts much of the details, allowing layers to be defined easily using
- Testing MLP:
- The test cases ensure that the output of the MLP has the correct shape, indicating proper functioning of the forward pass.
- These MLP implementations demonstrate the basic structure and forward pass computation of a neural network, highlighting the ease of using high-level libraries like PyTorch for such tasks.
20. Convolutional Neural Networks (CNN)
- Convolutional Neural Networks (CNNs) are a class of deep neural networks, most commonly applied to analyzing visual imagery.
- Convolutional Neural Network (CNN): A deep learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and differentiate one from the other.
-
Implementing a basic CNN involves setting up convolutional layers, activation functions, and pooling layers. Here’s a simple CNN implementation in both Numpy and PyTorch for image classification tasks.
- Numpy Implementation: Simple Convolution Operation
- Implementing a full CNN in Numpy is complex and inefficient, but we can demonstrate a basic convolution operation, which is the core of CNNs.
import numpy as np
def convolve2D(image, kernel, padding=0, strides=1):
# Add zero padding to the input image
image_padded = np.pad(image, [(padding, padding), (padding, padding)], mode='constant', constant_values=0)
kernel_height, kernel_width = kernel.shape
padded_height, padded_width = image_padded.shape
# Calculate the dimensions of the output image
output_height = (padded_height - kernel_height) // strides + 1
output_width = (padded_width - kernel_width) // strides + 1
# Perform convolution
output = np.zeros((output_height, output_width))
for y in range(0, output_height):
for x in range(0, output_width):
output[y][x] = np.sum(image_padded[y * strides:y * strides + kernel_height, x * strides:x * strides + kernel_width] * kernel)
return output
# Example usage
# image = np.array([...]) # Input image
# kernel = np.array([...]) # Convolutional kernel
# output = convolve2D(image, kernel)
- PyTorch Implementation: Simple CNN
- PyTorch provides a more straightforward way to create CNNs using its
nn
module.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleCNNPyTorch(nn.Module):
def __init__(self):
super(SimpleCNNPyTorch, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
return x
# Example usage
# cnn = SimpleCNNPyTorch()
# output = cnn.forward(torch.randn(1, 1, 28, 28)) # Example with a single 28x28 grayscale image
import pytest
def test_simple_cnn_pytorch():
cnn = SimpleCNNPyTorch()
output = cnn.forward(torch.randn(1, 1, 28, 28)) # Single 28x28 grayscale image
assert output.shape == torch.Size([1, 16, 14, 14]) # Output shape after convolution and pooling
test_simple_cnn_pytorch()
- Explanation
- Numpy Implementation:
- Demonstrates a basic 2D convolution operation.
- Involves element-wise multiplication of the kernel with the image and summing up the results.
- PyTorch Implementation:
- Sets up a simple CNN with one convolutional layer followed by a max pooling layer.
- Utilizes PyTorch’s
nn.Conv2d
for convolution andnn.MaxPool2d
for pooling.
- Testing CNN:
- The PyTorch test case ensures that the output shape of the CNN is as expected after applying a convolution and pooling layer to an input image.
- While the Numpy implementation provides a basic understanding of the convolution operation, CNNs in practice, especially for complex tasks like image recognition, are much more efficiently implemented using deep learning frameworks like PyTorch, which offer optimized operations and ease of use.
21. Recurrent Neural Networks (RNN)
- Recurrent Neural Networks (RNNs) are a class of neural networks designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or spoken words.
- Recurrent Neural Network (RNN): A type of neural network where connections between nodes form a directed graph along a temporal sequence, allowing it to exhibit temporal dynamic behavior and use its internal state (memory) to process sequences of inputs.
-
Implementing a full RNN from scratch is complex, but we can illustrate a basic RNN unit’s operation. Here’s a simple RNN implementation in both Numpy and PyTorch for demonstration purposes.
- Numpy Implementation: Simple RNN Unit ```python import numpy as np
def rnn_step_forward(x, prev_h, Wx, Wh, b): “”” A single time step forward of a vanilla RNN. x: Input data for this time step prev_h: Hidden state from the previous time step Wx: Weight matrix for input-to-hidden connections Wh: Weight matrix for hidden-to-hidden connections b: Bias term “”” h_next = np.tanh(np.dot(prev_h, Wh) + np.dot(x, Wx) + b) return h_next
Example usage
Initialize inputs, weights, and previous hidden state
x = np.array([…]) # Input vector
prev_h = np.array([…]) # Previous hidden state
Wx = np.array([…]) # Input-to-hidden weights
Wh = np.array([…]) # Hidden-to-hidden weights
b = np.array([…]) # Bias term
next_h = rnn_step_forward(x, prev_h, Wx, Wh, b)
- **PyTorch Implementation**: Simple RNN
- PyTorch provides an easy way to create RNNs with its `nn.RNN` module.
```python
import torch
import torch.nn as nn
class SimpleRNNPyTorch(nn.Module):
def __init__(self, input_size, hidden_size):
super(SimpleRNNPyTorch, self).__init__()
self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size)
def forward(self, x):
# Assuming x is of shape (seq_len, batch, input_size)
out, hidden = self.rnn(x)
return out, hidden
# Example usage
# rnn = SimpleRNNPyTorch(input_size=10, hidden_size=20)
# output, hidden = rnn.forward(torch.randn(5, 1, 10)) # Example with a sequence length of 5
import pytest
def test_simple_rnn_pytorch():
seq_len, batch_size, input_size, hidden_size = 5, 1, 10, 20
rnn = SimpleRNNPyTorch(input_size=input_size, hidden_size=hidden_size)
output, hidden = rnn.forward(torch.randn(seq_len, batch_size, input_size))
assert output.shape == torch.Size([seq_len, batch_size, hidden_size])
assert hidden.shape == torch.Size([1, batch_size, hidden_size])
test_simple_rnn_pytorch()
- Explanation
- Numpy Implementation:
- Implements a single step of a vanilla RNN.
- Combines the current input (
x
) with the previous hidden state (prev_h
) using weights (Wx
,Wh
) and bias (b
), then applies atanh
activation function.
- PyTorch Implementation:
- Uses PyTorch’s
nn.RNN
module to create a simple RNN. - The
forward
method processes an input sequence and outputs both the final hidden states and the output for each step.
- Uses PyTorch’s
- Testing RNN:
- The PyTorch test case checks if the output and hidden state’s dimensions are as expected after passing a sequence through the RNN.
- RNNs are particularly useful for processing sequential data and are foundational in applications like language modeling, translation, and speech recognition. However, they can suffer from issues like vanishing and exploding gradients, which are addressed in more advanced versions like LSTMs and GRUs.
22. Long Short-Term Memory Networks (LSTM)
- Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) capable of learning long-term dependencies, specifically designed to avoid the long-term dependency problem.
-
Long Short-Term Memory (LSTM):** An advanced RNN architecture that includes memory cells and gates to control the flow of information, effectively learning long-term dependencies in sequence data.
-
LSTMs are complex to implement from scratch due to their intricate architecture. However, I’ll provide an example of a simple LSTM layer using PyTorch, which has built-in support for LSTMs. Implementing an LSTM in Numpy is impractical due to its complexity and computational inefficiency.
- PyTorch Implementation: Simple LSTM ```python import torch import torch.nn as nn
class SimpleLSTM(nn.Module): def init(self, input_size, hidden_size): super(SimpleLSTM, self).init() self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size)
def forward(self, x):
# Assuming x is of shape (seq_len, batch, input_size)
output, (hidden, cell) = self.lstm(x)
return output, hidden, cell
Example usage
lstm = SimpleLSTM(input_size=10, hidden_size=20)
output, hidden, cell = lstm(torch.randn(5, 1, 10)) # Example with a sequence length of 5
def test_simple_lstm_pytorch(): seq_len, batch_size, input_size, hidden_size = 5, 1, 10, 20 lstm = SimpleLSTM(input_size=input_size, hidden_size=hidden_size) output, hidden, cell = lstm(torch.randn(seq_len, batch_size, input_size))
assert output.shape == torch.Size([seq_len, batch_size, hidden_size])
assert hidden.shape == torch.Size([1, batch_size, hidden_size])
assert cell.shape == torch.Size([1, batch_size, hidden_size])
test_simple_lstm_pytorch()
- Explanation
- **PyTorch Implementation:**
- Uses PyTorch's `nn.LSTM` module to create an LSTM layer.
- The LSTM layer processes an input sequence and outputs the final hidden states, output for each step, and cell states.
- **Testing LSTM:**
- The test case checks if the output, hidden state, and cell state's dimensions are as expected after passing a sequence through the LSTM.
- LSTMs are widely used in complex sequence modeling tasks like language translation, speech recognition, and time-series forecasting due to their ability to capture long-range dependencies and mitigate issues like vanishing gradients. The intricacies of LSTMs, including their gating mechanisms (forget gate, input gate, and output gate), make them particularly effective for these applications.
Implementing an LSTM from scratch in Numpy is a complex and intensive task, mainly because LSTMs involve intricate computations and state management that are not trivial to optimize without specialized libraries. However, for educational purposes, I can provide a simplified version of an LSTM cell's forward pass in Numpy. This implementation will focus on the key components of an LSTM - the forget gate, input gate, cell state, and output gate.
- **Numpy Implementation**: Simplified LSTM Cell Forward Pass
```python
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh(x):
return np.tanh(x)
class SimpleLSTMCell:
def __init__(self, input_size, hidden_size):
# Initialize weights
self.Wf = np.random.randn(hidden_size, hidden_size + input_size)
self.Wi = np.random.randn(hidden_size, hidden_size + input_size)
self.Wc = np.random.randn(hidden_size, hidden_size + input_size)
self.Wo = np.random.randn(hidden_size, hidden_size + input_size)
# Initialize biases
self.bf = np.zeros(hidden_size)
self.bi = np.zeros(hidden_size)
self.bc = np.zeros(hidden_size)
self.bo = np.zeros(hidden_size)
def forward(self, x, h_prev, c_prev):
# Concatenate h_prev and x
combined = np.concatenate((h_prev, x), axis=1)
# Forget gate
ft = sigmoid(np.dot(self.Wf, combined.T) + self.bf[:, np.newaxis])
# Input gate
it = sigmoid(np.dot(self.Wi, combined.T) + self.bi[:, np.newaxis])
ct_tilde = tanh(np.dot(self.Wc, combined.T) + self.bc[:, np.newaxis])
# Cell state
ct = ft * c_prev + it * ct_tilde
# Output gate
ot = sigmoid(np.dot(self.Wo, combined.T) + self.bo[:, np.newaxis])
ht = ot * tanh(ct)
return ht.T, ct
# Example usage
# lstm_cell = SimpleLSTMCell(input_size=10, hidden_size=20)
# h_prev = np.zeros((1, 20))
# c_prev = np.zeros((1, 20))
# x = np.random.randn(1, 10)
# h_next, c_next = lstm_cell.forward(x, h_prev, c_prev)
- Explanation
SimpleLSTMCell
Class:- Initializes weights (
Wf
,Wi
,Wc
,Wo
) and biases (bf
,bi
,bc
,bo
) for the forget gate, input gate, cell state, and output gate. - The
forward
method computes the LSTM cell’s forward pass.
- Initializes weights (
- Forward Pass:
- Combines the previous hidden state
h_prev
and the current inputx
. - Calculates the forget gate, input gate, cell state update, and output gate.
- Outputs the new hidden state
h_next
and new cell statec_next
.
- Combines the previous hidden state
- This implementation provides a basic understanding of an LSTM’s internal mechanics. However, in practical applications, especially for large datasets or complex tasks, it is highly recommended to use optimized libraries like PyTorch or TensorFlow, which provide efficient, pre-built LSTM modules.
23. Precision and Recall
To calculate precision and recall, we first need to understand these metrics in the context of classification tasks:
-
Precision: Measures the accuracy of positive predictions. It is the ratio of true positives (correctly predicted positive observations) to the total predicted positives (both true and false positives). It is given by (\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}).
-
Recall: Also known as sensitivity or true positive rate, measures the ability of a model to find all the relevant cases within a dataset. It is given by (\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}).
Let’s write a Python code to calculate precision and recall. This code will assume you have arrays of true labels and predicted labels.
NumPy Implementation
import numpy as np
def calculate_precision_recall(y_true, y_pred):
"""
Calculate precision and recall
y_true: numpy array of true labels
y_pred: numpy array of predicted labels
"""
true_positives = np.sum((y_true == 1) & (y_pred == 1))
false_positives = np.sum((y_true == 0) & (y_pred == 1))
false_negatives = np.sum((y_true == 1) & (y_pred == 0))
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
return precision, recall
# Example usage
y_true = np.array([1, 0, 1, 1, 0, 1])
y_pred = np.array([0, 0, 1, 1, 0, 1])
precision, recall = calculate_precision_recall(y_true, y_pred)
print("Precision:", precision)
print("Recall:", recall)
# NumPy test cases
def test_precision_recall_numpy():
y_true = np.array([1, 0, 1, 1, 0, 1])
y_pred = np.array([0, 0, 1, 1, 0, 1])
precision, recall = calculate_precision_recall(y_true, y_pred)
assert np.isclose(precision, 1.0)
assert np.isclose(recall, 0.75)
PyTorch Implementation
import torch
def calculate_precision_recall(y_true, y_pred):
"""
Calculate precision and recall using PyTorch
y_true: torch.Tensor of true labels
y_pred: torch.Tensor of predicted labels
"""
true_positives = torch.sum((y_true == 1) & (y_pred == 1))
false_positives = torch.sum((y_true == 0) & (y_pred == 1))
false_negatives = torch.sum((y_true == 1) & (y_pred == 0))
precision = true_positives.float() / (true_positives + false_positives).float()
recall = true_positives.float() / (true_positives + false_negatives).float()
return precision, recall
# Example usage
y_true = torch.tensor([1, 0, 1, 1, 0, 1])
y_pred = torch.tensor([0, 0, 1, 1, 0, 1])
precision, recall = calculate_precision_recall(y_true, y_pred)
print("Precision:", precision)
print("Recall:", recall)
# PyTorch test cases
def test_precision_recall_pytorch():
y_true = torch.tensor([1, 0, 1, 1, 0, 1])
y_pred = torch.tensor([0, 0, 1, 1, 0, 1])
precision, recall = calculate_precision_recall(y_true, y_pred)
assert torch.isclose(precision, torch.tensor(1.0))
assert torch.isclose(recall, torch.tensor(0.75))
In both implementations, y_true
and y_pred
are arrays or tensors of true and predicted labels, respectively. The functions calculate true positives, false positives, and false negatives, and then use these values to compute precision and recall.
24. Intersection over Union (IoU) for Target Detection
Description: IoU is a measure used to evaluate object detection models. It calculates the ratio of the area of overlap to the area of union between the predicted bounding box and the ground truth bounding box. The equation is (\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}).
Let’s implement this in both NumPy and PyTorch and provide test cases with pytest.
NumPy Implementation
import numpy as np
import pytest
def calculate_iou(boxA, boxB):
"""
Calculate the Intersection over Union (IoU) of two bounding boxes.
Parameters:
boxA (np.array): Numpy array [x1, y1, x2, y2] representing the first box,
where (x1, y1) is the top left coordinate,
and (x2, y2) is the bottom right coordinate.
boxB (np.array): Numpy array [x1, y1, x2, y2] for the second box.
Returns:
float: IoU of boxA and boxB.
"""
# Determine the coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# Compute the area of intersection
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
# Compute the area of both bounding boxes
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# Compute the IoU
iou = interArea / float(boxAArea + boxBArea - interArea)
return iou
# NumPy test case
def test_iou_numpy():
boxA = np.array([50, 50, 150, 150])
boxB = np.array([60, 60, 170, 160])
assert np.isclose(calculate_iou(boxA, boxB), 0.42105263)
PyTorch Implementation
import torch
import pytest
def calculate_iou_pytorch(boxA, boxB):
"""
Calculate the Intersection over Union (IoU) using PyTorch.
Parameters:
boxA (torch.Tensor): Tensor [x1, y1, x2, y2] representing the first box.
boxB (torch.Tensor): Tensor [x1, y1, x2, y2] for the second box.
Returns:
torch.Tensor: IoU of boxA and boxB.
"""
# Intersection rectangle coordinates
xA = torch.max(boxA[0], boxB[0])
yA = torch.max(boxA[1], boxB[1])
xB = torch.min(boxA[2], boxB[2])
yB = torch.min(boxA[3], boxB[3])
# Intersection area
interArea = torch.clamp(xB - xA + 1, min=0) * torch.clamp(yB - yA + 1, min=0)
# Areas of boxes
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# IoU
iou = interArea / (boxAArea + boxBArea - interArea)
return iou
# PyTorch test case
def test_iou_pytorch():
boxA = torch.tensor([50, 50, 150, 150])
boxB = torch.tensor([60, 60, 170, 160])
assert torch.isclose(calculate_iou_pytorch(boxA, boxB), torch.tensor(0.42105263))
To run the tests, save these functions in your Python file and execute pytest. The np.isclose
and torch.isclose
functions are used to check if the calculated IoU is close to the expected value, considering the floating-point precision.
Given the complexity of the request, we’ll handle each algorithm (AUC/ROC Curve and Cross-Entropy Loss) separately. Let’s start with the AUC/ROC Curve:
25. AUC/ROC Curve
Description: The Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve is a performance measurement for classification problems. The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. The AUC represents the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.
For AUC/ROC Curve, direct computation in NumPy or PyTorch is non-trivial and usually involves using existing libraries like scikit-learn
for Python. However, I can demonstrate how to calculate TPR and FPR, which are essential components of the ROC curve.
NumPy Implementation for TPR and FPR
import numpy as np
import pytest
def calculate_tpr_fpr(y_true, y_scores, threshold):
"""
Calculate True Positive Rate (TPR) and False Positive Rate (FPR).
Parameters:
y_true (np.array): Array of true binary labels.
y_scores (np.array): Array of scores, typically probability estimates.
threshold (float): Threshold for classifying a score as positive.
Returns:
float, float: TPR and FPR
"""
# Binarize predictions based on the threshold
y_pred = (y_scores >= threshold).astype(int)
# True positives, false positives, true negatives, false negatives
TP = np.sum((y_true == 1) & (y_pred == 1))
FP = np.sum((y_true == 0) & (y_pred == 1))
FN = np.sum((y_true == 1) & (y_pred == 0))
TN = np.sum((y_true == 0) & (y_pred == 0))
# Calculate TPR and FPR
TPR = TP / (TP + FN) if (TP + FN) != 0 else 0
FPR = FP / (FP + TN) if (FP + TN) != 0 else 0
return TPR, FPR
# NumPy test case for TPR and FPR
def test_tpr_fpr_numpy():
y_true = np.array([1, 0, 1, 1, 0, 1])
y_scores = np.array([0.9, 0.1, 0.8, 0.7, 0.2, 0.9])
threshold = 0.5
TPR, FPR = calculate_tpr_fpr(y_true, y_scores, threshold)
assert np.isclose(TPR, 0.75) and np.isclose(FPR, 0.0)
PyTorch Implementation for TPR and FPR
For PyTorch, the implementation would be similar, but we’d use PyTorch tensors instead of NumPy arrays. However, it’s important to note that PyTorch is typically used for building and training neural network models, and direct computation of ROC AUC might be more efficiently done using a specialized library like scikit-learn
.
26. Cross-Entropy Loss
Description: Cross-Entropy Loss, also known as Log Loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. The equation is (-\sum [y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})]), where (y) is the true label, and (\hat{y}) is the predicted probability.
NumPy Implementation
import numpy as np
import pytest
def cross_entropy_loss(y_true, y_pred):
"""
Calculate the cross-entropy loss.
Parameters:
y_true (np.array): Array of true binary labels (0 or 1).
y_pred (np.array): Array of predicted probabilities.
Returns:
float: Cross-entropy loss
"""
# Small epsilon to avoid log(0)
epsilon = 1e-15
# Clipping y_pred between epsilon and 1-epsilon.
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
# Calculate cross-entropy loss
loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
return loss
# NumPy test case for Cross-Entropy Loss
def test_cross_entropy_loss_numpy():
y_true = np.array([1, 0, 1, 1, 0, 1])
y_pred = np.array([0.9, 0.1, 0.8, 0.7, 0.2, 0.9])
assert np.isclose(cross_entropy_loss(y_true, y_pred), 0.164252033486018)
PyTorch Implementation
import torch
import pytest
def cross_entropy_loss_pytorch(y_true, y_pred):
"""
Calculate the cross-entropy loss using PyTorch.
Parameters:
y_true (torch.Tensor): Tensor of true binary labels (0 or 1).
y_pred (torch.Tensor): Tensor of predicted probabilities.
Returns:
torch.Tensor: Cross-entropy loss
"""
# Small epsilon to avoid log(0)
epsilon = 1e-15
# Clipping y_pred between epsilon and 1-epsilon.
y_pred = torch.clamp(y_pred, epsilon, 1 - epsilon)
# Calculate cross-entropy loss
loss = -torch.mean(y_true * torch.log(y_pred) + (1 - y_true) * torch.log(1 - y_pred))
return loss
# PyTorch test case for Cross-Entropy Loss
def test_cross_entropy_loss_pytorch():
y_true = torch.tensor([1, 0, 1, 1, 0, 1], dtype=torch.float32)
y_pred = torch.tensor([0.9, 0.1, 0.8, 0.7, 0.2, 0.9], dtype=torch.float32)
assert torch.isclose(cross_entropy_loss_pytorch(y_true, y_pred), torch.tensor(0.164252), atol=1e-5)
These implementations calculate the Cross-Entropy Loss for a given set of true labels and predicted probabilities. The loss measures how well the predicted probabilities match the actual labels. The small epsilon
value ensures numerical stability by preventing taking the logarithm of zero. The test cases check whether the implemented functions return expected loss values for a sample set of true labels and predictions.
SparseVector Implementation in Python (Using NumPy)
import numpy as np
import pytest
class SparseVector:
def __init__(self, nums):
"""
Initialize the SparseVector with the non-zero elements.
nums (list[int]): List of integers representing the sparse vector.
"""
self.non_zero_elements = {i: val for i, val in enumerate(nums) if val != 0}
def dotProduct(self, vec):
"""
Compute the dot product between this SparseVector and another SparseVector.
vec (SparseVector): Another SparseVector instance.
Returns:
int: The dot product result.
"""
result = 0
# Iterate through the non-zero elements of this vector.
for i, val in self.non_zero_elements.items():
# Add to result if the corresponding element in vec is non-zero.
result += val * vec.non_zero_elements.get(i, 0)
return result
# NumPy test cases
def test_sparse_vector_numpy():
nums1 = [1, 0, 0, 2, 3]
nums2 = [0, 3, 0, 4, 0]
v1 = SparseVector(nums1)
v2 = SparseVector(nums2)
assert v1.dotProduct(v2) == 8
nums1 = [0, 1, 0, 0, 0]
nums2 = [0, 0, 0, 0, 2]
v1 = SparseVector(nums1)
v2 = SparseVector(nums2)
assert v1.dotProduct(v2) == 0
SparseVector Implementation in Python (Using PyTorch)
import torch
import pytest
class SparseVectorPyTorch:
def __init__(self, nums):
"""
Initialize the SparseVector with the non-zero elements.
nums (list[int]): List of integers representing the sparse vector.
"""
# Convert the nums list to a PyTorch tensor.
nums_tensor = torch.tensor(nums)
# Store indices and values of non-zero elements.
self.indices = torch.nonzero(nums_tensor, as_tuple=True)[0]
self.values = nums_tensor[self.indices]
def dotProduct(self, vec):
"""
Compute the dot product between this SparseVector and another SparseVector.
vec (SparseVectorPyTorch): Another SparseVector instance.
Returns:
int: The dot product result.
"""
# Initialize the result
result = 0
# Iterate through non-zero elements
for idx, val in zip(self.indices, self.values):
# Add to result if corresponding element in vec is non-zero.
result += val * vec.values[vec.indices == idx].sum()
return result
# PyTorch test cases
def test_sparse_vector_pytorch():
nums1 = [1, 0, 0, 2, 3]
nums2 = [0, 3, 0, 4, 0]
v1 = SparseVectorPyTorch(nums1)
v2 = SparseVectorPyTorch(nums2)
assert v1.dotProduct(v2) == 8
nums1 = [0, 1, 0, 0, 0]
nums2 = [0, 0, 0, 0, 2]
v1 = SparseVectorPyTorch(nums1)
v2 = SparseVectorPyTorch(nums2)
assert v1.dotProduct(v2) == 0
Explanation
- Initialization: In both implementations, the
__init__
method stores the non-zero elements and their indices. For NumPy, this is a dictionary, and for PyTorch, two tensors are used (one for indices and one for values). - Dot Product: The
dotProduct
method iterates over the non-zero elements of one vector and multiplies them with the corresponding elements (if non-zero) of the other vector. The sum of these products is the dot product of the two sparse vectors.
Testing with pytest
- The test cases provided verify the functionality by creating instances of
SparseVector
orSparseVectorPyTorch
with given sparse vectors and then checking if the dot product matches the expected output.
To run the tests, you would save these class definitions and test functions in a Python file and execute pytest in your terminal. The testing framework will automatically discover and run the tests, validating the correctness of the implementation.
Prompt hold
- For the below algorithms, give me a one to two line description with the equation being implemented if applicable, implementation in numpy and pytorch with test case with pytest.Take multiple rounds of prompts if needed. Add detailed comments explaining variables and what is happening on each line: