Introduction to PyTorch
This blog post will be about some basic introductions to PyTorch, including tensors, and how to train your own model in PyTorch.
Prerequisites
- Basic Maths
- Basic Python
Aims
- Understand tensors
- Understand the steps to build and train a model in PyTorch
Introduction
“Artificial Intelligence, deep learning, machine learning — whatever you're doing if you don't understand it — learn it. Because otherwise you're going to be a dinosaur within 3 years.” ~ Mark Cuban.
Tensors
https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html
Tensors are specialized arrays or matrices. It can have as many dimensions as you want. We use tensors to encode the data in the PyTorch, as well as the model parameters. Specifically, if you perform operations on a PyTorch tensor (e.g. addition, multiplication), your action will be saved into PyTorch’s built-in dynamic computation graph, which is necessary for model backpropagation in training.
Here’s an example of some tensors:
import torch
# Create a tensor from a list/array:
data1 = [0, 1, 2, 3, 4, 5] # 1D array
data2 = [[0, 1], [2, 3], [4, 5]] # 2D array
tensor1 = torch.tensor(data1)
tensor2 = torch.tensor(data2)
print(tensor1.shape) # torch.Size([6])
print(tensor2.shape) # torch.Size([3, 2])
# Create a tensor filled with zeros or ones:
zeros_tensor = torch.zeros(3, 5, 4, 2, 1, 4) # A 6D tensor
ones_tensor = torch.ones(3, 5)
# Create a random tensor from a specified shape:
random_tensor1 = torch.randn(10, 20) # A 10 by 20 tensor with normal distribution
print(random_tensor1.shape) # torch.Size([10, 20])
print(random_tensor1[0, :10]) # You can use python array index slicing
random_tensor2 = torch.randint(0, 100, (5, 10)) # 5 by 10 tensor with integer values ranging from 0 to 100
print(random_tensor2[0, 3:10])
# Attributes of a tensor:
print(tensor1.shape)
print(tensor1.dtype) # data type, e.g. float32, int64, float64, Bfloat16
print(tensor1.device) # device the tensor is stored on, e.g. cpu, cuda
Training a Simple Model in PyTorch
The steps of training a model in PyTorch includes:
- Find a dataset that suits your problem, download it, and create a dataloader.
- Define your model, you can create your own model module using
nn.Module
. - Define your loss function (from
torch.nn
), optimizers (fromtorch.optim
), etc. - Define your training loop, it can be a function that does a single step and write a loop, or just simply a training loop.
- Start training!
For this example, we are going to use the MNIST dataset, which is built-in in PyTorch, making it very easy to download and use.
First import the necessary libraries:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
Finding and defining your dataset
PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.
The torchvision.datasets
module contains Dataset
objects for many real-world vision data like CIFAR, COCO. In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset
includes two arguments: transform
and target_transform
to modify the samples and labels respectively.
# Download training data from the MNIST dataset.
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(), # Convert to PyTorch tensor.
)
# Download test data from the MNIST dataset.
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor(),
)
We pass the Dataset
as an argument to DataLoader
. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.
batch_size = 64
# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
for X, y in test_dataloader:
print(f"Shape of X [N, C, H, W]: {X.shape}") # N, C, H, W stands for Batch Size, Channel Size, Height, and Width
print(f"Shape of y: {y.shape} {y.dtype}")
break
Create the model
To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__
function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU (cuda
) or MPS if available.
# Get cpu, gpu (cuda) device for training.
device = (
"cuda" if torch.cuda.is_available() else "cpu"
)
print(f"Using {device} device")
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten() # Flattens to (Batch Size, Channel Size * Height * Width), from a 4D tensor to a 2D tensor
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512), # Fully-connected hidden layer
nn.ReLU(), # Activation Function
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
) # When `linear_relu_stack` is called, it will run all the modules inside in order.
def forward(self, x):
x = self.flatten(x) # flatten to 2D, same as x.view(x.size(0), -1)
logits = self.linear_relu_stack(x) # the nn.Sequential instance
return logits
# Create an instance
model = NeuralNetwork().to(device)
print(model)
Learn More About Building Neural Networks in PyTorch Here
Define the Loss Function and the Optimizer
We are using the Cross Entropy Loss loss function and the Stochastic Gradient Descent (SGD) optimizer for training this model.
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
Define the training loop
First we define the function to train a single step:
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device) # Move tensors to cuda if available
# Compute prediction error
pred = model(X) # Forward pass
loss = loss_fn(pred, y) # Compute loss
# Backpropagation
loss.backward() # Compute gradients
optimizer.step() # Update parameters
optimizer.zero_grad() # Zero the gradients
if batch % 100 == 0:
loss, current = loss.item(), (batch + 1) * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
Then a function to evaluate (or validate). The goal of this is to check the model’s performance against the test dataset to ensure it is learning, and monitor if it is overfitting.
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
The training process is conducted over several iterations (epochs) that go over the entire training dataset. During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)
print("Done!")
Read More About Model Training Here
Tasks
- Plot some graphs to show the training loss, validation loss, and validation accuracy over step.
- Experiment with different learning rate, batch size, and model architecture, how does it affect the results?