Introduction to PyTorch

December 13, 2023 · 6 min read

Some Random Tree

This blog post will be about some basic introductions to PyTorch, including tensors, and how to train your own model in PyTorch.

Prerequisites

Basic Maths
Basic Python

Aims

Understand tensors
Understand the steps to build and train a model in PyTorch

Introduction

“Artificial Intelligence, deep learning, machine learning — whatever you're doing if you don't understand it — learn it. Because otherwise you're going to be a dinosaur within 3 years.” ~ Mark Cuban.

Tensors

https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html

Tensors are specialized arrays or matrices. It can have as many dimensions as you want. We use tensors to encode the data in the PyTorch, as well as the model parameters. Specifically, if you perform operations on a PyTorch tensor (e.g. addition, multiplication), your action will be saved into PyTorch’s built-in dynamic computation graph, which is necessary for model backpropagation in training.

Here’s an example of some tensors:

import torch

# Create a tensor from a list/array:

data1 = [0, 1, 2, 3, 4, 5] # 1D array
data2 = [[0, 1], [2, 3], [4, 5]] # 2D array

tensor1 = torch.tensor(data1)
tensor2 = torch.tensor(data2)

print(tensor1.shape) # torch.Size([6])
print(tensor2.shape) # torch.Size([3, 2])

# Create a tensor filled with zeros or ones:

zeros_tensor = torch.zeros(3, 5, 4, 2, 1, 4) # A 6D tensor
ones_tensor = torch.ones(3, 5)

# Create a random tensor from a specified shape:

random_tensor1 = torch.randn(10, 20) # A 10 by 20 tensor with normal distribution
print(random_tensor1.shape) # torch.Size([10, 20])
print(random_tensor1[0, :10]) # You can use python array index slicing

random_tensor2 = torch.randint(0, 100, (5, 10)) # 5 by 10 tensor with integer values ranging from 0 to 100
print(random_tensor2[0, 3:10])

# Attributes of a tensor:

print(tensor1.shape)
print(tensor1.dtype) # data type, e.g. float32, int64, float64, Bfloat16
print(tensor1.device) # device the tensor is stored on, e.g. cpu, cuda

Training a Simple Model in PyTorch

The steps of training a model in PyTorch includes:

Find a dataset that suits your problem, download it, and create a dataloader.
Define your model, you can create your own model module using nn.Module.
Define your loss function (from torch.nn), optimizers (from torch.optim), etc.
Define your training loop, it can be a function that does a single step and write a loop, or just simply a training loop.
Start training!

For this example, we are going to use the MNIST dataset, which is built-in in PyTorch, making it very easy to download and use.

First import the necessary libraries:

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Finding and defining your dataset

PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.

The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO. In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

# Download training data from the MNIST dataset.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(), # Convert to PyTorch tensor.
)

# Download test data from the MNIST dataset.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}") # N, C, H, W stands for Batch Size, Channel Size, Height, and Width
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Create the model

To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU (cuda) or MPS if available.

# Get cpu, gpu (cuda) device for training.
device = (
    "cuda" if torch.cuda.is_available() else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten() # Flattens to (Batch Size, Channel Size * Height * Width), from a 4D tensor to a 2D tensor
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512), # Fully-connected hidden layer
            nn.ReLU(), # Activation Function
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        ) # When `linear_relu_stack` is called, it will run all the modules inside in order.

    def forward(self, x):
        x = self.flatten(x) # flatten to 2D, same as x.view(x.size(0), -1)
        logits = self.linear_relu_stack(x) # the nn.Sequential instance
        return logits

# Create an instance
model = NeuralNetwork().to(device)
print(model)

Learn More About Building Neural Networks in PyTorch Here

Define the Loss Function and the Optimizer

We are using the Cross Entropy Loss loss function and the Stochastic Gradient Descent (SGD) optimizer for training this model.

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

Define the training loop

First we define the function to train a single step:

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device) # Move tensors to cuda if available

        # Compute prediction error
        pred = model(X) # Forward pass
        loss = loss_fn(pred, y) # Compute loss

        # Backpropagation
        loss.backward() # Compute gradients
        optimizer.step() # Update parameters
        optimizer.zero_grad() # Zero the gradients

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

Then a function to evaluate (or validate). The goal of this is to check the model’s performance against the test dataset to ensure it is learning, and monitor if it is overfitting.

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs) that go over the entire training dataset. During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Tasks

Plot some graphs to show the training loss, validation loss, and validation accuracy over step.
Experiment with different learning rate, batch size, and model architecture, how does it affect the results?

Prerequisites​

Aims​

Introduction​

Tensors​

Training a Simple Model in PyTorch​

Finding and defining your dataset​

Create the model​

Define the Loss Function and the Optimizer​

Define the training loop​

Tasks​