Introduction

In the previous blog, we updated the example code to be compatible with a general ESP32-S3 dev kit module.

One issue with the ESP-DL digit recognition example is that they do not provide the original PyTorch model or the output files generated after the esp-ppq process. Only the final .espdl file is provided.

Therefore, we need to build our own model based on their dataset.

Fortunately, Espressif provides the code for training, testing, and quantizing the model.

In this post, I will walk through the entire process: building the .espdl model using PyTorch and the esp-ppq library, and finally deploying it to the ESP32-S3 to verify it works.

How This Blog May Help You

  • Learn how to build a neural network using PyTorch.
  • Understand the complete workflow from dataset preparation to deployment on the ESP32-S3.

Prerequisites

  • Install the ESP-IDF according to the official guide.
  • Google Colab environment or similar for model training and quantization.

Development Environment

  • MCU: General ESP32-S3 dev kit (Waveshare ESP32-S3-DEV-Kit-N8R8)
  • IDE: VS Code with ESP-IDF extension
  • IDF version: v5.4.2
  • ESP-DL version: v3.1.5
  • touchpad-digit-recognition example code: Commit id: 3e35842 date: Jun 9, 2025
  • Python Version: v3.10.12 (Must use this version as esp-ppq does not support Python >3.10)
  • PyTorch Version: v2.8 Stable
  • esp-dl package version: v1.0.1

Building the Model Using PyTorch: Step by Step

To simplify the process, I recommend using Google Colab for model building.

Since the default Python version in Colab is v3.12, we will use konda to set up a virtual environment with Python 3.10 for esp-ppq compatibility.

Most of the code is adapted from the example, with added visualization functions to inspect the dataset.

1. Install konda for the Virtual Python Environment

konda is a package that allows you to use conda in Colab. See konda GitHub for details.

1
2
3
4
5
6
7
!pip install konda

import konda
konda.install()

# Check if conda is installed correctly
!conda --version

2. Create a Python 3.10 Environment Using konda

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Accept the channels
!conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
!conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Create virtual environment
!konda create -n esp_dl python=3.10 -y

# Activate the environment
!konda activate esp_dl

# Check Python version
!konda run "python --version"

# Install dependencies
!konda run "pip3 install torch torchvision esp-ppq"

# Check torch and esp-ppq versions
!konda run "pip list"

Note: The package name for import is esp_ppq, not esp-ppq.

Note: Installing torch and esp-ppq may take some time.

3. Download the Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import requests
import zipfile
from pathlib import Path

data_path = Path("/data/")
image_path = data_path / "dataset"

if image_path.is_dir():
print(f"{image_path} directory exists.")
else:
print(f"Did not find {image_path} directory, creating one...")
image_path.mkdir(parents=True, exist_ok=True)

# Download data
with open(data_path / "dataset.zip", "wb") as f:
request = requests.get("https://dl.espressif.com/AE/esp-iot-solution/touch_dataset.zip")
print("Downloading data...")
f.write(request.content)

# Unzip data
with zipfile.ZipFile(data_path / "dataset.zip", "r") as zip_ref:
print("Unzipping data...")
zip_ref.extractall(image_path)

4. Prepare the Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.RandomAffine(degrees=10, translate=(0.1, 0.1)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])

dataset = datasets.ImageFolder(root='/data/dataset/touch_dataset', transform=transform)

train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=32, shuffle=False)

5. Visualize the Dataset

Function to check random images from the dataset:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import random
import matplotlib.pyplot as plt

def display_random_images(dataset: torch.utils.data.dataset.Dataset,
classes: list[str] = None,
n: int = 10,
display_shape: bool = True,
seed: int = None):

if n > 10:
n = 10
display_shape = False
print(f"For display purposes, n shouldn't be larger than 10, setting to 10 and removing shape display.")

# 3. Set random seed
if seed:
random.seed(seed)

# 4. Get random sample indexes
random_samples_idx = random.sample(range(len(dataset)), k=n)

# 5. Setup plot
plt.figure(figsize=(16, 8))

# 6. Loop through samples and display random samples
for i, targ_sample in enumerate(random_samples_idx):
targ_image, targ_label = dataset[targ_sample][0], dataset[targ_sample][1]

# 7. Adjust image tensor shape for plotting: [color_channels, height, width] -> [color_channels, height, width]
targ_image_adjust = targ_image.permute(1, 2, 0)

# Plot adjusted samples
plt.subplot(1, n, i+1)
plt.imshow(targ_image_adjust)
plt.axis("off")
if classes:
title = f"class: {classes[targ_label]}"
if display_shape:
title = title + f"\nshape: {targ_image_adjust.shape}"
plt.title(title)

# Display random images from ImageFolder created Dataset
display_random_images(train_dataset,
n=5,
classes=class_names,
seed=42)

check-dataset

6. Define Loss Function and Optimizer

Use standard loss and optimizer for image classification:

1
2
3
4
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

7. Training Process

Set num_epochs to 50 for faster training.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def train_epoch(model, train_loader, criterion, optimizer, device):
model.train()
running_loss = 0.0
correct = 0
total = 0

for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

running_loss += loss.item()
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

epoch_loss = running_loss / len(train_loader)
epoch_acc = 100 * correct / total
return epoch_loss, epoch_acc

def test_epoch(model, test_loader, criterion, device):
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
running_loss += loss.item()
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

epoch_loss = running_loss / len(test_loader)
epoch_acc = 100 * correct / total
return epoch_loss, epoch_acc

num_epochs = 50
train_acc_array = []
test_acc_array = []
for epoch in range(num_epochs):
train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
test_loss, test_acc = test_epoch(model, test_loader, criterion, device)

print(f'Epoch [{epoch + 1}/{num_epochs}], '
f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, '
f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%')
train_acc_array.append(train_acc)
test_acc_array.append(test_acc)

torch.save(model.state_dict(), '/content/my_final_model.pth')

training

Now you have a trained PyTorch model.

8. Quantize the Model Using esp-ppq

Since we are using konda, the quantization script should be saved to a file. Use %%writefile in Colab.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
%%writefile espdl_model_quant.py
import torch
from PIL import Image
from esp_ppq.api import espdl_quantize_torch
from torch.utils.data import Dataset
from torch.utils.data import random_split
from torchvision import transforms, datasets
BATCH_SIZE = 32
INPUT_SHAPE = [1, 25, 30]
TARGET = "esp32s3"
NUM_OF_BITS = 8
ESPDL_MODEL_PATH = "./my_final_model.espdl"
DEVICE = "cpu"
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.model = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),

torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),

torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),

torch.nn.Flatten(),
torch.nn.Linear(in_features=7 * 6 * 64, out_features=256),
torch.nn.ReLU(),
torch.nn.Dropout(p=0.5),
torch.nn.Linear(in_features=256, out_features=10),
torch.nn.Softmax(dim=1)
)

def forward(self, x):
output = self.model(x)
return output

class FeatureOnlyDataset(Dataset):
def __init__(self, original_dataset):
self.features = []
for item in original_dataset:
self.features.append(item[0])

def __len__(self):
return len(self.features)

def __getitem__(self, idx):
return self.features[idx]


def collate_fn2(batch):
features = torch.stack(batch)
return features.to(DEVICE)

transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])

dataset = datasets.ImageFolder(root='/data/dataset/touch_dataset', transform=transform)
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

image = Image.open("/data/dataset/touch_dataset/9/20250225_140331.png").convert('L')
input_tensor = transform(image).unsqueeze(0)

feature_only_test_data = FeatureOnlyDataset(test_dataset)

testDataLoader = torch.utils.data.DataLoader(dataset=feature_only_test_data, batch_size=32, shuffle=False,
collate_fn=collate_fn2)

model = Net().to(DEVICE)
model.load_state_dict(torch.load("/content/my_final_model.pth", map_location=DEVICE))
model.eval()

quant_ppq_graph = espdl_quantize_torch(
model=model,
espdl_export_file=ESPDL_MODEL_PATH,
calib_dataloader=testDataLoader,
calib_steps=8,
input_shape=[1] + INPUT_SHAPE,
inputs=[input_tensor],
target=TARGET,
num_of_bits=NUM_OF_BITS,
device=DEVICE,
error_report=True,
skip_export=False,
export_test_values=True,
verbose=1,
dispatching_override=None
)

9. Run the Quantization Script in the Python 3.10 Environment

1
!konda run "python espdl_model_quant.py"

quantization-finished

10. After Quantization

You will get the .onnx model, which you can inspect using Netron.

onnx-viewer

11. Copy the .espdl File to the ESP32-S3 Working Directory

copy-espdl

12. Update CMakeLists.txt to Use the New Model

update-cmake-list

13. Build and Flash the Firmware Using Your Own Model

14. Success!

seems-works

Issue Summary

  • As of this writing, the esp-dl package requires Python version <=3.10. It does not work with Python >3.12.
  • When importing the package, use esp_ppq (with an underscore), not esp-ppq.
  • Most of the time was spent figuring out how to change the Python version in Google Colab. Using konda to create a virtual environment solved this issue.

In the next post in this series, I will review the model and highlight all the important points about the neural network architecture used in digit recognition.