Digit Recognition Optimization Series 00: Data Collection & Testing

Introduction

In the previous series, I completed the digit recognition project using LVGL.
However, the initial accuracy of digit recognition was not satisfactory.

Therefore, in this new series, I will focus on optimizing the model based on the current hardware setup.

The first step is to generate a test dataset directly on the target hardware to accurately evaluate performance.

The original dataset provided by Espressif was generated using a touchpad as input.

In contrast, I am now using the LVGL canvas (128 x 128) as input, and then applying linear interpolation to compress the data to 30 x 25 pixels.

How to Efficiently Create the Dataset

There are two approaches I considered:

  1. Save the pixel data to an SD card.
  2. Send the data via UART to a PC.

After evaluating both options, I chose option 2 (UART to PC), as it is more flexible and requires less code on the ESP32 side.

A Python script can then be used to handle and process the dataset.

Implementation

ESP32 Side

  1. Add a function to send the data via UART using a simple protocol:
    • Start with the string: START,
    • End with the string: ,END followed by a newline \r\n
    • Data is encoded as integer ASCII values, separated by commas.

By using the API below, you can send raw data output without additional parasitic log messages:

1
void esp_log_write(esp_log_level_t level, const char *tag, const char *format, ...)

The core function in C for testing model accuracy is straightforward.

  • digit_test_data is a 2D array: [number of samples] x [750 pixels].
  • digit_test_label is a 1D array representing the digit value for each row in digit_test_data.

Code to Send Pixel Data to the PC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
if (xQueueReceive(xImageQueue, &image_data, portMAX_DELAY) == pdTRUE)
{
// g_image.print();

// Send data via UART
esp_log_write(ESP_LOG_INFO, TAG, "START,");
for (int y = 0; y < 25; y++)
{
for (int x = 0; x < 30; x++)
{
esp_log_write(ESP_LOG_INFO, TAG, "%d,", g_image.data[y * 30 + x]);
}
}
esp_log_write(ESP_LOG_INFO, TAG, "END\n"); // Note: '\n' will be auto-converted to '\r\n' after sending the log

g_image.clear();
}

Note: ESP32 automatically converts "\n" to "\r\n" in log output.

Code to Test the Samples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
for (int i = 0; i < DIGIT_TEST_NUM; i++)
{
memcpy(g_image.data, (uint8_t *)digit_test_data[i], sizeof(mnistData));
int result = touch_digit_recognition->predict(g_image.data);

if (result == digit_test_label[i])
{
ESP_LOGI(TAG, "Test sample %d: Ground Truth: %d, Prediction: %d -- Correct", i, digit_test_label[i], result);
success++;
}
else
{
ESP_LOGI(TAG, "Test sample %d: Ground Truth: %d, Prediction: %d -- Incorrect", i, digit_test_label[i], result);
failure++;
}
}

ESP_LOGI(TAG, "Test completed. Total: %d, Success: %d, Failure: %d", DIGIT_TEST_NUM, success, failure);
// Print out the percentage of success
ESP_LOGI(TAG, "Success rate: %.2f%%", (success * 100.0) / DIGIT_TEST_NUM);

Python Side

Two scripts were developed:

  • data_collect_ui.py: A GUI for collecting pixel data.
  • c_code_generate.py: Converts the pixel data to a C array.

data_collect_ui.py

  1. Saves all bytes into a buffer.
  2. When a \n is received, it decodes the data.
  3. Saves the data to a file in PNG format.

c_code_generate.py

  1. Converts the PNG files to C arrays.

Full code is available at:
https://github.com/tommokmok/esp32s3_lvgl_digit_recongnition/tree/test/datatset_create

Lessons Learned

  1. When using esp_log_write to send data, note that ESP-IDF will automatically convert "\n" to "\r\n". This may be configurable—worth investigating further.
  2. Key takeaways from the Python code:
    • The output of self.ser.read(1024) is in bytes, so special characters like "\n" are represented as hex code 0x0A.
    • ESP32 sends out "\r\n" instead of just "\n", even if only "\n" is specified in the code.
    • raw_data = parts[1:-1] excludes the last element, so "END\r\n" is not included in the data.
    • The self.buffer must be cleared before the next parse; otherwise, data will accumulate and cause errors.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def run(self):
while self.running:
try:
data = self.ser.read(1024)
if data:
char = data.decode(errors='ignore')
self.buffer += char

if data[-1] == 0x0a:
# print(f"Full buffer before processing: {self.buffer}") # Debug print to trace full buffer
parts = self.buffer.split(',')

if parts[0] == "START" and parts[-1] == "END\r\n":
# print(f"Processing the data")
raw_data = parts[1:-1] # Note: Does not include the last element
self.save_as_png_if_valid(raw_data)
self.callback(raw_data)
# print(f"Received raw data: {','.join(raw_data)}")
self.buffer = ""
except Exception:
pass

Note: There is a bug in the above code—sometimes, two data arrays are received at once. Before saving to a PNG file, always check the array size.

1
2
3
4
5
6
7
def save_as_png_if_valid(self, str_list):
"""
Convert string list to integer array and save as PNG if size is 750 (30x25).
"""
if len(str_list) != 750:
print(f"Data size is not 750, got {len(str_list)}")
return

Test Results

  • Accuracy: ~88%

Next Steps

  • Retrain the model using the new dataset collected from the current hardware.