Digit Recognition Optimization Series 00: Data Collection & Testing

Introduction

In the previous series, I completed the digit recognition project using LVGL.
However, the initial accuracy of digit recognition was not satisfactory.

Therefore, in this new series, I will focus on optimizing the model based on the current hardware setup.

The first step is to generate a test dataset directly on the target hardware to accurately evaluate performance.

The original dataset provided by Espressif was generated using a touchpad as input.

In contrast, I am now using the LVGL canvas (128 x 128) as input, and then applying linear interpolation to compress the data to 30 x 25 pixels.

How to Efficiently Create the Dataset

There are two approaches I considered:

Save the pixel data to an SD card.
Send the data via UART to a PC.

After evaluating both options, I chose option 2 (UART to PC), as it is more flexible and requires less code on the ESP32 side.

A Python script can then be used to handle and process the dataset.

Implementation

ESP32 Side

Add a function to send the data via UART using a simple protocol:
- Start with the string: START,
- End with the string: ,END followed by a newline \r\n
- Data is encoded as integer ASCII values, separated by commas.

By using the API below, you can send raw data output without additional parasitic log messages:

1	void esp_log_write(esp_log_level_t level, const char tag, const char format, ...)

The core function in C for testing model accuracy is straightforward.

digit_test_data is a 2D array: [number of samples] x [750 pixels].
digit_test_label is a 1D array representing the digit value for each row in digit_test_data.

Code to Send Pixel Data to the PC

if (xQueueReceive(xImageQueue, &image_data, portMAX_DELAY) == pdTRUE)
{
    // g_image.print();

    // Send data via UART
    esp_log_write(ESP_LOG_INFO, TAG, "START,");
    for (int y = 0; y < 25; y++)
    {
        for (int x = 0; x < 30; x++)
        {
            esp_log_write(ESP_LOG_INFO, TAG, "%d,", g_image.data[y * 30 + x]);
        }
    }
    esp_log_write(ESP_LOG_INFO, TAG, "END\n"); // Note: '\n' will be auto-converted to '\r\n' after sending the log

    g_image.clear();
}

Note: ESP32 automatically converts "\n" to "\r\n" in log output.

Code to Test the Samples

for (int i = 0; i < DIGIT_TEST_NUM; i++)
{
    memcpy(g_image.data, (uint8_t *)digit_test_data[i], sizeof(mnistData));
    int result = touch_digit_recognition->predict(g_image.data);

    if (result == digit_test_label[i])
    {
        ESP_LOGI(TAG, "Test sample %d: Ground Truth: %d, Prediction: %d -- Correct", i, digit_test_label[i], result);
        success++;
    }
    else
    {
        ESP_LOGI(TAG, "Test sample %d: Ground Truth: %d, Prediction: %d -- Incorrect", i, digit_test_label[i], result);
        failure++;
    }
}

ESP_LOGI(TAG, "Test completed. Total: %d, Success: %d, Failure: %d", DIGIT_TEST_NUM, success, failure);
// Print out the percentage of success
ESP_LOGI(TAG, "Success rate: %.2f%%", (success * 100.0) / DIGIT_TEST_NUM);

Python Side

Two scripts were developed:

data_collect_ui.py: A GUI for collecting pixel data.
c_code_generate.py: Converts the pixel data to a C array.

`data_collect_ui.py`

Saves all bytes into a buffer.
When a \n is received, it decodes the data.
Saves the data to a file in PNG format.

`c_code_generate.py`

Converts the PNG files to C arrays.

Full code is available at:
https://github.com/tommokmok/esp32s3_lvgl_digit_recongnition/tree/test/datatset_create

Lessons Learned

When using esp_log_write to send data, note that ESP-IDF will automatically convert "\n" to "\r\n". This may be configurable—worth investigating further.
Key takeaways from the Python code:
- The output of self.ser.read(1024) is in bytes, so special characters like "\n" are represented as hex code 0x0A.
- ESP32 sends out "\r\n" instead of just "\n", even if only "\n" is specified in the code.
- raw_data = parts[1:-1] excludes the last element, so "END\r\n" is not included in the data.
- The self.buffer must be cleared before the next parse; otherwise, data will accumulate and cause errors.

def run(self):
    while self.running:
        try:
            data = self.ser.read(1024)
            if data:
                char = data.decode(errors='ignore')
                self.buffer += char

                if data[-1] == 0x0a: 
                    # print(f"Full buffer before processing: {self.buffer}")  # Debug print to trace full buffer
                    parts = self.buffer.split(',')

                    if parts[0] == "START" and parts[-1] == "END\r\n":
                        # print(f"Processing the data")
                        raw_data = parts[1:-1]  # Note: Does not include the last element
                        self.save_as_png_if_valid(raw_data)
                        self.callback(raw_data)
                        # print(f"Received raw data: {','.join(raw_data)}")
                        self.buffer = ""
        except Exception:
            pass

Note: There is a bug in the above code—sometimes, two data arrays are received at once. Before saving to a PNG file, always check the array size.

def save_as_png_if_valid(self, str_list):
    """
    Convert string list to integer array and save as PNG if size is 750 (30x25).
    """
    if len(str_list) != 750:
        print(f"Data size is not 750, got {len(str_list)}")
        return