0%

Learning ESP-DL Series 01: Digi Recognition

Introduction

In the previous blog, we verified the hardware and development environment.

This time, we will go through another interesting example: touchpad-digit-recognition.

This is an ESP AI example that demonstrates the complete deep learning development workflow, from data collection and training to device deployment.

In this blog, I will clone this example and run the code on my own hardware, which does not have a touch sensor.

How This Blog May Help You

  • Learn ESP-DL in detail.
  • If you want to run the example code on your own hardware, this blog will illustrate some pitfalls encountered during code refactoring.
  • You should comment out the touch sensor driver and button driver code if you are not using the same hardware as the example.
  • Make sure you clearly understand the col and row definitions in the example code.

Prerequisites

Development Environment

  • MCU: General ESP32-S3 dev kit (Waveshare ESP32-S3-DEV-Kit-N8R8)
  • IDE: VS Code with ESP-IDF extension
  • IDF version: v5.4.2
  • ESP-DL version: v3.1.5
  • touchpad-digit-recognition example code: Commit id: 3e35842 date: Jun 9, 2025

Code Analysis

  • The data flow is straightforward: it gets data from the touch sensor (7x6), expands it to 30x25 pixel data, then feeds it to the neural network and calls the predict function of the ESP-DL API.
  • The important functions are touch_digit_task, which handles data input to the network, and touch_digit_recognition_task, which calls the ESP-DL API.
  • The input data is a 1D array of uint8 values (0 or 1), representing the digit.

Running the Example Code: Step by Step

  1. I couldn’t find the example in the IDF extension, so I directly copied the code from GitHub. Copy the entire esp-iot-solution repository.

    source-code-location

  2. Open the folder touchpad_digit_recognition located at esp-iot-solution-master\examples\ai\esp_dl\touchpad_digit_recognition\.

  3. Set the device type to esp32s3, then build and flash the firmware.

    config-sdkconfig

  4. Unfortunately, after flashing the firmware, I encountered a PSRAM error.

    run-error

  5. This error did not occur in the previous example, so it is likely due to a configuration issue in sdkconfig.

  6. Update the sdkconfig: Make sure to use “Octal Mode PSRAM”.

    update-spiram-mode

  7. The PSRAM error was fixed, but another error occurred: normalization save.

    erreo2

  8. After reviewing the code in normalization_save.cpp, I found a bug: it checks the NVS but does not verify if the KEY data exists. As a result, the error always appears unless you call set_normalization_data. My solution was to add a call to set_normalization_data after nvs_get_blob if there is an error.

    normalization-error

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    ret = nvs_get_blob(my_handle, KEY, data, &len);
    if (ret != ESP_OK)
    {
    ESP_LOGW(TAG, "Warn (%s) nvs_get_blob data fail. Set a new data", esp_err_to_name(ret));
    touch_digit_data_t _data = {};
    set_normalization_data(&_data);

    ret = nvs_get_blob(my_handle, KEY, data, &len);
    if (ret != ESP_OK)
    {
    ESP_LOGE(TAG, "Error (%s) reading normalization data!", esp_err_to_name(ret));
    nvs_close(my_handle);
    return ret;
    }
    }
  9. From the code, the definitions of ROW and COL are not as expected. The row length is the number of vertical series, and col is the number of horizontal series. This is unclear and could be considered a bug in their code.

    config-sdkconfig

Normally, ROW should refer to number of horizontial series. COL should refer to number of vertical series.

  1. With the help of tool.html, which can generate a 1D array representing the 30x25 pixels of a digit, you can test the code with fixed data first using this tool.

    The tool uses pure JS, HTML, and CSS. Reference: https://gitee.com/Shine_Zhang/esp32s3_dl_helloworld/tree/main/04_tool

    digi-tool

  2. Refactor the touch_digit_task. The entry point for data input to the neural network is xQueueSend(xImageQueue, &image_data, portMAX_DELAY).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    static void touch_digit_task(void *arg)
    {
    // Copy the raw data to the TouchImage
    memcpy(g_image.data, mnistData, sizeof(mnistData));
    // Print the TouchImage row length and column length
    ESP_LOGI(TAG, "TouchImage row length: %d, column length: %d\n", (int)g_image.row_length, (int)g_image.col_length);
    while (1)
    {
    // Send data to DL inference
    image_data_t image_data;
    image_data.size = g_image.col_length * g_image.row_length;
    image_data.data = new uint8_t[image_data.size];
    if (image_data.data != NULL)
    {
    memcpy(image_data.data, g_image.data, image_data.size);
    xQueueSend(xImageQueue, &image_data, portMAX_DELAY);
    }
    vTaskDelay(3000); // Delay for 3 seconds before next iteration
    }
    }
  3. Build and run—success!

    run-ok

Issue Summary

  • Pay attention to sdkconfig. Incorrect PSRAM settings will cause PSRAM issues.
  • Ensure the input data format is correct. Incorrect row and col values will lead to wrong predictions.
  • Comment out all hardware-specific code that is not relevant to your device. Otherwise, it may cause unexpected behavior.

*In the next post in this series, we will replicate this example code to become more familiar with the entire development