Offline Training YOLO Models for MaixCAM MaixPy, Custom Object Detection and Keypoint Detection

Update history
Date Version Author Update content
2026-02-04 v4.0 Tao Added YOLO26 support
2025-07-01 v3.0 neucrack Added MaixCAM2 support
2024-10-10 v2.0 neucrack Added YOLO11 support
2024-06-21 v1.0 neucrack Initial documentation

Introduction

By default, the official firmware provides detection for 80 object classes. If these do not meet your needs, you can train your own object detection model by setting up a training environment on your own computer or server.

YOLOv8 / YOLO11 not only support object detection, but also yolov8-pose / YOLO11-pose for keypoint detection. In addition to the official human pose keypoints, you can also create your own keypoint dataset to train detection for specific objects and their keypoints.

Since YOLOv8 and YOLO11 mainly differ in their internal network structure, while preprocessing and postprocessing remain the same, the training and conversion steps for YOLOv8 and YOLO11 are identical. The only difference is the output node names.

Prerequisites

This article explains how to perform custom training, but assumes you already have some basic knowledge. If not, please learn the following first:

  • This article does not explain how to install the training environment. Please search for and install a PyTorch training environment yourself.
  • This article does not explain basic machine learning concepts or Linux fundamentals.
  • The environment described in this article is based on a Linux desktop system. For Windows, please use a WSL2 environment, which supports X11. Please search for related information yourself.

Workflow and Goal of This Article

This is a workflow that requires zero algorithm background. As long as you know how to use Docker and how to annotate data, you can get your custom YOLO model running perfectly on MaixCAM/MaixCAM2 within 12 hours after receiving the device.

To use our model in MaixPy (MaixCAM), the following process is required:

  • Set up the training environment. This article skips that part; please search for how to set up a PyTorch training environment.
  • Clone the YOLO11/YOLOv8/YOLO26 source code locally.
  • Prepare the dataset and convert it into the format required by the YOLO11 / YOLOv8 / YOLO26 project.
  • Train the model and obtain an onnx model file, which is also the final output of this article.
  • Convert the onnx model into a MUD file supported by MaixPy. This process is described in detail in the model conversion articles:
  • Use MaixPy to load and run the model.

Install Docker

Refer to the official Docker installation documentation for installation.

For example:

# Install basic software dependencies required by docker
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
# Add official repository
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
# Install docker
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Next, install the environment for exporting ONNX nodes.

Where to Find Datasets for Training

See Where to Find Datasets

Prepare the Model Conversion Environment

Please reserve 100 GB of disk space for Docker, otherwise the conversion process may fail.
Note: Different versions of ultralytics export different nodes from the pt model file. Changing the version arbitrarily may cause the postprocessing code to mismatch the model outputs, making the model unusable. Therefore, please strictly use the version specified in the Dockerfile for exporting.

Create the ONNX Export Environment

# Build based on Python 3.10 slim image
FROM python:3.10-slim

# Set system and Python environment variables
ENV DEBIAN_FRONTEND=noninteractive \
    PIP_NO_CACHE_DIR=1 \
    PYTHONUNBUFFERED=1 \
    MPLCONFIGDIR=/tmp/matplotlib

# Set container working directory to /app
WORKDIR /app

# Update apt source, install system dependencies and clean cache to reduce image size
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    wget \
    curl \
    libglib2.0-0 \
    libsm6 \
    libxrender1 \
    libxext6 \
    libgl1 \
    && rm -rf /var/lib/apt/lists/*

# Upgrade pip and install CPU version of PyTorch related libraries
RUN pip install --upgrade pip && \
    pip install \
    torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cpu

# Install specified version of ultralytics framework and dependencies
RUN pip install \
    ultralytics==8.3.240 \
    ultralytics-thop==2.0.18

# Clone YOLOv5 source code and install project requirements
RUN git clone https://github.com/ultralytics/yolov5.git &&  \
    pip install -r /app/yolov5/requirements.txt 

# Run environment inspection check
RUN yolo checks

# Switch working directory to /workspace
WORKDIR /workspace

# Default startup command for container is bash shell
CMD ["/bin/bash"]

Create a new Dockerfile in an empty folder, then run docker build -t onnx-export . to build the image.
Note: The Dockerfile file has no extension.

Reference Articles

Since this is a fairly general workflow, this article only provides an overview. For details, please refer to the official YOLO26 / YOLO11 / YOLOv8 code and documentation (recommended), and search for related training tutorials. The final goal is to export an ONNX file.

If you find any good articles worth recommending, feel free to update this document and submit a PR.

Export the ONNX Model

Run the Docker environment:

docker run -it --rm  -v ./:/workspace   -w /workspace   --network=host   onnx-export:latest

For yolov8n / yolo11n / yolo26, use the yolo command line to export:
yolo export model=./yolo11n.pt format=onnx
Please replace ./yolo11n.pt with your own trained pt file.

For yolov5s (the model used by MaixHub online training), use:
python /app/yolov5/export.py --weights ./yolov5s.pt --include onnx --img 640 480
Please replace ./yolov5s.pt with your own trained pt file.

Here, the input resolution is specified again. The model was trained with 640x640, but we redefine the resolution to improve runtime speed. We use 640x480 here because its aspect ratio is closer to the MaixCAM2 screen, making display more convenient. For MaixCAM, you can use 320x224. You can choose the resolution according to your needs.

Convert to a Model and MUD File Supported by MaixCAM

MaixPy/MaixCDK currently support YOLO26 / YOLOv8 / YOLO11 detection, as well as YOLOv8-pose / YOLO11-pose keypoint detection, YOLOv8-seg / YOLO11-seg segmentation, and YOLOv8-obb / YOLO11-obb rotated bounding box detection (2026.2.4).

Output Node Selection

Pay attention to selecting the correct output nodes (note that your model values may not be exactly the same; refer to the figures below and find the corresponding nodes):

For YOLO11 / YOLOv8, MaixPy supports two output node selection schemes, and you can choose according to the hardware platform.

After determining the output nodes, you need to trim the ONNX model. For details, see [ONNX Model Node Trimming Tutorial].

Model and Features Option 1 Option 2
Applicable Devices MaixCAM2 (recommended)
MaixCAM (slightly slower than Option 2)
MaixCAM (recommended)
Features More computation is handled by CPU postprocessing, making quantization less likely to fail; speed is slightly slower than Option 2 More computation is handled by the NPU and participates in quantization
Notes None Quantization fails on MaixCAM2 in actual tests
Detection YOLOv5s /model.24/m.0/Conv_output_0
/model.24/m.1/Conv_output_0
/model.24/m.2/Conv_output_0
Same as MaixCAM2
Detection YOLOv8 /model.22/Concat_1_output_0
/model.22/Concat_2_output_0
/model.22/Concat_3_output_0
/model.22/dfl/conv/Conv_output_0
/model.22/Sigmoid_output_0
Detection YOLO11 /model.23/Concat_output_0
/model.23/Concat_1_output_0
/model.23/Concat_2_output_0
/model.23/dfl/conv/Conv_output_0
/model.23/Sigmoid_output_0
Keypoints YOLOv8-pose /model.22/Concat_1_output_0
/model.22/Concat_2_output_0
/model.22/Concat_3_output_0
/model.22/Concat_output_0
/model.22/dfl/conv/Conv_output_0
/model.22/Sigmoid_output_0
/model.22/Concat_output_0
Keypoints YOLO11-pose /model.23/Concat_1_output_0
/model.23/Concat_2_output_0
/model.23/Concat_3_output_0
/model.23/Concat_output_0
/model.23/dfl/conv/Conv_output_0
/model.23/Sigmoid_output_0
/model.23/Concat_output_0
Segmentation YOLOv8-seg /model.22/Concat_1_output_0
/model.22/Concat_2_output_0
/model.22/Concat_3_output_0
/model.22/Concat_output_0
output1
/model.22/dfl/conv/Conv_output_0
/model.22/Sigmoid_output_0
/model.22/Concat_output_0
output1
Segmentation YOLO11-seg /model.23/Concat_1_output_0
/model.23/Concat_2_output_0
/model.23/Concat_3_output_0
/model.23/Concat_output_0
output1
/model.23/dfl/conv/Conv_output_0
/model.23/Sigmoid_output_0
/model.23/Concat_output_0
output1
Rotated Box YOLOv8-obb /model.22/Concat_1_output_0
/model.22/Concat_2_output_0
/model.22/Concat_3_output_0
/model.22/Concat_output_0
/model.22/dfl/conv/Conv_output_0
/model.22/Sigmoid_1_output_0
/model.22/Sigmoid_output_0
Rotated Box YOLO11-obb /model.23/Concat_1_output_0
/model.23/Concat_2_output_0
/model.23/Concat_3_output_0
/model.23/Concat_output_0
/model.23/dfl/conv/Conv_output_0
/model.23/Sigmoid_1_output_0
/model.23/Sigmoid_output_0
Detection YOLO26 /model.23/one2one_cv2.0/one2one_cv2.0.2/Conv_output_0 /model.23/one2one_cv2.1/one2one_cv2.1.2/Conv_output_0 /model.23/one2one_cv2.2/one2one_cv2.2.2/Conv_output_0 /model.23/one2one_cv3.0/one2one_cv3.0.2/Conv_output_0 /model.23/one2one_cv3.1/one2one_cv3.1.2/Conv_output_0 /model.23/one2one_cv3.2/one2one_cv3.2.2/Conv_output_0 Same as MaixCAM2
YOLOv8/YOLO11 detection output node diagram
Extra output nodes for YOLOv8/YOLO11 pose See the pose branch in the figure above
Extra output nodes for YOLOv8/YOLO11 seg
Extra output nodes for YOLOv8/YOLO11 OBB
Detection YOLO26 output node diagram Same as MaixCAM2

Tutorial for Converting to NPU-Specific Model Files

Follow MaixCAM Model Conversion and MaixCAM2 Model Conversion to convert the model.

Modify the MUD File

For object detection, the MUD file is as follows (model_type should be changed to yolo11 for YOLO11 and to yolo26 for YOLO26):

MaixCAM/MaixCAM-Pro:

[basic]
type = cvimodel
model = yolov8n.cvimodel

[extra]
model_type = yolov8
input_type = rgb
mean = 0, 0, 0
scale = 0.00392156862745098, 0.00392156862745098, 0.00392156862745098
labels = person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, toothbrush

MaixCAM2:

[basic]
type = axmodel
model_npu = yolo11n_640x480_npu.axmodel
model_vnpu = yolo11n_640x480_vnpu.axmodel

[extra]
model_type = yolo11
type=detector
input_type = rgb
labels = person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, toothbrush

input_cache = true
output_cache = true
input_cache_flush = false
output_cache_inval = true

mean = 0,0,0
scale = 0.00392156862745098, 0.00392156862745098, 0.00392156862745098

Replace labels with the classes you trained on.

For keypoint detection (yolov8-pose), change type=pose.
For segmentation (yolov8-seg), change type=seg.
For oriented bounding boxes (yolov8-obb), change type=obb.

Upload and Share on MaixHub

Go to the MaixHub Model Zoo to upload and share your model. It is recommended to provide several resolutions for others to choose from.