MaixCAM MaixPy Deploy online speech recognition
Update history
Date | Version | Author | Update content |
---|---|---|---|
2024-12-23 | 1.0.0 | lxowalle | Initial document |
1. Introduction#
Deploying online speech recognition locally is a solution for real-time processing of speech input. By running a speech recognition model on a local server and interacting with MaixCAM
, it enables instant processing and result return of audio data without relying on external cloud services. This approach not only improves response speed but also protects user privacy, making it ideal for applications requiring high data security and real-time performance, such as smart hardware, industrial control, and real-time subtitle generation.
This document uses the open-source framework sherpa-onnx
for deployment. sherpa-onnx
is a subproject of sherpa
, supporting various tasks like streaming and non-streaming speech recognition, text-to-speech, speaker classification, speaker recognition, speaker verification, and spoken language recognition. Below, we mainly introduce how to achieve streaming speech recognition using MaixCAM
and sherpa-onnx
.
Note: Streaming speech recognition features high real-time performance, allowing recognition during speech. It is commonly used in real-time translation and voice assistants. Non-streaming recognition requires processing a complete sentence at a time and is known for its high accuracy.
2. Deploying the Speech Recognition Server#
sherpa-onnx
supports deployment in multiple languages, including C/C++
, Python
, Java
, and more. For simplicity, we will use Python
for deployment. If you encounter any issues during the process, you can refer to the sherpa
documentation. Let's get started!
2.0.1. Download the sherpa-onnx
Repository#
2.0.2. Install Dependencies#
2.0.3. Install the sherpa-onnx
Package#
If GPU support is required, install the CUDA-enabled package:
If the package is unavailable or installation fails, build and install from the source:
If a GPU is available but CUDA
is not installed, refer to the installation guide here
2.0.4. Verify the Installation of sherpa-onnx
#
2.0.5. Download the Model#
Note:
For Chinese recognition, it is recommended to use thesherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-mobile
modelFor English recognition, it is recommended to use the
sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en
model
2.0.6. Run the Server#
sherpa-onnx
provides a server example, so there's no need to write additional code. Follow these steps to start the server.
Run the zipformer
Model#
Run the paraformer
Model#
Example Log Output#
At this point, the ASR model server is up and running.
2.0.7. Communication Between MaixCAM
and the Server#
For brevity, example client code is provided via the following links. Note that most cases require audio data with a sampling rate of 16000Hz and a single channel:
MaixCAMMaixCAM
Streaming Recognition
MaixCAM
Non-Streaming Recognition
After updating the server address and port, use maixvision to run the client. If using the streaming recognition script, try interacting with MaixCAM.
Note: This document does not elaborate on the communication protocol because it is straightforward—essentially raw data exchange via WebSocket. It is recommended to first experience the setup and then delve into the code for further details.
The deployment process is now complete.