Running the SenseVoice Model on MaixPy MaixCAM
2026-01-05
Update history
| Date | Version | Author | Update content |
|---|---|---|---|
| 2026-01-05 | 1.0.0 | lxowalle | Added SenseVoice documentation |
SenseVoice Model Overview
SenseVoice is a multilingual audio recognition model that supports Chinese, English, Cantonese, Japanese, and Korean. It provides features including speech recognition, automatic language detection, emotion recognition, automatic punctuation, and streaming recognition.
Downloading the Model
Supported models:
| Model | Platform | Memory Requirement | Description |
|---|---|---|---|
| sensevoice-maixcam2 | MaixCAM2 | 1G |
Refer to the Large Model User Guide to download the model.
Running the Model with MaixPy
Note: MaixPy version
4.12.3or later is required
Non-Streaming Recognition
from maix import sensevoice
model_path = "/root/models/sensevoice-maixcam2"
client = sensevoice.Sensevoice(model=model_path+"/model.mud", stream=False)
client.start()
if client.is_ready(block=True) is False:
print("Failed to start service or model.")
exit()
audio_file = "/maixapp/share/audio/demo.wav"
text = client.refer(path=audio_file)
print(text)
# You can comment out this line of code, which will save time on the next startup.
# But it will cause the background service to continuously occupy CMM memory.
client.stop()
Output:
开始愉快的探索吧。
Explanation:
- When creating the
sensevoice.Sensevoiceobject, settingstream=Falseenables non-streaming recognition. The interface will wait until recognition is complete and then return the result at once. - When the
referfunction is called with thepathparameter, it recognizes an audio file. Currently, only thewavformat is supported. Audio format requirements:16,000Hz sample rate, mono channel, 16-bit width. - When the
referfunction is called with theaudio_dataparameter, it recognizesbytes-type PCMdata. Audio format requirements are the same:16,000Hz sample rate, mono channel, 16-bit width. - The start function starts the
SenseVoicebackground service, and thestopfunction stops it. RunningSenseVoiceas a background service allows multi-process operation and prevents the foreground application from being blocked during model execution.
Streaming Recognition
from maix import sensevoice
model_path = "/root/models/sensevoice-maixcam2"
client = sensevoice.Sensevoice(model=model_path+"/model.mud", stream=True)
client.start()
if client.is_ready(block=True) is False:
print("Failed to start service or model.")
exit()
audio_file = "/maixapp/share/audio/demo.wav"
print('start refer stream')
for text in client.refer_stream(path=audio_file):
print(text)
# You can comment out this line of code, which will save time on the next startup.
# But it will cause the background service to continuously occupy CMM memory.
client.stop()
Output:
开始愉快
开始愉快的探索
开始愉快的探索吧
Explanation:
- When creating the
sensevoice.Sensevoiceobject, settingstream=Trueenables streaming recognition. Partial recognition results are returned immediately as they become available, until the entire audio is processed. - Other behaviors are the same as described above.
Real-Time Speech Recognition via Microphone
In practical development, you may need to capture audio data from a microphone and pass it to the model for speech-to-text processing. Please refer to the example:asr_sensevoice.py