Running the MeloTTS Model on MaixPy MaixCAM
Update history
| Date | Version | Author | Update content |
|---|---|---|---|
| 2025-08-15 | 1.0.0 | lxowalle | Initial version |
Introduction
MeloTTS is a high-quality multilingual text-to-speech library jointly developed by MIT and MyShell.ai. Currently, it supports the mellotts-zh model, which can synthesize both Chinese and English speech. However, English synthesis is not yet optimal.
The default output audio is PCM data with a sample rate of 44100 Hz, single channel, and 16-bit depth.
Sample rate: The number of times sound is sampled per second.
Channels: The number of audio channels captured per sample. Single channel means mono audio, and dual channel means stereo (left and right channels). To reduce AI inference complexity, single-channel audio is generally used.
Bit depth: The data range captured per sample. A 16-bit depth usually represents each sample as a 16-bit signed integer. Higher bit depth captures finer audio details.
Downloading the Model
Supported models:
| Model | Platform | Memory Requirement | Description |
|---|---|---|---|
| melotts-maixcam2 | MaixCAM2 | 1G | base |
Refer to the Large Model User Guide to download the model.
Running the Model with MaixPy
from maix import nn, audio
# Only MaixCAM2 supports this model.
sample_rate = 44100
p = audio.Player(sample_rate=sample_rate)
p.volume(80)
melotts = nn.MeloTTS(model="/root/models/melotts-maixcam2/melotts-zh.mud", speed = 0.8, language='zh')
pcm = melotts.infer('你好', output_pcm=True)
p.play(pcm)
Notes:
- Import the nn module first to create a MeloTTS model object:
from maix import nn
- Choose the model to load. currently, the melotts-zh model is supported:
speedsets the playback speedlanguagesets the language type
melotts = nn.MeloTTS(model="/root/models/melotts/melotts-zh.mud", speed = 0.8, language='zh')
- Start inference:
- The text to infer here is 'hello'
- Set
output_pcm=Trueto return PCM data
pcm = melotts.infer('hello', output_pcm=True)
- Use the audio playback module to play the generated audio:
- Make sure the sample rate matches the model’s output
- Use
p.volume(80)to control the output volume (range: 0–100) - Play the PCM generated by MeloTTS with
p.play(pcm)
p = audio.Player(sample_rate=sample_rate)
p.volume(80)
p.play(pcm)