pyvoicebox¶
A complete Python port of the VOICEBOX Speech and Audio Processing toolbox, originally written in MATLAB by Mike Brookes at Imperial College London.
280+ functions, fully typed, validated against the original MATLAB source via GNU Octave with 500+ automated tests.
What is VOICEBOX?¶
VOICEBOX is a comprehensive MATLAB toolkit for speech and audio signal processing maintained since the 1990s. It covers areas that most Python audio libraries don't touch:
- LPC Analysis — 60+ functions for conversion between AR coefficients, cepstra, reflection coefficients, line spectra, and more
- Speech Enhancement — spectral subtraction, MMSE estimators, noise estimation
- Psychoacoustic Metrics — PESQ/MOS, SII, STOI, phon/sone loudness
- Pitch Detection — PEFAC, RAPT, DYPSA glottal closure detection
- Gaussian Mixtures — full GMM suite: fitting (EM), scoring, merging, divergence
- Rotations & Quaternions — Euler angles, rotation matrices, quaternions, geometry
- Audio Codecs — WAV, HTK, SPHERE/TIMIT, AIFF, AU, FLAC, A-law, mu-law
- Frequency Scales — Mel, Bark, ERB, Cent, MIDI conversions
- Signal Processing — enframing, overlap-add, STFT, filterbanks, Teager energy
See how it compares to librosa and openSMILE.
Quick install¶
pip install pyvoicebox-sap # core (numpy, scipy, soundfile)
pip install "pyvoicebox-sap[plot]" # with matplotlib for plotting functions
See the Getting Started guide for examples and usage details.
Notebooks¶
Interactive Jupyter notebooks are available in the notebooks/ directory:
| Notebook | Description | Colab |
|---|---|---|
| Visualize Speech | Waveform, spectrogram, MFCCs, and pitch tracking | |
| Clean Up Noisy Speech | Add noise, run MMSE enhancement, measure SNR improvement | |
| Inside the Vocal Tract | LPC spectral envelopes, coefficient conversions, bandwidth expansion | |
| Who Said That? | Speaker identification with GMMs | |
| Emotion Recognition | TEO vs MFCC features on EmoDB with Random Forest |