pyvoicebox¶

A complete Python port of the VOICEBOX Speech and Audio Processing toolbox, originally written in MATLAB by Mike Brookes at Imperial College London.

280+ functions, fully typed, validated against the original MATLAB source via GNU Octave with 500+ automated tests.

What is VOICEBOX?¶

VOICEBOX is a comprehensive MATLAB toolkit for speech and audio signal processing maintained since the 1990s. It covers areas that most Python audio libraries don't touch:

LPC Analysis — 60+ functions for conversion between AR coefficients, cepstra, reflection coefficients, line spectra, and more
Speech Enhancement — spectral subtraction, MMSE estimators, noise estimation
Psychoacoustic Metrics — PESQ/MOS, SII, STOI, phon/sone loudness
Pitch Detection — PEFAC, RAPT, DYPSA glottal closure detection
Gaussian Mixtures — full GMM suite: fitting (EM), scoring, merging, divergence
Rotations & Quaternions — Euler angles, rotation matrices, quaternions, geometry
Audio Codecs — WAV, HTK, SPHERE/TIMIT, AIFF, AU, FLAC, A-law, mu-law
Frequency Scales — Mel, Bark, ERB, Cent, MIDI conversions
Signal Processing — enframing, overlap-add, STFT, filterbanks, Teager energy

See how it compares to librosa and openSMILE.

Quick install¶

pip install pyvoicebox-sap                # core (numpy, scipy, soundfile)
pip install "pyvoicebox-sap[plot]"        # with matplotlib for plotting functions

See the Getting Started guide for examples and usage details.

Notebooks¶

Interactive Jupyter notebooks are available in the notebooks/ directory:

Notebook	Description	Colab
Visualize Speech	Waveform, spectrogram, MFCCs, and pitch tracking
Clean Up Noisy Speech	Add noise, run MMSE enhancement, measure SNR improvement
Inside the Vocal Tract	LPC spectral envelopes, coefficient conversions, bandwidth expansion
Who Said That?	Speaker identification with GMMs
Emotion Recognition	TEO vs MFCC features on EmoDB with Random Forest