280+ speech and audio processing functions, fully typed, ported from MATLAB to Python.
A complete port of the VOICEBOX Speech and Audio Processing toolbox by Mike Brookes, Imperial College London.
60+ functions for linear predictive coding — every conversion between AR, reflection coefficients, line spectra, cepstra, and more.
Spectral subtraction, MMSE estimators, noise estimation, dereverberation, and voice activity detection.
PESQ/MOS, Speech Intelligibility Index, STOI, phon/sone loudness, active speech level, and segmental SNR.
PEFAC, RAPT, and DYPSA glottal closure detection — battle-tested algorithms from decades of speech research.
Complete GMM toolkit: EM fitting, scoring, merging, marginals, conditionals, Bhattacharyya and KL divergence.
Euler angles, rotation matrices, quaternions, polygon/polyhedron geometry, and spherical harmonics.
Every function preserves its original VOICEBOX name and behavior, rigorously validated against the MATLAB source via GNU Octave. Drop-in replacements using NumPy arrays.
from pyvoicebox import * import numpy as np # Frame audio into overlapping windows signal = np.random.randn(16000) frames = v_enframe(signal, 400, 160) # Mel-scaled filterbank m, _, _ = v_melbankm(26, 512, 16000) # LPC analysis → cepstral coefficients ar, e = v_lpcauto(frames, 12) cc = v_lpcar2cc(ar, 12)
Jupyter notebooks with real speech data.
Waveform, spectrogram, MFCCs, and pitch tracking
Add noise, run MMSE enhancement, measure SNR improvement
LPC spectral envelopes, coefficient conversions, bandwidth expansion
Speaker identification with GMMs: feature extraction, training, classification
TEO vs MFCC features on the EmoDB dataset with Random Forest classification
280+ functions. Fully typed. 500+ tests against the original MATLAB source.