Speech Analysis¶
Frame-based analysis, spectrograms, pitch trackers, voice activity detection, level measurement, and psychoacoustic metrics.
Framing and time-frequency¶
v_enframe
¶
V_ENFRAME - Split signal into (overlapping) frames: one per row.
v_enframe
¶
Split signal up into (overlapping) frames: one per row.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array_like
|
Input signal (1-D). |
required |
win
|
int or array_like
|
Window or window length in samples. Default is len(x). |
None
|
hop
|
int or float
|
Frame increment in samples. If < 1, fraction of window length. Default is window length (non-overlapping). |
None
|
m
|
str
|
Mode string: 'z' - zero pad final frame 'r' - reflect last few samples for final frame 'A' - t output as centre of mass 'E' - t output as centre of energy 'f' - 1-sided DFT on each frame 'F' - 2-sided DFT on each frame 'p' - 1-sided power spectrum 'P' - 2-sided power spectrum 'a' - scale window to give unity gain with overlap-add 's' - scale so power is preserved 'S' - scale so total energy is preserved 'd' - make 's'/'S' give power/energy per Hz |
''
|
fs
|
float
|
Sample frequency (only needed for 'd' option). Default is 1. |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
f |
ndarray
|
Enframed data, one frame per row. |
t |
ndarray (if requested via tuple unpacking)
|
Fractional time in samples at the centre of each frame. First sample is index 1 (MATLAB convention). |
w |
ndarray (if requested via tuple unpacking)
|
Window function used, including scaling. |
Source code in pyvoicebox/v_enframe.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
v_overlapadd
¶
V_OVERLAPADD - Join overlapping frames together.
v_overlapadd
¶
Join overlapping frames together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
f
|
(ndarray, shape(nr, nw))
|
Frames to be added together, one frame per row. |
required |
win
|
array_like or dict
|
Window function to multiply each frame, or saved state dict. If omitted, a rectangular window is used. |
None
|
inc
|
int
|
Time increment (in samples) between successive frames. Default is nw. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
x |
ndarray
|
Output signal of length nw + (nr-1)*inc. |
zo |
dict (only if explicitly requested)
|
Saved state for chunk processing. |
Source code in pyvoicebox/v_overlapadd.py
v_fram2wav
¶
V_FRAM2WAV - Convert frame values to a continuous waveform.
v_fram2wav
¶
Convert frame values to a continuous waveform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
(ndarray, shape(nf) or (nf, p))
|
Input signal: one row per frame. |
required |
tt
|
(ndarray, shape(nf, 2) or (nf, 3))
|
Frame specifications. Each row: [start_sample, end_sample, flag]. flag=1 for start of new spurt. If tt has only 2 columns, spurts are auto-detected from gaps. |
required |
mode
|
str
|
'z' for zero-order hold, 'l' for linear interpolation (default). |
'l'
|
Returns:
| Name | Type | Description |
|---|---|---|
w |
(ndarray, shape(n, p))
|
Interpolated waveform of length n = tt[-1, 1]. |
s |
(ndarray, shape(ns, 2))
|
Start and end sample numbers of each spurt. |
Source code in pyvoicebox/v_fram2wav.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
v_stftw
¶
V_STFTW - Short-time Fourier Transform.
v_stftw
¶
Convert time-domain signal to time-frequency domain using STFT.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array_like
|
Input signal. |
required |
nw
|
int
|
Window length (rounded up to multiple of ov). |
required |
m
|
str
|
Mode string including window code. |
''
|
ov
|
int
|
Overlap factor. Default 2. |
2
|
nt
|
int
|
DFT length. Default nw. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
y |
ndarray
|
STFT output (frames x frequencies). |
so |
dict
|
Structure for inverse transformation. |
Source code in pyvoicebox/v_stftw.py
v_istftw
¶
V_ISTFTW - Inverse Short-time Fourier Transform.
v_istftw
¶
Convert time-frequency domain back to time domain using inverse STFT.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array_like
|
STFT data (frames x frequencies). |
required |
so
|
dict
|
Structure from v_stftw containing window and transform parameters. |
required |
io
|
dict
|
State from previous call for chunked processing. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
z |
ndarray
|
Reconstructed time-domain signal. |
io |
dict
|
State for subsequent calls. |
Source code in pyvoicebox/v_istftw.py
v_filtbankm
¶
V_FILTBANKM - General filterbank matrix (mel/bark/erb/linear).
v_filtbankm
¶
Determine matrix for a filterbank with various frequency scales.
Simplified implementation supporting mel, bark, erb, linear scales.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
int
|
Number of filters in filterbank. |
required |
n
|
int
|
Length of DFT. |
required |
fs
|
float
|
Sample rate in Hz. |
required |
fl
|
float
|
Low frequency edge in Hz. Default 0. |
0
|
fh
|
float
|
High frequency edge in Hz. Default fs/2. |
None
|
w
|
str
|
Options: 'm'=mel, 'b'=bark, 'e'=erb, 'f'=linear [default]. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
x |
(ndarray, shape(p, 1 + n // 2))
|
Filterbank matrix. |
cf |
(ndarray, shape(p))
|
Filter center frequencies in Hz. |
Source code in pyvoicebox/v_filtbankm.py
v_gammabank
¶
V_GAMMABANK - Gammatone filterbank (stub).
v_gammabank
¶
Create a bank of gammatone filters.
The full MATLAB implementation creates gammatone filters at ERB-spaced center frequencies. A stub is provided.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Full implementation pending. |
Source code in pyvoicebox/v_gammabank.py
v_spgrambw
¶
V_SPGRAMBW - Spectrogram computation with configurable bandwidth.
v_spgrambw
¶
Compute spectrogram with configurable bandwidth (no plotting).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Speech signal or power spectrum array. |
required |
fs
|
float or array_like
|
Sample frequency in Hz, or [fs, t1]. |
required |
mode
|
str
|
Mode options: 'p' : output power per decade 'P' : output power per mel/bark/erb 'd' : output in dB 'm' : mel scale 'b' : bark scale 'e' : erb scale 'l' : log10 Hz scale |
''
|
bw
|
float
|
Bandwidth resolution in Hz. |
200
|
fmax
|
array_like
|
Frequency range [Fmin, Fstep, Fmax]. |
None
|
db
|
float
|
dB range for plotting/clipping. |
40
|
tinc
|
float
|
Output frame increment in seconds. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
t |
ndarray
|
Time axis values (seconds). |
f |
ndarray
|
Frequency axis values. |
b |
ndarray
|
Spectrogram values (power per Hz unless mode changes this). |
Source code in pyvoicebox/v_spgrambw.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
v_modspect
¶
V_MODSPECT - Calculate modulation spectrum of a signal.
v_modspect
¶
Calculate the modulation spectrum of a signal.
This is a simplified implementation that computes the mel spectrogram and then the modulation spectrum for each mel channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sample rate in Hz. |
11025
|
m
|
str
|
Mode string. |
''
|
nf
|
array_like
|
[num_mel_bins, fmin, fmax, num_dct]. |
None
|
nq
|
array_like
|
[num_mod_bins, mod_fmin, mod_fmax, num_mod_dct]. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
c |
ndarray
|
Modulation spectrum (mod_freq, mel_freq, time). |
qq |
ndarray
|
Modulation frequency centres. |
ff |
ndarray
|
Mel frequency centres. |
tt |
ndarray
|
Time axis. |
Source code in pyvoicebox/v_modspect.py
v_correlogram
¶
V_CORRELOGRAM - Calculate correlogram.
v_correlogram
¶
Calculate correlogram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Input signal (samples, channels) from a filterbank. |
required |
inc
|
int
|
Frame increment in samples. |
128
|
nw
|
int or array_like
|
Window length in samples or window function. Default: inc. |
None
|
nlag
|
int
|
Number of lags to calculate. Default: nw. |
None
|
m
|
str
|
Mode: 'h' for Hamming window. |
'h'
|
fs
|
float
|
Sample frequency. |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
y |
ndarray
|
Correlogram (nlag, channels, frames). |
ty |
ndarray
|
Time of window centre for each frame. |
Source code in pyvoicebox/v_correlogram.py
v_ewgrpdel
¶
V_EWGRPDEL - Energy-weighted group delay waveform.
v_ewgrpdel
¶
Calculate energy-weighted group delay waveform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array_like
|
Input signal. |
required |
w
|
int or array_like
|
Window or window length (default: Hamming of length(x)). |
None
|
m
|
int
|
Center sample of window (1-based, default: (1+len(w))/2). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
y |
ndarray
|
Energy-weighted group delay waveform. |
mm |
int
|
Actual value of m used. |
Source code in pyvoicebox/v_ewgrpdel.py
Pitch and voicing¶
v_fxpefac
¶
V_FXPEFAC - PEFAC pitch extraction algorithm.
v_fxpefac
¶
PEFAC pitch extraction algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
tinc
|
float
|
Frame increment in seconds. |
0.01
|
Returns:
| Name | Type | Description |
|---|---|---|
fx |
ndarray
|
Estimated pitch frequency per frame (0 = unvoiced). |
tt |
ndarray
|
Time of each frame centre (seconds). |
pv |
ndarray
|
Probability of voicing per frame. |
References
[1] Gonzalez & Brookes, PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise, IEEE/ACM TASLP, 2014.
Source code in pyvoicebox/v_fxpefac.py
v_fxrapt
¶
V_FXRAPT - RAPT pitch extraction algorithm.
v_fxrapt
¶
RAPT pitch extraction algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
tinc
|
float
|
Frame increment in seconds. |
0.01
|
Returns:
| Name | Type | Description |
|---|---|---|
fx |
ndarray
|
Estimated pitch frequency per frame (0 = unvoiced). |
tt |
ndarray
|
Time of each frame centre (seconds). |
pv |
ndarray
|
Probability of voicing per frame. |
References
[1] Talkin, D. A robust algorithm for pitch tracking (RAPT). In Speech Coding and Synthesis, ch.14, Elsevier, 1995.
Source code in pyvoicebox/v_fxrapt.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
v_dypsa
¶
V_DYPSA - Derive glottal closure instances from speech using the DYPSA algorithm.
v_dypsa
¶
Derive glottal closure instances from speech.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sampling frequency in Hz. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
gci |
ndarray
|
Vector of glottal closure sample numbers (0-based). |
goi |
ndarray
|
Vector of glottal opening sample numbers (0-based). |
References
[1] Naylor et al., IEEE Trans Speech Audio Proc, 15:34-43, 2007.
Source code in pyvoicebox/v_dypsa.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
v_vadsohn
¶
V_VADSOHN - Voice activity detector (Sohn et al.).
v_vadsohn
¶
Voice activity detector based on Sohn et al.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
si
|
array_like
|
Input speech signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
m
|
str
|
Mode: 'a' activity decision, 'b' likelihood ratio. |
'a'
|
pp
|
dict
|
Algorithm parameters. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
vs |
ndarray
|
VAD output (one value per sample if mode 'a', or per frame if 'n'/'t'). |
Source code in pyvoicebox/v_vadsohn.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
Level, noise and alignment¶
v_activlev
¶
V_ACTIVLEV - Measure active speech level as per ITU-T P.56.
v_activlev
¶
Measure active speech level as in ITU-T P.56 (Method B).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sp
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
mode
|
str
|
Mode string: 'd' - output in dB 'n' - normalize speech to 0 dB active level '0' - omit high pass filter 'h' - omit low pass filter 'l' - output long-term power level too |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
lev |
float or ndarray
|
Active speech level. |
af |
float
|
Activity factor. |
Source code in pyvoicebox/v_activlev.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
v_activlevg
¶
V_ACTIVLEVG - Measure active speech level robustly.
v_activlevg
¶
Measure active speech level robustly.
This is a simplified wrapper around v_activlev. The full MATLAB implementation uses a Gaussian mixture model approach for robust estimation in noisy conditions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sp
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
mode
|
str
|
Mode string (same as v_activlev). |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
lev |
float
|
Active speech level. |
af |
float
|
Activity factor. |
Source code in pyvoicebox/v_activlevg.py
v_earnoise
¶
V_EARNOISE - Add noise to simulate hearing threshold.
v_earnoise
¶
Add noise to simulate the hearing threshold of a listener.
Simplified version that adds white noise scaled to simulate internal ear noise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
(array_like, shape(n) or (n, c))
|
Speech signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
m
|
str
|
Mode string: 'n' input normalized, 'u' input already scaled. |
''
|
spl
|
float
|
Target active speech level in dB SPL. Default 62.35. |
62.35
|
Returns:
| Name | Type | Description |
|---|---|---|
y |
ndarray
|
Speech with simulated ear noise. |
x |
ndarray
|
Filtered speech signal. |
v |
ndarray
|
Added noise. |
Source code in pyvoicebox/v_earnoise.py
v_ppmvu
¶
V_PPMVU - Calculate PPM and VU meter readings (stub).
The full implementation requires specific filter design. A simplified stub is provided.
v_ppmvu
¶
Calculate PPM and VU meter readings.
Simplified implementation. The full MATLAB version implements detailed PPM and VU metering standards.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sp
|
array_like
|
Input signal. |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
mode
|
str
|
Mode string. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
lev |
float
|
Level reading. |
Source code in pyvoicebox/v_ppmvu.py
v_snrseg
¶
V_SNRSEG - Measure segmental and global SNR.
v_snrseg
¶
Measure segmental and global SNR.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Test signal (noisy). |
required |
r
|
array_like
|
Reference signal (clean). |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
m
|
str
|
Mode string: 'w' : No VAD - use whole file (default). 'z' : Do not do any alignment (default). 'q' : Use linear interpolation to remove delays ± 1 sample. |
'wz'
|
tf
|
float
|
Frame increment in seconds. Default: 0.01. |
0.01
|
Returns:
| Name | Type | Description |
|---|---|---|
seg |
float
|
Segmental SNR in dB. |
glo |
float
|
Global SNR in dB. |
tc |
ndarray
|
Time at centre of each frame (seconds). |
snf |
ndarray
|
Segmental SNR in dB in each frame. |
vf |
ndarray
|
Boolean mask indicating valid frames. |
Source code in pyvoicebox/v_snrseg.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
v_addnoise
¶
V_ADDNOISE - Add noise at a chosen SNR.
v_addnoise
¶
Add white noise at a chosen SNR using energy-based measurement.
This is a simplified version that supports white noise addition with energy-based level measurement. For more advanced noise types, see the MATLAB original.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Input speech signal (1-D). |
required |
fs
|
float
|
Sample frequency in Hz. |
required |
snr
|
float
|
Target SNR in dB. Default: Inf (no noise). |
inf
|
m
|
str
|
Mode string: 'D' : SNR input given as power ratio instead of dB. 'e' : Use energy to calculate signal level (default). 'k' : Preserve original signal power. 'n' : Make signal level = 0 dB. 'N' : Make noise level = 0 dB. 't' : Make total = 0 dB (default). 'x' : Output signal and noise as separate columns. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
z |
ndarray
|
Noisy signal (or [signal, noise] if 'x' option). |
p |
ndarray
|
Levels: [s-in n-in s-out n-out] as power ratios or dB. |
Source code in pyvoicebox/v_addnoise.py
v_sigalign
¶
V_SIGALIGN - Align a clean reference with a noisy signal.
v_sigalign
¶
Align a clean reference with a noisy signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Test signal. |
required |
r
|
array_like
|
Reference signal. |
required |
maxd
|
float or array_like
|
[+-max] or [min, max] delay allowed in samples. Fractions of len® are used if abs(maxd) < 1. Default ensures at least 50% overlap. |
None
|
m
|
str
|
Mode string: 'u' - unity gain 'g' - find optimal gain (default) 's' - maximize correlation coefficient (default) 'S' - maximize energy of common component |
'gs'
|
fs
|
float
|
Sample frequency (only used for filtering). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
d |
int
|
Optimum delay to apply to r. |
g |
float
|
Optimal gain to apply to r. |
rr |
ndarray
|
g * r(shifted by -d), zero-padded to match s if ss not returned. |
ss |
ndarray
|
s truncated to match rr. |
Source code in pyvoicebox/v_sigalign.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
v_txalign
¶
V_TXALIGN - Find best alignment of two sets of time markers.
v_txalign
¶
Find the best alignment of two sets of time markers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array_like
|
First set of non-decreasing time values. |
required |
y
|
array_like
|
Second set of non-decreasing time values. |
required |
maxt
|
float
|
Penalty threshold. |
required |
nsd
|
float
|
If specified, threshold is nsd times std dev from mean. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
kx |
ndarray
|
Alignment from x to y (kx[i]=j means x[i] matched to y[j], 0=unmatched). |
ky |
ndarray
|
Alignment from y to x. |
nxy |
int
|
Number of matched pairs. |
mxy |
float
|
Mean of y-x for matched pairs. |
sxy |
float
|
Std dev of y-x for matched pairs. |
Source code in pyvoicebox/v_txalign.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
Psychoacoustics¶
v_importsii
¶
V_IMPORTSII - Calculate the SII importance function.
v_importsii
¶
Calculate the SII importance function per Hz or per Bark.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
f
|
array_like
|
Frequencies in Hz (or Bark if 'b' flag). |
required |
m
|
str
|
Mode string: 'b' : Frequencies given in Bark rather than Hz. 'c' : Calculate cumulative importance. 'd' : Calculate importance of n-1 bands. 'h' : Calculate importance per Hz or per Bark. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
q |
ndarray
|
Importance values. |
References
[1] ANSI Standard S3.5-1997 (R2007). [2] C. V. Pavlovic. JASA, 82:413-422, 1987.
Source code in pyvoicebox/v_importsii.py
v_phon2sone
¶
V_PHON2SONE - Convert PHON loudness values to SONEs.
v_phon2sone
¶
Convert PHON loudness values to SONEs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
array_like
|
Matrix of phon values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
s |
ndarray
|
Matrix of sone values, same shape as p. |
Notes
The phon scale measures perceived loudness in dB; at 1 kHz it is identical to dB SPL relative to 20e-6 Pa sound pressure. The sone scale is proportional to apparent loudness and, by definition, equals 1 at 40 phon.
References
[1] J. Lochner and J. Burger. Form of the loudness function in the presence of masking noise. JASA, 33: 1705, 1961. [2] ISO/TC43. Acoustics Normal equal-loudness-level contours. Standard ISO 226:2003, Aug. 2003.
Source code in pyvoicebox/v_phon2sone.py
v_sone2phon
¶
V_SONE2PHON - Convert SONE loudness values to PHONs.
v_sone2phon
¶
Convert SONE loudness values to PHONs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Matrix of sone values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
p |
ndarray
|
Matrix of phon values, same shape as s. |
Notes
The phon scale measures perceived loudness in dB; at 1 kHz it is identical to dB SPL relative to 20e-6 Pa sound pressure. The sone scale is proportional to apparent loudness and, by definition, equals 1 at 40 phon.
References
[1] J. Lochner and J. Burger. Form of the loudness function in the presence of masking noise. JASA, 33: 1705, 1961. [2] ISO/TC43. Acoustics Normal equal-loudness-level contours. Standard ISO 226:2003, Aug. 2003.
Source code in pyvoicebox/v_sone2phon.py
v_pesq2mos
¶
V_PESQ2MOS - Convert PESQ speech quality scores to MOS.
v_pesq2mos
¶
Convert PESQ speech quality scores to MOS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
array_like
|
Matrix of PESQ scores. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
m |
ndarray
|
Matrix of MOS scores, same shape as p. |
References
[1] ITU-T Recommendation P.862.1, Nov. 2003.
Source code in pyvoicebox/v_pesq2mos.py
v_mos2pesq
¶
V_MOS2PESQ - Convert MOS speech quality scores to PESQ.
v_mos2pesq
¶
Convert MOS speech quality scores to PESQ.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
m
|
array_like
|
Matrix of MOS scores. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
p |
ndarray
|
Matrix of PESQ scores, same shape as m. |
References
[1] ITU-T Recommendation P.862.1, Nov. 2003.
Source code in pyvoicebox/v_mos2pesq.py
v_stoi2prob
¶
V_STOI2PROB - Convert STOI to probability.
v_stoi2prob
¶
Convert STOI to probability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Matrix containing STOI values. |
required |
m
|
str
|
Mapping: 'i' for IEEE sentences (default), 'd' for Dantale corpus. |
'i'
|
Returns:
| Name | Type | Description |
|---|---|---|
p |
ndarray
|
Corresponding probability values. |
References
[1] C. H. Taal et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio, Speech, Language Processing, 19(7):2125-2136, 2011.
Source code in pyvoicebox/v_stoi2prob.py
v_psycdigit
¶
V_PSYCDIGIT - Psychoacoustic digit recognition test (stub).
v_psycdigit
¶
Run a psychoacoustic digit recognition test.
This is an interactive GUI-based function that has no direct Python equivalent without a GUI framework.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Requires interactive GUI. |
Source code in pyvoicebox/v_psycdigit.py
v_psycest
¶
V_PSYCEST - Psychoacoustic estimation (stub).
v_psycest
¶
Psychoacoustic parameter estimation.
This is a complex estimation function with many options. A stub is provided.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Full implementation pending. |
Source code in pyvoicebox/v_psycest.py
v_psycestu
¶
V_PSYCESTU - Psychoacoustic estimation utilities (stub).
v_psycestu
¶
Psychoacoustic estimation utilities.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Full implementation pending. |
Source code in pyvoicebox/v_psycestu.py
v_psychofunc
¶
V_PSYCHOFUNC - Calculate psychometric functions.
v_psychofunc
¶
Calculate psychometric functions: trial success probability vs SNR.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
m_or_q
|
str or array_like
|
Mode string or q parameters (if mode omitted). |
None
|
q
|
array_like
|
Model parameters [p_threshold, threshold, slope, miss_prob, guess_prob, type]. |
None
|
x
|
array_like
|
SNR values or probability values (for inverse). |
None
|
r
|
array_like
|
Test results (0 or 1). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
p |
ndarray
|
Probabilities. |
Source code in pyvoicebox/v_psychofunc.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
Miscellaneous¶
v_soundspeed
¶
V_SOUNDSPEED - Speed of sound, density and impedance of air.
v_soundspeed
¶
Calculate speed of sound, density, and acoustic impedance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
t
|
float
|
Air temperature in Celsius. Default 20. |
20
|
p
|
float
|
Air pressure in atm. Default 1. |
1
|
m
|
float
|
Average molecular weight of air in kg/mol. Default 0.0289644. |
0.0289644
|
g
|
float
|
Adiabatic constant for air. Default 1.4. |
1.4
|
Returns:
| Name | Type | Description |
|---|---|---|
v |
float
|
Speed of sound in m/s. |
d |
float
|
Density of air in kg/m^3. |
z |
float
|
Characteristic impedance in Pa.s/m. |
Source code in pyvoicebox/v_soundspeed.py
v_sigma
¶
V_SIGMA - Estimate glottal opening and closing instants using SIGMA algorithm.
This function requires SWT (Stationary Wavelet Transform) which is complex to implement. A simplified stub is provided that raises NotImplementedError.
v_sigma
¶
Estimate glottal opening and closing instants (SIGMA algorithm).
This function requires the Stationary Wavelet Transform (SWT) which is available in MATLAB's Wavelet Toolbox but not straightforwardly in scipy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lx
|
array_like
|
LX (laryngograph) signal. |
required |
fs
|
float
|
Sampling frequency in Hz. |
required |
fmax
|
float
|
Maximum laryngeal frequency. Default 400 Hz. |
400
|
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
SWT-based SIGMA algorithm requires specialized wavelet toolbox support. |