Speech Recognition & Features¶
MFCC extraction, mel filterbank construction, cepstrum/power-spectrum conversion, and Linear Discriminant Analysis.
v_melcepst
¶
V_MELCEPST - Calculate the mel cepstrum of a signal.
v_melcepst
¶
v_melcepst(
s,
fs=11025,
w="M",
nc=12,
p=None,
n=None,
inc=None,
fl=0,
fh=0.5,
) -> tuple[ndarray, ndarray]
Calculate the mel cepstrum of a signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
array_like
|
Speech signal. |
required |
fs
|
float
|
Sample rate in Hz. |
11025
|
w
|
str
|
Mode string: 'R' : rectangular window 'N' : Hanning window 'M' : Hamming window (default) 'p' : filters act in power domain 'a' : filters act in absolute magnitude domain (default) '0' : include 0th cepstral coefficient 'E' : include log energy 'd' : include delta coefficients 'D' : include delta-delta coefficients |
'M'
|
nc
|
int
|
Number of cepstral coefficients excluding 0th. |
12
|
p
|
int
|
Number of filters in filterbank. Default: floor(3*log(fs)). |
None
|
n
|
int
|
FFT length. Default: power of 2 < 0.03*fs. |
None
|
inc
|
int
|
Frame increment. Default: n//2. |
None
|
fl
|
float
|
Low end of lowest filter as fraction of fs. |
0
|
fh
|
float
|
High end of highest filter as fraction of fs. |
0.5
|
Returns:
| Name | Type | Description |
|---|---|---|
c |
ndarray
|
Mel cepstrum output (one frame per row). |
tc |
ndarray
|
Time of each frame centre in samples. |
Source code in pyvoicebox/v_melcepst.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
v_melbankm
¶
V_MELBANKM - Determine matrix for a mel/erb/bark-spaced filterbank.
v_melbankm
¶
Determine matrix for a mel/erb/bark-spaced filterbank.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
int or float
|
Number of filters or filter spacing in k-mel/bark/erb. Default: ceil(4.6*log10(fs)). |
None
|
n
|
int
|
Length of FFT. |
256
|
fs
|
float
|
Sample rate in Hz. |
11025
|
fl
|
float
|
Low end of lowest filter as fraction of fs (or Hz if 'h'/'H' in w). |
0
|
fh
|
float
|
High end of highest filter as fraction of fs. |
0.5
|
w
|
str
|
Options string (see MATLAB docs). |
'tz'
|
Returns:
| Name | Type | Description |
|---|---|---|
x |
scipy.sparse matrix
|
Filterbank matrix (p, 1+floor(n/2)) or (p, mx-mn+1). |
mc |
ndarray
|
Filterbank centre frequencies in mel/erb/bark. |
mn |
int
|
Lowest FFT bin with non-zero coefficient. |
mx |
int
|
Highest FFT bin with non-zero coefficient. |
Source code in pyvoicebox/v_melbankm.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
v_cep2pow
¶
V_CEP2POW - Convert cepstral means and variances to the power domain.
v_cep2pow
¶
Convert cepstral means and variances to the power domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
u
|
array_like
|
Vector giving the cepstral means with u[0] the 0th cepstral coefficient. |
required |
v
|
array_like
|
Cepstral covariance matrix or vector containing the diagonal elements. |
required |
mode
|
str
|
'c' : pow=exp(irdct(cep)) [default] 'i' : pow=exp(cep) [no transformation] |
'c'
|
Returns:
| Name | Type | Description |
|---|---|---|
m |
ndarray
|
Row vector giving means in the power domain. |
c |
ndarray
|
Covariance matrix in the power domain. |
Source code in pyvoicebox/v_cep2pow.py
v_pow2cep
¶
V_POW2CEP - Convert power domain means and variances to the cepstral domain.
v_pow2cep
¶
Convert power domain means and variances to the cepstral domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
m
|
array_like
|
Vector giving means in the power domain. |
required |
c
|
array_like
|
Covariance matrix in the power domain (or diag© if diagonal). |
required |
mode
|
str
|
'c' : pow=exp(irdct(cep)) [default] 'i' : pow=exp(cep) [no transformation] |
'c'
|
Returns:
| Name | Type | Description |
|---|---|---|
u |
ndarray
|
Row vector giving the cepstral means with u[0] the 0th cepstral coefficient. |
v |
ndarray
|
Cepstral covariance matrix. |
Source code in pyvoicebox/v_pow2cep.py
v_ldatrace
¶
V_LDATRACE - LDA transform to maximize trace discriminant.
v_ldatrace
¶
Calculate an LDA transform to maximize trace discriminant.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
b
|
ndarray
|
Between-class covariance matrix (m, m). |
required |
w
|
ndarray
|
Within-class covariance matrix (m, m). Default: identity. |
None
|
n
|
int
|
Number of columns in output matrix A. Default: m. |
None
|
c
|
ndarray
|
Pre-specified columns of A (m, r). Default: None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
a |
ndarray
|
Transformation matrix (m, n): y = a.T @ x. |
f |
ndarray
|
Incremental gain in discriminant for successive columns. |
B |
ndarray
|
Between-class covariance of y. |
W |
ndarray
|
Within-class covariance of y. |