Let N = # bands to cover the frequency range 0 --> f(s), corresponding to 0 --> 2 pi
We're interested in the first Q frequency bands, for some Q <= N/2 (i.e., extending no higher than f(s)/2)
A filter bank can be used to filter the signal into the Q bands. The output of each band can be passed through a nonlinearity -- e.g., a full-wave rectifier followed by a low-pass filter -- to obtain the energy in each band.
The resulting output of each band is a "signal" whose values are measures of the energy in that band as a function of time.
Figure: Channel vocoder.
After [J. L. Flanagan, M .Schroeder, B. Atal, R. Crochiere,
N. Jayant, and J. Tribolet, "Speech Coding,"
IEEE Trans. Communications, 1979. ]
Data compression can be achieved by
Let h(i) be the impulse response of bandpass filter i
Assume that for each band i, h is such that it can be written as:
for some window w(n), where omega(i) is center frequency for band i.
Then the output of the i-th filter can be written as a convolution of s(n) with h(i).
Substitution of variables + rearranging gives an expression for each x(n) in terms of the short-time Fourier transform of s(n) at frequency omega(i).
Assume we're interested in evenly spaced omega(i)'s
omega(i) = (2 pi i) / N = 2 pi [f(i) / f(s)]
where N = # of filters needed to span 0 --> f(s) or 0 --> 2 pi
and Q <= N/2
Let m be the time index for the summation in the Fourier transform.
Then for every value of m for which s(m)w(n-m) is non-zero,
m can be written as
Let sn(m) denote s(m)w(n-m).
sn(m) = windowed signal at time index n
FFT implementation procedure
Filter bank analyses can also be done with non-uniform filters, e.g., using logarithmic spacing or "critical band" filters based on perceptual models.
Why implement filter banks this way?
Suppose you want
Alternative 1:
Why not compute S(n) (e^jw) with a window length
L = 64?
If L = 64 --> 6.4 ms
Alternative 2:
Let L = 128, 12.8 ms
How do you get the 32 filter outputs you want?
Computation: L log(2)L = ~ 128*7
Using the summing method:
With the summing method,
you can choose the window length L independently of Q, as long as L >= 2Q.
so, e.g., L = 128; Q = 32; N = 64; 64*6 +128
Applications of filter bank analyses:
Baseline transmission rates for speech signals
Typical channel vocoder
(24) (6) (40) = 5760 bps output of filters
for excitation: V/UV + pitch --> 7 bits x 40 frames/sec = 280 bps
Total 6040 bps vs. 42,000 bps for uncompressed telephone quality speech
Can compress further:
Channel vocoders can operate as low as 2400 bps
Intelligibility ~ 85% in informal tests
Go: