I. Introduction to the physiology of speech and hearing
Rabiner & Juang, Chapter 1, 2.1 - 2.2
Speech production physiology
Figure: The human vocal system.
After [J. L. Flanagan, Speech Analysis and Perception,
Springer-Verlag, Berlin, 2nd edition, 1965].
Figure: Schematic of the functional components of the vocal system.
After [J. L. Flanagan, Speech Analysis and Perception,
Springer-Verlag, Berlin, 2nd edition, 1965].
The process of producing speech sounds:
- lungs: fill with air
- contraction of rib cage forces air from the lungs into the trachea -
the volume of air determines the amplitude of the sound
- trachea (windpipe): conveys air to the vocal tract.
The vocal cords, at the top of the trachea, separate the
trachea from the base of the vocal tract
- vocal tract
-
consists of:
- pharynx (throat)
- mouth
- nose
-
the shape and size of the vocal tract vary by positioning the
articulators:
the tongue, teeth and lips
-
the shape of the vocal tract determines the type of speech sound -
e.g., the /a/ in "hat" vs the /i/ in "hit"
Speech differs from breathing in that at some point in the path
you set the air in rapid motion or vibration
Two principal components of speech production
- Excitation - create a sound by setting the air in rapid motion
- Vocal tract - "shape" the sound
A. Excitation: three principal forms
- Phonation: vibration of vocal cords
The vocal cords consist of ligament and muscle,
and are adjustable under muscle control.
The cartilage surrounding the vocal cords provides support.
The opening that allows air to pass through the vocal cords
from the trachea to the larynx is called the glottis.
There are two modes of operation of the vocal cords:
- Vibrating
- cords tense, pressed together - no air flows
- air pressure from the lungs forces them open
- local pressure is reduced --> cords close
- the cycle repeats
The result is a quasi-periodic release of air into the pharynx.
The fundamental frequency of the vocal cord opening/closing cycle
becomes the fundamental frequency (informally, the "pitch") of
the resulting sound.
The tenser the vocal cords
- -- the higher the pitch
- -- the shorter the period
Typical frequency of vocal cord open/close cycle:
- male: 128 Hz
- female: 256 Hz
- Non-vibrating
- vocal cords open
- air flows from trachea to pharynx without interruption
- not an excitation since the air isn't set into rapid motion
- Frication: Turbulent air flow
- "noise source"
- the excitation is set up by forcing air past a constriction
at some point in the vocal tract
-
e.g., /f/ in "for": top teeth & bottom lip
-
e.g., /th/ in "thin": tongue & top teeth
- can combine frication with phonation
-
e.g., /v/ as in "vote" top teeth & bottom lip as in /f/,
combined with phonation
-
one special case: form constriction by partially closing
the vocal cords, held rigid - not periodic airflow
- get /h/ as in hot, or whispering
- Plosive: Closure at some point in the vocal tract,
followed by a release of air
- e.g., /p/ as in "pot": closure at lips
-
can be combined with vocal cord vibration: phonation and plosive
- /b/ as in "boy": closure at lips closure as in /p/,
combined with phonation