EE649 SPEECH PROCESSING BY COMPUTER Spring 2002
Project #4B: Small Vocabulary Continuous Speech Recognition System
Assigned: Thursday March 21 Due: Friday April 26
Individual project
You are to implement a recognizer for simple voice dialing applications using the Hiddem Markov Model Toolkit (HTK) V3.1 . This recognizer will be designed to recognize continuously spoken digit strings and a limited set of names. Though this recognizer is built for a small vocabulary continuous speech recognition system, the design is general-purpose and would be useful for a range of applications.
Toolkit Installation: The HTK 3.1 is available at:
http://htk.eng.cam.ac.uk/. The version 3.1 is the currentrelease and is available for free download but you must first agree to the license. You must register for a username and password for accessing the HTK. Note HTK is only available as a source distribution. To build HTK3.1 you must have a working ANSI C compiler and associated tools installed on your system. We suggest you download the HTK and install it in your ECN account. After downloading and uncompressing, follow the instructions in the "README" file to compile and install the HTK 3.1.
HTKBook: A detailed handbook for HTK users is available at:
Vocabulary: The goal of the system to be built here is to provide a voice-operated interface for phone
dialing. Thus, the recognizer must handle digit strings and also personal name lists.
Training Data: You will have access to the training data recorded using the HTK tool HSLAB as described
in "Step 3 - Recording the Data" in the tutorial. The training data files contain 100 sentences spoken by one male speaker. The wav files and their transcriptions are provided to be used to develop a recognizer.
Testing Data: Your recognizer will be tested on the speech utterances spoken by the same speaker. The
test set sentences were generated by the tool HSGEN. You will not have access to this data.
Goal and Documentation:
You must work by yourself on this project.
EE649 SPEECH PROCESSING BY COMPUTER Spring 2002
Project #4B: Data and Program Specifications
Training Files:
Each data file is the wave file for one sentence spoken by one adult male speaker. All of the data were recorded using a high quality microphone in a quiet room. There is some silence (no fixed amount) at the beginning and ending of each file.
The data files are available via anonymous ftp in directory
/var/spool/ftp/pub/ee649/Data/p4B/train_wav
File names are of the form from train0001.wav to train0100.wav.
The transcription for the training speech files is available via anonymous ftp in directory
/var/spool/ftp/pub/ee649/Data/p4B/train.txt
Each line contains the name for one training file and its corresponding transcription.
The EE649 Projects web page contains a link to the ftp site.