Department of Computer Engineering and Computer Science
University of Missouri-Columbia
Columbia, MO 65211
Office Hours: Wed. 3:30-5:50 PM
This course covers the theory and techniques in spoken language
processing, including speech production and perception, speech
analysis, enhancement, coding, synthesis, recognition, and language
modeling. The course will be focused on statistical models of speech and
language, including hidden Markov models, EM algorithm, and a number of
newly innovated statistical learning algorithms, which are powerful
tools for speech processing and beyond. The state-of-the-art spoken
language systems and their applications will also be surveyed.
- Speech production: physical mechanism and digital model of speech
- Speech perception: human auditory model including mel-frequency,
critical band, temporal-spectral masking, etc.
- Speech analysis: short-time spectral analysis, linear predictive coding,
and cepstral analysis.
- Speech enhancement: varieties of acoustic environment conditions and
their degradation effects, enhancement of speech waveform, estimation of
speech spectra and compensation of speech models.
- Speech coding: waveform coding and low-bit rate coding based on speech
production and auditory models.
- Speech synthesis: text analysis, phrasing and intonation, and letter to
- Speech recognition: representation of speech features, modeling of
speech units by templates and statistical models, pattern matching based
on dynamic programming and hidden Markov models.
- Language modeling: self-organizing statistical language models such as
n-grams and their usage in modeling syntax and semantic of natural
Students are expected to complete homework assignments, take final exam,
review literature, and complete a final project. Class time will be used
mainly for lectures by the instructor; students are required to
participate in class discussions, present literature reviews and final
Spoken language processing is an important research area in information
technology and is multidisciplinary in nature. The proposed course
covers a broad spectrum of knowledge in engineering and science and
addresses advanced topics in statistical estimation. The course
therefore justifies itself for graduate credit. The course prerequisite
is background knowledge in introductory level statistics and signal
processing and experience in programming and Unix system.
- Instructor's handouts
- L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition
Prentice Hall, 1993
Final Project Presentation
The final project presentation is scheduled for Friday the 17th at 3:30
PM. Each presentation shall take 12 minutes including questions. The
reports must be submitted during the presentation.
The order of the presentations is alphabetical:
- He, Xiaodong
- Huang, Huan
- Li, Wei
- Mandava, Swapna
- Ravindran, Karthik
- Vass, Jozsef
- Yao, Jia
- Zhang, Xiao
- Zhang, Xuping
Project 1: Basics of Speech Processing
Because of network problems with Dr. Zhao's machine, the first assignment is
postponed till the week of 13th.
- Assign date: September 14, 1999.
- Project 1 is due on September 28, 1999.
Project 2: Simple Speech Recognizer
- Source code for basic functions is available.
- Filter file is available.
- Assign date: October 7, 1999.
- Project 2 is due on October 28, 1999.
Project 3: DP, EM, and HMM
- Data file from a Gaussian mixture density source.
- Assign date: November 4, 1999.
- Project 3 is due on Novemeber 18, 1999.
- Final project description in PostScript format.
- Data file for the final project in tarred gzip
format (size 3.6 Mbytes)
These lecture notes are in pdf format. You will need Adobe Acrobat Reader to read them.
8/24/99: Note 1
8/26/99: Note 2
8/31/99: Note 3
9/2/99: Note 4
9/14/99: Note 5
9/16/99: Note 6
9/21/99: Note 7
9/23/99: Note 8
9/28/99: Note 9
9/30/99: Note 10
10/5/99: Note 11
10/7/99: Note 12
10/12/99: Note 13
10/14/99: Note 14
10/15/99: Note 15
10/21/99: Note 16
10/26/99: Note 17
10/28/99: Note 18
CECS Multimedia Communications and Visualization Laboratory
Last revised: Nov.