This page shows my auditory modeling work, my signal processing work, some of my software tools, and pointers to other work.
Note! My tomography book is now online. Download all the chapters. See it here.
There is now a new version of the Auditory Toolbox. It
contains Matlab
functions to implement many different kinds of auditory models. The
toolbox
includes code for Lyon's passive longwave model, Patterson's gammatone
filterbank, Meddis' hair cell model, Seneff's auditory model,
correlograms
and several common representations from the speech-recognition world
(including
MFCC, LPC and spectrograms). This code has been tested on Macintosh,
Windows,
and Unix machines using Matlab 5.2.
Note: This toolbox was originally published as Apple Computer Technical Report #45. The old technical report ( PDF PDF and Postscript ) and old code ( Unix TAR and Macintosh BinHex ) are available for historical reasons. |
Auditory
Toolbox (Version 2.0) |
My primary scientific goal is to understand how our brains perceive sound. My role in this research area is a modeler, I build models that explain the neurophysiological and psychoacoustic data. Hopefully these models will help other researchers understand the mechanisms involved and result in better experiments. My latest work in this area is titled "Connecting Correlograms to Neurophysiology and Psychoacoustics" and was presented at the XIth International Symposium on Hearing in Grantham England from 1-6 August, 1997. Two correlograms, one computed using autocorrelation and other other computed using AIM, are shown on the left. | Abstract (and soon demos) |
The information in most auditory models flows exclusively bottom-up, yet there is increasing evidence that a great deluge of information is flowing down from the cortex. A paper I wrote for the 1995 Computational Auditory Scene Analysis workshop is called "A Critique of Pure Audition". This paper has been greatly refined and is published in the book Computational Auditory Scene Analysis in 1998 by Erlbaum. The figure at the left shows the spectrogram of sine-wave speech. | Book chapter (153k pdf) |
I have written several papers describing how to convert auditory representations into sounds. I have built models of the cochlea and central auditory processing, which I hope both explain auditory processing and will allow us to build auditory sound separation tools. These papers describe the process of converting sounds into cochleagrams and correlograms, and then converting these representations back into sounds. Unlike the printed versions of this work, the web page includes audio file examples. It includes better spectrogram inversion techniques, a description of how to invert Lyon's passive cochlear model, and a description of correlogram inversion. This material was first presented as part of the Proceedings of the ATR Workshop on "A Biological Framework for Speech Perception and Production" published in September 1994. A more refined version of this paper was an invited talk at the 1994 NIPS conference. The image on the left shows the spectrogram of one channel of cochlear output; one step in the correlogram inversion process. | ATR (Kyoto) Workshop Web Reprint with Sound Examples |
Pattern Playback is the term used by Frank Cooper to describe his successful efforts to paint spectrogram on plastic and then convert them into sound. I wrote of Pattern Playback techniques, from Frank Cooper's efforts to my own efforts with auditory model inversion, in a paper which was published at the 1995 IEEE International Conference on Systems, Man, and Cybernetics. My paper is titled "Pattern Playback from 1950 to 1995". The image at the left shows a portion of one of Cooper's spectrograms. | Web Version |
"Auditory Model Inversion for Sound Separation" is the first paper to describe correlogram inversion techniques. We also discuss improved methods for inverting spectrograms and a cochlear model designed by Richard F. Lyon. This paper was published at ICASSP '94. | Postscript (1.5M) |
"A Perceptual Pitch Detector" is a paper that describes a model of human pitch perception. It is similar to work done by Meddis and Hewitt and published in JASA, but this paper has more real-world examples. This paper was published at ICASSP '90. | Postscript (3M) |
"On the importance of time" is an invited chapter by Dick Lyon and myself in the book Visual Representations of Speech Signals (edited by Martin Cooke, Steve Beet and Malcolm Crawford, John Wiley & Sons). This tutorial describes the reason that we think time-domain processing is important when modeling the cochlea and higher-level processing. | Postscript |
"Lyon's Cochlear Model" is a Mathematica notebook that describes an implementation of simple (but efficient) cochlear model designed by Richard F. Lyon. It is also known as Apple Technical Report #13. | Mathematica Notebook (1.2M) |
A software package called MacEar implements the latest version of Lyon's Cochlear Model. MacEar is written in very portable C for Unix and Macintosh computers. This link points to the last published version (2.2). (Note the README file included has old program results. The names of the output files have changed and there are a couple of extra channels being output. I'm sorry for the confusion.) | Unix Shell Archive with Sources |
Gammatone Math is a Mathematica notebook that describes a new more efficient implementation of the Gammatone filters that are often used to implement critical band models. It is also known as Apple Technical Report #35. | Mathematica Notebook (327k) |
Apple Hearing Demo Reel was published as Apple Technical Report #25. It includes more than one hour of correlogram videos, including a large fraction of the ASA Auditory Demonstration CD. I have a limited number of NTSC copies left. Send email to malcolm@interval.com to request a copy. | HTML Video Guide |
Chris Bregler, Michele Covell, and I developed a technique we call Video Rewrite to automatically synthesize video of talking heads. This technology is cool because we use a purely data driven approach (concatenative triphone video synthesis) to create new video of a person speaking. Given new audio, we concatenate the best sequence of lip images and morph them into a background sequence. We can automatically create sequences like the Kennedy and Johnson scenes in the movie "Forrest Gump." | Original SIGGRAPH '97 Paper (with examples) |
We studied how adults convey affective messages to infants using prosody. We did not attempt to recognize the words, let alone to distill more nebulous concepts such as satire or irony. We analyzed speech with low-level acoustic features and discriminated approval, attentional bids, and prohibitions from adults speaking to their infants. We built automatic classifiers to create a system, Baby Ears, that performs the task that comes so naturally to infants. The image on the left shows one of the decision surfaces which classifies approval, attention and prohibition utterances on the basis of their pitch. | Web Page |
I was able to help Michele Covell do some neat work on time-compression of audio. Lots of people know how to compress a speech utterance by a constant amount. But if you want to do better, which parts of the speech signal can be compressed the most? This paper describes a good technique and shows how to test the resulting comprehension. | Conference
Paper
Technical Report with Audio Samples
|
Eric Scheirer and I worked on a system for discriminating between speech and music in an audio signal. This paper describes a large number of features, how they can be combined into a statistical framework, and the resulting performance on discriminating signals found on radio stations. The results are better then anybody else's results. (That comparison is not necessarily valid since there are no common testing databases. We did work hard to make our test set representative.) This paper was published at the 1997 ICASSP in Munich. The image on the left shows clouds of our data. | Web Page |
Work we've done to morph between two sounds is described in a paper at the 1996 ICASSP. This work is new because it extends previous audio morphing work to include inharmonic sounds. This paper uses results from Auditory Scene Analysis to represent, match, warp, and then interpolate between two sounds. The image on the left shows the smooth spectrogram, one of two independent representations used when morphing audio signals. | Web Page |
I wrote an article describing my experiences writing "intelligent" signal processing documents. My Mathematica notebook "Lyon's Cochlear Model" was the first large document written with Mathematica. While I don't use Mathematica as much as I used to, I still believe that intelligent documents are a good way to publish scientific results. These ideas were also published in a book titled "Knowledge Based Signal Processing" that was published by Prentice Hall. | KBSP Book Chapter in Adobe PDF (3M) |
I have written Matlab m-functions that read and write QuickTime movies. The WriteQTMovie code is more general than previous solutions for creating movies in Matlab. It runs on any platform that Matlab runs on. It also lets you add sound to the movie. The ReadQTMovie code reads and parses JPEG compressed moves. | Matlab Source Code |
Chris Bregler and I coded an implementation of an image processing technique known as snakes. There are two m-files that implement a type of dynamic contour following popular in computer vision. First proposed by Kass, Witkin and Terzopoulos in 1987, snakes are a variational technique to find the best contour that aligns with an image. The basic routine, snake.m, aligns a sequence of points along a contour to the maximum of an array or image. Provide it with an image, a set of starting points, limits on the search space and it returns a new set of points that better align with the image. The second m-file is a demonstration script. Using your own array of image data, or a built-in default, a demo window is displayed where you can click to indicate points and see the snake program in action. | Matlab Source Code |
Dick
de Ridder and his colleagues wrote a nice description
of a Support Vector Classifier and provided some code
to demonstrate how it works. I added a Graphical User Interface
(GUI)
so I could play with all the options and put lots of data through it.
With the GUI, you select points with the mouse. After you tell it what kind of distance metric you want, you get several plots showing the results. The links at the right show a number of points separated by a fourth order polynomial. |
Image
showing GUI
Image showing points and support |
Michele Covell and I wrote some Matlab code to compute multi-dimensional scaling (MDS). MDS allows you to reconstruct an estimate of the position of points, given just relative distance data. These routines do both metric (where you know distances) and non-metric (where you just now the order of distances) data. | Technical report containing the code (no documentation). |
The SoundAndImage toolbox is a collection of Matlab tools to make it easier to work with sounds and images. On the Macintosh, tools are provided to record and playback sounds through the sound system, and to copy images to and from the scrapbook. For both Macintosh and Unix system, routines are provided to read and write many common sound formats (including AIFF). Only 68k MEX files are included. Users on other machines will need to recompile the software. This toolbox is published as Apple Computer Technical Report #61. | Postscript Documentation (153k) |
Filter Design is a Mathematica notebook that describes (and implements) many IIR filter design techniques. It was published as Apple Technical Report #34. | Mathematica Notebook (556k) |
I created a Hypercard stack to make it easier for people with a Macintosh and CDROM drive to interact with the Acoustical Society of America's Auditory Demonstrations CD. This CD is a wonderful collection of auditory effects and principles. The ASA Demo Hypercard stack includes the text and figures from the book and lets you browse the Audio CD. | Macintosh Archive |
I wrote a program for the Macintosh 660/AV and 840/AV computers that uses the DSP (AT&T3210) to monitor audio levels. VUMeters runs on any Macintosh with the AT&T DSP chip. Source and binaries are included. | Macintosh Archive |
Bill Stafford and I wrote TCPPlay to allow us to play sounds from a Unix machine over the network to the Macintosh on our desks. This archive includes Macintosh and Unix source code and the Macintosh application. There are other network audio solutions, but this works well on the Macintosh. | Macintosh Archive |
In
a past life, I worked on medical imaging. A book on tomographic imaging
(cross-sectional x-ray imaging) was published by IEEE Press: Avinash C.
Kak and Malcolm Slaney, Principles of Computerized Tomographic
Imaging,
(New York : IEEE Press, c1988). The software used to generate many of
the
tomographic images in this book is available. The parallel beam
reconstruction
on the left was generated with the commands
gen n=100 k=100 if=lib.d.s |
Tomographic Software (Unix TAR format) |
Code to implement the diffraction tomography algorithms in my PhD Thesis is also available. | Compressed Unix TAR File |
Carl Crawford, Mani Azimi and I wrote a simple Unix plotting package called qplot. Both two-dimensional and 3d-surface plots are supported. | Compressed Unix TAR File |
Now obsolete code to implement a DITroff previewer under SunView is available. This program was called suntroff and is an ancestor of the X Window System Troff previewer. It was written while I was an employee of Schlumberger Palo Alto Research. All files are compressed Unix TAR files. | Source |
Malcolm SlaneyThe best way to reach me is to send email.
Interval Research, Inc.
1801 Page Mill Road, Building C
Palo Alto, CA 94304
(650) 842-6143
(650) 565-7944 (FAX)
This page last updated on March 1, 2000.