SWAR: SIMD Within A Register
SIMD (Single Instruction stream, Multiple Data stream) Within A Register (SWAR) isn't a new idea. Given a machine with k-bit registers, data paths, and function units, it has long been known that ordinary register operations can function as SIMD parallel operations on n k/n-bit field values. For example, a fast population count (number of 1 bits) operation is often implemented using a SWAR reduction algorthm. A larger example is in the 1992 PhD thesis of (my former student) M. Liou, Efficient Algorithms for Fractional Factorial Design Generation, which made extensive use of SWAR techniques to speed-up a class of optimization algorithms... but even that was obscure stuff.
Now, multimedia applications are anything but obscure, and SWAR techniques can easily yield 2x to 8x speedup. Recognizing this, the 1997 versions of most microprocessors will incorporate hardware support for SWAR:
So, what are we doing about this here at Purdue ECE? Well, naturally, we are building fully public domain support software. It isn't ready yet, but early in 1997 we plan to release a preliminary version of a compiler that will translate a simple parallel C dialect into SWAR functions that can be called from ordinary C code. The model for the compiler to target is described here. There are also HTML slides from the talk Hank Dietz gave on this topic on Feb. 13, 1997 in Purdue ECE's Parallel Processing Seminar series. Soon, more research papers will be here... and hopefully even a reference to a funding sponsor or two. ;-)
This page was last modified March 04, 1997. [an error occurred while processing this directive]