Parallel Processing Facilities

Parallel Processing Laboratory
School of Electrical Engineering
Purdue University
West Lafayette, IN 47907-1285 USA
Phone: 317 494 3431
Last update: April 1995

This document is available as full hypertext at WWW URL http://dynamo.ecn.purdue.edu/~hankd/PPL/.
To go to the hypertext index, click here.

Overview

The Parallel Processing Laboratory (PPL) consists of a growing collection of parallel computer systems available to faculty and graduate students for architecture, operating systems, languages, software tools, and applications research. The lab includes a 684 square-foot workstation room and an adjoining 531 square-foot machine room. The workstation room contains a number of terminals and workstations, including five SPARC 5 workstations, two SPARC 1 workstations, and two IBM RS6000 servers. Current parallel machine facilities include the 16-processing element PASM prototype, two PAPERS clusters, a 16,384-processing element MasPar MP-1 machine, a 16-processor Transputer-based system, a 64-node nCUBE 2 hypercube system, and a Xilinx development system. These are described briefly below. Additionally, lab users have access to an Intel Paragon XP/S machine and a collection of parallel machines at the Argonne National Laboratory (including a CM-2, DAP-510, and BBN Butterfly).

Faculty and students are involved with numerous research projects using the Parallel Processing Laboratory. A course on the programming of the PASM, MasPar, and Paragon machines is offered.

For more information about the laboratory, contact the Parallel Processing Laboratory coordinator, Prof. H. J. Siegel, at the laboratory address, by phone at 317 494 3444, or by e-mail to hj@ecn.purdue.edu.


[Photograph of PASM](86K .gif)

The PASM Parallel Processing System

The first parallel computer fully conceived, designed, and constructed in the School of Electrical Engineering at Purdue University was PASM (shown above). The PASM (Partitionable SIMD/MIMD) parallel processing system, fondly known as the Purdue All Star Machine, is a 16-processing element small-scale proof-of-concept prototype housed in the Parallel Processing Lab. PASM is a research tool for the study of the design and use of parallel architectures. The PASM organization could support 1,024 processing elements working together to perform tasks. PASM can be partitioned into independent or communicating submachines of various sizes that can change dynamically. Two modes of parallelism are supported in PASM: SIMD and MIMD. In SIMD (single instruction - multiple data) mode, all processors execute the same instructions simultaneously, but on different data sets. In MIMD (multiple instruction - multiple data) mode, each processor can independently execute a different program. Submachines in PASM can independently and dynamically switch between the SIMD and MIMD modes at instruction level granularity. This allows PASM to perform "mixed-mode" computations that use these modes in an interleaved fashion. The processing elements in PASM are interconnected with a flexible multistage cube network that allows connection patterns to be changed. Thus, PASM is dynamically reconfigurable along three dimensions: partitionability, modes of parallelism, and connections among processing elements. The PASM prototype hardware provides all three dimensions of reconfigurability and is supporting active experimentation.

The PASM prototype consists of 29 Motorola MVME-110 single-board microcomputers using either MC68000 or MC68010 microprocessors, a total of 50 Mbytes of semiconductor memory, and 80 Mbytes of disk space. The PASM prototype is accessible over the Purdue Engineering Computer Network. High-level language programming is performed by using a C-based parallel language for explicit specification of parallelism, called ELP (Explicit Language for Parallelism). It allows PASM to be programmed in SIMD mode, SPMD mode (the single program - multiple data subset of MIMD), or a hybrid SIMD/SPMD mode. Furthermore, programming can be done in standard C augmented with macros and libraries that permit programming constructs to be expressed and compiled for SIMD, full MIMD, and mixed-mode. A parallel assembly language is also available.

Current projects associated with PASM include: extending the ELP language and compiler to allow full MIMD mode and full mixed-mode program constructs, examining the trade-offs involved with mixed-mode parallelism, performing application experiments on the prototype and extrapolating to larger machines, exploring ways to use partitioning, mapping tasks onto reconfigurable parallel machines, investigating instrumentation for performance monitoring and debugging, developing the operating system, studying an automatic reconfiguration system for improving performance and fault tolerance, and researching hardware design improvements. The PASM prototype is a constantly evolving tool for understanding the programming and design of parallel processing systems.

The professor in charge of the PASM system is H. J. Siegel, phone: 317 494 3444, e-mail: hj@ecn.purdue.edu.


[Photograph of PAPERS](45K .gif)

The PAPERS Parallel Processing Clusters

Like PASM, PAPERS (Purdue's Adapter for Parallel Execution and Rapid Synchronization - shown above) implements a parallel computer fully conceived, designed, and constructed within the School of Electrical Engineering. It also is similar to PASM in that it duplicates or extends key features that first appeared in PASM: PAPERS supports arbitrary execution time partitioning into independent submachines, and each of those submachines uses special barrier synchronization hardware to efficiently execute code with a fine-grain parallel structure that is SIMD, MIMD, or even VLIW (Very Long Instruction Word). However, PAPERS is different from PASM in two fundamental ways.

The first major difference between PASM and PAPERS is that PASM is a custom design at the board level, whereas PAPERS is designed to take full advantage of unmodified UNIX-based workstations or PCs (personal computers) as its processing elements. PAPERS simply connects these machines using their standard Centronics-compatible printer ports. Although this interface provides lower performance than could be obtained using custom boards, this mechanism allows the same PAPERS hardware to be used with virtually any type of computer. It is also significant that these ports are directly accessible from a user process under most versions of UNIX (e.g., under Linux on a PC). The result is that the latency of basic synchronization and communication operations is several orders of magnitude better than that achieved using conventional workstation networks such as Ethernet, FDDI, or ATM; PAPERS actually delivers significantly lower total hardware + software latency than most parallel supercomputers. Typical PAPERS latency is 3 microseconds, and latency can be less than 200 nanoseconds, depending on the port hardware used in each machine and wire lengths between machines.

The second difference between PASM and PAPERS centers on the reasons that these machines have been built. The PASM prototype was designed as a proof-of-concept research and educational tool; absolute execution speed and compactness of implementation were not primary concerns. In contrast, PAPERS is designed to be cheaply and easily replicated, with hardware design and software tightly integrated and targeted for public domain release. The simplest PAPERS, TTL_PAPERS, uses just eight TTL chips to connect four machines, providing both barrier synchronization and a variety of aggregate communication operations. The software being developed for PAPERS is not aimed at mixed-mode experiments (as PASM's ELP), but at efficiently implementing familiar software environments: an optimizing compiler for MPL (MasPar's parallel C dialect) and an optimizing and parallelizing compiler for a Fortran dialect are being constructed. A parallel "meta OS" called PEN (PAPERS ENvironment) has also been developed to handle timesharing of parallel jobs across a cluster.

Currently, the laboratory houses two dedicated PAPERS clusters; a third should be installed by Summer '95. These clusters contain either four or eight machines each, with performance ranging from 16 MFLOPS to over 300 MFLOPS. Because the PAPERS design actually grew out of optimizing and parallelizing compiler technology developed at Purdue (see http://dynamo.ecn.purdue.edu/~hankd/CARP/), PAPERS serves as the primary target for that research.

Development of PAPERS has been supported in part by ONR Grant No. N0001-91-J-4013, donations from Intel, IBM, TI, and AMD, and equipment loans from DEC and IBM.

The professor in charge of PAPERS is Henry G. Dietz, phone: 317 494 3357, e-mail: hankd@ecn.purdue.edu. More information is available via WWW URL http://garage.ecn.purdue.edu/~papers/.


[Photograph of MasPar MP-1](67K .gif)

The MasPar MP-1 System

The MasPar MP-1 is a massively parallel SIMD supercomputer commercially available with 1,024 to 16,384 processing elements (PE's). Our MP-1, shown above, is only the fourth 16,384 PE MasPar built. When it was delivered to our Parallel Processing Laboratory on June 27, 1991, it became the first such machine in a US academic institution. Floating point peak performance is 1.2 GFLOPS single precision, 580 MFLOPS double precision. The peak instruction rate for 32-bit integer operations is 26 GIPS, and is over 200 GIPS for operations on 8-bit integers.

Each PE has a 4-bit ALU, hardware support for floating point operations, 48 32-bit registers, and 16,384 bytes of memory. The system is microcoded so that each PE appears to be a RISC processor supporting operations on 1, 8, 16, 32, and 64-bit data. Communication among PE's is supported by both direct connections from each PE to its eight nearest neighbors in the 128 by 128 array of PE's and a global router multistage network that allows a given PE to establish a path to any other PE. The PE's have access to several Gbytes of parallel disk storage. Controlling the PE array is the ACU (Array Control Unit), which is a high-performance 32-bit RISC processor that also can perform scalar computations. The ACU is, in turn, controlled by the front-end machine, currently a DEC 5000 that runs Ultrix and sees the MP-1 essentially as a peripheral. Hence, the front-end serves primarily as a user interface - editing, compiling, debugging, etc., are supported by the front-end.

The user interface for the MP-1 provides a relatively standard Unix programming environment augmented by MPPE, an interactive debugger with visualization support using X windows graphics. Programs for the MP-1 can be written in extended dialects of Fortran or C, called MPF and MPL, respectively. The MP-1 is accessible over the Purdue Engineering Computer Network.

The purchase of the MasPar was funded in part by National Science Foundation (NSF) award number CDA-9015696; all published work involving the PPL MasPar should acknowledge both the PPL and the NSF grant.

The professor in charge of the MasPar MP-1 system is Henry G. Dietz, phone: 317 494 3357, e-mail: hankd@ecn.purdue.edu. Locally-written documentation overviewing use of the MasPar is available online as http://dynamo.ecn.purdue.edu/~hankd/PPL/MASPAR/paper.html.Z and http://dynamo.ecn.purdue.edu/~hankd/PPL/MASPAR/standard.ps.Z


The Transputer Development System

In cooperation with InMOS corporation, the lab obtained a 16-node mesh-connected parallel system based on the Transputer processor. The mesh interconnection is only one of several configurations possible for this flexible system. The system consists of a PC-based cross development environment supporting the OCCAM programming language. Research on this machine includes efforts aimed at building general-purpose non-shared memory computer systems and automated environments for the support of programming such machines.

Although this machine is now considered obsolete, it is still available for use. The professor in charge of the Transputer system is Henry G. Dietz, phone: 317 494 3357, e-mail: hankd@ecn.purdue.edu.


The NCR GAPP Array

This SIMD machine with 2,304 1-bit processing elements is obsolete and is no longer in active service.


The nCUBE 2 System

The School of Electrical Engineering (in cooperation with the Computer Sciences Department) has a 64-node nCUBE 2 hypercube multiprocessor. This machine is a distributed-memory MIMD computer using an n-dimensional "hypercube" topology to support communication among the processors. Each node of the hypercube is a custom-VLSI single-chip 64-bit microprocessor capable of 7 MIPS and 3.5 MFLOPS (i.e., there is a potential for 512 MIPS and 224 MFLOPS in the 64-node system), and has an instruction set that supports the IEEE arithmetic standard. Furthermore, each processor contains 14 bidirectional DMA channels running at 20 Mbits/sec to support parallel interprocessor communication and input/output, and four Mbytes of local single-bit error correcting DRAM.

The entire nCUBE system consists of a host processor and an adjoining hypercube. The host is a Sun 4/470 Sparc-server that communicates with the hypercube through a high speed peripheral interface board. The hypercube can be decomposed into multiple subcubes and shared by multiple users at one time. The host programming environment is the Unix operating system with interface libraries to handle subcube allocation, process control, and hypercube input-output management. The hypercube is running the Vertex operating system that supports multiple processes, direct disk file accesses, interprocessor communication, and host-node communication, among other functions. The nCUBE 2 currently supports programming in assembly language, Fortran, and C. Cross compilers for nCUBE 2 Fortran and C on the Suns are supported. A source level symbolic debugger running on the host is also available. The machine is located in the Computer Sciences Department and is accessible through the Purdue Engineering Computer Network.

The School of Electrical Engineering Parallel Processing Laboratory coordinator for the nCUBE 2 system is Prof. Jeff L. Gray, phone: 317 494 3478, e-mail: grayj@ecn.purdue.edu.


The Xilinx Development System

A Xilinx programmable gate-array development system is available to Parallel Processing Laboratory users. It supports the development of application-specific hardware required by experimental research on parallel architectures.

The professor in charge of the Xilinx system is Jose A. B. Fortes, phone: 317 494 3646, e-mail: fortes@ecn.purdue.edu.


The Intel Paragon XP/S System

In addition to the resources available at the Parallel Processing Laboratory, an Intel Paragon XP/S, maintained by the Purdue University Computing Center (PUCC), is also available for research. Purdue's Paragon XP/S Model 10 is a parallel machine with 140 compute nodes, one boot and service node, four service nodes, 13 I/O nodes, three HiPPI/network nodes, and one Ethernet node. Compute nodes, service nodes, two of the three HiPPI nodes, and the boot node have 32 Mbytes of DRAM each. The 13 I/O nodes, one HiPPI node, and the Ethernet node have 16 Mbytes each.

Paragon accounts are available to faculty, graduate students with faculty sponsors, and research staff. An application form can be obtained at the PUCC Accounting Services office in room 231 of the Mathematical Sciences Building.

The School of Electrical Engineering Parallel Processing Laboratory coordinator for the Intel Paragon XP/S system is Prof. Jeff L. Gray, phone: 317 494 3478, e-mail: grayj@ecn.purdue.edu. More information is available via WWW URL http://www-rcd.cc.purdue.edu/Paragon/qs-toc.html.


Argonne Advanced Computing Research Facilities

Through the Argonne National Laboratory we have limited access to the considerable collection of parallel machines available in their Advanced Computing Research Facility. This description has not been updated for some time, but hardware available as of the last update included:


Hypertext Index

Overview
The PASM Parallel Processing System
The PAPERS Parallel Processing Clusters
The MasPar MP-1 System
The Transputer Development System
The Xilinx Development System
The NCR GAPP Array
The nCUBE 2 System
The Intel Paragon XP/S System
Argonne Advanced Computing Research Facilities
Hypertext Index