-
General Info
SPECseis96 is one set of code within the SPEC/High-Performace
Group 's SPEChpc96 suite, used to evaluate machine performance on industrially
significant computer workloads as well as for scientific study. The seismic
processing suite is a suite of codes typical of seismic processing applications
used in industry for the search of oil and gas. The code consists of four
applications, referred to as four ``phases,'' which perform the seismic
computations: ``Data generation,'' ``Stacking of data,'' ``Time migration,''
and ``Depth migration.'' The entire code contains 15,000 lines of Fortran
and C, and includes intensive communication as well as intensive disk I/O.
The phases are designed to execute in sequence. Within each phase, a
main-loop performs a series of functions on each seismic trace. Phases
2, 3, and 4 require data to be processed and passed across the processors.
However, Phase 1 requires communication only to decompose at the start
of the phase and join the data at the phase's completion. Communication
is implemented using a message-passing layer of MPI or PVM. A shared-memory
version is also being developed. Each processor writes a disjoint segment
of the data file allowing for non-blocking writing. The seismic functions
performed by each phase are as follows:
Seismic Processing Phases
| phase |
functions |
| Data Generation |
Generate source/receiver geometry, generate seismic data, apply
spatial filters to data, apply predictive deconvolution, apply moveout
corrections |
| Stacking of Data |
Apply residual moveout corrections, sum input traces into zero offset
section |
| Time Migration |
Fourier domain migration |
| Depth Migration |
Finite difference migration |
Each seismic phase consists of a series of seismic processes which perform
certain seismic processing computations or disk I/O.
Description of the Seismic Processes in the Phases of SEIS
| Process |
Description |
| Phase 1: Data Generation |
| VSBF |
Read velocity function and provide access routines |
| GEOM |
Specify source/receiver coordinates |
| DGEN |
Generate seismic data |
| FANF |
Apply two-dimensional spatial filters to data via fourier transforms |
| DCON |
Apply predictive deconvolution |
| NMOC |
Apply normal move-out corrections |
| PFWR |
Parallel write to output files |
| VRFY |
Compute average amplitude profile as a checksum |
| RATE |
Measure processing rate |
| Phase 2: Stacking of Data |
| PFRD |
Parallel read of input files |
| DMOC |
Apply residual move-out corrections |
| STAK |
Sum input traces into zero offset section |
| PFWR |
Parallel write to output files |
| VRFY |
Compute average amplitude profile as a checksum |
| RATE |
Measure processing rate |
| Phase 3: Time Migration |
| PFRD |
Parallel read of input files |
| M3FK |
three-dimensional Fourier domain migration |
| PFWR |
Parallel write to output files |
| VRFY |
Compute average amplitude profile as a checksum |
| RATE |
Measure processing rate |
| Phase 4: Depth Migration |
| VSBF |
Data generation |
| PFRD |
Parallel read of input files |
| MG3D |
an approximation of a three-dimensional, one-pass, finite difference
migration |
| PFWR |
Parallel write to output files |
| VRFY |
Compute average amplitude profile as a checksum |
| RATE |
Measure processing rate |
Several datasets are available, ranging from a very small``test'' dataset
(17 MB of disk space) to sizes that are larger than current machines can
handle (4 TB.) More details on the code can be found in the documentation
provided by the code authors, ARCO Seismic
Benchmark (ps file).
The phases are designed to execute in sequence. Within each phase, a
main-loop performs a series of functions on each seismic trace. Phases
2, 3, and 4 require data to be processed and passed across the processors.
However, Phase 1 requires communication only to decompose at the start
of the phase and join the data at the phase's completion. Communication
is implemented using a message-passing layer of MPI or PVM. A shared-memory
version is also being developed. Each processor writes a disjoint segment
of the data file allowing for non-blocking writing.
-
Compiling Seismic
The make environment of Seismic includes the following components:
-
Configuration files:
The include directory has architecture specific makefiles. Two files
will apply to a system, a configuration file for the host system and one
specific to the communication package on the target architecture. These
two files are located in the include directory and are named: makedef.$ARCH
and makedef.${TARGET_ARCH}.${PARALLEL_METHOD}.
-
Makefiles:
There are makefiles in the outer directory and in the sub-directories
of the source directory (src/seis, src/util, ....) These include the configuration
file corresponding to the system you have chosen (by the environment variables.)
-
Library and bin directories:
A library directory should be set up automatically for the system and
parallelism you have chosen as a sub-sub-directory of lib. A library
(libseis.a) will be created contained all compiled source files
in this directory.
Likewise, a directory which will contain the executable will automatically
be set up as a sub-sub-directory of the bin directory, corresponding
to the system and parallelism you have chosen. The executable will be named,
seis.
To compile Seismic follow these steps:
-
Set up the environment variables.
These environment variables must be set up to use the makefiles as
is. (The file, setenvironment, is an example of how to sets up
these environment variables. You can use it from csh by executing:
source setenvironment.)
Environment Variables for Compilation
| BENCH |
the pathname of the directory where SPECseis96 is installed. |
| ARCH |
the host architecture; it is used to compile code in the src/seis directory
that creates a Fortran file, such as solaris, and to choose architecture
specific code using ifdef's when compiling the C codes. |
| TARGET_ARCH |
the target architecture for the libraries and executable, such as SUNMP
for SUN multi-processor Ultra Enterprise systems. |
| PARALLEL_METHOD |
the communications package to use for parallel execution, (such as
pvm
or mpi,) or serial to compile for sequential execution. |
-
Modify or create the configuration files.
-
Make the executable:
make
This will cd into the source directories in order and compile each one,
(i.e., into src/util, src/xlib, src/custom/${TARGET_ARCH}, and then src/seis.)
It will record the current compilation flags in the appropriate lib
directory in files named: config.seis and config.util (for the src/seis
and src/util files respectively.)
To compile for parallel execution, follow the above steps with PARALLEL_METHOD
set to pvm or mpi.
The source files which control the communication are:
-
include/makedef.${TARGET_ARCH}.${PARALLEL_METHOD},
(such as makedef.SUNMP.mpi for using MPI to communicate on
a shared-memory, SUN Ultra Enterprise.)
-
src/util/message.c, the source file containing all calls to the
communication libraries (or empty subroutines for serial execution.) When
compiling, the -D${PARALLEL_METHOD} chooses the appropriate code
for the communication package you will use.
-
sh/run.seis, the run script which also sets up the communication
daemons if need be.
You may need to create or modify the configuration file, makedef.${TARGET_ARCH}.mpi,
to contain the appropriate compiler and link flags, for example:
CC = $(MPI_ROOT)/mpicc
CFLAGS = ... -I$(MPI_ROOT)/include ...
OTHER = ... -lmpich
You should not need to modify the source file, message.c, unless
you are porting the application to use a communication package other than
PVM or MPI. All calls to communication library routines are located within
the message.c file. These include basic send, receive, and broadcast
routines.
You also should not need to modify the run script unless you are using
a different implementation of pvm or mpi. The script contains switch statements
to choose between serial, mpi, and pvm executions.
For PVM you will have to place a link to the PVM executable in the bin
directory where your PVM installed. (cd $PVM_ROOT/bin/$PVM_ARCH; ln -s
$BENCH/bin/$TARGET_ARCH/pvm/seis .)
To execute Seismic:
-
The following environment variables must be set in addition to those set
for compilation:
| BENCH_DATA |
the directory to run the executable from. The data files will be created
here. |
-
Execute the run script located in the sh directory:
sh/run.seis <dataset> <phase>
dataset can be any of the SPEC dataset
sizes: small, medium,
large, or xlarge.
You can also run with the test dataset for a quick 5 minute execution
which should verify with the verification files. There is an ultra
dataset available if you want a real challenge.
phase can be 1, 2, 3, or 4, to run a single processing
phase. A phase of 0 or blank will run all four phases in order.
-
The run script will:
-
Boot any daemons needed for managing the communication in a parallel run,
(such as the PVM daemon needed for using PVM or lamboot needed for the
LAM implementation of MPI.)
For a PVM execution on a shared-memory system (i.e., if PVM_SHMEM is
set to "ON",) the PVM shared-memory daemon is also started.
-
Setup a run directory (chosen by the BENCH_DATA environment variable.)
By default this will be /tmp/seis-$USER/${TARGET_ARCH}/${PARALLEL_METHOD}/${NTASK}
-
Link the appropriate executable to the run directory.
-
A loop executes the following for each seismic processing phase being executed:
-
Set up the parameter and path name files in the run directory.
The path name file ($BENCH_DATA/pathname) contains the full
pathname of the run directory.
The parameter file ($BENCH_DATA/seis.prm) sets values to input
parameters of Seismic corresponding the the dataset being used.
-
Remove old data files.
Before creating new data files for a specific processing phase of Seismic,
the old data files must be removed.
-
Execute the executable.
Output is generated to the screen and captured in a file named:
${BENCH_DATA}/seis.<dataset>.<phase>.${TARGET_ARCH}.${PARALLEL_METHOD}.${NTASK}.out
(The mpirun command is used when PARALLEL_METHOD is ``mpi''
instead of directly executing the executable.)
-
At the end of the loop, the daemons started earlier are halted and the
shared-memory (if PVM_SHMEM is ``ON'') is cleaned up (using ipcfree.)
-
The output is searched (by an awk script,
sh/collect.awk) for elapsed times of each
phase and verification results. The times and verification results are
stored in a file with a suffix of validation in the run directory.
(The elapsed times of the four seismic processing phases must be summed
together to submit results to SPEC.)
This file is also spilled out to the screen. The following output shows
a correct execution of Seismic with the test dataset on 4 processors. All 4 seismic processing phases verify:
DATASET=Test NUMPROCS=4
Phase 1, Elapsed= 10.9, Validated
Phase 2, Elapsed= 2.5, Validated
Phase 3, Elapsed= 0.6, Validated
Phase 4, Elapsed= 51.8, Validated
Total , Elapsed= 65.8, 4 Phases Validated
-
To verify that the results you obtained are valid, check the verification
file in the run directory ($BENCH_DATA). This file will record
whether or not Seismic verified for each of the four phases that were executed
by the run script.
Verification of results is performed by comparing summations of wave
amplitudes (summed over the execution of the application) from the current
execution with a trusted single-processor execution. If any of the amplitudes
are over a certain threshold then the benchmark does not validate and a
message is printed, ``Benchmark Verification Failed''. This can
be the result of rounding errors from decomposing the data when running
in parallel or from precision errors resulting from aggressive compilation
techniques.
Note that the third seismic phase is very sensitive to differences in
precision and therefore may not verify. See the PROBLEMS file
for more information.
-
Example output files for the small dataset are included in runs/sample.
Sub-Directories of SPECseis96:
-
bin the compiled executables are placed here within sub-sub-directories,
corresponding to the architectures and communication packages compiled
for.
-
include the makefiles used to compile for the various architectures
and communication packages.
-
lib the compiled codes are placed within a library (libseis.a)
within a sub-sub-directory, corresponding to the architecture and communication
package being compiled for.
-
runs/verify the data to use to verify whether or not an execution
produced valid data.
-
sh shell scripts to set up runtime parameters and execute Seismic.
-
src the source files arranged in two main directories (seis and
util), along with any custom files (custom), and files
used by the look3d seismic process to visualize the seismic traces
(in xlib.)
Parallel Method
-
message-passing: each entire phase (except the initial communication
required to open files and send a few, small common blocks, and to verify
computed data at the end of the processing phase) can be run via a message-passing
layer (MPI or PVM.)
-
shared-memory: a shared-memory has been developed for the SGI machines
and is being developed for use with OpenMP (or a set of portable shared-memory
directives.) The version including OpenMP directives will be included in
the next release of SPECseis.
-
The parallelism explicitly coded into the application is limited by the
decomposition of certain data parameters.
-
Parallelism exploited by Seismic falls within several main categories:
(from Types of Parallelism, Scalable Parallel Seismic Processing)
-
Simple: "completely independent operations that can be applied to
different blocks of data." (Phase 1 exhibits this type of parallelism.)
-
Domain: sub-blocks of the data are assigned to different processors
and communication is required to transmit boundary information.
-
Transform/Transpose: some data must be computed in parallel, transposed
across processors, then computed in parallel again. (Phases 3 and 4 exhibit
this type of parallelism.) ``The speed of the application is determined
by the relative mix of computer, inter-node communication, and disk I/O
speeds of the hardware, rather than by serial bottlenecks in the algorithm.''
Data
-
Input Parameters:
Values of the Dataset Parameters for SEIS
| Parameter |
Meaning |
| NS |
samples per trace |
| NTPG |
traces per group |
| NGPL |
groups per line |
| NLINE |
number of lines |
| NX |
velocity x-dimension number of midpoints |
| NY |
velocity y-dimension |
| NZ |
velocity z-dimension |
| ZMAX |
number of depth steps |
-
Input/Output Data:
-
Initially, only input parameters are needed (no data files.)
-
Then, the trace data is stored in a file throughout the execution of
a phase. The trace data stored from a previous phase is used in the current
phase. (Phase 2 uses the data files stored by Phase 1. Phases 3 and 4 use
the data files stored by Phase 2.)
-
Disk accessing has been designed to allow parallel IO to aleviate the
potential IO bottleneck.
-
Authors: (Seismic was developed in 1993)
-
Charles Mosher (ARCO Exploration and Production Technology) http, email
-
Siamak Hassanzadeh (Sun Microsystems) email
-
For more information on the science behind the application see the following
links:
-
SEG: The Society of Exploration Geophysicists