Index of /~eigenman/ECE563/Handouts/SPECseis96.1.2

      Name                   Last modified     Size  Description

[DIR] Parent Directory 04-May-09 11:57 - [   ] DISCLAIMER 26-Aug-96 07:41 1k [   ] Makefile 12-Jul-99 13:38 3k [   ] PROBLEMS 13-Jul-99 05:32 1k [DIR] doc/ 17-May-99 07:32 - [DIR] include/ 13-Jul-99 09:40 - [DIR] runs/ 12-Jul-99 11:26 - [   ] setenvironment 12-Jul-99 08:24 1k [DIR] sh/ 12-Jul-99 15:21 - [DIR] src/ 13-Aug-97 20:04 -

SPECseis96.1.2

SPECseis96.1.2
Seismic Processing Application Suite


Seismic processing used in the search for oil and gas.

  1. General Info

  2. SPECseis96 is one set of code within the SPEC/High-Performace Group 's SPEChpc96 suite, used to evaluate machine performance on industrially significant computer workloads as well as for scientific study. The seismic processing suite is a suite of codes typical of seismic processing applications used in industry for the search of oil and gas. The code consists of four applications, referred to as four ``phases,'' which perform the seismic computations: ``Data generation,'' ``Stacking of data,'' ``Time migration,'' and ``Depth migration.'' The entire code contains 15,000 lines of Fortran and C, and includes intensive communication as well as intensive disk I/O.

    The phases are designed to execute in sequence. Within each phase, a main-loop performs a series of functions on each seismic trace. Phases 2, 3, and 4 require data to be processed and passed across the processors. However, Phase 1 requires communication only to decompose at the start of the phase and join the data at the phase's completion. Communication is implemented using a message-passing layer of MPI or PVM. A shared-memory version is also being developed. Each processor writes a disjoint segment of the data file allowing for non-blocking writing. The seismic functions performed by each phase are as follows:

     
    Seismic Processing Phases
    phase functions
    Data Generation Generate source/receiver geometry, generate seismic data, apply spatial filters to data, apply predictive deconvolution, apply moveout corrections
    Stacking of Data Apply residual moveout corrections, sum input traces into zero offset section
    Time Migration Fourier domain migration
    Depth Migration Finite difference migration 

    Each seismic phase consists of a series of seismic processes which perform certain seismic processing computations or disk I/O.
     
    Description of the Seismic Processes in the Phases of SEIS
    Process  Description 
    Phase 1: Data Generation
    VSBF  Read velocity function and provide access routines 
    GEOM  Specify source/receiver coordinates 
    DGEN  Generate seismic data 
    FANF  Apply two-dimensional spatial filters to data via fourier transforms 
    DCON  Apply predictive deconvolution 
    NMOC  Apply normal move-out corrections 
    PFWR  Parallel write to output files 
    VRFY  Compute average amplitude profile as a checksum 
    RATE  Measure processing rate 
    Phase 2: Stacking of Data
    PFRD  Parallel read of input files 
    DMOC  Apply residual move-out corrections 
    STAK  Sum input traces into zero offset section 
    PFWR  Parallel write to output files 
    VRFY  Compute average amplitude profile as a checksum 
    RATE  Measure processing rate 
    Phase 3: Time Migration
    PFRD  Parallel read of input files 
    M3FK  three-dimensional Fourier domain migration 
    PFWR  Parallel write to output files 
    VRFY  Compute average amplitude profile as a checksum 
    RATE  Measure processing rate 
    Phase 4: Depth Migration
    VSBF  Data generation 
    PFRD  Parallel read of input files 
    MG3D  an approximation of a three-dimensional, one-pass, finite difference migration 
    PFWR  Parallel write to output files 
    VRFY  Compute average amplitude profile as a checksum 
    RATE  Measure processing rate 

  3. Compiling Seismic

  4. The make environment of Seismic includes the following components:

    To compile Seismic follow these steps:
    1. Set up the environment variables.

    2. These environment variables must be set up to use the makefiles as is. (The file, setenvironment, is an example of how to sets up these environment variables. You can use it from csh by executing: source setenvironment.)
      Environment Variables for Compilation
      BENCH the pathname of the directory where SPECseis96 is installed.
      ARCH the host architecture; it is used to compile code in the src/seis directory that creates a Fortran file, such as solaris, and to choose architecture specific code using ifdef's when compiling the C codes.
      TARGET_ARCH the target architecture for the libraries and executable, such as SUNMP for SUN multi-processor Ultra Enterprise systems.
      PARALLEL_METHOD the communications package to use for parallel execution, (such as pvm or mpi,) or serial to compile for sequential execution. 
    3. Modify or create the configuration files.
    4. Make the executable:
      1. make
      This will cd into the source directories in order and compile each one, (i.e., into src/util, src/xlib, src/custom/${TARGET_ARCH}, and then src/seis.)
      It will record the current compilation flags in the appropriate lib directory in files named: config.seis and config.util (for the src/seis and src/util files respectively.)
    To compile for parallel execution, follow the above steps with PARALLEL_METHOD set to pvm or mpi.
    The source files which control the communication are: You may need to create or modify the configuration file, makedef.${TARGET_ARCH}.mpi, to contain the appropriate compiler and link flags, for example:
      CC = $(MPI_ROOT)/mpicc
      CFLAGS = ... -I$(MPI_ROOT)/include ...
      OTHER = ... -lmpich
    You should not need to modify the source file, message.c, unless you are porting the application to use a communication package other than PVM or MPI. All calls to communication library routines are located within the message.c file. These include basic send, receive, and broadcast routines.
    You also should not need to modify the run script unless you are using a different implementation of pvm or mpi. The script contains switch statements to choose between serial, mpi, and pvm executions.
    For PVM you will have to place a link to the PVM executable in the bin directory where your PVM installed. (cd $PVM_ROOT/bin/$PVM_ARCH; ln -s $BENCH/bin/$TARGET_ARCH/pvm/seis .)

    To execute Seismic:

    1. The following environment variables must be set in addition to those set for compilation:

    2.  
      BENCH_DATA the directory to run the executable from. The data files will be created here.
    3. Execute the run script located in the sh directory:
      1. sh/run.seis <dataset> <phase>
      dataset can be any of the SPEC dataset sizes: small, medium, large, or xlarge. You can also run with the test dataset for a quick 5 minute execution which should verify with the verification files. There is an ultra dataset available if you want a real challenge.
      phase can be 1, 2, 3, or 4, to run a single processing phase. A phase of 0 or blank will run all four phases in order.
    4. The run script will:
      1. Boot any daemons needed for managing the communication in a parallel run, (such as the PVM daemon needed for using PVM or lamboot needed for the LAM implementation of MPI.)

      2. For a PVM execution on a shared-memory system (i.e., if PVM_SHMEM is set to "ON",) the PVM shared-memory daemon is also started.
      3. Setup a run directory (chosen by the BENCH_DATA environment variable.)

      4. By default this will be /tmp/seis-$USER/${TARGET_ARCH}/${PARALLEL_METHOD}/${NTASK}
      5. Link the appropriate executable to the run directory.
      6. A loop executes the following for each seismic processing phase being executed:
        1. Set up the parameter and path name files in the run directory.

        2. The path name file ($BENCH_DATA/pathname) contains the full pathname of the run directory.
          The parameter file ($BENCH_DATA/seis.prm) sets values to input parameters of Seismic corresponding the the dataset being used.
        3. Remove old data files.

        4. Before creating new data files for a specific processing phase of Seismic, the old data files must be removed.
        5. Execute the executable.

        6. Output is generated to the screen and captured in a file named: ${BENCH_DATA}/seis.<dataset>.<phase>.${TARGET_ARCH}.${PARALLEL_METHOD}.${NTASK}.out
          (The mpirun command is used when PARALLEL_METHOD is ``mpi'' instead of directly executing the executable.)
      7. At the end of the loop, the daemons started earlier are halted and the shared-memory (if PVM_SHMEM is ``ON'') is cleaned up (using ipcfree.)
      8. The output is searched (by an awk script, sh/collect.awk) for elapsed times of each phase and verification results. The times and verification results are stored in a file with a suffix of validation in the run directory. (The elapsed times of the four seismic processing phases must be summed together to submit results to SPEC.)

      9. This file is also spilled out to the screen. The following output shows a correct execution of Seismic with the test dataset on 4 processors. All 4 seismic processing phases verify:

        DATASET=Test NUMPROCS=4
           Phase 1, Elapsed=  10.9, Validated
           Phase 2, Elapsed=   2.5, Validated
           Phase 3, Elapsed=   0.6, Validated
           Phase 4, Elapsed=  51.8, Validated
           Total  , Elapsed=  65.8, 4 Phases Validated

    5. To verify that the results you obtained are valid, check the verification file in the run directory ($BENCH_DATA). This file will record whether or not Seismic verified for each of the four phases that were executed by the run script.

    6. Verification of results is performed by comparing summations of wave amplitudes (summed over the execution of the application) from the current execution with a trusted single-processor execution. If any of the amplitudes are over a certain threshold then the benchmark does not validate and a message is printed, ``Benchmark Verification Failed''. This can be the result of rounding errors from decomposing the data when running in parallel or from precision errors resulting from aggressive compilation techniques.

      Note that the third seismic phase is very sensitive to differences in precision and therefore may not verify. See the PROBLEMS file for more information.
    7. Example output files for the small dataset are included in runs/sample.

    Sub-Directories of SPECseis96:

    Parallel Method

    1. message-passing: each entire phase (except the initial communication required to open files and send a few, small common blocks, and to verify computed data at the end of the processing phase) can be run via a message-passing layer (MPI or PVM.)
    2. shared-memory: a shared-memory has been developed for the SGI machines and is being developed for use with OpenMP (or a set of portable shared-memory directives.) The version including OpenMP directives will be included in the next release of SPECseis.
    3. The parallelism explicitly coded into the application is limited by the decomposition of certain data parameters.
    4. Parallelism exploited by Seismic falls within several main categories:

    5. (from Types of Parallelism, Scalable Parallel Seismic Processing)
      1. Simple: "completely independent operations that can be applied to different blocks of data." (Phase 1 exhibits this type of parallelism.)
      2. Domain: sub-blocks of the data are assigned to different processors and communication is required to transmit boundary information.
      3. Transform/Transpose: some data must be computed in parallel, transposed across processors, then computed in parallel again. (Phases 3 and 4 exhibit this type of parallelism.) ``The speed of the application is determined by the relative mix of computer, inter-node communication, and disk I/O speeds of the hardware, rather than by serial bottlenecks in the algorithm.''
    Data
    1. Input Parameters:
    2. Values of the Dataset Parameters for SEIS
      Parameter  Meaning
      NS  samples per trace 
      NTPG  traces per group 
      NGPL  groups per line 
      NLINE  number of lines 
      NX  velocity x-dimension number of midpoints 
      NY  velocity y-dimension 
      NZ  velocity z-dimension 
      ZMAX  number of depth steps 

    3. Input/Output Data:
      1. Initially, only input parameters are needed (no data files.)
      2. Then, the trace data is stored in a file throughout the execution of a phase. The trace data stored from a previous phase is used in the current phase. (Phase 2 uses the data files stored by Phase 1. Phases 3 and 4 use the data files stored by Phase 2.)
      3. Disk accessing has been designed to allow parallel IO to aleviate the potential IO bottleneck.
  5. Authors: (Seismic was developed in 1993)
    1. Charles Mosher (ARCO Exploration and Production Technology) http, email
    2. Siamak Hassanzadeh (Sun Microsystems) email
  6. For more information on the science behind the application see the following links:
    1. SEG: The Society of Exploration Geophysicists