Parallel Scaling to 23,000 Cores

Scaling on ORNL Cray XT3/4

Benchmark 6 on Cray XT3/4.

Scaling on ORNL Cray XT3/4

Benchmark 5 and 6 on Cray XT3/4.

Scaling on 6 HPC Platforms

Benchmark 5 on six different HPC resources.


NEMO 1-D was benchmarked on a broad range of HPC platforms in June and July 2007. 5 of the 6 benchmark platforms were ranked on the TOP500 list of June 2007. The benchmark machines are:
  • #2, ORNL, Jaguar, Cray XT3/4, with 23016 cores, 2GB RAM/core
  • #7, RPI, eServer Blue Gene Solution, IBM B/G, with 32768 cores, 256MB RAM/core
  • #8, NCSA, Abe, XeonQ (Quad core, dual socket), with 9600 cores, 1GB RAM/core
  • #30, IUPU, Big Red, IBM JS21, with 3072 cores, 2GB RAM/core
  • #46, PSC, Big Ben, Cray XT3, with 4136 cores, 1GB RAM/core
  • unranked, Purdue, XeonD (Dual core, dual socket), with 672 cores, 2GB/4GB RAM/core

The benchmark constitutes and end-to-end calculation of a realistic problem of carrier transport through a hole-based resonant tunneling diode. Benchmark5/6 are based on an end-to-end hole RTD calculation in the sp3s*/sp3d5s* (basis 10/20) model consistent of 326/572 spatial blocks. A reasonably large number of bias points of 180 was selected. Hole transport in heterostructures demands a detailed analysis of the transverse momentum [30]. 200 transverse momentum points are selected. The energy grid remains adaptive and resolves a relatively large number of very sharp resonances adaptively. A tri-level parallelims [40] has been implemented in NEMO 1-D. All bias points are completely independent and are treated in a master-worker approach. All momentum points are homogeneously distributed and the energy grid integrations are performed adaptively.

This page has been accessed at least several times since Feb 24, 2008.