Techniques for the vectorization and parallelization of a sequential code for evaluating, one at a time, the leftmost eigenpairs of large sparse finite-element matrices are developed and tested. Two vectorization strategies are implemented, relying upon the ITPACK and JAD storage modes. Due to the lack of intrinsic parallel substasks, the level scheduling technique is exploited to gain parallelism in computing the products of the preconditioning matrix with given vectors, and the source code compilation takes advantage of the autotasking facility of the Cray Fortran compiler. Results of numerical tests performed on large sparse finite-element matrices are shown, indicating good vectorization levels but poor parellization on a Cray Y-MP8/432. For comparison, the performance of an efficient vector/parallel version of another algorithm allowing for the simultaneous evaluation of a number of eigenpairs is also reported. The CPU times of the vector sequential codes are less than those of the vetorized simultaneous code when running in vector uniprocessor mode. Poorer parallel performance is obtained for the parallelized sequential code, confirming that unlike with single processor runs, when parallel runs are performed and the minimum wall clock time is required, the parallelized simultaneous code is to be preferred.
Vector and parallel codes for large sparse eigenproblems.
PINI, GIORGIO;SARTORETTO, FLAVIO
1992
Abstract
Techniques for the vectorization and parallelization of a sequential code for evaluating, one at a time, the leftmost eigenpairs of large sparse finite-element matrices are developed and tested. Two vectorization strategies are implemented, relying upon the ITPACK and JAD storage modes. Due to the lack of intrinsic parallel substasks, the level scheduling technique is exploited to gain parallelism in computing the products of the preconditioning matrix with given vectors, and the source code compilation takes advantage of the autotasking facility of the Cray Fortran compiler. Results of numerical tests performed on large sparse finite-element matrices are shown, indicating good vectorization levels but poor parellization on a Cray Y-MP8/432. For comparison, the performance of an efficient vector/parallel version of another algorithm allowing for the simultaneous evaluation of a number of eigenpairs is also reported. The CPU times of the vector sequential codes are less than those of the vetorized simultaneous code when running in vector uniprocessor mode. Poorer parallel performance is obtained for the parallelized sequential code, confirming that unlike with single processor runs, when parallel runs are performed and the minimum wall clock time is required, the parallelized simultaneous code is to be preferred.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.