Description |
1 online resource (10 pages) : color illustrations. |
|
text txt rdacontent |
|
computer c rdamedia |
|
online resource cr rdacarrier |
Series |
NREL/CP ; 2C00-80530 |
|
Conference paper (National Renewable Energy Laboratory (U.S.)) ; 2C00-80530.
|
Note |
"March 2022." |
|
"Presented at the 2021 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE'21), Las Vegas, Nevada, July 26-29, 2021"--Cover. |
Bibliography |
Includes bibliographical references (page 9). |
Funding |
U.S. Department of Energy DE-AC36-08GO28308 |
Note |
Description based on online resource; title from PDF title page (NREL, viewed May 26, 2022). |
Summary |
A threaded multi-core implementation of the high performance dense linear algebra matrix-matrix multiply GEMM kernel is described. This kernel is widely implemented by vendors in the basic linear algebra subroutine BLAS library. The mathematics of arrays (MoA) paradigm due to Mullin (1988) results in contiguous memory accesses by employing outer-product forms. Our performance studies demonstrate that the MoA implementation of double precision DGEMM combined with optimal cache-blocking strategies results in at least a 25% performance gain on the Intel Xeon Skylake processor over the vendor supplied Intel MKL basic linear algebra libraries. Results are presented for the NREL Eagle supercomputer. The multi-core DGEMM achieves over 100 GigaFlops/sec with eight openMP threads. |
Subject |
Array processors.
|
|
Computer science -- Mathematics.
|
|
Algebras, Linear.
|
|
Processeurs de tableaux.
|
|
Informatique -- Mathématiques.
|
|
Algèbre linéaire.
|
|
Algebras, Linear
|
|
Array processors
|
|
Computer science -- Mathematics
|
Indexed Term |
cache-blocking |
|
contiguous memory |
|
mathematics of arrays |
|
shared-memory multi-threading |
Added Author |
National Renewable Energy Laboratory (U.S.), issuing body.
|
Standard No. |
1848079 OSTI ID |
Gpo Item No. |
0430-P-04 (online) |
Sudoc No. |
E 9.17:NREL/CP-2 C 00-80530 |
|