Welcome to the Kids' Library!

Search for books, movies, music, magazines, and more.

Back to Search Results

Available items only

Previous Record Next Record

Author

Thomas, Stephen (Mathematician), author.

Title Threaded multi-core GEMM with MoA and cache-blocking : preprint / Stephen Thomas and [three others].

Publication Info.

Golden, CO : National Renewable Energy Laboratory, 2022.

Connect to

Copies

Description	1 online resource (10 pages) : color illustrations.
	text txt rdacontent
	computer c rdamedia
	online resource cr rdacarrier
Series	NREL/CP ; 2C00-80530
	Conference paper (National Renewable Energy Laboratory (U.S.)) ; 2C00-80530.
Note	"March 2022."
	"Presented at the 2021 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE'21), Las Vegas, Nevada, July 26-29, 2021"--Cover.
Bibliography	Includes bibliographical references (page 9).
Funding	U.S. Department of Energy DE-AC36-08GO28308
Note	Description based on online resource; title from PDF title page (NREL, viewed May 26, 2022).
Summary	A threaded multi-core implementation of the high performance dense linear algebra matrix-matrix multiply GEMM kernel is described. This kernel is widely implemented by vendors in the basic linear algebra subroutine BLAS library. The mathematics of arrays (MoA) paradigm due to Mullin (1988) results in contiguous memory accesses by employing outer-product forms. Our performance studies demonstrate that the MoA implementation of double precision DGEMM combined with optimal cache-blocking strategies results in at least a 25% performance gain on the Intel Xeon Skylake processor over the vendor supplied Intel MKL basic linear algebra libraries. Results are presented for the NREL Eagle supercomputer. The multi-core DGEMM achieves over 100 GigaFlops/sec with eight openMP threads.
Subject	Array processors.
	Computer science -- Mathematics.
	Algebras, Linear.
	Processeurs de tableaux.
	Informatique -- Mathématiques.
	Algèbre linéaire.
	Algebras, Linear
	Array processors
	Computer science -- Mathematics
Indexed Term	cache-blocking
	contiguous memory
	mathematics of arrays
	shared-memory multi-threading
Added Author	National Renewable Energy Laboratory (U.S.), issuing body.
Standard No.	1848079 OSTI ID
Gpo Item No.	0430-P-04 (online)
Sudoc No.	E 9.17:NREL/CP-2 C 00-80530