GEMM Operations

Detailed performance analysis for gemm operations.

GEMM Performance (Double Precision)

Comparing General Matrix Multiplication (DGEMM) performance across libraries.

Problem Description

Benchmarking double-precision general matrix multiplication (DGEMM) is crucial as it is a fundamental building block for many HPC applications. This test measures sustained floating-point performance for large square matrices (N x N).

Results

No results available for this benchmark yet.

Chart

Bar chart visualising data for: GEMM Performance (Double Precision)

Analysis

To be updated

Batched GEMM Performance

Performance of GEMM for many small matrices processed in batches.

Problem Description

Batched GEMM operations involve performing many independent matrix multiplications on small matrices. This is common in applications like deep learning (tensor contractions), scientific simulations with many small systems (e.g., astrophysics, materials science), and block-oriented algorithms. This benchmark evaluates the throughput (e.g., GFLOPS or matrices per second) for various batch sizes and matrix dimensions.

Results

No results available for this benchmark yet.

Chart

Bar chart visualising data for: Batched GEMM Performance

Analysis

To be updated

Distributed GEMM Scaling (PDGEMM)

Scaling of General Matrix Multiplication across multiple compute nodes.

Problem Description

For matrices too large to fit in the memory of a single compute node, or to accelerate computations, GEMM operations must be distributed across multiple nodes. This benchmark evaluates the strong and weak scaling performance of distributed GEMM implementations (e.g., ScaLAPACK's PDGEMM, SLATE, DLA-Future) using metrics like sustained GFLOPS and parallel efficiency.

Results

No results available for this benchmark yet.

Chart

Bar chart visualising data for: Distributed GEMM Scaling (PDGEMM)

Analysis

To be updated