Skip to main content

GPU Benchmarking API Reference

This page provides detailed API reference for the GPU benchmarking functionality in CatP2P.

Structures

GpuBenchmarkContext

A context for GPU benchmarks that manages GPU resources and allows reusing them across multiple benchmarks.

FieldTypeDescriptionExample Access
gpu_infoGpuInfoInformation about the GPUcontext.gpu_info.name
adapterAdapterGPU adapter(internal use)
deviceDeviceGPU device(internal use)
queueQueueGPU command queue(internal use)

GpuBenchmarkResult

Contains detailed information about the results of a GPU benchmark.

FieldTypeDescriptionExample Access
gpu_modelStringGPU model nameresult.gpu_model
gpu_vendorStringGPU vendorresult.gpu_vendor
vram_estimateStringEstimated VRAM in GBresult.vram_estimate
compute_scoref64Compute performance score (MFLOPS)result.compute_score
texture_scoref64Texture sampling performance scoreresult.texture_score
geometry_scoref64Geometry processing performance scoreresult.geometry_score
memory_scoref64Memory bandwidth performance scoreresult.memory_score
overall_scoref64Overall benchmark score (higher is better)result.overall_score
average_fpsf64Average frames per second across all testsresult.average_fps
test_resultsVec<GpuTestResult>Detailed results for each testresult.test_results[0].score

GpuTestResult

Contains detailed information about the results of a specific GPU test.

FieldTypeDescriptionExample Access
test_nameStringName of the testtest_result.test_name
average_fpsf64Average frames per secondtest_result.average_fps
min_fpsf64Minimum frames per secondtest_result.min_fps
max_fpsf64Maximum frames per secondtest_result.max_fps
scoref64Test score in MFLOPS (higher is better)test_result.score

GpuBenchmarkConfig

Configuration options for GPU benchmarks.

FieldTypeDescriptionDefault ValueExample Access
test_duration_secsu64Duration of each test in seconds5config.test_duration_secs
include_compute_testboolWhether to include compute testtrueconfig.include_compute_test
include_texture_testboolWhether to include texture testtrueconfig.include_texture_test
include_geometry_testboolWhether to include geometry testtrueconfig.include_geometry_test
include_memory_testboolWhether to include memory testtrueconfig.include_memory_test
complexityu32Test complexity (1-10) affecting matrix size5config.complexity
window_widthu32Width of benchmark window800config.window_width
window_heightu32Height of benchmark window600config.window_height
show_windowboolWhether to show the benchmark windowfalseconfig.show_window

GpuInfo

Information about a GPU.

FieldTypeDescriptionExample Access
nameStringGPU model namegpu_info.name
vendorStringGPU vendorgpu_info.vendor
driverStringGPU driver versiongpu_info.driver
vramStringEstimated VRAMgpu_info.vram
backendStringGraphics API backendgpu_info.backend
is_integratedboolWhether the GPU is integratedgpu_info.is_integrated

Functions

Context Management

FunctionReturn TypeDescriptionExample UsagePossible Errors
GpuBenchmarkContext::new()Result<GpuBenchmarkContext, Error>Creates a new GPU benchmark contextlet context = GpuBenchmarkContext::new()?;No compatible GPU found, device creation failed
context.run_matrix_mult(duration, matrix_size)Result<GpuTestResult, Error>Runs a matrix multiplication benchmarklet result = context.run_matrix_mult(Duration::from_secs(5), 1024)?;Device lost, out of memory
context.run_activation_functions(duration, data_size)Result<GpuTestResult, Error>Runs a neural network activation functions benchmarklet result = context.run_activation_functions(Duration::from_secs(2), 1_000_000)?;Device lost, out of memory

Information Gathering

FunctionReturn TypeDescriptionExample UsagePossible Errors
get_gpu_info()Result<GpuInfo, Error>Gets information about the GPUlet gpu_info = gpu::get_gpu_info()?;No compatible GPU found
is_gpu_available()boolChecks if a compatible GPU is availableif gpu::is_gpu_available() { ... }None

Performance Testing

FunctionReturn TypeDescriptionExample UsagePerformance Impact
run_gpu_benchmark()Result<f64, Error>Runs a benchmark with default settingslet score = gpu::run_gpu_benchmark()?;High - runs matrix multiplication test with default duration
run_gpu_benchmark_with_config(config)Result<GpuBenchmarkResult, Error>Runs a benchmark with custom configurationlet result = gpu::run_gpu_benchmark_with_config(&config)?;Varies based on configuration
run_matrix_mult_benchmark(adapter, duration, size)Result<GpuTestResult, Error>Runs only the matrix multiplication testlet result = gpu::run_matrix_mult_benchmark(&adapter, duration, 1024)?;Medium - runs only matrix multiplication test
run_activation_functions_benchmark(adapter, duration, data_size)Result<GpuTestResult, Error>Runs only the activation functions testlet result = gpu::run_activation_functions_benchmark(&adapter, duration, 1_000_000)?;Medium - runs only activation functions test

Understanding GPU Benchmark Results

The GPU benchmark in CatP2P includes two main tests:

  1. Matrix Multiplication: How efficiently the GPU can multiply large matrices
  2. Neural Network Activation Functions: How efficiently the GPU can compute common activation functions used in neural networks

Matrix Multiplication Benchmark

This benchmark measures the GPU's ability to perform matrix multiplication operations, which are fundamental to many GPU computing tasks:

  • Score in MFLOPS: Millions of floating-point operations per second
  • Higher scores indicate better GPU compute performance
  • Matrix size: Calculated as 512 + (complexity * 128), where complexity ranges from 1 to 10

Activation Functions Benchmark

This benchmark measures the GPU's ability to compute common neural network activation functions:

  • Operations tested: ReLU, Sigmoid, Tanh, and Leaky ReLU
  • Score: Based on millions of operations per second
  • Higher scores indicate better performance for AI and deep learning applications

Interpreting the Score

The GPU benchmark score represents:

  • Higher scores indicate better GPU compute performance
  • Scores are influenced by:
    • GPU architecture and generation
    • Number of compute units/cores
    • Memory bandwidth and capacity
    • Driver optimization
    • System configuration

Typical Performance Ranges

GPU TypeTypical Matrix Multiplication Score (MFLOPS)Expected Performance
High-end Desktop GPU5,000,000 - 15,000,000Excellent for complex parallel computing
Mid-range Desktop GPU1,000,000 - 5,000,000Good for most parallel computing tasks
Entry-level Desktop GPU200,000 - 1,000,000Suitable for basic parallel computing
High-end Integrated GPU50,000 - 200,000Limited parallel computing capability
Basic Integrated GPU5,000 - 50,000Minimal parallel computing capability

Note: Actual performance can vary significantly based on specific hardware, system conditions, and benchmark parameters.

Factors Affecting Benchmark Results

FactorImpactNotes
Matrix SizeHighLarger matrices provide more accurate results but may hit memory limits
Data SizeMediumLarger data sizes for activation functions provide more accurate results
System ActivityMediumOther processes using the GPU can reduce benchmark scores
Driver VersionMediumUpdated drivers can provide performance improvements
Thermal ThrottlingHighGPUs may slow down if they overheat during benchmarking
Power LimitsMediumPower-limited systems (like laptops) may show lower performance
API OverheadLowDifferent graphics APIs have different overheads

Implementation Details

Matrix Multiplication Benchmark

The matrix multiplication benchmark measures how quickly the GPU can multiply two large matrices:

  1. Two random matrices of size NxN are created (where N is determined by the complexity parameter)
  2. The matrices are uploaded to GPU memory
  3. A compute shader multiplies the matrices
  4. The process is repeated for the specified duration
  5. Performance is measured in MFLOPS (Millions of Floating Point Operations Per Second)

The matrix size is calculated as: 512 + (complexity * 128), where complexity ranges from 1 to 10.

Activation Functions Benchmark

The activation functions benchmark measures how quickly the GPU can compute common neural network activation functions:

  1. Random input data of the specified size is created
  2. The data is uploaded to GPU memory
  3. A compute shader applies four activation functions (ReLU, Sigmoid, Tanh, Leaky ReLU) to each element
  4. The process is repeated for the specified duration
  5. Performance is measured in millions of operations per second

Resource Management

The GPU benchmarking module uses a context-based approach to manage GPU resources:

  1. A GpuBenchmarkContext is created once
  2. This context holds the GPU device and command queue
  3. Multiple benchmarks can be run using the same context
  4. This approach avoids device creation/destruction overhead and potential driver issues

Graphics API Selection

The GPU benchmarking functionality uses wgpu, which automatically selects the best available graphics API:

  1. Vulkan on supported systems (Linux, Windows, Android)
  2. Metal on macOS and iOS
  3. DirectX 12 on Windows
  4. DirectX 11 on older Windows systems
  5. OpenGL as a fallback

Error Handling

The GPU benchmarking functions use Rust's Result type to handle errors gracefully. Common errors include:

  • Error::Benchmark("No suitable GPU adapter found"): System lacks required GPU capabilities
  • Error::Benchmark("Failed to create device: {error}"): GPU initialization failed
  • Error::Benchmark("No matrix multiplications were performed during the benchmark"): Benchmark failed to run any iterations
  • Error::Benchmark("No activation function operations were performed during the benchmark"): Benchmark failed to run any iterations