我机器上的GTX580 和 C2070 在 CUDA-Z 0.6.163 上的测试结果如下所示。
[b]部分结论:
- 单精度浮点数计算 GTX 580 是 C2070 的 1.6 倍左右;
- 双精度浮点数计算 GTX 580 是 C2070 的 0.4 倍左右;
- 整数计算 GTX 580 是 C2070 的 1.6 倍左右 。[/b]
仅供大家参考,欢迎多提宝贵意见。
[b]Core Information
Name: GeForce GTX 580[/b]
Compute Capability: 2.0
Clock Rate: 1600 MHz
PCI Location: 0:3:0
Multiprocessors: 16 (512 Cores)
Therds Per Multiproc.: 1536
Warp Size: 32
Regs Per Block: 32768
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 65535 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Memory Information
Total Global: 1535.81 MiB
Bus Width: 384 bits
Clock Rate: 2004 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65535
Texture 3D Size: 2048 x 2048 x 2048
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Unidirectional
Performance Information
Memory Copy
Host Pinned to Device: 3101.36 MiB/s
Host Pageable to Device: 2394.15 MiB/s
Device to Host Pinned: 3270.42 MiB/s
Device to Host Pageable: 2569.77 MiB/s
Device to Device: 10.6116 GiB/s
GPU Core Performance
Single-precision Float: 1618.74 Gflop/s
Double-precision Float: 204.048 Gflop/s
32-bit Integer: 814.982 Giop/s
24-bit Integer: 811.543 Giop/s
[b]Core Information
Name: Tesla C2070[/b]
Compute Capability: 2.0
Clock Rate: 1147 MHz
PCI Location: 0:2:0
Multiprocessors: 14 (448 Cores)
Therds Per Multiproc.: 1536
Warp Size: 32
Regs Per Block: 32768
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 65535 x 65535 x 65535
Watchdog Enabled: No
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Memory Information
Total Global: 4096 MiB
Bus Width: 384 bits
Clock Rate: 1494 MHz
Error Correction: Yes
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65535
Texture 3D Size: 2048 x 2048 x 2048
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Bidirectional
Performance Information
Memory Copy
Host Pinned to Device: 5981.67 MiB/s
Host Pageable to Device: 5329.91 MiB/s
Device to Host Pinned: 6333.27 MiB/s
Device to Host Pageable: 5640.94 MiB/s
Device to Device: 46.6775 GiB/s
GPU Core Performance
Single-precision Float: 1020.71 Gflop/s
Double-precision Float: 512.452 Gflop/s
32-bit Integer: 512.62 Giop/s
24-bit Integer: 511.965 Giop/s