关于CUDA寄存器的一个问题:
下图是通过CUDA Sample里面的deviceQuery程序查询得到信息,中间部分有一行【红色框标记】为
Total number of registers available per block:65536
[attach]3373[/attach]
下表是CUDA_Occupancy_Calculator中的一个表格,红色标记的部分表明,每个SM中的register数为65536。
这与上图的中结果不是有些矛盾么?应该以那个为准呢?
[table=437]
[tr][td=365]
Physical Limits for GPU Compute Capability:
[/td][td=72]
3.0
[/td][/tr]
[tr][td=365]
Threads per Warp
[/td][td=72]
32
[/td][/tr]
[tr][td=365]
Warps per Multiprocessor
[/td][td=72]
64
[/td][/tr]
[tr][td=365]
Threads per Multiprocessor
[/td][td=72]
2048
[/td][/tr]
[tr][td=365]
Thread Blocks per Multiprocessor
[/td][td=72]
16
[/td][/tr]
[tr][td=365]
Total # of 32-bit registers per Multiprocessor
[/td][td=72]
65536
[/td][/tr]
[tr][td=365]
Register allocation unit size
[/td][td=72]
256
[/td][/tr]
[tr][td=365]
Register allocation granularity
[/td][td=72]
warp
[/td][/tr]
[tr][td=365]
Registers per Thread
[/td][td=72]
63
[/td][/tr]
[tr][td=365]
Shared Memory per Multiprocessor (bytes)
[/td][td=72]
49152
[/td][/tr]
[tr][td=365]
Shared Memory Allocation unit size
[/td][td=72]
256
[/td][/tr]
[tr][td=365]
Warp allocation granularity
[/td][td=72]
4
[/td][/tr]
[tr][td=365]
Maximum Thread Block Size
[/td][td=72]
1024
[/td][/tr]
[/table]
另外,如何计算每个thread可用的最大register个数?是用每个SM的register个数除以每个SM中开启的thread总数吗?