问下bank conflict 的问题
我用的geforce9500 应该是有16个banks
对于下面的例子
shared float shared[32];
float data = shared[BaseIndex + s * tid];
若BaseIndex = 0
s = 1;
访问应该是
tid shared[BaseIndex + s * tid]
0 shared[0]
1 shared[1]
.
.
.
15 shared[15]
0~15属于不同的bank
若s = 3的话
是怎么样的呢
tid shared[BaseIndex + s * tid]
0 shared[0]
1 shared[3]
.
.
.
5 shared[15]
6 shared[16]
.
.
10 shared[30]
11 shared[33]//大于31了,不会出错吗
.
.
.
15 shared[45]
如果将上面float类型改为
struct type { float x, y, z; };
tid shared[BaseIndex + s * tid] bank
0 shared[0] 0,1,2
1 shared[3] 3,4,5
.
.
.
5 shared[15] 15,0,1
6 shared[16] 2,3,4
.
.
10 shared[30]
11 shared[33]
.
.
.
15 shared[45]
这里不是已经重复访问了吗
为什么在NVIDIA_CUDA_ProgrammingGuide.pdf G3.3.4
Three separate reads without bank conflicts if type is defined as
struct type { float x, y, z; };
since each member is accessed with an odd stride of three 32-bit words;
英文版和中文版的都已经看过了,还是没看懂
还请各位不吝赐教