任意矩阵相乘的程序

system · 2011 年12 月 20 日 14:12

下面的程序，为什么只能做16倍数的矩阵？谢谢
global static void mat_MultCUDA_2(const float *a,int aRow,int aCol,
const float *b,int bRow,int bCol,float *C)
{
int tx = threadIdx.x;
int ty = threadIdx.y;

int bx = blockIdx.x;
int by = blockIdx.y;

int nResult = (bx*blockDim.x+tx)aRow+byblockDim.y+ty;

shared float as[BLOCK_SIZE][BLOCK_SIZE];
shared float bs[BLOCK_SIZE][BLOCK_SIZE];

float result = 0.0;
int aBegin = byBLOCK_SIZE;
int aStep = BLOCK_SIZEaRow;

int bBegin = bxBLOCK_SIZEbRow;
int bStep = BLOCK_SIZE;
int bEnd = bBegin+bRow-1;

for (int i=aBegin,j=bBegin;j<=bEnd;i+=aStep,j+=bStep)
{
if(tx+bxBLOCK_SIZE<aCol && ty+byBLOCK_SIZE<aRow)
{
as[ty][tx] = a[i+aRow*tx+ty];
}
else
{
as[ty][tx] = 0;
}
if(tx+bxBLOCK_SIZE<bCol && ty+byBLOCK_SIZE<bRow)
{
bs[ty][tx] = b[j+bRowtx+ty];
}
else
bs[ty][tx]=0;
/
as[ty][tx] = a[i + aRow * tx + ty];
bs[ty][tx] = b[j + bRow * tx + ty];*/

__syncthreads();

//if(nResult<aRow*bCol)
for (int k = 0; k < BLOCK_SIZE; ++k)
result += as[ty][k] * bs[k][tx];
__syncthreads();
}
C[nResult] = result;
}

dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);
dim3 dimGrid((M+BLOCK_SIZE-1)/BLOCK_SIZE,(L+BLOCK_SIZE-1)/BLOCK_SIZE);
mat_MultCUDA_2<<<dimGrid,dimBlock>>>(d_a,L,N,d_b,N,M,d_result);

system · 2011 年12 月 20 日 14:12

希望大家帮助下纠结了好久呢

system · 2011 年12 月 21 日 00:30

这个程序是分块方法，不是blocksize的倍数的话，是不行的。因为for循环的终止条件是j<bEnd。如果不是16的倍数，那么最后一块矩阵肯定小于线程的个数，线程执行肯定出错，因为没数据了，所以要修改bEnd，或者再加条件等等。。。。

system · 2011 年12 月 21 日 02:03

哈哈谢谢昨晚上我睡觉的时候想了这个问题今天准备来改一下刚下课谢谢啦

system · 2011 年12 月 22 日 05:53

呵呵，练习一下呗:)

system · 2011 年12 月 23 日 03:25

至少有一个问题是写时没加范围判断。

C[nResult] = result;

system · 2011 年12 月 30 日 03:33

谢谢你呀这个问题我后来也改了今天调出来了呵呵不过我是渐渐断断的在写还好这个问题解决了谢谢大家