性能优化求助,shared memory 问题

system · 2009 年11 月 24 日 08:03

现在的情况是：每个block内16X16线程，每个线程取上下左右四个值来处理，所以shared memory要读18X18。

我打算让四个边框上的线程来读取shared memory里的边框上的数据，但是这样要用到四个条件判断,if(tx==0),if(ty==0). if(tx == blocksize-1), if(ty = blocksize-1).

但是代码跑起来后结果不对,不知道为何错误. 同时觉得应该有更好的办法解决shared memory的读入问题,求助版上的高手.
下面是未经优化的代码:

__global__ void Laplace_d (float *A, float *B, int *N_t){

   int tx = threadIdx.x; int ty = threadIdx.y;
   
   int j = blockIdx.x * blockDim.x + tx;
   int i = blockIdx.y * blockDim.y + ty;
   int index, left, right, top, bottom;
   int N=*N_t;

   index = i*N +j;
   left = i*N+ j-1;
   right = i*N+ j+1;
   top = (i-1)*N +j;
   bottom = (i+1)*N+j;
   if(i>0 && i<N-1 && j>0 && j<N-1){
   B[index]=0.25*( A[left]+A[right]+A[top]+A[bottom])*0.9+0.1*B[index];
   }
}

system · 2010 年12 月 31 日 00:13

:idle:

system · 2011 年1 月 26 日 03:46

不太清楚