kernel函数中跟变量有关的问题

system · 2011 年10 月 19 日 16:17

global void ranksort(int * device_array)
{
int tx = threadIdx.x;
int bx = blockIdx.x;
//s is the rank of the current number
int i, j, s = 0;
int temp = 0;
shared int sharaed_array[BLOCK_SIZE];
sharaed_array[tx] = device_array[bx * blockDim.x + tx];
temp = sharaed_array[tx];
__syncthreads();
//compare the number in the current block
for(j = 0; j < blockDim.x; j++)
if(sharaed_array[j] < temp) //14行
s++; //15行
__syncthreads();
//compare the number before the current block
for(i = 0;i < blockIdx.x; i++)
{
sharaed_array[tx] = device_array[i * blockDim.x + tx];
__syncthreads();
for(j = 0;j < blockDim.x; j++)
{
if(sharaed_array[j] < temp)
s++;
}
__syncthreads();
}
//compare the number after the current block
for(i = blockIdx.x+1; i<gridDim.x; i++)
{
sharaed_array[tx] = device_array[i * blockDim.x + tx];
__syncthreads();
for(j = 0; j < blockDim.x; j++)
{
if(sharaed_array[j] < temp)
s++;
}
}
device_array[s] = temp;
}

这是网上找的一个关于数组排序的kernel程序，想请教一下，14,15行是什么意思，每个线程是怎么样工作的，麻烦高手解答一下，谢谢。

system · 2011 年10 月 20 日 01:00

14行是一个线程和block中的其它线程的数据挨个比较，比temp小，s加加，这样得到的s就是在block中，该线程的数据所处的排序位置。14、15行就是这么个意思。

system · 2011 年10 月 20 日 01:20

对了，后续的代码也是这个思想，比如把当前block之前(和之后)的block中的数据加载到当前block中执行类似的思想。当kernel执行完毕之后，得到的s值就是该数据在整个数组排序完之后的索引。也就是说，排序的话只要知道某个数据在排完序之后的索引就ok了。
明白了吗？

system · 2011 年10 月 20 日 05:04

谢谢，在你的指导下，14,15行都懂了，是不是每个thread都有一个temp和s。另外，
for(i = 0;i < blockIdx.x; i++)
{
sharaed_array[tx] = device_array[i * blockDim.x + tx]; 不是太了解，比如当blockIdx.x=2时，i可以取0,和1，那么 device_array[i * blockDim.x + tx]是不是把64个值放在thread0和thread1的sharaed_array[tx]中，麻烦了

system · 2011 年10 月 20 日 06:18

当然是每个线程都有一个temp和s了。

system · 2011 年10 月 20 日 06:23

sharaed_array[tx] = device_array[i * blockDim.x + tx]; 当i=0的时候，这一部分是将第0个block内的所有数据读入到第2个block内，然后第2个block内的每个线程和新读入的数据进行比较，更新s值。然后再读入第1个block内的数据，再比较。明白了吗？你可以动手写写

system · 2011 年10 月 20 日 06:24

放在thread0和thread1的sharaed_array[tx]，你的这句话有严重的错去，自己好好看看cuda基本知识哦

system · 2011 年10 月 20 日 08:00

研究了一下午，终于看懂了，真的太感谢你了，我刚刚学cuda一个月，不懂的地方很多，多亏有你帮助~~真心谢谢