想问一下我的显卡是计算功能集1.3的按理说支持int类型的原子操作,可却运行不了CUDA实战第九章全局内存的上的原子操作,运行时会出现unknown error 的提示。而这个编译过的应用程序拿到计算功能集2.0的显卡确实能正确运行。然而我奇怪的不是这个,是我的显卡确实可以计算带有原子操作的其他程序!求解释?
楼主您好,一般出现unknown error代表您的kernel进行了非法的地址访问(例如越界就可能导致这个,当然越界也可能没反应)。
而您2.0卡上能运行,说明该原子操作使用的地址,在2.0上是有效的,而在1.3上是无效的。
(考虑到2.x+支持generic addressing等,以及可能不同的最小分配粒度,这种情况是有可能的)。
因为我没有购买《CUDA实战》这本书,所以能否请您将《CUDA实战第九章》的的例子发到本论坛?然后论坛为您看一下呢?
/*
* Copyright 1993-2010 NVIDIA Corporation. All rights reserved.
*
* NVIDIA Corporation and its licensors retain all intellectual property and
* proprietary rights in and to this software and related documentation.
* Any use, reproduction, disclosure, or distribution of this software
* and related documentation without an express license agreement from
* NVIDIA Corporation is strictly prohibited.
*
* Please refer to the applicable NVIDIA end user license agreement (EULA)
* associated with this source code for terms and conditions that govern
* your use of this NVIDIA software.
*
*/
#include "../common/book.h"
#define SIZE (100*1024*1024)
__global__ void histo_kernel( unsigned char *buffer,
long size,
unsigned int *histo ) {
// calculate the starting index and the offset to the next
// block that each thread will be processing
int i = threadIdx.x + blockIdx.x * blockDim.x;
int stride = blockDim.x * gridDim.x;
while (i < size) {
atomicAdd( &histo[buffer[i]], 1 );
i += stride;
}
}
int main( void ) {
unsigned char *buffer =
(unsigned char*)big_random_block( SIZE );
// capture the start time
// starting the timer here so that we include the cost of
// all of the operations on the GPU.
cudaEvent_t start, stop;
HANDLE_ERROR( cudaEventCreate( &start ) );
HANDLE_ERROR( cudaEventCreate( &stop ) );
HANDLE_ERROR( cudaEventRecord( start, 0 ) );
// allocate memory on the GPU for the file's data
unsigned char *dev_buffer;
unsigned int *dev_histo;
HANDLE_ERROR( cudaMalloc( (void**)&dev_buffer, SIZE ) );
HANDLE_ERROR( cudaMemcpy( dev_buffer, buffer, SIZE,
cudaMemcpyHostToDevice ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_histo,
256 * sizeof( int ) ) );
HANDLE_ERROR( cudaMemset( dev_histo, 0,
256 * sizeof( int ) ) );
// kernel launch - 2x the number of mps gave best timing
cudaDeviceProp prop;
HANDLE_ERROR( cudaGetDeviceProperties( &prop, 0 ) );
int blocks = prop.multiProcessorCount;
histo_kernel<<<blocks*2,256>>>( dev_buffer, SIZE, dev_histo );
unsigned int histo[256];
HANDLE_ERROR( cudaMemcpy( histo, dev_histo,
256 * sizeof( int ),
cudaMemcpyDeviceToHost ) );
// get stop time, and display the timing results
HANDLE_ERROR( cudaEventRecord( stop, 0 ) );
HANDLE_ERROR( cudaEventSynchronize( stop ) );
float elapsedTime;
HANDLE_ERROR( cudaEventElapsedTime( &elapsedTime,
start, stop ) );
printf( "Time to generate: %3.1f ms\n", elapsedTime );
long histoCount = 0;
for (int i=0; i<256; i++) {
histoCount += histo[i];
}
printf( "Histogram Sum: %ld\n", histoCount );
// verify that we have the same counts via CPU
for (int i=0; i<SIZE; i++)
histo[buffer[i]]--;
for (int i=0; i<256; i++) {
if (histo[i] != 0)
printf( "Failure at %d! Off by %d\n", i, histo[i] );
}
HANDLE_ERROR( cudaEventDestroy( start ) );
HANDLE_ERROR( cudaEventDestroy( stop ) );
cudaFree( dev_histo );
cudaFree( dev_buffer );
free( buffer );
return 0;
}
楼主您好,通过阅读您代码的,似乎无问题的。
那么我做出如下建议:
(1)楼主发一下出错时候的行号,以便进一步的辅助观察。(我这里没有您需要的部分代码,也没有1.3的老卡,所以无法为您上机测试。)
(2)如果楼主用的是该书编译好的二进制代码,我建议楼主重新编译自己编译一下。
(3)如果可能,升级driver和toolkit到最新,避免是编译器BUG
(4)建议其他会员/版主/NVIDIA官方技术支持/总版主进行帮忙看下。
没有其他更好的建议了。
谢谢版主的耐心解答!我会尝试一下,这个问题我只是感到奇怪后来就问了问!