调试信息如下:
Nsight Debug
CUDA Memory Checker detected 64 threads caused an access violation:
Launch Parameters
CUcontext = 00496c38
CUstream = 0357e5c0
CUmodule = 035e83f0
CUfunction = 0361a440
FunctionName = Z7encryptPjS_iS_PhS0_S0_S_S
gridDim = {32,1,1}
blockDim = {512,1,1}
sharedSize = 896
Parameters:
dev_input = 0x05d60000 16843009
key = 0x05aa0000 4294967295
length = 720896
dev_out = 0x05da0000 3452816845
S = 0x059a0000 99 ‘c’
Logtable = 0x059a0400 0 ’
Nsight Debug
Memory Checker detected 64 access violations.
error = access violation on load (global memory)
blockIdx = {0,0,0}
threadIdx = {224,0,0}
address = 0x00130e68
accessSize = 4
Nsight Debug
CUDA grid launch failed: CUcontext: 4811832 CUmodule: 56525808 Function: Z7encryptPjS_iS_PhS0_S0_S_S
Nsight Debug
CUDA Memory Checker detected 64 threads caused an access violation:
Launch Parameters
CUcontext = 02aa6c38
CUstream = 0352e5c0
CUmodule = 035983c8
CUfunction = 035ca420
FunctionName = Z7encryptPjS_iS_PhS0_S0_S_S
gridDim = {32,1,1}
blockDim = {512,1,1}
sharedSize = 896
Parameters:
dev_input = 0x05d60000 16843009
key = 0x05aa0000 4294967295
length = 720896
dev_out = 0x05da0000 0
S = 0x059a0000 99 ‘c’
Logtable = 0x059a0400 0 ’
Nsight Debug
Memory Checker detected 64 access violations.
error = access violation on load (global memory)
blockIdx = {2,0,0}
threadIdx = {128,0,0}
address = 0x00884868
accessSize = 4
设备为GT630M
1G显存
THREAD_NUM = 512
BLOCK_NUM = 32
是在做这样一步:
key[index] = keyExpansion[keyIndex];
key 与 keyExpansion的定义分别为
word32* key ; length : 44 * THREAD_NUM * BLOCK_NUM;
word32* keyExpansion; length : 4 * THREAD_NUM * BLOCK_NUM;
下标为 index = 44 * (THREAD_NUM * (BLOCK_NUM * blockIdx.y) + threadIdx.x);
keyIndex = 4 * (THREAD_NUM * (BLOCK_NUM * blockIdx.y) + threadIdx.x);
不知内存访问冲突的原因,请教各位,
谢谢~