为什么我这个向量相加的程序运行结果不对?

我写了一个很简单的向量相加的程序,在VS2005下调试通过,但是运行得结果总是不对。我的系统是XP,显卡是GF 9300 GE。
我的源代码如下,不知道出了什么问题,恳请各位大侠帮忙:
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <math.h>
#define N 32

global void VecAdd(float* A, float* B, float* C)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < N)
C[i][i] = A[i][i] + B[i][i];
}

int
main( int argc, char** argv)
{
int i;
size_t size = N * sizeof(float);

float *h_A, *h_B, *h_C;
float *d_A, *d_B, *d_C;

h_A = (float*)malloc(size);
h_B = (float*)malloc(size);
h_C = (float*)malloc(size);

cudaMalloc((void**)&d_A, size);
cudaMalloc((void**)&d_B, size);
cudaMalloc((void**)&d_C, size);
for(i=0;i <N;i++)
{
h_A=1.0;
h_B=1.0;
[/i][/i][/i]
[i][i][i][i][i][i][i]}

cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

dim3 Grid(1,1);
dim3 Block(2,16);
VecAdd < < <Grid, Block>>>(d_A, d_B, d_C);
cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

for(i=0;i <N;i+=1)
{
printf(“%d: %f + %f = %f\n”,i,h_A[i][i],h_B[i][i],h_C[i][i]);
}

cudaFree(d_A);
cudaFree(d_B);
cudaFree(d_C);

free(h_A);
free(h_B);
free(h_C);

}

这个程序是将两个含32个分量的向量相加,这两个向量中的每一个分量都是1.0,那么结果应该是一个含32个分量的向量,其每个分量值应为2.0.可是我的结果确很奇怪,不知道为什么:
0: 1.000000 + 1.000000 = 0.000000
1: 1.000000 + 1.000000 = 7.000000
2: 1.000000 + 1.000000 = 1.000000
3: 1.000000 + 1.000000 = 7.000000
4: 1.000000 + 1.000000 = 0.000000
5: 1.000000 + 1.000000 = 7.000000
6: 1.000000 + 1.000000 = 4.000000
7: 1.000000 + 1.000000 = 5.000000
8: 1.000000 + 1.000000 = 9.000000
9: 1.000000 + 1.000000 = 7.000000
10: 1.000000 + 1.000000 = 7.000000
11: 1.000000 + 1.000000 = 9.000000
12: 1.000000 + 1.000000 = 3.000000
13: 1.000000 + 1.000000 = 3.000000
14: 1.000000 + 1.000000 = 2.000000
15: 1.000000 + 1.000000 = 8.000000
16: 1.000000 + 1.000000 = 9.000000
17: 1.000000 + 1.000000 = 7.000000
18: 1.000000 + 1.000000 = 4.000000
19: 1.000000 + 1.000000 = 4.000000
20: 1.000000 + 1.000000 = 6.000000
21: 1.000000 + 1.000000 = 2.000000
22: 1.000000 + 1.000000 = 5.000000
23: 1.000000 + 1.000000 = 8.000000
24: 1.000000 + 1.000000 = 4.000000
25: 1.000000 + 1.000000 = 4.000000
26: 1.000000 + 1.000000 = 8.000000
27: 1.000000 + 1.000000 = 8.000000
28: 1.000000 + 1.000000 = 8.000000
29: 1.000000 + 1.000000 = 9.000000
30: 1.000000 + 1.000000 = 2.000000
31: 1.000000 + 1.000000 = 6.000000[/i][/i][/i][/i][/i][/i][/i][/i][/i][/i]

[ 本帖最后由 jimmy19850511 于 2010-3-4 16:44 编辑 ]