有一个函数,里面有变量width,d_jx。有4个进程,我试图输出width和d_jx首地址,width在不同的进程中地址不一样,但分配在设备端的地址却不变。myrank为进程号,就一块加速卡。
fun()
{
…
int width;
double d_jx;
cudaMalloc((void*)&d_jx,d_size);
printf(“rank=%d,address of width=%x\n”,myrank,&width);
printf(“rank=%d,address of d_jx=%x\n”,myrank,d_jx);
MPI通讯;
kernel;
cudaMemcpy;
MPI通讯;
…
}
函数大郅结构就是先mpi通讯,然后调用kernel,然后数据拷回主机端,再mpi通讯。
输出结果:
rank=3,address of width=78a4e05c
rank=3,address of d_jx=209000
rank=2,address of width=aea090ec
rank=2,address of d_jx=209000
rank=0,address of width=86a777fc
rank=0,address of d_jx=209000
rank=1,address of width=bb5f1fac
rank=1,address of d_jx=209000
----------第二次调用该函数时---------
rank=0,address of width=86a777fc
rank=0,address of d_jx=22d000
rank=2,address of width=aea090ec
rank=2,address of d_jx=22d000
rank=3,address of width=78a4e05c
rank=3,address of d_jx=22d000
rank=1,address of width=bb5f1fac
rank=1,address of d_jx=22d000
还有一个问题是,多个进程运行该函数,然后各自运行了kernel函数,操作的数据都是一样的,各进程之间会不会有冲突?因为我程序跑到第二次调用该函数就出错了。今日第二问,谢谢啦!