cufftExectD2Z与cufftExecZ2Z 结果有差异帮我看看

system · 2013 年11 月 19 日 04:26

#define ROW 256
#define COL 256
int main(int argc, char* argv)
{

/***********************************************************************
***************** 256256 2D FFT **************************
/***********************************************************************/
double idata_h = (double )malloc(ROWCOLsizeof(double));
memset(idata_h,0,ROWCOL*sizeof(double));

double2 *idataC_h = (double2 *)malloc(ROW*COL*sizeof(double2));
memset(idataC_h,0,ROW*COL*sizeof(double2));

double2 *odata_h = (double2 *)malloc(ROW*COL*sizeof(double2));
memset(odata_h,0,ROW*COL*sizeof(double2));

double2 *odataC_h = (double2 *)malloc(ROW*COL*sizeof(double2));
memset(odataC_h,0,ROW*COL*sizeof(double2));

for (int i = 0; i < ROW*COL; i++)				
{
	idataC_h[i].x=idata_h[i] = rand();
	idataC_h[i].y=0;
}

double *idata_d;
CudaSafeCall(cudaMalloc((void **)&idata_d, ROW*COL*sizeof(double)));
CudaSafeCall(cudaMemset((void *)idata_d, 0 ,ROW*COL * sizeof(double)));

double2 *odata_d;
CudaSafeCall(cudaMalloc((void **)&odata_d, ROW*COL*sizeof(double2)));
CudaSafeCall(cudaMemset((void *)odata_d, 0 ,ROW*COL * sizeof(double2)));

double2 *idataC_d;
CudaSafeCall(cudaMalloc((void **)&idataC_d, ROW*COL*sizeof(double2)));
CudaSafeCall(cudaMemset((void *)idataC_d, 0 ,ROW*COL * sizeof(double2)));

double2 *odataC_d;
CudaSafeCall(cudaMalloc((void **)&odataC_d, ROW*COL*sizeof(double2)));
CudaSafeCall(cudaMemset((void *)odataC_d, 0 ,ROW*COL * sizeof(double2)));


//拷贝内存信号到显存
CudaSafeCall(cudaMemcpy(idata_d, idata_h, ROW*COL * sizeof(double), cudaMemcpyHostToDevice));
CudaSafeCall(cudaMemcpy(idataC_d, idataC_h, ROW*COL * sizeof(double2), cudaMemcpyHostToDevice));

printf("2D ROW = %d COL = %d points fft start\n",ROW,COL);
cufftHandle planD2Z;				//创建CUFFT D2Z句柄
checkCudaErrors(cufftPlan2d(&planD2Z, ROW,COL, CUFFT_D2Z));

cufftHandle planZ2Z;				//创建CUFFT Z2Z句柄
checkCudaErrors(cufftPlan2d(&planZ2Z, ROW,COL, CUFFT_Z2Z));

checkCudaErrors(cufftExecD2Z(planD2Z, idata_d, odata_d));	
checkCudaErrors(cufftExecZ2Z(planZ2Z, idataC_d, odataC_d,CUFFT_FORWARD));

//拷贝显存到内存信号
CudaSafeCall(cudaMemcpy(odata_h,odata_d,ROW*COL * sizeof(double2),cudaMemcpyDeviceToHost));
CudaSafeCall(cudaMemcpy(odataC_h,odataC_d,ROW*COL * sizeof(double2),cudaMemcpyDeviceToHost));
//验证结果的正确性
for(int i=0;i<ROW*COL;i++)
{
	if ((fabs(odata_h[i].x - odataC_h[i].x)>0.1)||(fabs(odata_h[i].y - odataC_h[i].y)>0.1))
	{
		printf("fft restult is wrong\n");	
	}
}

printf("2D ROW = %d COL = %d points fft finishes\n",ROW,COL);

cudaFree(idata_d);
cudaFree(odata_d);
cudaFree(idataC_d);
cudaFree(odataC_d);
free(idata_h);
free(odata_h);
free(idataC_h);
free(odataC_h);

cudaDeviceReset();
return 0;

}

以上代码是想完成idata_h是double型的矩阵，idataC_h是复数数据，其实部与idata_h相等，虚部为0，对idata_h和idataC_h做FFT，计算的结果最终存入odata_h,odataC_h中进行比较发现前128项相等，从129项就不等了？帮我看那看是不是程序哪里出了问题？

system · 2013 年11 月 19 日 04:49

LZ您好：

FFT中，实值序列的FFT结果是共轭对称的，所以R2C或者D2Z的FFT函数利用了这个对称性，仅计算频域大概前一半点的数据（对应正频率的部分），所以您观察到的现象是正常的。

详情请参阅cuFFT 手册中的说明。

祝您编码顺利~

system · 2013 年11 月 19 日 05:51

我的应用是对图像中的psf做FFT得到otf，本来是用cufftExecZ2Z，考虑到cufftExecD2Z耗时比cufftExecZ2Z少，想替换它，然而结果错了。那么请问一下如果是多实数做fft 用cufftExecD2Z怎么才能得到与cufftExecZ2Z一样的结果呢?

system · 2013 年11 月 19 日 05:54

LZ您好：

请您复习一下FFT理论。

祝您好运~

system · 2013 年11 月 19 日 08:46

cufftExecZ2Z变换之后的共轭对称的规律是什么呢？
如果要用cufftExecD2Z替换cufftExecZ2Z是不是还要把cufftExecD2Z计算的结果重新补齐（通过共轭对称）成和cufftExecZ2Z计算的结果呢？这样会不会得不偿失？

system · 2013 年11 月 19 日 08:48

补充一下：上面说的是共轭对称的坐标的规律，我发现一维的挺好找规律的，对于二维好像不太好找。

system · 2013 年11 月 19 日 08:58

LZ您好：

1：共轭对称的具体形式请您查阅相关傅里叶变换理论的教本，解释这一点已经超出本版讨论范围。

2：R2C/D2Z的变换本来就是和C2R/Z2D的变换配合使用的，后者在计算中会自动从已有的复值数据中生成所需的另外一半共轭对称的数据，并予以计算。

3：如果您后续的自己的处理代码中需要Z2Z的格式的数据，那么您可以直接使用Z2Z变换，或用D2Z的数据手工补齐另外一半，或修改您后续的算法，从已有数据中就地生成所需的数据，并继续计算。您可以自行选择一种。

大致如此，请您定夺。

祝您编码顺利~

system · 2013 年11 月 19 日 09:02

LZ您好：

1：根据FFTW的手册，在高维中，仅有一个维度的点数减半，cufft可能与之类似。具体规律请您查阅相关资料和手册。

2：您可以查阅手册看看有没有现成可用的函数，而无需自行推导。

祝您好运~

system · 2013 年11 月 19 日 09:03

好的谢谢ice版主

system · 2013 年11 月 19 日 09:05

不客气的，欢迎您常来

system · 2013 年11 月 21 日 01:55

R2C变换后的结果矩阵是
1 1 1 1 1
1 1 1
1
还是
1 1 1 1 1
1 1 1
1
这样的形式呢？我想手动补齐。

system · 2013 年11 月 21 日 02:40

我大概看了一下，下面是我二维矩阵fft的公式
Xij = X(row-i)%row,(col-j)%col
其中i j 和(row-i)%row (col-j)%col都是下标， row col 是二维的行和列

system · 2013 年11 月 21 日 04:19

LZ您好：

我觉得R2C变换后的结果并非您给出的两种结果中的任何一个，您这个看上去是沿矩阵的主对角线和副对角线划分的，fft处理这里可能不是类似于矩阵共轭转置相等的形式。

一个直接的方法是您构造两组fft，一组是C2C，输入端虚部为0的，另一组是对应的R2C，您对比一下结果。

此外，考虑到cufft的默认情况下data layout是和FFTW兼容的，这里转载一段FFTW手册的叙述，供您参考，但实际情况如何，请您以cufft实际结果为准。

“As before,nis the logical size of the array, and the consequences of this on the the format
of the complex arrays deserve careful attention. Suppose that the real data has dimensions
n0n1 n2* …(nd-1)
(in row-major order). Then, after an r2c transform, the output
is an n0 n1* n2* ((nd-1)/2 + 1) array offftw_complexvalues in row-major order,
corresponding to slightly over half of the output of the corresponding complex DFT. (The
division is rounded down.) The ordering of the data is otherwise exactly the same as in the
complex-DFT case.”

(注意这里n0，nd-1等里面，0，d-1都是下标，复制过来格式丢失了)

“For example, consider the transform of a two-dimensional real array of sizen0byn1. The
output of the r2c transform is a two-dimensional complex array of sizen0byn1/2+1, where
theydimension has been cut nearly in half because of redundancies in the output. Because
fftw_complexis twice the size of double, the output array is slightly bigger than the input
array. Thus, if we want to compute the transform in place, we mustpadthe input array so
that it is of size n0by2*(n1/2+1). If n1is even, then there are two padding elements at
the end of each row (which need not be initialized, as they are only used for output).”

祝您好运~