CUDA面内存用法总结

最近群里有网友问CUDA中2D GMEM copy的问题下面详细介绍再不需要内核的情况下如何实现:

view plaincopy to clipboardprint?
测试(从100x100的GMEM区域,起始索引为(25,25)的位置开始复制一块大小为50x50的子区域到目标GMEM):

src GMEM pointer : dpSrc

src GMEM layout : 100x100

dst GMEM pointer : dpDst

dst GMEM layout : 50*50

将src GMEM按行序初始化为:0~9999的值

CUDA_MEMCPY2D planeMem;
memset(&planeMem,0,sizeof(planeMem));
planeMem.srcMemoryType=CU_MEMORYTYPE_DEVICE;
planeMem.srcDevice =dpSrc;
planeMem.srcXInBytes =25sizeof(float);
planeMem.srcY =25;
planeMem.srcPitch =100
sizeof(float);
planeMem.dstMemoryType=CU_MEMORYTYPE_DEVICE;
planeMem.dstDevice =dpDst;
planeMem.dstXInBytes =0;
planeMem.dstY =0;
planeMem.dstPitch =50*sizeof(float);
planeMem.WidthInBytes =planeMem.dstPitch;
planeMem.Height =50;

cuMemcpy2DUnaligned(&planeMem); //如果数据已经对齐则最好使用cuMemcpy2D,否则必须使用该函数,另外当内存是使用cu垃圾广告llocPitch分配的时候,如果内存布局本事不是2的次幂,则需要将planeMem的srcPitch和dstPitch设置为通过cu垃圾广告llocPitch得到的pitch参数,而不是内存本身的布局大小*sizeof(TYPE)
测试(从100x100的GMEM区域,起始索引为(25,25)的位置开始复制一块大小为50x50的子区域到目标GMEM):

src GMEM pointer : dpSrc

src GMEM layout : 100x100

dst GMEM pointer : dpDst

dst GMEM layout : 50*50

将src GMEM按行序初始化为:0~9999的值

CUDA_MEMCPY2D planeMem;
memset(&planeMem,0,sizeof(planeMem));
planeMem.srcMemoryType=CU_MEMORYTYPE_DEVICE;
planeMem.srcDevice =dpSrc;
planeMem.srcXInBytes =25sizeof(float);
planeMem.srcY =25;
planeMem.srcPitch =100
sizeof(float);
planeMem.dstMemoryType=CU_MEMORYTYPE_DEVICE;
planeMem.dstDevice =dpDst;
planeMem.dstXInBytes =0;
planeMem.dstY =0;
planeMem.dstPitch =50*sizeof(float);
planeMem.WidthInBytes =planeMem.dstPitch;
planeMem.Height =50;

cuMemcpy2DUnaligned(&planeMem); //如果数据已经对齐则最好使用cuMemcpy2D,否则必须使用该函数,另外当内存是使用cu垃圾广告llocPitch分配的时候,如果内存布局本事不是2的次幂,则需要将planeMem的srcPitch和dstPitch设置为通过cu垃圾广告llocPitch得到的pitch参数,而不是内存本身的布局大小*sizeof(TYPE)

注:以上代码经过测试

:o

楼上的怎么了?

没什么,没看懂到底干吗了:o

:terrible:

:surprise: