现有kernel伪代码如下:
global void kernel(… projection p, float step_size,
float sample_spacing, float pos) {
y = UMAD(blockIdx.x, blockDim.x, threadIdx.x);
z = UMAD(blockIdx.y, blockDim.y, threadIdx.y);
i = z*p.det.projydim + y;
…
num_steps = …;
for(int j=0;j<num_steps;j++) {
sum+=tex3D(project_tex, pos.x, pos.y, pos.z);
pos+=dx;
}
d_proj [ i ] =sumstep_sizesample_spacing;
}
哪位大侠看看,如何优化是好呢~~?
多谢各位看官
[ 本帖最后由 shhhelen123 于 2010-5-9 11:08 编辑 ]