Number of Threads

I implemented my kernel with 64 and 256 threads in each block respectively. The result showed that the runtime with 64 threads was quicker than that with 256 threads. I was
wondering whether someone could give me a detailed explanation to this phenomenon.
Thanks a lot.

具体的话估计还是要看你的代码

但是在我印象中一个block中thread数目并不是越多越好,这要看你的任务划分以及程序

也期待大牛来回答