编译错误: NYI: deposit_bits for non-word size

我编写了这么一段代码,主要目的是在device端执行64位的数据,但提示的错误,请问如何修改??

typedef long long int int64_t;
typedef unsigned long long int uint64_t;
struct tag_NodeHeader //
{
int nValValue ; //待修改的生长点变量值,[-2147483648, 2147483647]
short int nValIndex; //当前生长点变量的下标,[0, 65535]
short int nGrowthCount; //生长轮数计数,[0, 65535]
};

union union_NodeHeader //
{
tag_NodeHeader tag_Fission; //离散状态
int64_t nFusion; //聚合状态
};

struct Thin_Node
{
int64_t nNodeHeader; //生长点
float FValue; //结点F函数值
};

1>------ 已启动全部重新生成: 项目: Demo11, 配置: Release Win32 ------
1>正在删除项目“Demo11”(配置“Release|Win32”)的中间文件和输出文件
1>Compiling…
1>Test.cu
1>tmpxft_000008b0_00000000-3_Test.cudafe1.gpu
1>tmpxft_000008b0_00000000-8_Test.cudafe2.gpu
1>### Assertion failure at line 1140 of …/…/be/cg/NVISA/exp_loadstore.cxx:
1>### Compiler Error in file C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/tmpxft_000008b0_00000000-9_Test.cpp3.i during Code_Expansion phase:
1>### NYI: deposit_bits for non-word size
1>nvopencc ERROR: C:\CUDA\toolkit\bin/…/open64/lib//be.exe returned non-zero status 1
1>正在编译…
1>main.cpp
1>正在链接…
1>LINK : fatal error LNK1181: 无法打开输入文件“.\Release\Test.obj”
1>生成日志保存在“file://e:\PGSA_GPU\Demo11\Release\BuildLog.htm”
1>Demo11 - 1 个错误,0 个警告
========== 全部重新生成: 0 已成功, 1 已失败, 0 已跳过 ==========

恳请各位回答,谢谢!!

我调用上述结构的代码是:

///////////////////////////////////////////////////////////// //////////////////////////////
/// @brief
/// @param
///
/// @return
/// @see
/// @note
/// @date 2010-10-10 17:43
/// @version
///////////////////////////////////////////////////////////// //////////////////////////////
global void GPUInitNodes(Thin_Node *pNodes, const int NUM)
{
unsigned int tidInGrid = threadIdx.x + blockIdx.x * blockDim.x;

union_NodeHeader temp;
Thin_Node Temp_Node;
for(int i=tidInGrid; i < NUM; i+=blockDim.x*gridDim.x)
{
temp.tag_Fission.nValIndex = tidInGrid%125;
temp.tag_Fission.nValValue = (int)tidInGrid%1022;
temp.tag_Fission.nGrowthCount = tidInGrid%2020+1;
Temp_Node.nNodeHeader = temp.nFusion;
Temp_Node.FValue = (float)(tidInGrid/123.0f);
pNodes[i] = Temp_Node;
}
}

global void GPUProcessNode(Thin_Node *pNodes, const int NUM)
{
int tidInGrid = threadIdx.x + blockIdx.x * blockDim.x;

Thin_Node Temp_Node;
union_NodeHeader temp ;
for(int i=tidInGrid; i < NUM; i+=blockDim.x*gridDim.x)
{
Temp_Node = pNodes[i];
temp.nFusion = Temp_Node.nNodeHeader;
temp.tag_Fission.nValIndex *= 2;
temp.tag_Fission.nValValue *= 2;
temp.tag_Fission.nGrowthCount *= 2;
Temp_Node.nNodeHeader = temp.nFusion;
Temp_Node.FValue = (float)(sqrt( Temp_Node.FValue ) );
//__syncthreads();
pNodes[i] = Temp_Node;
}

}

同意楼上的说法