请使用下面的模版提问(创建话题后勾选相应的选项):
Jetson 模组
Jetson Orin NX
Jetson 软件
JetPack 6.2
问题描述
为了实现用DLA加速TensorRT-LLM构建的推理引擎,我尝试了如下方法
步骤1,使用TensorRT-LLM构建并保存模型文件,具体代码如下
# 构建并保存TensorRT引擎
llm.save("./trt_engine.checkpoint")
使用trtexec指令推理TensorRT-LLM保存的引擎文件
trtexec --loadEngine=rank0.engine --useDLACore=0 --allowGPUFallback --verbose
最终结果失败
错误日志
[03/11/2025-11:56:30] [I] [TRT] Loaded engine size: 3467 MiB
[03/11/2025-11:56:31] [V] [TRT] Local registry did not find GPTAttention creator. Will try parent registry if enabled.
[03/11/2025-11:56:31] [E] [TRT] IPluginRegistry::getPluginCreator: Error Code 4: API Usage Error (Cannot find plugin: GPTAttentiontensorrt_llm, version: 1, namespace:tensorrt_llm.)
[03/11/2025-11:56:32] [E] Error[1]: IRuntime::deserializeCudaEngine: Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[03/11/2025-11:56:32] [E] Engine deserialization failed
[03/11/2025-11:56:32] [E] Got invalid engine!
[03/11/2025-11:56:32] [E] Inference set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100400] [b26] # trtexec --loadEngine=rank0.engine --useDLACore=0 --allowGPUFallback --verbose
请问这种出现上面错误的原因是什么?这种方法是否可行能?如不不可行应该怎么实现用DLA加速TensorRT LLM构建的引擎呢?