[Jetson AGX Orin] Can we manually install CUDA 13 / TensorRT 10.15 / TensorRT-LLM for Qwen3-VL?

862423843 · 2026 年5 月 25 日 06:30

请使用下面的模版提问（创建话题后勾选相应的选项）：
Jetson 模组
[*] Jetson AGX Orin
Jetson Orin NX
Jetson Orin Nano
Jetson AGX Xavier
Jetson Xavier NX
Jetson TX 系列
Jetson Nano

Jetson 软件
JetPack 5.1.3
JetPack 5.1.4
JetPack 6.0
JetPack 6.1
[*] JetPack 6.2
DeepStream SDK
NVIDIA Isaac

SDK Manager 管理工具版本
2.3.0
2.2.0
2.1.0
[*] 其他

问题描述
把这里替换为您的问题描述/复现步骤/用例/图片的详细信息
Hello NVIDIA team,

We are working on deploying a local multimodal RAG pipeline on NVIDIA Jetson AGX Orin 64GB.

Current system environment:

Device: NVIDIA Jetson AGX Orin 64GB
Architecture: aarch64 / ARM64
L4T: R36.4.7
JetPack: 6.2.1
CUDA Toolkit: 12.6
TensorRT:
10.3.0.30
cuDNN: 9.3
Python: 3.10
PyTorch: 2.10.0
Transformers: 4.57.6
Triton Server: 2.49.0, TensorRT backend works
TensorRT-LLM: not installed

What already works:

Qwen3-Embedding-0.6B
- Exported to ONNX
- Built TensorRT engine
- Deployed on Triton TensorRT backend
- Verified FP32 output, no NaN
Qwen3-Reranker-0.6B
- Exported to ONNX
- Built TensorRT engine
- Deployed on Triton TensorRT backend
- Verified output shape [batch, 1]
Qwen3-VL-2B-Instruct
- Runs locally with PyTorch CUDA on Jetson
- Used as image captioner/compliance checker
- Works, but not accelerated by TensorRT-LLM yet

Target models we want to deploy with TensorRT-LLM:

Qwen/Qwen3-VL-2B-Instruct
- Architecture: Qwen3VLForConditionalGeneration
- Multimodal model: Vision Encoder + LLM Decoder
- model_type: qwen3_vl
Qwen/Qwen3.5-2B
- Architecture: Qwen3_5ForConditionalGeneration
- Multimodal model: Vision Encoder + LLM Decoder
- model_type: qwen3_5
- Text part includes linear_attention / Gated DeltaNet style layers

According to the latest TensorRT-LLM source and documentation, Qwen3VLForConditionalGeneration appears to be supported in the newer TensorRT-LLM PyTorch backend.

However, the current Jetson-compatible TensorRT-LLM branch, such as v0.12.0-jetson, seems to match TensorRT 10.3 but does not support Qwen3-VL or Qwen3.5.

The newer TensorRT-LLM main branch seems to require a newer software stack, approximately:

CUDA 13.x
TensorRT 10.15.x
cuda-python >= 13
torch >= 2.10.0, <= 2.11.0a0
transformers == 5.5.4
nvidia-modelopt[torch] ~= 0.37.0
triton == 3.6.0
flashinfer-python == 0.6.11.post1

Questions:

On Jetson AGX Orin with L4T R36.4.7 / JetPack 6.2.1, is it officially supported or technically possible to manually install CUDA 13.x and TensorRT 10.15.x?
If yes, is there a recommended installation procedure for Jetson AGX Orin?
Is there a Jetson-compatible TensorRT-LLM version or wheel that supports Qwen3VLForConditionalGeneration?
Is Qwen/Qwen3-VL-2B-Instruct deployment with TensorRT-LLM officially supported on Jetson AGX Orin?
Is Qwen/Qwen3.5-2B deployment with TensorRT-LLM supported on Jetson AGX Orin?
If direct TensorRT-LLM deployment is not currently supported, what is the recommended deployment path for these multimodal models on Jetson Orin?
- PyTorch backend?
- vLLM/SGLang?
- wait for future JetPack/TensorRT-LLM release?
- Docker container?
- build TensorRT-LLM from source?
If manual mixing of CUDA/TensorRT versions is not recommended, please confirm the reason, especially regarding driver/L4T compatibility and TensorRT ABI compatibility.

We would appreciate any official guidance or reference workflow.
错误码
把这里替换为错误码（无需其他信息）

错误日志
把这里替换，粘贴错误日志文本（尽量粘贴错误文本，不要只上传截图）
如果有多个日志，请使用多个代码格式化文本