The NVIDIA Jetson AGX Orin 64GB faces significant challenges when attempting to run the DeepSeek-V2.5 model due to the model's substantial memory footprint. DeepSeek-V2.5, with its 236 billion parameters, requires approximately 472GB of VRAM when using FP16 precision. The Jetson AGX Orin 64GB, equipped with only 64GB of LPDDR5 VRAM, falls drastically short of this requirement, resulting in a VRAM deficit of 408GB. This incompatibility prevents the model from being loaded onto the GPU for inference.
Beyond VRAM, memory bandwidth also plays a critical role. The Jetson AGX Orin 64GB provides 0.21 TB/s of memory bandwidth. While this is respectable for its class, it would likely become a bottleneck even if sufficient VRAM were available. The sheer size of DeepSeek-V2.5 means frequent memory access would be necessary, potentially leading to slow inference speeds. The combination of insufficient VRAM and limited memory bandwidth makes running DeepSeek-V2.5 on the Jetson AGX Orin 64GB impractical without significant optimization or model modification.
Due to the severe VRAM limitations, directly running DeepSeek-V2.5 on the Jetson AGX Orin 64GB is not feasible. Consider using model quantization techniques such as 4-bit or even lower precision to significantly reduce the VRAM footprint. However, expect some loss in accuracy. Alternatively, explore distributed inference solutions where the model is sharded across multiple devices, although this adds complexity. If possible, offloading some layers to the CPU might allow for a partial load, but this will drastically reduce performance.
Another approach is to use a smaller, more manageable model that is better suited for the Jetson AGX Orin's capabilities. Fine-tuning a smaller model on a specific task could provide acceptable performance without exceeding the hardware limitations. Explore cloud-based inference solutions as well, where the model runs on more powerful remote servers and the Jetson AGX Orin acts as a client.