Can I run Mistral Large 2 on NVIDIA Jetson AGX Orin 64GB?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
64.0GB
Required
246.0GB
Headroom
-182.0GB

VRAM Usage

0GB 100% used 64.0GB

info Technical Analysis

The primary limiting factor for running Mistral Large 2 on the NVIDIA Jetson AGX Orin 64GB is the significant VRAM disparity. Mistral Large 2, with its 123 billion parameters, necessitates approximately 246GB of VRAM when using FP16 precision. The Jetson AGX Orin 64GB, however, only provides 64GB of VRAM. This 182GB shortfall means the model in its full FP16 form cannot be loaded onto the device. The Ampere architecture of the Jetson AGX Orin, while capable, is ultimately constrained by its memory capacity for such a large model. Memory bandwidth, at 0.21 TB/s, would also become a bottleneck even if sufficient VRAM were available, impacting the tokens/second generation rate.

Even with techniques like offloading layers to system RAM, the performance would be severely degraded due to the slower transfer speeds between system RAM and the GPU. The 2048 CUDA cores and 64 Tensor cores would be underutilized as the system spends a significant portion of its time swapping data. The large context length of 128,000 tokens compounds the memory pressure, as the attention mechanism requires substantial memory resources. Therefore, running Mistral Large 2 in its original form on the Jetson AGX Orin 64GB is not feasible.

lightbulb Recommendation

To run Mistral Large 2 on the Jetson AGX Orin 64GB, aggressive quantization is essential. Explore using 4-bit or even 3-bit quantization techniques to significantly reduce the VRAM footprint. Frameworks like `llama.cpp` are well-suited for this purpose, offering various quantization methods and CPU offloading capabilities. However, expect a noticeable reduction in model accuracy compared to the FP16 version.

Alternatively, consider using a smaller, fine-tuned model that is more manageable for the Jetson AGX Orin's resources. Another approach would be to use a cloud-based inference service where the model is hosted remotely, and the Jetson AGX Orin acts as a client for sending requests and receiving responses. This offloads the computational burden but requires a stable internet connection.

tune Recommended Settings

Batch_Size
1
Context_Length
2048 or lower
Other_Settings
['Enable CPU offloading', 'Reduce the number of layers processed on the GPU', 'Use a smaller, quantized embedding model']
Inference_Framework
llama.cpp
Quantization_Suggested
Q4_K_M or lower (e.g., Q3_K_S)

help Frequently Asked Questions

Is Mistral Large 2 compatible with NVIDIA Jetson AGX Orin 64GB? expand_more
No, Mistral Large 2 is not directly compatible with the NVIDIA Jetson AGX Orin 64GB due to insufficient VRAM.
What VRAM is needed for Mistral Large 2? expand_more
Mistral Large 2 requires approximately 246GB of VRAM in FP16 precision.
How fast will Mistral Large 2 run on NVIDIA Jetson AGX Orin 64GB? expand_more
Without significant quantization and optimization, Mistral Large 2 will not run on the NVIDIA Jetson AGX Orin 64GB. Even with aggressive quantization, expect very slow inference speeds, likely less than 1 token/second.