Mistral Large 2 on Jetson AGX Orin: Compatibility Analysis

info Technical Analysis

The primary limiting factor for running Mistral Large 2 on the NVIDIA Jetson AGX Orin 64GB is the significant VRAM disparity. Mistral Large 2, with its 123 billion parameters, necessitates approximately 246GB of VRAM when using FP16 precision. The Jetson AGX Orin 64GB, however, only provides 64GB of VRAM. This 182GB shortfall means the model in its full FP16 form cannot be loaded onto the device. The Ampere architecture of the Jetson AGX Orin, while capable, is ultimately constrained by its memory capacity for such a large model. Memory bandwidth, at 0.21 TB/s, would also become a bottleneck even if sufficient VRAM were available, impacting the tokens/second generation rate.

Even with techniques like offloading layers to system RAM, the performance would be severely degraded due to the slower transfer speeds between system RAM and the GPU. The 2048 CUDA cores and 64 Tensor cores would be underutilized as the system spends a significant portion of its time swapping data. The large context length of 128,000 tokens compounds the memory pressure, as the attention mechanism requires substantial memory resources. Therefore, running Mistral Large 2 in its original form on the Jetson AGX Orin 64GB is not feasible.

lightbulb Recommendation

To run Mistral Large 2 on the Jetson AGX Orin 64GB, aggressive quantization is essential. Explore using 4-bit or even 3-bit quantization techniques to significantly reduce the VRAM footprint. Frameworks like `llama.cpp` are well-suited for this purpose, offering various quantization methods and CPU offloading capabilities. However, expect a noticeable reduction in model accuracy compared to the FP16 version.

Alternatively, consider using a smaller, fine-tuned model that is more manageable for the Jetson AGX Orin's resources. Another approach would be to use a cloud-based inference service where the model is hosted remotely, and the Jetson AGX Orin acts as a client for sending requests and receiving responses. This offloads the computational burden but requires a stable internet connection.

tune Recommended Settings

Batch_Size

1

Context_Length

2048 or lower

Other_Settings

['Enable CPU offloading', 'Reduce the number of layers processed on the GPU', 'Use a smaller, quantized embedding model']

Inference_Framework

llama.cpp

Quantization_Suggested

Q4_K_M or lower (e.g., Q3_K_S)

help Frequently Asked Questions

Is Mistral Large 2 compatible with NVIDIA Jetson AGX Orin 64GB? expand_more

No, Mistral Large 2 is not directly compatible with the NVIDIA Jetson AGX Orin 64GB due to insufficient VRAM.

What VRAM is needed for Mistral Large 2? expand_more

Mistral Large 2 requires approximately 246GB of VRAM in FP16 precision.

How fast will Mistral Large 2 run on NVIDIA Jetson AGX Orin 64GB? expand_more

Without significant quantization and optimization, Mistral Large 2 will not run on the NVIDIA Jetson AGX Orin 64GB. Even with aggressive quantization, expect very slow inference speeds, likely less than 1 token/second.

NelsaHost

Can I run Mistral Large 2 on NVIDIA Jetson AGX Orin 64GB?

VRAM Usage

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with Jetson AGX Orin 64GB