LLaVA 1.6 13B on Jetson AGX Orin: Compatibility & Performance

info Technical Analysis

The NVIDIA Jetson AGX Orin 64GB, with its 64GB of LPDDR5 VRAM, is well-suited for running the LLaVA 1.6 13B model. LLaVA 1.6 13B requires approximately 26GB of VRAM when using FP16 precision. The Orin's ample 64GB VRAM provides a significant 38GB headroom, ensuring the model and its associated data structures can comfortably reside in memory. The Orin's 2048 CUDA cores and 64 Tensor cores will be utilized for the matrix multiplications and other computations inherent in the model's architecture, while the 210 GB/s memory bandwidth will allow for efficient data transfer between the VRAM and the compute units. This headroom also allows for larger batch sizes and longer context lengths without immediately running into memory constraints, though the memory bandwidth may become a limiting factor as batch size increases.

lightbulb Recommendation

Given the substantial VRAM headroom, experiment with larger batch sizes (up to the estimated 14) to maximize throughput. Start with FP16 precision for optimal speed and then consider quantization (e.g., Q4_K_M) to further reduce memory footprint and potentially improve inference speed, though with a potential trade-off in accuracy. Monitor VRAM usage and token generation speed during experimentation. Consider using a framework like `llama.cpp` with appropriate hardware acceleration flags enabled for the Jetson AGX Orin to optimize performance. If you encounter performance bottlenecks, investigate optimizing the image encoding and decoding pipelines used by LLaVA.

tune Recommended Settings

Batch_Size

14

Context_Length

4096

Other_Settings

['Enable hardware acceleration flags in llama.cpp', 'Optimize image encoding/decoding pipeline']

Inference_Framework

llama.cpp

Quantization_Suggested

Q4_K_M (if needed)

help Frequently Asked Questions

Is LLaVA 1.6 13B compatible with NVIDIA Jetson AGX Orin 64GB? expand_more

Yes, LLaVA 1.6 13B is fully compatible with the NVIDIA Jetson AGX Orin 64GB due to the Orin's ample VRAM.

What VRAM is needed for LLaVA 1.6 13B? expand_more

LLaVA 1.6 13B requires approximately 26GB of VRAM when using FP16 precision.

How fast will LLaVA 1.6 13B run on NVIDIA Jetson AGX Orin 64GB? expand_more

Expect approximately 72 tokens/sec on the NVIDIA Jetson AGX Orin 64GB, though this can vary depending on the specific settings and optimizations applied.

NelsaHost

Can I run LLaVA 1.6 13B on NVIDIA Jetson AGX Orin 64GB?

VRAM Usage

Performance Estimate

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with Jetson AGX Orin 64GB