BGE-M3 on Jetson AGX Orin: Compatibility & Performance

info Technical Analysis

The NVIDIA Jetson AGX Orin 64GB, with its Ampere architecture, 64GB of LPDDR5 VRAM, and 2048 CUDA cores, is exceptionally well-suited for running the BGE-M3 embedding model. BGE-M3, being a relatively small model with only 0.5B parameters and requiring just 1.0GB of VRAM in FP16 precision, leaves a substantial 63.0GB of VRAM headroom on the Orin. This ample VRAM allows for large batch sizes and the potential to load multiple model instances or other AI tasks concurrently. The Orin's 0.21 TB/s memory bandwidth, while not the highest available, is sufficient for BGE-M3's memory access patterns, preventing memory bandwidth from becoming a bottleneck.

lightbulb Recommendation

Given the abundant VRAM and computational resources of the Jetson AGX Orin, users should prioritize maximizing throughput by experimenting with larger batch sizes. Start with a batch size of 32 and incrementally increase it until you observe diminishing returns or encounter memory limitations. Consider using TensorRT for optimized inference, which can significantly improve the model's performance on NVIDIA hardware. Also, because BGE-M3 is small, explore running multiple instances in parallel to improve overall system utilization.

tune Recommended Settings

Batch_Size

32 (start, then increase)

Context_Length

8192

Other_Settings

['Enable CUDA graph capture for reduced latency', 'Experiment with different thread configurations for optimal throughput', 'Monitor GPU utilization to identify potential bottlenecks']

Inference_Framework

TensorRT or ONNX Runtime

Quantization_Suggested

FP16 (default)

help Frequently Asked Questions

Is BGE-M3 compatible with NVIDIA Jetson AGX Orin 64GB? expand_more

Yes, BGE-M3 is fully compatible and expected to perform well on the NVIDIA Jetson AGX Orin 64GB.

What VRAM is needed for BGE-M3? expand_more

BGE-M3 requires approximately 1.0GB of VRAM when running in FP16 precision.

How fast will BGE-M3 run on NVIDIA Jetson AGX Orin 64GB? expand_more

You can expect approximately 90 tokens per second on the NVIDIA Jetson AGX Orin 64GB.

NelsaHost

Can I run BGE-M3 on NVIDIA Jetson AGX Orin 64GB?

VRAM Usage

Performance Estimate

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with Jetson AGX Orin 64GB