Can I run BGE-M3 on NVIDIA Jetson AGX Orin 64GB?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
64.0GB
Required
1.0GB
Headroom
+63.0GB

VRAM Usage

0GB 2% used 64.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

The NVIDIA Jetson AGX Orin 64GB, with its Ampere architecture, 64GB of LPDDR5 VRAM, and 2048 CUDA cores, is exceptionally well-suited for running the BGE-M3 embedding model. BGE-M3, being a relatively small model with only 0.5B parameters and requiring just 1.0GB of VRAM in FP16 precision, leaves a substantial 63.0GB of VRAM headroom on the Orin. This ample VRAM allows for large batch sizes and the potential to load multiple model instances or other AI tasks concurrently. The Orin's 0.21 TB/s memory bandwidth, while not the highest available, is sufficient for BGE-M3's memory access patterns, preventing memory bandwidth from becoming a bottleneck.

lightbulb Recommendation

Given the abundant VRAM and computational resources of the Jetson AGX Orin, users should prioritize maximizing throughput by experimenting with larger batch sizes. Start with a batch size of 32 and incrementally increase it until you observe diminishing returns or encounter memory limitations. Consider using TensorRT for optimized inference, which can significantly improve the model's performance on NVIDIA hardware. Also, because BGE-M3 is small, explore running multiple instances in parallel to improve overall system utilization.

tune Recommended Settings

Batch_Size
32 (start, then increase)
Context_Length
8192
Other_Settings
['Enable CUDA graph capture for reduced latency', 'Experiment with different thread configurations for optimal throughput', 'Monitor GPU utilization to identify potential bottlenecks']
Inference_Framework
TensorRT or ONNX Runtime
Quantization_Suggested
FP16 (default)

help Frequently Asked Questions

Is BGE-M3 compatible with NVIDIA Jetson AGX Orin 64GB? expand_more
Yes, BGE-M3 is fully compatible and expected to perform well on the NVIDIA Jetson AGX Orin 64GB.
What VRAM is needed for BGE-M3? expand_more
BGE-M3 requires approximately 1.0GB of VRAM when running in FP16 precision.
How fast will BGE-M3 run on NVIDIA Jetson AGX Orin 64GB? expand_more
You can expect approximately 90 tokens per second on the NVIDIA Jetson AGX Orin 64GB.