Can I run BGE-Large-EN on NVIDIA Jetson AGX Orin 64GB?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
64.0GB
Required
0.7GB
Headroom
+63.3GB

VRAM Usage

0GB 1% used 64.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

While the Orin's 60W TDP is relatively low, it's important to consider thermal management, especially when pushing the GPU to its limits. The 90 tokens/sec estimate is a reasonable starting point, but actual performance will depend on factors like the specific inference framework used, quantization level, and batch size. The large VRAM headroom means that users can experiment with larger batch sizes to improve throughput, but this should be balanced against latency requirements. Quantization to INT8 or even lower precision could further improve performance, although careful evaluation is needed to ensure minimal impact on embedding quality.

lightbulb Recommendation

If you encounter performance bottlenecks, profile the application to identify the specific areas consuming the most resources. Optimize data loading and preprocessing pipelines to minimize CPU overhead. Ensure that the Jetson AGX Orin is adequately cooled to prevent thermal throttling, which can significantly impact performance. For production deployments, implement robust monitoring and alerting to proactively identify and address any performance issues.

tune Recommended Settings

Batch_Size
32 (start), experiment with larger sizes
Context_Length
512
Other_Settings
['Optimize data loading pipelines', 'Ensure adequate cooling', 'Profile application for bottlenecks']
Inference_Framework
ONNX Runtime, TensorRT
Quantization_Suggested
INT8

help Frequently Asked Questions

Is BGE-Large-EN compatible with NVIDIA Jetson AGX Orin 64GB? expand_more
Yes, BGE-Large-EN is fully compatible and performs well on the NVIDIA Jetson AGX Orin 64GB.
What VRAM is needed for BGE-Large-EN? expand_more
BGE-Large-EN requires approximately 0.7GB of VRAM when using FP16 precision.
How fast will BGE-Large-EN run on NVIDIA Jetson AGX Orin 64GB? expand_more
You can expect around 90 tokens per second. Performance can be improved with quantization and optimization techniques.