Can I run BGE-Small-EN on NVIDIA Jetson AGX Orin 32GB?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
32.0GB
Required
0.1GB
Headroom
+31.9GB

VRAM Usage

0GB 0% used 32.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

The NVIDIA Jetson AGX Orin 32GB, with its Ampere architecture, 1792 CUDA cores, and 32GB of LPDDR5 VRAM, provides ample resources for running the BGE-Small-EN embedding model. BGE-Small-EN, being a relatively small model with only 0.03B parameters and requiring just 0.1GB of VRAM in FP16 precision, fits comfortably within the Orin's memory capacity. The Orin's 210 GB/s memory bandwidth is also sufficient to handle the data transfer demands of this model, ensuring efficient operation. The 56 Tensor Cores on the Orin will accelerate the matrix multiplication operations inherent in the BGE-Small-EN model, leading to faster inference times.

Given the substantial VRAM headroom (31.9GB), the Jetson AGX Orin can easily accommodate larger batch sizes and potentially run multiple instances of the BGE-Small-EN model concurrently. The Ampere architecture's improvements in memory management and compute efficiency further contribute to the model's performance. With an estimated throughput of 90 tokens/sec, the Orin provides a responsive and practical platform for embedding tasks using BGE-Small-EN. The power efficiency of the Jetson AGX Orin (40W TDP) makes it suitable for edge deployment scenarios where power consumption is a concern.

lightbulb Recommendation

For optimal performance, begin with a batch size of 32 and a context length of 512 tokens, as these parameters are well-suited to the Jetson AGX Orin's capabilities. Explore using the ONNX Runtime or TensorRT to further optimize the model for the Orin's architecture. Consider quantizing the model to INT8 or even INT4 to reduce memory footprint and potentially increase inference speed, though this may come at a slight cost to accuracy.

Monitor VRAM usage and inference latency during initial testing to fine-tune the batch size and other parameters for your specific application. If you encounter performance bottlenecks, profile the application to identify areas for further optimization, such as kernel fusion or memory access patterns. If higher throughput is required, explore parallelizing inference across multiple Orin devices if your use case allows.

tune Recommended Settings

Batch_Size
32
Context_Length
512
Other_Settings
['Optimize with TensorRT', 'Profile for bottlenecks', 'Consider quantization aware training']
Inference_Framework
ONNX Runtime, TensorRT
Quantization_Suggested
INT8 or INT4

help Frequently Asked Questions

Is BGE-Small-EN compatible with NVIDIA Jetson AGX Orin 32GB? expand_more
Yes, BGE-Small-EN is fully compatible with the NVIDIA Jetson AGX Orin 32GB.
What VRAM is needed for BGE-Small-EN? expand_more
BGE-Small-EN requires approximately 0.1GB of VRAM in FP16 precision.
How fast will BGE-Small-EN run on NVIDIA Jetson AGX Orin 32GB? expand_more
You can expect an estimated throughput of around 90 tokens per second on the NVIDIA Jetson AGX Orin 32GB.