Can I run BGE-Small-EN on NVIDIA Jetson AGX Orin 64GB?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
64.0GB
Required
0.1GB
Headroom
+63.9GB

VRAM Usage

0GB 0% used 64.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

While VRAM isn't a concern, optimizing for throughput will be the key to maximizing performance. The Orin's 60W TDP means power efficiency is a priority, so choosing efficient inference frameworks and quantization levels will be important. The estimated 90 tokens/sec is a good starting point, but this can likely be improved through optimizations. The estimated batch size of 32 is reasonable and will help to saturate the GPU's compute capabilities. Keep in mind that the BGE-Small-EN model is designed for embedding generation, so the 'tokens/sec' metric doesn't directly translate to language generation speed; it reflects the speed at which embeddings can be created.

lightbulb Recommendation

Start by using a high-performance inference framework like ONNX Runtime or TensorRT to leverage the Jetson AGX Orin's hardware acceleration capabilities. Since the model is so small, experiment with different batch sizes to find the optimal balance between latency and throughput. A larger batch size will generally increase throughput but also increase latency. Given the ample VRAM headroom, you can likely increase the batch size significantly beyond the initial estimate of 32. Consider quantizing the model to INT8 or even INT4 to further improve performance and reduce memory bandwidth requirements, even though the VRAM usage is already minimal. Finally, profile your application to identify any bottlenecks and optimize accordingly.

tune Recommended Settings

Batch_Size
64-128 (experiment to find optimal)
Context_Length
512
Other_Settings
['Enable CUDA graph capture', 'Use asynchronous data loading', 'Optimize data preprocessing']
Inference_Framework
ONNX Runtime or TensorRT
Quantization_Suggested
INT8 or INT4

help Frequently Asked Questions

Is BGE-Small-EN compatible with NVIDIA Jetson AGX Orin 64GB? expand_more
Yes, BGE-Small-EN is perfectly compatible with the NVIDIA Jetson AGX Orin 64GB due to the ample VRAM and the Orin's compute capabilities.
What VRAM is needed for BGE-Small-EN? expand_more
BGE-Small-EN requires approximately 0.1GB of VRAM when using FP16 precision.
How fast will BGE-Small-EN run on NVIDIA Jetson AGX Orin 64GB? expand_more
You can expect an estimated throughput of around 90 tokens/sec with a batch size of 32. This can likely be improved by optimizing the inference framework and quantization.