BGE-Small-EN on Jetson AGX Orin: Compatibility & Performance

info Technical Analysis

While VRAM isn't a concern, optimizing for throughput will be the key to maximizing performance. The Orin's 60W TDP means power efficiency is a priority, so choosing efficient inference frameworks and quantization levels will be important. The estimated 90 tokens/sec is a good starting point, but this can likely be improved through optimizations. The estimated batch size of 32 is reasonable and will help to saturate the GPU's compute capabilities. Keep in mind that the BGE-Small-EN model is designed for embedding generation, so the 'tokens/sec' metric doesn't directly translate to language generation speed; it reflects the speed at which embeddings can be created.

lightbulb Recommendation

Start by using a high-performance inference framework like ONNX Runtime or TensorRT to leverage the Jetson AGX Orin's hardware acceleration capabilities. Since the model is so small, experiment with different batch sizes to find the optimal balance between latency and throughput. A larger batch size will generally increase throughput but also increase latency. Given the ample VRAM headroom, you can likely increase the batch size significantly beyond the initial estimate of 32. Consider quantizing the model to INT8 or even INT4 to further improve performance and reduce memory bandwidth requirements, even though the VRAM usage is already minimal. Finally, profile your application to identify any bottlenecks and optimize accordingly.

tune Recommended Settings

Batch_Size

64-128 (experiment to find optimal)

Context_Length

512

Other_Settings

['Enable CUDA graph capture', 'Use asynchronous data loading', 'Optimize data preprocessing']

Inference_Framework

ONNX Runtime or TensorRT

Quantization_Suggested

INT8 or INT4

help Frequently Asked Questions

Is BGE-Small-EN compatible with NVIDIA Jetson AGX Orin 64GB? expand_more

Yes, BGE-Small-EN is perfectly compatible with the NVIDIA Jetson AGX Orin 64GB due to the ample VRAM and the Orin's compute capabilities.

What VRAM is needed for BGE-Small-EN? expand_more

BGE-Small-EN requires approximately 0.1GB of VRAM when using FP16 precision.

How fast will BGE-Small-EN run on NVIDIA Jetson AGX Orin 64GB? expand_more

You can expect an estimated throughput of around 90 tokens/sec with a batch size of 32. This can likely be improved by optimizing the inference framework and quantization.

NelsaHost

Can I run BGE-Small-EN on NVIDIA Jetson AGX Orin 64GB?

VRAM Usage

Performance Estimate

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with Jetson AGX Orin 64GB