Can I run Phi-3 Medium 14B (q3_k_m) on NVIDIA RTX 3090 Ti?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
24.0GB
Required
5.6GB
Headroom
+18.4GB

VRAM Usage

0GB 23% used 24.0GB

Performance Estimate

Tokens/sec ~60.0
Batch size 6
Context 128000K

info Technical Analysis

The NVIDIA RTX 3090 Ti, with its 24GB of GDDR6X VRAM and Ampere architecture, is well-suited for running the Phi-3 Medium 14B model, especially when utilizing quantization. The base FP16 (half-precision floating point) model requires 28GB of VRAM, exceeding the 3090 Ti's capacity. However, with q3_k_m quantization, the model's memory footprint is reduced to approximately 5.6GB. This leaves a substantial 18.4GB of VRAM headroom, allowing for comfortable operation and potentially enabling larger batch sizes or longer context lengths without encountering memory limitations. The RTX 3090 Ti's 1.01 TB/s memory bandwidth ensures that data can be transferred efficiently between the GPU and memory, which is crucial for maintaining high inference speeds.

lightbulb Recommendation

For optimal performance with the Phi-3 Medium 14B model on your RTX 3090 Ti, stick with the q3_k_m quantization as it significantly reduces VRAM usage. Experiment with batch sizes up to 6 to maximize throughput, keeping a close eye on VRAM usage to avoid exceeding available memory. Consider using a framework like `llama.cpp` or `vLLM` to further optimize inference speed and memory management. If you need to experiment with larger batch sizes or longer context lengths, monitor your VRAM usage and consider further quantization (e.g., Q2_K) if necessary, although this may slightly impact model accuracy.

tune Recommended Settings

Batch_Size
6
Context_Length
128000
Other_Settings
['Enable CUDA acceleration', 'Use memory mapping for weights', 'Optimize attention mechanisms']
Inference_Framework
llama.cpp or vLLM
Quantization_Suggested
q3_k_m (or Q2_K if more VRAM is needed)

help Frequently Asked Questions

Is Phi-3 Medium 14B (14.00B) compatible with NVIDIA RTX 3090 Ti? expand_more
Yes, Phi-3 Medium 14B is fully compatible with the NVIDIA RTX 3090 Ti, especially when using quantization to reduce VRAM usage.
What VRAM is needed for Phi-3 Medium 14B (14.00B)? expand_more
With q3_k_m quantization, Phi-3 Medium 14B requires approximately 5.6GB of VRAM.
How fast will Phi-3 Medium 14B (14.00B) run on NVIDIA RTX 3090 Ti? expand_more
You can expect around 60 tokens per second with optimized settings and q3_k_m quantization.