Can I run Mixtral 8x22B (Q4_K_M (GGUF 4-bit)) on NVIDIA RTX 3090?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
24.0GB
Required
70.5GB
Headroom
-46.5GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

The NVIDIA RTX 3090, while a powerful GPU with 24GB of GDDR6X VRAM, falls short of the VRAM requirements for running Mixtral 8x22B (141.00B) even with aggressive quantization. Mixtral 8x22B, a large language model with 141 billion parameters, demands substantial memory. Even when quantized to Q4_K_M (4-bit), it still requires approximately 70.5GB of VRAM. The RTX 3090 has a memory bandwidth of 0.94 TB/s, which is sufficient for transferring data to the GPU, but the limiting factor is the available VRAM itself. The deficit of 46.5GB means the model cannot be fully loaded onto the GPU, preventing successful inference. The 10496 CUDA cores and 328 Tensor cores are effectively bottlenecked by the VRAM limitation, rendering their computational power unusable for this specific model.

lightbulb Recommendation

Due to the significant VRAM shortfall, running Mixtral 8x22B (141.00B) on a single RTX 3090 is not feasible. Consider using a multi-GPU setup with NVLink to pool VRAM, or explore cloud-based solutions offering GPUs with sufficient memory. Alternatively, smaller models with fewer parameters, such as those in the 7B to 13B range, might be more suitable for the RTX 3090. If using a smaller model is not an option, offloading layers to system RAM (CPU) using llama.cpp might allow you to run the model, but performance will be severely impacted. Consider using cloud services or renting a GPU with sufficient memory if performance is critical.

tune Recommended Settings

Batch_Size
1 (if CPU offloading)
Context_Length
Reduce context length to the minimum acceptable v…
Other_Settings
['Enable CPU offloading in llama.cpp', 'Use mlock=True to prevent swapping']
Inference_Framework
llama.cpp (for CPU offloading)
Quantization_Suggested
No further quantization will solve the VRAM issue…

help Frequently Asked Questions

Is Mixtral 8x22B (141.00B) compatible with NVIDIA RTX 3090? expand_more
No, Mixtral 8x22B (141.00B) is not compatible with a single NVIDIA RTX 3090 due to insufficient VRAM.
What VRAM is needed for Mixtral 8x22B (141.00B)? expand_more
Mixtral 8x22B (141.00B) requires approximately 70.5GB of VRAM when quantized to Q4_K_M (4-bit).
How fast will Mixtral 8x22B (141.00B) run on NVIDIA RTX 3090? expand_more
Mixtral 8x22B will likely not run at all on a single RTX 3090 without significant CPU offloading. Even with offloading, expect very slow performance, potentially less than 1 token/second.