Can I run Mixtral 8x22B (q3_k_m) on NVIDIA RTX 4090?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
24.0GB
Required
56.4GB
Headroom
-32.4GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

The primary limiting factor in running large language models (LLMs) like Mixtral 8x22B on consumer GPUs is VRAM capacity. Mixtral 8x22B, even when quantized to q3_k_m, requires approximately 56.4GB of VRAM to load the model and perform inference. The NVIDIA RTX 4090, while a powerful GPU, is equipped with only 24GB of VRAM. This 32.4GB deficit means the entire model cannot reside on the GPU's memory simultaneously, leading to a compatibility failure. Memory bandwidth, while significant at 1.01 TB/s on the RTX 4090, becomes less relevant when the model cannot fit entirely within the available VRAM. The model would have to constantly swap data between system RAM and GPU VRAM, which would be extremely slow.

lightbulb Recommendation

Due to the VRAM limitations of the RTX 4090, directly running Mixtral 8x22B (141.00B) even in its q3_k_m quantized form is not feasible. Consider using CPU offloading or splitting the model across multiple GPUs if possible. Alternatively, explore using a smaller model that fits within the 24GB VRAM of the RTX 4090 or using cloud-based inference services that offer sufficient GPU resources. If CPU offloading is the only option, expect significantly reduced performance compared to a full GPU implementation.

tune Recommended Settings

Batch_Size
1 (start with the lowest possible batch size)
Context_Length
Reduce context length to the minimum required for…
Other_Settings
['Enable CPU offloading in your chosen inference framework.', 'Experiment with different layers offloaded to CPU to find a balance between VRAM usage and performance.', 'Consider using a smaller model.']
Inference_Framework
llama.cpp with CPU offloading or vLLM for potenti…
Quantization_Suggested
q3_k_m (as currently used, further quantization m…

help Frequently Asked Questions

Is Mixtral 8x22B (141.00B) compatible with NVIDIA RTX 4090? expand_more
No, Mixtral 8x22B (141.00B) is not directly compatible with the NVIDIA RTX 4090 due to insufficient VRAM.
What VRAM is needed for Mixtral 8x22B (141.00B)? expand_more
Mixtral 8x22B (141.00B) requires approximately 56.4GB of VRAM when quantized to q3_k_m.
How fast will Mixtral 8x22B (141.00B) run on NVIDIA RTX 4090? expand_more
Due to the VRAM limitations, Mixtral 8x22B (141.00B) will likely not run or will run extremely slowly on the NVIDIA RTX 4090 using CPU offloading.