Can I run Mistral 7B (INT8 (8-bit Integer)) on NVIDIA H100 PCIe?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
80.0GB
Required
7.0GB
Headroom
+73.0GB

VRAM Usage

0GB 9% used 80.0GB

Performance Estimate

Tokens/sec ~117.0
Batch size 32
Context 32768K

info Technical Analysis

The NVIDIA H100 PCIe, with its massive 80GB of HBM2e VRAM and 2.0 TB/s memory bandwidth, is exceptionally well-suited for running the Mistral 7B language model. Mistral 7B, even in its full FP16 precision, only requires 14GB of VRAM. When quantized to INT8, the VRAM footprint shrinks further to approximately 7GB. This leaves a substantial 73GB of VRAM headroom, allowing for large batch sizes, longer context lengths, and potentially the concurrent deployment of multiple model instances or other AI workloads. The H100's 14592 CUDA cores and 456 Tensor Cores further contribute to its ability to efficiently process the model's computations, leading to high inference throughput.

lightbulb Recommendation

Given the H100's ample resources, users should prioritize maximizing throughput and minimizing latency. Start with a batch size of 32 and experiment with increasing it until VRAM utilization approaches its limit. Explore techniques such as speculative decoding and continuous batching to further enhance performance. While INT8 quantization provides a good balance of performance and accuracy, consider experimenting with FP16 or BF16 precision if the application demands the highest possible accuracy, keeping in mind the increased VRAM usage. The H100's architecture also benefits from optimized kernels; using libraries like NVIDIA's Triton or cuBLAS can yield significant speedups.

tune Recommended Settings

Batch_Size
32
Context_Length
32768
Other_Settings
['Enable CUDA graphs', 'Experiment with speculative decoding', 'Profile performance with NVIDIA Nsight Systems']
Inference_Framework
vLLM
Quantization_Suggested
INT8

help Frequently Asked Questions

Is Mistral 7B (7.00B) compatible with NVIDIA H100 PCIe? expand_more
Yes, Mistral 7B is fully compatible and performs exceptionally well on the NVIDIA H100 PCIe.
What VRAM is needed for Mistral 7B (7.00B)? expand_more
In INT8 quantized format, Mistral 7B requires approximately 7GB of VRAM. In FP16, it requires around 14GB.
How fast will Mistral 7B (7.00B) run on NVIDIA H100 PCIe? expand_more
With optimized settings, expect approximately 117 tokens/sec. Actual performance may vary based on batch size, context length, and other factors.