smart_toy
Llama Large Language Models

Llama 3 8B (8.00B)

Parameters
8.00B
VRAM (FP16)
16.0GB
VRAM (INT4)
4.0GB
Context
8192

tune Quantization Options

Quantization VRAM Required Min GPU
FP16 (Half Precision) 16.0GB RTX 4080
INT8 (8-bit Integer) 8.0GB RTX 3070 / 4060
Q4_K_M (GGUF 4-bit) 4.0GB RTX 3070 / 4060
q3_k_m 3.2GB RTX 3070 / 4060

Model Details

Family Llama
Category Large Language Models
Parameters 8.00B
Context Length 8192