smart_toy
Llama Large Language Models

Llama 3 70B (70.00B)

Parameters
70.00B
VRAM (FP16)
140.0GB
VRAM (INT4)
35.0GB
Context
8192

tune Quantization Options

Quantization VRAM Required Min GPU
FP16 (Half Precision) 140.0GB A100 / H100
INT8 (8-bit Integer) 70.0GB A100 / H100
Q4_K_M (GGUF 4-bit) 35.0GB A6000 / 2x 4090
q3_k_m 28.0GB A6000 / 2x 4090

Model Details

Family Llama
Category Large Language Models
Parameters 70.00B
Context Length 8192