Understand what hardware you need, realistic costs, and what AI models you can run on different setups. Updated with current GPU prices, RTX 5090 guidance, and newer open-weight model recommendations.
Don't have hardware? Rent GPUs by the hour instead. Use these referral links for bonuses.
NVIDIA GPUs are the best-supported option for local AI. CUDA is the standard for most AI tools.
| GPU | Price | VRAM | Performance | Notes | Where to Buy |
|---|---|---|---|---|---|
| RTX 4080 Super 16GB | $950-1,050 | 16GB | Excellent | Great performance, 16GB VRAM limit | |
| RTX 4070 Ti Super 16GB | $750-800 | 16GB | Excellent | Great balance, 16GB VRAM | |
| RTX 4070 Ti 12GB | $700-750 | 12GB | Excellent | Great for 8B models, limited for larger |
| GPU | Price | VRAM | Performance | Notes | Where to Buy |
|---|---|---|---|---|---|
| RTX 4090 24GB | $1,700-2,000 | 24GB | Best | Still great for 70B-class local inference | |
| RTX 5090 32GB | $2,000-2,500+ | 32GB | Best | Best single-GPU local AI option today | |
| RTX 4090 D 24GB | $4,000-5,000 | 24GB | Best | Enterprise version, water-cooled |
AMD GPUs offer great VRAM for the price. ROCm support is improving but some tools are CUDA-only. New AMD AI MAX accelerators compete with NVIDIA for enterprise workloads.
| GPU/Accelerator | Price | VRAM | Performance | Notes |
|---|---|---|---|---|
| RX 7600 16GB | $280 | 16GB | Good | Best value for VRAM, great for 7-8B models |
| RX 6750 GRE 12GB | $240 | 12GB | Good | Great for LLMs and image gen |
| RX 6700 XT 12GB | $320 | 12GB | Very Good | Strong performer, good value |
| GPU/Accelerator | Price | VRAM | Performance | Notes |
|---|---|---|---|---|
| RX 7900 GRE 16GB | $550 | 16GB | Excellent | Great for 14-32B models |
| RX 7700 XT 12GB | $450 | 12GB | Very Good | Good performance, 12GB limits model size |
| RX 7800 XT 16GB | $500 | 16GB | Excellent | Great all-rounder for AI |
| GPU/Accelerator | Price | VRAM | Performance | Notes |
|---|---|---|---|---|
| RX 7900 XTX 24GB | $950 | 24GB | Excellent | Best AMD for consumer, 24GB VRAM |
| RX 7900 XT 20GB | $800 | 20GB | Excellent | Great for large models |
| GPU/Accelerator | Price | VRAM | Performance | Notes |
|---|---|---|---|---|
| AMD AI MAX 300 | $2,000-3,000 | 128GB HBM3 | Excellent | For enterprise AI workloads |
| AMD AI MAX 395 | $5,000-7,000 | 192GB HBM3 | Excellent | Top-tier AMD AI accelerator |
| MI300X | $12,000-15,000 | 192GB HBM3 | Excellent | Enterprise/workstation GPU |
Use your current computer for basic AI tasks
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| Llama 3.2 3B | Q4_K_M | 2.3GB | Slow (CPU) | Good for simple chat |
| Phi-4 Mini 3.8B | Q4_K_M | 2.6GB | Decent (CPU) | Surprisingly capable for size |
| Qwen 3.5 0.5B/3B | Q8_0 | 0.6-2.0GB | Fast (CPU) | Great for coding and writing |
| Gemma 2 2B | Q4_K_M | 1.5GB | Decent (CPU) | Good general purpose model |
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| SDXL Turbo | FP16 | 6.9GB | Very Slow | Low resolution only |
Affordable GPU for solid AI performance
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| Llama 4 Scout 8B | Q4_K_M | 5.0GB | Good (20-30 t/s) | Great balance of performance |
| DeepSeek R1 Distill Qwen 7B | Q4_K_M | 4.3GB | Good (25-35 t/s) | Excellent reasoning |
| Qwen 3.5 7B | Q4_K_M | 4.6GB | Good (25-35 t/s) | Strong coder and writer |
| Mixtral 8x7B (MoE) | Q4_K_M | 25GB | Slow (needs 16GB+ VRAM) | Only on 16GB VRAM cards |
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| SDXL 1.0 | FP16 | 6.9GB | Decent (15-20s) | Good quality 1024x1024 |
| FLUX.1 Schnell | FP8/INT8 | 12GB | Decent (8-12s) | Great for quick iterations |
| Stable Diffusion 3.5 | FP16 | 10GB | Fast (5-8s) | Excellent quality |
Balanced setup for serious AI work
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| Llama 4 Scout 8B | Q6_K or Q8_0 | 7.2GB | Excellent (40-50 t/s) | Fast and high quality |
| Llama 4 Maverick 17B | Q4_K_M | 10GB | Good (20-30 t/s) | Fits in 16GB VRAM |
| DeepSeek V3.2 32B | Q4_K_M | 20GB | Good (20-25 t/s) | Great for complex tasks |
| Qwen 3.5 14B | Q4_K_M | 9GB | Excellent (35-45 t/s) | Very capable open model |
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| SDXL 1.0 | FP16 | 6.9GB | Fast (8-12s) | High quality |
| FLUX.1 Dev | FP8 | 24GB | Slow (20-30s) | Top tier quality |
| Stable Diffusion 3.5 | FP16 | 10GB | Fast (10-15s) | Latest model |
Professional-grade AI workstation
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| Llama 4 Maverick 17B | Q4_K_M | 10GB | Excellent (50-60 t/s) | Best performance locally |
| DeepSeek V3.2 67B | Q4_K_M | 40GB | Excellent (25-35 t/s) | Top tier reasoning |
| Mixtral 8x22B | Q4_K_M | 94GB | Good (15-20 t/s) | Needs CPU offload |
| Qwen 3.5 32B | Q4_K_M | 19GB | Excellent (40-50 t/s) | Great all-rounder |
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| FLUX.1 Dev | FP16 | 24GB | Excellent (5-8s) | Best quality |
| SD3.5 Large | FP16 | 10GB | Excellent (6-10s) | Fast and high quality |
| Stable Diffusion 3.5 | FP16 | 10GB | Excellent (6-10s) | Latest with great quality |
Multiple GPUs or accelerators for advanced AI research
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| DeepSeek V3.2 67B | Q6_K or Q8_0 | 50-70GB | Excellent (50+ t/s) | Multiple concurrent users |
| Llama 4 Maverick 17B (full) | BF16 | 35GB | Excellent (60+ t/s) | Full precision frontier model |
| Qwen 3.6 72B | Q6_K | 56GB | Excellent (40-50 t/s) | Great for production |
| Mixtral 8x22B | Q6_K | 140GB | Excellent (30-40 t/s) | Great for MoE |
| Model | Quantization | Size | Speed | Notes |
|---|---|---|---|---|
| FLUX.1 Pro | FP32 | 48GB | Excellent (3-5s) | Best quality available |
| SD3 5B Large | FP16 | 10GB | Excellent (5-8s) | Fast with great quality |
| Custom Training | FP16 | Variable | Variable | Train your own models |
Apple's M-series chips have excellent AI performance due to Unified Memory. No separate GPU required. Perfect for privacy and portable AI workstations.
Apple Silicon is unique because CPU and GPU share the same memory pool. This means you're not limited by GPU VRAM - you can use your full system RAM for AI models. However, you cannot upgrade the memory later.
$700-1,000 (used)
$1,200-1,800
$2,000-2,500
$3,500-7,000+
The minimum hardware needed to run popular AI models locally. These are minimums - more is always better.
| Model | Min RAM | Min VRAM | Recommended GPU | Notes |
|---|---|---|---|---|
| Llama 3.2 1B | 4GB | None (CPU) | Any | Runs on CPU, GPU not needed |
| Llama 3.2 3B | 6GB | None (CPU) | Any | CPU is fine, GPU helps speed |
| Llama 4 Scout 8B | 8GB | 8GB | RTX 3060 / RX 7600 | 8GB VRAM minimum for Q4 quantization |
| Llama 4 Maverick 17B | 16GB | 16GB | RTX 4060 Ti 16GB / RTX 4090 / 48GB system | 16GB VRAM for Q4, or 24GB for full context |
| DeepSeek V3.2 32B | 16GB | 16GB | RTX 4070 Ti / RX 7900 GRE | 16GB VRAM for Q4 quantization |
| DeepSeek V3.2 67B | 32GB | 24GB | RTX 4090 / RTX 5090 / 48GB+ Apple | 24GB+ VRAM or 64GB RAM with offload |
| Qwen 3.5 32B | 16GB | 16GB | RTX 4070 Ti / RX 7900 XTX | 16GB VRAM for Q4 quantization |
| Qwen 3.6 72B | 32GB | 24GB | RTX 4090 / RTX 5090 / 64GB+ Apple | 24GB+ VRAM or 96GB RAM with offload |
| FLUX.1 Dev | 24GB | 24GB | RTX 4090 / 36GB+ Apple | 24GB VRAM required |
| SDXL 1.0 | 8GB | 8GB | RTX 3060 / RX 6750 | 8GB VRAM minimum |
| Stable Diffusion 3.5 | 12GB | 12GB | RTX 4070 / RX 7800 XT | 12GB VRAM recommended |
Take our quiz to get personalized recommendations based on your budget, use case, and hardware situation.