Why picking the right model matters
No single LLM is best for every task — each has different trade-offs. This guide helps you decide quickly without testing everything yourself.
Selection framework
1. What’s the task?
| Task | Recommended |
|---|---|
| Coding, debug, refactor | Claude Sonnet / GPT-4o |
| Long document Q&A | Claude (large context window) |
| High-volume batch work | Haiku 4.5 or Gemini Flash |
| Privacy-sensitive / offline | Ollama + Llama 3 / Qwen |
| Multimodal (image + text) | GPT-4o, Gemini 1.5 Pro, Claude |
2. Budget
- Free / near-free: Ollama local, Gemini free tier
- $0.01–0.10/1K tokens: Haiku 4.5, GPT-4o mini, Gemini Flash
- $0.10–1.00/1K tokens: Claude Sonnet, GPT-4o, Gemini Pro
- $1+/1K tokens: Claude Opus, GPT-4.5 (high-stakes tasks only)
3. Infrastructure
- Cloud API: easiest, no setup
- Self-hosted: Ollama + GPU or fast CPU (Llama 3 8B needs ~8GB RAM)
- Hybrid: local for drafts, cloud for final review
Decision tree
Have a good GPU + privacy requirements?
→ Ollama (Llama 3 / Qwen2.5)
Need very long context (>100K tokens)?
→ Claude 3.x or Gemini 1.5 Pro
High-volume routine tasks?
→ Haiku 4.5 (cheapest, solid performance)
Complex reasoning or creative work?
→ Claude Sonnet 4.6
Real-world observation
I use Haiku 4.5 for article generation pipelines and score ~96% vs Sonnet at 3x the price. With good prompts and task decomposition, smaller models punch well above their weight.