My Cheatsheets

Working with AI Chats

Updated Guide Sept 2025

Core Principles


Tokens & Context


Model Behavior

What’s Stable

What’s Changed


Confidence vs Accuracy


Choosing the Right Model ( Sept 2025 Landscape)

Model Max Tokens (Context Window) Output Limit Memory Tools Notes / Caveats
GPT‑3.5 ~16k ~4k–8k typical No Limited Fast, shallow; context limited vs newer models
GPT‑4‑turbo ~128k ~8k–16k typical Yes Yes Stable for deep context; widely used baseline
GPT‑4.1 family Up to ~1M (enterprise/API) ~32k output Yes Yes Highest OpenAI context, but availability varies by plan
GPT‑5 ~256k (standard) Unclear; likely ~32k Yes Yes Reported new limit; higher tiers may extend
GPT‑4o / 4o‑mini ~128k Lower output in UI Yes Yes Fast, multimodal; many free plans still capped lower
Claude 4 (Opus/Sonnet) ~200k–1M depending on tier ~64k output Yes Varies Some docs cite 200k, Anthropic announced 1M for Sonnet 4; plan dependent
Gemini 2.5 Flash 1,048,576 input ~65k output Yes Yes Google‑documented; API only; Flash = speed optimized
Gemini 2.5 Pro ~1M (reported) Not clearly specified Yes Yes Output caps unclear; varies by tier and preview vs GA
Qwen 2.5 (72B Instruct) ~32k Not published Varies Varies Numbers from model cards; enterprise variants may differ
Perplexity Sonar ~128k Not published Varies Yes Context limit reported; output cap undocumented

Important caveats:


File & Data Handling


Math & Logic (Still Weak Spots)


Practical Tips


Memory & Personalization


Considerations


Appendix: Tokenization Nuances


Bottom Line