How AI Works

Token and context window

By Mark Ziler · Last updated 2026-04-05

A token is roughly a word (technically about 3/4 of a word). A context window is how much text an AI can consider at once — how much it can "hold in its head" during a conversation. Early models had tiny windows (a few pages). Current models can handle hundreds of pages in a single conversation. This matters because the more data you can put in front of the AI — your data dictionary, your business rules, your operational history — the smarter its answers become. When someone says a model has a "1 million token context window," they mean it can reason over roughly 750,000 words at once. That is enough to hold your entire operational dataset description, every business rule, and still have room for conversation.

Go deeper

You want an AI agent to answer questions about your operations, so you need it to understand your business. Your company has a 40-page employee handbook, a 200-row metric dictionary, 15 standard operating procedures, and a year of weekly leadership meeting notes. That is roughly 300,000 tokens. If your AI's context window only holds 8,000 tokens, it can barely read one SOP at a time — it is working with blinders on. With a million-token window, it holds all of it simultaneously and can cross-reference your hiring policy with last month's meeting discussion about overtime costs.

The trap is assuming a bigger context window automatically means better answers. Cramming everything into the window is like handing a new employee every document in the company on their first day and saying 'read all of this before you answer my question.' Retrieval strategies — pulling in only the relevant context for each question — often outperform brute-force stuffing, even with large windows. Window size is a ceiling, not a strategy.

Questions to consider: How much context does our AI actually need for the questions we ask most often? Is our vendor using the context window efficiently, or are they burning tokens on irrelevant boilerplate every query? What is the cost difference between a 100K-token query and a 10K-token query — and does the answer quality justify it?