You are designing a document Q&A system. Your knowledge base contains 500 documents, each approximately 2,000 words long. You plan to retrieve the top 3 most relevant documents and pass them to GPT-4o (128k token context window) along with a system prompt (500 tokens) and the user question (100 tokens).
Answer the following:
- Approximately how many tokens is each 2,000-word document?
- How many tokens will the full prompt consume (system prompt + 3 retrieved docs + user question)?
- What percentage of GPT-4o context window does this use?
- A product manager asks if you can increase retrieval to top 10 documents instead. What is your response and why?