Dataset: LLM Token Usage in Everyday Office Tasks
This dataset provides realistic token usage estimates for 64 common AI tasks across different categories common in enterprise office environments: Communication, Coding, Analysis, Planning, Document Processing, and Multimodal tasks (Vision, Audio, Mixed).
About This Dataset
The dataset compares token consumption between standard models (like GPT-4o) and reasoning models (like OpenAI o1). Reasoning models use additional hidden tokens to “think” through problems step-by-step before generating responses, leading to more accurate and reliable results.
This dataset has been compiled from real-world use cases in our enterprise AI implementation projects, then validated and extended with data from authoritative sources listed below. The token estimates reflect actual production workloads and can be used for modeling theoretical scenarios.
Key Insights:
- Simple tasks (e.g., “Hello World”) use ~300-500 additional tokens for reasoning
- Complex tasks (math, debugging, logic) benefit most: reasoning models use 10-20x more tokens but deliver significantly better accuracy
- Multimodal tasks (images, audio) have high base costs before any reasoning
- Audio is extremely token-dense: ~1,000-1,200 tokens per minute
Token Usage Dataset
The complete dataset is available for download in CSV file format.
| Category | Task | Description | Input Tokens | Output Tokens (Normal) | Output Tokens (Reasoning) | Images | Audio (min) |
|---|---|---|---|---|---|---|---|
| Communication | Drafting a Short Email | Write a sick leave email to boss | 50 | 100 | 450 | 0 | 0 |
| Communication | Polite Rejection | Decline a wedding invitation politely | 60 | 120 | 500 | 0 | 0 |
| Communication | Rewrite for Tone | Make paragraph sound more professional | 150 | 150 | 800 | 0 | 0 |
| Communication | Cover Letter Generation | Write cover letter for Sales role | 200 | 400 | 1800 | 0 | 0 |
| Communication | Replying to a Text | Give 3 witty replies to text message | 40 | 60 | 500 | 0 | 0 |
| Communication | Grammar Check | Fix grammar in 200-word memo | 200 | 150 | 900 | 0 | 0 |
| Coding | Hello World Script | Write a Python Hello World script | 30 | 20 | 350 | 0 | 0 |
| Coding | Excel Formula Help | Formula to VLOOKUP column A in Sheet 2 | 50 | 50 | 1200 | 0 | 0 |
| Coding | Regex Generation | Regex to validate email address | 80 | 70 | 1500 | 0 | 0 |
| Coding | SQL Query Generation | Select top 5 users by spend from tables | 100 | 100 | 1800 | 0 | 0 |
| Coding | Debugging Code | Find error in 50-line Python function | 600 | 200 | 4500 | 0 | 0 |
| Coding | Code Refactoring | Rewrite code to be more efficient | 700 | 300 | 5000 | 0 | 0 |
| Coding | Explain Error Log | What does this stack trace mean? | 350 | 150 | 2500 | 0 | 0 |
| Analysis | Summarize Article | Summarize 1000-word article | 1400 | 200 | 3500 | 0 | 0 |
| Analysis | Extract Data | List all dates and names from text | 1000 | 200 | 2800 | 0 | 0 |
| Analysis | Math Word Problem | If train leaves Chicago at 60mph… | 100 | 50 | 2500 | 0 | 0 |
| Analysis | Logic Riddle Solving | Solve two doors two guards riddle | 120 | 80 | 2000 | 0 | 0 |
| Analysis | Financial Analysis | Analyze CSV rows for trends | 600 | 200 | 3500 | 0 | 0 |
| Analysis | Sentiment Analysis | Is customer review positive? | 80 | 70 | 600 | 0 | 0 |
| Planning | Meal Plan | Create healthy 3-day meal plan | 150 | 350 | 1500 | 0 | 0 |
| Planning | Trip Itinerary | Plan 3-day weekend in Tokyo | 200 | 600 | 2500 | 0 | 0 |
| Planning | Brainstorm Titles | 10 catchy titles for AI blog | 100 | 100 | 800 | 0 | 0 |
| Planning | Write a Haiku | Write haiku about the ocean | 30 | 30 | 400 | 0 | 0 |
| Planning | Gift Ideas | Gift ideas for dad who likes golf | 100 | 200 | 1000 | 0 | 0 |
| Planning | Roleplay Scenario | Pretend you are a career coach | 150 | 450 | 1500 | 0 | 0 |
| Document Processing | Extract Invoice Data | Extract vendor total date from 2-page invoice | 900 | 300 | 2800 | 0 | 0 |
| Document Processing | Summarize Contract | Summarize key terms from 10-page legal contract | 4000 | 500 | 9000 | 0 | 0 |
| Document Processing | Resume Screening | Extract relevant skills from 2-page resume | 1100 | 400 | 3200 | 0 | 0 |
| Document Processing | Translate Document | Translate 5-page Spanish document to English | 2300 | 700 | 6500 | 0 | 0 |
| Document Processing | Format Markdown | Convert 500-word Word doc to structured Markdown | 1200 | 600 | 4000 | 0 | 0 |
| Document Processing | Parse JSON Schema | Validate and fix malformed JSON document | 500 | 300 | 2200 | 0 | 0 |
| Document Processing | CSV to SQL | Convert 100-row CSV to INSERT statements | 1200 | 800 | 4500 | 0 | 0 |
| Document Processing | Extract Table Data | Extract and restructure table from PDF (500 rows) | 2800 | 700 | 7000 | 0 | 0 |
| Document Processing | Compare Versions | Identify changes between 2 versions of 5-page doc | 1700 | 500 | 5500 | 0 | 0 |
| Document Processing | Review Code PR | Review 200-line code pull request for bugs | 1300 | 500 | 4500 | 0 | 0 |
| Document Processing | Generate API Docs | Create documentation from 50-function source file | 1800 | 700 | 5500 | 0 | 0 |
| Multimodal (Vision) | Describe Image | Describe content of single photograph | 800 | 150 | 2200 | 1 | 0 |
| Multimodal (Vision) | OCR Document | Extract text from image of handwritten note | 850 | 200 | 2400 | 1 | 0 |
| Multimodal (Vision) | Analyze Chart | Interpret data trends from bar chart image | 950 | 350 | 3000 | 1 | 0 |
| Multimodal (Vision) | Screenshot Analysis | Debug UI from application screenshot | 900 | 350 | 3800 | 1 | 0 |
| Multimodal (Vision) | Identify Objects | List all objects in image of warehouse | 800 | 300 | 2800 | 1 | 0 |
| Multimodal (Vision) | Compare Images | Find differences between 2 product photos | 1600 | 600 | 4500 | 2 | 0 |
| Multimodal (Vision) | Read Whiteboard | Transcribe equation written on whiteboard photo | 800 | 250 | 2600 | 1 | 0 |
| Multimodal (Audio) | Transcribe Audio | Transcribe 5-minute audio interview | 5000 | 800 | 11000 | 0 | 5 |
| Multimodal (Audio) | Extract Meeting Notes | Generate summary and action items from 30-min meeting | 30000 | 1000 | 58000 | 0 | 30 |
| Multimodal (Audio) | Identify Speaker | Identify speaker and emotion in 2-min audio clip | 2000 | 300 | 4800 | 0 | 2 |
| Multimodal (Audio) | Translate Audio | Transcribe and translate 10-min German audio to English | 10000 | 1000 | 21000 | 0 | 10 |
| Multimodal (Mixed) | Document + Image | Match text document to related photos | 1500 | 1000 | 5500 | 2 | 0 |
| Multimodal (Mixed) | Video Description | Describe content from 2-min video (frames + audio) | 2300 | 2200 | 9500 | 3 | 2 |
| Multimodal (Mixed) | Multi-Image Comparison | Compare changes across 5 product design mockups | 4200 | 600 | 9500 | 5 | 0 |
| Document Processing | Summarize 50-page Technical Report | Summarize key findings from 50-page technical PDF without images | 20000 | 1200 | 26000 | 0 | 0 |
| Document Processing | Extract KPIs from 50-page Annual Report | Extract revenue profit and growth KPIs from 50-page annual report | 22000 | 1500 | 28000 | 0 | 0 |
| Document Processing | Summarize 100-page Regulatory Filing | Create executive summary of 100-page regulatory filing (10-K/10-Q) | 40000 | 2000 | 52000 | 0 | 0 |
| Document Processing | Compare Two 50-page Contracts | Identify differences and risks between two 50-page legal contracts | 38000 | 2500 | 60000 | 0 | 0 |
| Document Processing | Audit 5k-line Codebase File | Review a 5000-line single code file for bugs and architecture issues | 35000 | 3000 | 70000 | 0 | 0 |
| Multimodal (Vision) | Process 20-page Scanned PDF | OCR and structure 20-page scanned PDF (image-only) | 16000 | 2000 | 30000 | 20 | 0 |
| Multimodal (Mixed) | Analyze 50-page Report with Charts | Summarize 50-page PDF containing text plus 10 chart images | 23000 | 2000 | 32000 | 10 | 0 |
| Multimodal (Audio) | Transcribe 60-min Podcast | Full transcription of a 60-minute podcast episode | 60000 | 3000 | 75000 | 0 | 60 |
| Multimodal (Audio) | Summarize 90-min University Lecture | Generate structured notes and sections from a 90-minute lecture recording | 90000 | 4000 | 90000 | 0 | 90 |
| Multimodal (Audio) | Analyze 2-hour Support Call Log | Extract issues sentiments and escalation points from 2-hour support call | 120000 | 5000 | 110000 | 0 | 120 |
| Multimodal (Mixed) | Describe 10-min Product Demo Video | Summarize features and UX from 10-minute demo video (screen + narration) | 18000 | 3000 | 22000 | 10 | 10 |
| Multimodal (Mixed) | Summarize 45-min Webinar with Slides | Generate structured summary from 45-min webinar audio plus 30 slide images | 75000 | 4000 | 80000 | 30 | 45 |
| Multimodal (Mixed) | Review 60-min Security Camera Footage | Identify key events in 60-min silent security recording | 48000 | 2500 | 52000 | 40 | 0 |
Data Sources & Methodology
This dataset was compiled from the following authoritative sources:
General Tokenizer
- Tiktokenizer (OpenAI): Standard text tokenization rule: 1 word ≈ 1.3 tokens (1000 tokens ≈ 750 words).
Reasoning Models
- OpenAI o1 System Card: Reasoning tokens are hidden output tokens used by the model to “think” before answering. Can range from hundreds to tens of thousands depending on complexity.
- PromptLayer Analysis (o1 vs GPT-4o): Reasoning models often use 3-10x more tokens for complex tasks like coding or math due to internal chain-of-thought generation.
- Reddit Community Analysis (Hidden Tokens): User benchmarks showing simple tasks might use ~300 hidden tokens, while complex coding tasks can exceed 5,000+ hidden tokens.
- Arxiv: Comparative Study on Reasoning Patterns: Comparative benchmarks showing reasoning models consuming 10x-20x more tokens on complex logical tasks.
- Clarifai Reasoning Model Comparison: Benchmarks for hard math/logic problems showing reasoning token usage often exceeding 30,000+ for difficult queries.
- Databricks: Long Context RAG & o1: Highlights that reasoning models can fail or hit output limits when reasoning over very large contexts (e.g., 100+ pages).
Vision Tasks
- OpenAI Vision Documentation: Images are processed in 512x512 tiles. High-detail mode costs ~85 tokens base + 170 tokens per tile. A standard 1080p image is often ~765-1105 tokens.
- Cursor IDE Blog (GPT-4o Image Costs): Practical breakdown of image costs: Low detail is fixed at 85 tokens. High detail scales with resolution.
Audio Tasks
- OpenAI Pricing (Audio): Audio inputs are billed separately from text. GPT-4o Audio input is ~$0.06/min (Realtime).
- Microsoft Azure AI Blog (Audio Tokens): Audio tokenization is dense. Approximately 1 minute of audio ≈ 1,000 - 1,200 audio tokens for billing purposes.
- OpenAI GPT-4o Audio Guide: Technical details on how audio is tokenized and processed, confirming the distinction between input audio tokens and output text tokens.
Document Processing
- Arxiv: Chain of Draft: Discusses token efficiency in reasoning models for drafting and document tasks, highlighting the overhead of “thinking” steps.
General Tasks
- Awesome LLM Tasks (GitHub): Curated list of practical LLM tasks used to derive the common daily task list categories.
Use Cases
- Cost Estimation: Calculate expected API costs for your AI applications
- Model Selection: Choose between standard and reasoning models based on task complexity
- Budgeting: Plan AI infrastructure costs for production workloads
- Research: Benchmark and compare token efficiency across different task types
Related Resources
Want to see how these tasks translate to real-world workloads? Check out our detailed analysis:
AI Costs by Office Role - We use this dataset to calculate typical daily token consumption for different business roles (Executive Assistant, Recruiter, Financial Analyst, Corporate Counsel, Software Engineer) and reveal what drives AI costs in your organization.
Citation
If you use this dataset in your research or applications, please cite:
onprem.ai Research (2025). Real-World LLM Token Usage Dataset.
Retrieved from https://onprem.ai/en/knowhow/llm-token-usage-dataset/
Last Updated: December 2025 Version: 1.0 License: Creative Commons BY 4.0