Dataset: LLM Token Usage in Everyday Office Tasks

This dataset provides realistic token usage estimates for 64 common AI tasks across different categories common in enterprise office environments: Communication, Coding, Analysis, Planning, Document Processing, and Multimodal tasks (Vision, Audio, Mixed).

About This Dataset

The dataset compares token consumption between standard models (like GPT-4o) and reasoning models (like OpenAI o1). Reasoning models use additional hidden tokens to “think” through problems step-by-step before generating responses, leading to more accurate and reliable results.

This dataset has been compiled from real-world use cases in our enterprise AI implementation projects, then validated and extended with data from authoritative sources listed below. The token estimates reflect actual production workloads and can be used for modeling theoretical scenarios.

Key Insights:

  • Simple tasks (e.g., “Hello World”) use ~300-500 additional tokens for reasoning
  • Complex tasks (math, debugging, logic) benefit most: reasoning models use 10-20x more tokens but deliver significantly better accuracy
  • Multimodal tasks (images, audio) have high base costs before any reasoning
  • Audio is extremely token-dense: ~1,000-1,200 tokens per minute

Token Usage Dataset

The complete dataset is available for download in CSV file format.

CategoryTaskDescriptionInput TokensOutput Tokens (Normal)Output Tokens (Reasoning)ImagesAudio (min)Insight / Complexity Factor
CommunicationDrafting a Short EmailWrite a sick leave email to boss5010045000Low complexity. Reasoning model overthinks simple politeness.
CommunicationPolite RejectionDecline a wedding invitation politely6012050000Social nuance requires minimal reasoning overhead (~300 hidden tokens).
CommunicationRewrite for ToneMake paragraph sound more professional15015080000Reasoning model checks multiple variations internally.
CommunicationCover Letter GenerationWrite cover letter for Sales role200400180000Reasoning model plans structure and maps skills to job requirements.
CommunicationReplying to a TextGive 3 witty replies to text message406050000Creativity task; reasoning model brainstorms options before selecting.
CommunicationGrammar CheckFix grammar in 200-word memo20015090000Standard model is sufficient; reasoning adds unnecessary verification.
CodingHello World ScriptWrite a Python Hello World script302035000Even simple tasks use ~300 reasoning tokens for quality verification.
CodingExcel Formula HelpFormula to VLOOKUP column A in Sheet 25050120000Reasoning model verifies syntax and edge cases (e.g., exact match).
CodingRegex GenerationRegex to validate email address8070150000High reasoning gain; regex is error-prone, model self-corrects heavily.
CodingSQL Query GenerationSelect top 5 users by spend from tables100100180000Reasoning model plans joins and filtering logic carefully.
CodingDebugging CodeFind error in 50-line Python function600200450000Major reasoning win. Model ‘traces’ execution mentally to find bugs.
CodingCode RefactoringRewrite code to be more efficient700300500000Complex planning required to preserve logic while changing structure.
CodingExplain Error LogWhat does this stack trace mean?350150250000Reasoning model analyzes causal chain of the error.
AnalysisSummarize ArticleSummarize 1000-word article1400200350000Reading cost is dominant. Reasoning adds synthesis overhead.
AnalysisExtract DataList all dates and names from text1000200280000Reasoning model double-checks missed items (higher recall).
AnalysisMath Word ProblemIf train leaves Chicago at 60mph…10050250000Massive multiplier (15x+). Standard models guess; reasoning models calculate.
AnalysisLogic Riddle SolvingSolve two doors two guards riddle12080200000Pure logic task. Reasoning model simulates scenarios.
AnalysisFinancial AnalysisAnalyze CSV rows for trends600200350000Reasoning model performs multi-step numerical comparison.
AnalysisSentiment AnalysisIs customer review positive?807060000Low complexity. Standard model is usually sufficient.
PlanningMeal PlanCreate healthy 3-day meal plan150350150000Reasoning model balances constraints (nutrition, variety) better.
PlanningTrip ItineraryPlan 3-day weekend in Tokyo200600250000Reasoning model checks logistics and travel times between spots.
PlanningBrainstorm Titles10 catchy titles for AI blog10010080000Creative task. Reasoning overhead is mostly ‘filtering’ bad ideas.
PlanningWrite a HaikuWrite haiku about the ocean303040000Syllable counting requires ‘thinking’ steps for accuracy.
PlanningGift IdeasGift ideas for dad who likes golf100200100000Reasoning model models the ‘persona’ of the recipient.
PlanningRoleplay ScenarioPretend you are a career coach150450150000Maintains character consistency via hidden state.
Document ProcessingExtract Invoice DataExtract vendor total date from 2-page invoice900300280000High layout complexity. Reasoning model traces field locations.
Document ProcessingSummarize ContractSummarize key terms from 10-page legal contract4000500900000Heavy reading load. Reasoning critical for interpreting legal clauses.
Document ProcessingResume ScreeningExtract relevant skills from 2-page resume1100400320000Reasoning model infers implicit skills from experience descriptions.
Document ProcessingTranslate DocumentTranslate 5-page Spanish document to English2300700650000Reasoning model preserves nuance and idioms better than direct translation.
Document ProcessingFormat MarkdownConvert 500-word Word doc to structured Markdown1200600400000Formatting is tedious; reasoning model checks consistency.
Document ProcessingParse JSON SchemaValidate and fix malformed JSON document500300220000Reasoning model traces brace matching and structure errors.
Document ProcessingCSV to SQLConvert 100-row CSV to INSERT statements1200800450000Repetitive task. Reasoning model ensures data type correctness.
Document ProcessingExtract Table DataExtract and restructure table from PDF (500 rows)2800700700000High token usage due to dense data. Reasoning helps with row alignment.
Document ProcessingCompare VersionsIdentify changes between 2 versions of 5-page doc1700500550000Reasoning model performs semantic diffing, not just text diffing.
Document ProcessingReview Code PRReview 200-line code pull request for bugs1300500450000Reasoning model simulates runtime to find subtle logic bugs.
Document ProcessingGenerate API DocsCreate documentation from 50-function source file1800700550000Reasoning model infers function purpose from code logic.
Multimodal (Vision)Describe ImageDescribe content of single photograph800150220010Tokens = Image patches (765+) + Output. Reasoning adds visual analysis.
Multimodal (Vision)OCR DocumentExtract text from image of handwritten note850200240010Handwriting recognition requires ‘guessing’ and verifying context.
Multimodal (Vision)Analyze ChartInterpret data trends from bar chart image950350300010Reasoning model maps visual bars to approximate numerical values.
Multimodal (Vision)Screenshot AnalysisDebug UI from application screenshot900350380010High reasoning: model must correlate UI elements with code logic.
Multimodal (Vision)Identify ObjectsList all objects in image of warehouse800300280010Scanning task. Reasoning model performs systematic grid search mentally.
Multimodal (Vision)Compare ImagesFind differences between 2 product photos1600600450020Double image cost (~1700+ tokens). Reasoning compares feature by feature.
Multimodal (Vision)Read WhiteboardTranscribe equation written on whiteboard photo800250260010Math + Vision. Reasoning model validates the math syntax extracted.
Multimodal (Audio)Transcribe AudioTranscribe 5-minute audio interview50008001100005Includes ~5k billed audio tokens. High base cost.
Multimodal (Audio)Extract Meeting NotesGenerate summary and action items from 30-min meeting30000100058000030~30k audio tokens! Extremely expensive task due to audio density.
Multimodal (Audio)Identify SpeakerIdentify speaker and emotion in 2-min audio clip2000300480002Audio analysis adds cost. Reasoning infers emotion from tone.
Multimodal (Audio)Translate AudioTranscribe and translate 10-min German audio to English10000100021000010Double load: Audio tokens (10k) + Translation reasoning.
Multimodal (Mixed)Document + ImageMatch text document to related photos15001000550020Cross-modal reasoning (Text vs Image) is token-heavy.
Multimodal (Mixed)Video DescriptionDescribe content from 2-min video (frames + audio)23002200950032Video = Audio tokens + Sampled Image Frames. Very high data density.
Multimodal (Mixed)Multi-Image ComparisonCompare changes across 5 product design mockups42006009500505 images @ ~900 tokens each. High base cost before any reasoning.
Document ProcessingSummarize 50-page Technical ReportSummarize key findings from 50-page technical PDF without images2000012002600000Very heavy reading. Reasoning model builds global mental map of the report.
Document ProcessingExtract KPIs from 50-page Annual ReportExtract revenue profit and growth KPIs from 50-page annual report2200015002800000Reasoning model scans tables and narrative to consolidate financial metrics.
Document ProcessingSummarize 100-page Regulatory FilingCreate executive summary of 100-page regulatory filing (10-K/10-Q)4000020005200000Extreme reading load. Reasoning ensures compliance-critical points are retained.
Document ProcessingCompare Two 50-page ContractsIdentify differences and risks between two 50-page legal contracts3800025006000000Semantic diff across 100 pages. Reasoning model aligns clauses and flags conflicts.
Document ProcessingAudit 5k-line Codebase FileReview a 5000-line single code file for bugs and architecture issues3500030007000000Reasoning model must track long-range dependencies and patterns across thousands of lines.
Multimodal (Vision)Process 20-page Scanned PDFOCR and structure 20-page scanned PDF (image-only)1600020003000020020 page images (~800 tokens each). Reasoning aligns detected text into pages and sections.
Multimodal (Mixed)Analyze 50-page Report with ChartsSummarize 50-page PDF containing text plus 10 chart images23000200032000100Combination of long text and visual charts; reasoning fuses numeric insight with narrative.
Multimodal (Audio)Transcribe 60-min PodcastFull transcription of a 60-minute podcast episode60000300075000060~60k audio tokens. Reasoning may cluster topics or speakers if requested.
Multimodal (Audio)Summarize 90-min University LectureGenerate structured notes and sections from a 90-minute lecture recording90000400090000090Massive audio load; reasoning organizes into topics subtopics and key definitions.
Multimodal (Audio)Analyze 2-hour Support Call LogExtract issues sentiments and escalation points from 2-hour support call12000050001100000120120k audio tokens plus reasoning to classify intents and sentiment over long horizon.
Multimodal (Mixed)Describe 10-min Product Demo VideoSummarize features and UX from 10-minute demo video (screen + narration)180003000220001010Combines ~10k audio tokens with ~8-10 key frames; reasoning links UI steps with spoken explanations.
Multimodal (Mixed)Summarize 45-min Webinar with SlidesGenerate structured summary from 45-min webinar audio plus 30 slide images750004000800003045Slide deck (30 images) plus long audio track; reasoning aligns slide content with spoken narrative.
Multimodal (Mixed)Review 60-min Security Camera FootageIdentify key events in 60-min silent security recording48000250052000400Dozens of sampled frames; reasoning tracks motion and anomalies across time.

Data Sources & Methodology

This dataset was compiled from the following authoritative sources:

Source CategorySource NameURLKey Insight / Data Point
General TokenizerTiktokenizer (OpenAI)https://platform.openai.com/tokenizerStandard text tokenization rule: 1 word ≈ 1.3 tokens (1000 tokens ≈ 750 words).
Reasoning ModelsOpenAI o1 System Cardhttps://openai.com/index/learning-to-reason-with-llms/Reasoning tokens are hidden output tokens used by the model to “think” before answering. Can range from hundreds to tens of thousands depending on complexity.
Reasoning ModelsPromptLayer Analysis (o1 vs GPT-4o)https://blog.promptlayer.com/an-analysis-of-openai-models-o1-vs-gpt-4o/Reasoning models often use 3-10x more tokens for complex tasks like coding or math due to internal chain-of-thought generation.
Reasoning ModelsReddit Community Analysis (Hidden Tokens)https://www.reddit.com/r/OpenAI/comments/1hrhdbp/o1_models_hidden_reasoning_tokens/User benchmarks showing simple tasks might use ~300 hidden tokens, while complex coding tasks can exceed 5,000+ hidden tokens.
Reasoning ModelsArxiv: Comparative Study on Reasoning Patternshttps://arxiv.org/html/2410.13639v1Comparative benchmarks showing reasoning models consuming 10x-20x more tokens on complex logical tasks.
Reasoning ModelsClarifai Reasoning Model Comparisonhttps://www.clarifai.com/blog/best-reasoning-model-apis/Benchmarks for hard math/logic problems showing reasoning token usage often exceeding 30,000+ for difficult queries.
Reasoning ModelsDatabricks: Long Context RAG & o1https://www.databricks.com/blog/long-context-rag-capabilities-openai-o1-and-google-geminiHighlights that reasoning models can fail or hit output limits when reasoning over very large contexts (e.g., 100+ pages).
Vision TasksOpenAI Vision Documentationhttps://platform.openai.com/docs/guides/images-visionImages are processed in 512x512 tiles. High-detail mode costs ~85 tokens base + 170 tokens per tile. A standard 1080p image is often ~765-1105 tokens.
Vision TasksCursor IDE Blog (GPT-4o Image Costs)https://www.cursor-ide.com/blog/gpt4o-image-api-pricing-guide-2025Practical breakdown of image costs: Low detail is fixed at 85 tokens. High detail scales with resolution.
Audio TasksOpenAI Pricing (Audio)https://openai.com/api/pricing/Audio inputs are billed separately from text. GPT-4o Audio input is ~$0.06/min (Realtime).
Audio TasksMicrosoft Azure AI Blog (Audio Tokens)https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/real-time-speech-intelligence-for-global-scale-gpt-4o-transcribe-/4403091Audio tokenization is dense. Approximately 1 minute of audio ≈ 1,000 - 1,200 audio tokens for billing purposes.
Audio TasksOpenAI GPT-4o Audio Guidehttps://platform.openai.com/docs/guides/audioTechnical details on how audio is tokenized and processed, confirming the distinction between input audio tokens and output text tokens.
Document ProcessingArxiv: Chain of Drafthttps://arxiv.org/html/2502.18600v2Discusses token efficiency in reasoning models for drafting and document tasks, highlighting the overhead of “thinking” steps.
General TasksAwesome LLM Tasks (GitHub)https://github.com/ozbekburak/awesome-llm-tasksCurated list of practical LLM tasks used to derive the common daily task list categories.

Use Cases

  • Cost Estimation: Calculate expected API costs for your AI applications
  • Model Selection: Choose between standard and reasoning models based on task complexity
  • Budgeting: Plan AI infrastructure costs for production workloads
  • Research: Benchmark and compare token efficiency across different task types

Want to see how these tasks translate to real-world workloads? Check out our detailed analysis:

AI Costs by Office Role - We use this dataset to calculate typical daily token consumption for different business roles (Executive Assistant, Recruiter, Financial Analyst, Corporate Counsel, Software Engineer) and reveal what drives AI costs in your organization.


Citation

If you use this dataset in your research or applications, please cite:

onprem.ai Research (2025). Real-World LLM Token Usage Dataset.
Retrieved from https://onprem.ai/knowhow/llm-token-usage-dataset

Last Updated: December 2025 Version: 1.0 License: Creative Commons BY 4.0