Context Engineering
Wiki
37 articles from arXiv, OpenAI, Anthropic, Google AI, and built-in terms. Auto-fetched and searchable.
How to work with large language models
[Large language models][Large language models Blog Post] are functions that map text to text. Given an input string of text, a large language model predicts the text that should come next.
Techniques to improve reliability
When GPT-3 fails on a task, what should you do?
Related resources from around the web
People are writing great tools and papers for improving outputs from GPT. Here are some cool ones we've seen:
How_to_count_tokens_with_tiktoken
{ "cells": { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": " How to count tokens with tiktoken\n", "\n", " tiktoken ...
How_to_stream_completions
{ "cells": { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": " How to stream completions\n", "\n", "By default, when you request a completion...
Prompt Caching
Claude API Documentation
Prompt Engineering Overview
Claude API Documentation
Chain of Thought Prompting
Comprehensive guide to prompt engineering techniques for Claude's latest models, covering clarity, examples, XML structuring, thinking, and agentic systems.
Context Windows
Claude API Documentation
Long Context Window Tips
Comprehensive guide to prompt engineering techniques for Claude's latest models, covering clarity, examples, XML structuring, thinking, and agentic systems.
Token Counting
Claude API Documentation
Use XML Tags in Prompts
Comprehensive guide to prompt engineering techniques for Claude's latest models, covering clarity, examples, XML structuring, thinking, and agentic systems.
Extended Thinking
Claude API Documentation
Context Caching
Saiba como usar o armazenamento em cache de contexto na API Gemini
Long Context
Learn about how to get started building with long context (1 million context window) on Gemini.
Tokens
/ Styles inlined from /site assets/css/style.css / body theme="googledevai theme" { devsite background 0: var devsite background 1 ; devsite button border: 1px solid 747775; devsite...
Prompting Strategies
/ Styles inlined from /site assets/css/style.css / body theme="googledevai theme" { devsite background 0: var devsite background 1 ; devsite button border: 1px solid 747775; devsite...
System Instructions
Gemini API ile sohbet ve metin oluşturma uygulamaları geliştirmeye başlayın
Code Execution
Learn how to use the Gemini API code execution feature.
Progressive Disclosure
Instead of loading an entire codebase—which would immediately overwhelm the attention budget—modern agents use JIT context. The assistant dynamically loads only the necessary data at runtime.
Lightweight Identifiers
The assistant maintains references (file paths, stored queries) and dynamically loads only the necessary data at runtime using tools like grep, head, or tail.
Compaction
When a session nears its token limit, the assistant summarizes critical details—such as architectural decisions and unresolved bugs—while discarding redundant tool outputs.
Tool Result Clearing
A light touch form of compaction where the raw results of previous tool calls (like long terminal outputs) are cleared to save space.
Structured Note-taking
The agent may maintain an external NOTES.md or a to-do list to track dependencies and progress across thousands of steps, which it can read back into its context after a reset.
Distractors
Files or code snippets that are topically related to the query but do not contain the answer can cause the model to lose focus or hallucinate.
Context Rot
As more tokens are added, the model's ability to accurately retrieve needles of information from the haystack of the codebase decreases.
XML Tagging
Use tags like <background_information>, <tool_guidance>, <constraints> to clearly separate different types of instructions in system prompts.
High-Signal Tokens
The objective is to provide the smallest possible set of high-signal tokens that maximize the likelihood of the correct code generation.
Structural Patterns
Research suggests that models often perform better on shuffled or unstructured context than on logically structured haystacks, impacting how they process long files.
Agent Skills
Reusable packages of domain expertise defined in SKILL.md files that provide specialized AI agent capabilities. Introduced as GA in VS Code 1.109, skills can be invoked as slash commands or loaded...
Agent Hooks
Deterministic shell commands that execute at key lifecycle points during agent sessions. Unlike instructions, hooks run code with guaranteed outcomes for security policies, quality checks, or audit...
Agent Orchestration
A multi-agent pattern where specialized subagents collaborate on complex tasks, each operating in its own dedicated context window. Provides context efficiency, specialization with different models,...
Message Steering
An agent interaction pattern where follow-up messages redirect a running agent request. The agent yields after the active tool execution and processes the new message. Alternatives include request...
Terminal Sandboxing
A security mechanism restricting file system and network access for agent-executed terminal commands. Sandboxed commands have read/write access only to the workspace directory, and network access can...
Thinking Tokens
Tokens generated during a model's internal reasoning process before producing a visible response. Thinking tokens consume context budget but improve quality on complex tasks. Anthropic models support...
MCP Server (Model Context Protocol)
A local stdio process that exposes tools to Claude Code and other MCP-capable agents. Tokalator's MCP server (tokalator-mcp) provides four tools: count_tokens, estimate_budget, preview_turn, and...
CLI Token Counter
A standalone command-line tool for counting tokens and checking context budgets outside of VS Code. Tokalator ships a CLI binary (tokalator count, budget, preview, models) for SSH sessions, CI...