Skip to content

Conversation

emre-openai
Copy link
Contributor

Summary

New cookbook: Context Engineering - Short-Term Memory Management with Sessions from OpenAI Agents SDK
This guide focuses on two proven context management techniques—trimming and compression—to keep agents fast, reliable, and cost-efficient.

Real-World Scenario

We’ll ground the techniques in a practical example for one of the common long-running tasks, such as:

  • Multi-turn Customer Service Conversations
    In extended conversations about tech products—spanning both hardware and software—customers often surface multiple issues over time. The agent must stay consistent and goal-focused while retaining only the essentials rather than hauling along every past detail.

Techniques Covered

To address these challenges, we introduce two concrete approaches using OpenAI Agents SDK:

  1. Trimming Messages – dropping older turns while keeping the last N turns.
  2. Summarizing Messages – compressing prior exchanges into structured, shorter representations.

Motivation

This notebook is the first resource around agent memory. Many of our customers are asking about resources to solve agent memory challenges. This will be highly valuable asset for our customers who are building AI agents and working on context engineering.


For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

@msingh-openai
Copy link
Contributor

A few recommendations:

  1. In the opening " If too much is carried forward, the model risks distraction, inefficiency, or outright failure. If too little is preserved, the agent loses coherence. .." perhaps good to mention about context window size
  2. In Step 0 also mention setup OPENAI_API_KEY
  3. Step 1: Below we install the openai-agents library (the OpenAI Agents SDK ... missing a closing parenthesis
  4. Its not clear from the cookbook whether context trimming and context summarization are two different techniques or two steps in the same technique? In other words can you use context trimming on its own or you always have to use the two together?
  • If they are two separate techniques then you should highlight what situation they are applicable (pros/cons)
  • If they are two steps in the same technique then update the cookbook to indicate this.

@emre-openai
Copy link
Contributor Author

  1. I added context definition and current value for the GPT5

Here, context refers to the total window of tokens (input + output) that the model can attend to at once. For GPT-5, this capacity is up to 272k input tokens and 128k output tokens but even such a large window can be overwhelmed by uncurated histories, redundant tool results, or noisy retrievals. This makes context management not just an optimization, but a necessity.

  1. Added API KEY instructions

Before running the workflow, set your environment variables:

# Your openai key
os.environ["OPENAI_API_KEY"] = "sk-proj-..."

Alternatively, you can set your OpenAI API key for use by the agents via the set_default_openai_key function by importing agents library .

from agents import set_default_openai_key
set_default_openai_key("YOUR_API_KEY")
  1. Fixed the typo

  2. Two separate techniques. Added pros and cons for each with when best section to better understand each technique and make a decision about when to use it.

Real-World Scenario

We’ll ground the techniques in a practical example for one of the common long-running tasks, such as:

  • Multi-turn Customer Service Conversations
    In extended conversations about tech products—spanning both hardware and software—customers often surface multiple issues over time. The agent must stay consistent and goal-focused while retaining only the essentials rather than hauling along every past detail.

Techniques Covered

To address these challenges, we introduce two separate concrete approaches using OpenAI Agents SDK:

  • Context Trimming – dropping older turns while keeping the last N turns.

    • Pros

      • Deterministic & simple: No summarizer variability; easy to reason about state and to reproduce runs.
      • Zero added latency: No extra model calls to compress history.
      • Fidelity for recent work: Latest tool results, parameters, and edge cases stay verbatim—great for debugging.
      • Lower risk of “summary drift”: You never reinterpret or compress facts.

      Cons

      • Forgets long-range context abruptly: Important earlier constraints, IDs, or decisions can vanish once they scroll past N.
      • User experience “amnesia”: Agent can appear to “forget” promises or prior preferences midway through long sessions.
      • Wasted signal: Older turns may contain reusable knowledge (requirements, constraints) that gets dropped.
      • Token spikes still possible: If a recent turn includes huge tool payloads, your last-N can still blow up the context.
    • Best when

      • Your tasks in the conversation is indepentent from each other with non-overlapping context that does not reuqire carrying previous details further.
      • You need predictability, easy evals, and low latency (ops automations, CRM/API actions).
      • The conversation’s useful context is local (recent steps matter far more than distant history).
  • Context Summarization – compressing prior messages(assistant, user, tools, etc.) into structured, shorter summaries injected into the conversation history.

    • Pros

      • Retains long-range memory compactly: Past requirements, decisions, and rationales persist beyond N.
      • Smoother UX: Agent “remembers” commitments and constraints across long sessions.
      • Cost-controlled scale: One concise summary can replace hundreds of turns.
      • Searchable anchor: A single synthetic assistant message becomes a stable “state of the world so far.”

      Cons

      • Summarization loss & bias: Details can be dropped or misweighted; subtle constraints may vanish.
      • Latency & cost spikes: Each refresh adds model work (and potentially tool-trim logic).
      • Compounding errors: If a bad fact enters the summary, it can poison future behavior (“context poisoning”).
      • Observability complexity: You must log summary prompts/outputs for auditability and evals.
    • Best when

      • You have use cases where your tasks needs context collected accross the flow such as planning/coaching, RAG-heavy analysis, policy Q&A.
      • You need continuity over long horizons and carry the important details further to solve related tasks.
      • Sessions exceed N turns but must preserve decisions, IDs, and constraints reliably.

Dimension Trimming (last-N turns) Summarizing (older → generated summary)
Latency / Cost Lowest (no extra calls) Higher at summary refresh points
Long-range recall Weak (hard cut-off) Strong (compact carry-forward)
Risk type Context loss Context distortion/poisoning
Observability Simple logs Must log summary prompts/outputs
Eval stability High Needs robust summary evals
Best for Tool-heavy ops, short workflows Analyst/concierge, long threads

@emre-openai emre-openai merged commit 7433ba1 into main Sep 10, 2025
1 check passed
@emre-openai emre-openai deleted the emre/agents-context-management branch September 10, 2025 21:07
@biggerveggies
Copy link

Sorry, this is very minor - just noted in session_memory.ipynb. --- the word 'independent' appears to be misspelled as 'indepentent' under the 'Techniques Covered' section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants