Chapter 5: Implementing Memory and Context in AI Agents

Learn to add memory systems to AI agents, enabling context retention, personalization, and learning from past interactions using vector databases and conversation history.

Memory transforms agents from stateless responders to intelligent assistants that learn and adapt. This chapter explores implementing various memory systems, from simple conversation history to sophisticated vector database integration for long-term knowledge retention.

Understanding Agent Memory Types

Agent memory exists in several forms, each serving different purposes. Buffer memory stores recent conversation history in full, providing immediate context for ongoing interactions. This is the simplest memory type, keeping the last N messages accessible to the agent.

Summary memory compresses conversation history into summaries, reducing token usage while maintaining context. Instead of storing entire conversations, the agent keeps condensed versions of past interactions, allowing longer context windows without overwhelming token limits.

Vector memory uses embeddings to store and retrieve relevant information semantically. Rather than keeping everything or summaries, vector memory lets agents search their knowledge base for contextually relevant information, even from much earlier conversations or external documents.

Entity memory tracks specific entities (people, places, concepts) mentioned in conversations, maintaining structured information about each. This is particularly useful for assistants that need to remember facts about multiple people or topics.

Implementing Conversation Buffer Memory

Start with the simplest memory type—buffer memory that remembers recent exchanges:

<code https://python.org" target="_blank" rel="nofollow">Python</a>">from langchain.memory import ConversationBufferMemoryfrom langchain.agents import AgentExecutor, create_react_agentfrom langchain_openai import ChatOpenAIfrom langchain.prompts import PromptTemplate# Initialize memorymemory = ConversationBufferMemory( memory_key="chat_history", return_messages=True, output_key="output")# Create agent with memoryllm = ChatOpenAI(model="gpt-4", temperature=0)# Modify prompt to include chat historyprompt = PromptTemplate.from_template("""You are a helpful assistant. Use the chat history to provide contextual responses.Chat History:{chat_history}Available tools:{tools}Question: {input}{agent_scratchpad}""")agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)agent_executor = AgentExecutor( agent=agent, tools=tools, memory=memory, verbose=True)</code>

Now the agent remembers previous interactions within a session:

<code ># First questionresponse1 = agent_executor.invoke({"input": "My name is Alex and I live in Seattle"})# Later question - agent remembersresponse2 = agent_executor.invoke({"input": "What's my name and where do I live?"})</code>

The agent recalls information from earlier in the conversation, providing personalized responses.

Implementing Conversation Summary Memory

For longer conversations, summary memory prevents token limits from being reached:

<code >from langchain.memory import ConversationSummaryMemorymemory = ConversationSummaryMemory( llm=llm, memory_key="chat_history", return_messages=True)</code>

This memory type periodically summarizes the conversation, storing compressed versions of exchanges. The agent maintains context without accumulating thousands of tokens.

Implementing Conversation Buffer Window Memory

Window memory keeps only the most recent N interactions:

<code >from langchain.memory import ConversationBufferWindowMemorymemory = ConversationBufferWindowMemory( k=5, # Keep last 5 exchanges memory_key="chat_history", return_messages=True)</code>

This balances context retention with token efficiency—ideal when only recent context matters.

Vector Database Integration for Long-term Memory

Vector databases enable semantic search over large information stores. Agents can remember vast amounts of information and retrieve relevant pieces based on current context.

First, install a vector database. ChromaDB is excellent for local development:

<code >pip install chromadb tiktoken</code>

Create a vector store for agent memory:

<code >from langchain.vectorstores import Chromafrom langchain_openai import OpenAIEmbeddingsfrom langchain.memory import VectorStoreRetrieverMemory# Initialize embeddingsembeddings = OpenAIEmbeddings()# Create or load vector storevectorstore = Chroma( collection_name="agent_memory", embedding_function=embeddings, persist_directory="./chroma_db")# Create retrieverretriever = vectorstore.as_retriever(search_kwargs={"k": 3})# Create memory using vector storememory = VectorStoreRetrieverMemory( retriever=retriever, memory_key="relevant_history", input_key="input")</code>

This memory retrieves the 3 most relevant past interactions for each new query, allowing agents to remember information from months ago if contextually relevant.

Combining Memory Types

Sophisticated agents often combine multiple memory systems:

<code >from langchain.memory import CombinedMemory# Short-term: recent conversationbuffer_memory = ConversationBufferWindowMemory( k=3, memory_key="recent_history", input_key="input")# Long-term: semantic searchvector_memory = VectorStoreRetrieverMemory( retriever=retriever, memory_key="relevant_past", input_key="input")# Combine bothcombined_memory = CombinedMemory(memories=[buffer_memory, vector_memory])</code>

Now agents have both recent context and access to relevant long-term information.

Implementing Persistent Memory Across Sessions

Make memory persist across application restarts:

<code >import <a href="https://json.org" target="_blank" rel="nofollow">JSON</a>from langchain.memory import ConversationBufferMemorydef save_memory(memory, filepath="memory.json"): """Save memory to file""" history = memory.load_memory_variables({}) with open(filepath, 'w') as f: json.dump(history, f)def load_memory(filepath="memory.json"): """Load memory from file""" try: with open(filepath, 'r') as f: history = json.load(f) memory = ConversationBufferMemory(memory_key="chat_history") memory.chat_memory.messages = history.get("chat_history", []) return memory except FileNotFoundError: return ConversationBufferMemory(memory_key="chat_history")# Usagememory = load_memory()# ... use agent ...save_memory(memory)</code>

For production systems, use databases (PostgreSQL, MongoDB) instead of JSON files for scalability and reliability.

User-Specific Memory

Store separate memory for each user:

<code >def get_user_memory(user_id): """Retrieve or create memory for specific user""" vectorstore = Chroma( collection_name=f"user_{user_id}_memory", embedding_function=embeddings, persist_directory=f"./user_data/{user_id}" ) retriever = vectorstore.as_retriever() return VectorStoreRetrieverMemory(retriever=retriever)# Usageuser1_memory = get_user_memory("user_123")user2_memory = get_user_memory("user_456")</code>

Each user gets personalized memory that persists across sessions.

Retrieval-Augmented Generation (RAG)

RAG extends agent memory with external knowledge bases. Load documents into vector databases for agent access:

<code >from langchain.document_loaders import TextLoader, DirectoryLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitter# Load documentsloader = DirectoryLoader("./documents", glob="**/*.txt")documents = loader.load()# Split into chunkstext_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200)texts = text_splitter.split_documents(documents)# Create vector storevectorstore = Chroma.from_documents( documents=texts, embedding=embeddings, persist_directory="./knowledge_base")# Create retrieval toolfrom langchain.tools import Tooldef search_knowledge_base(query): docs = vectorstore.similarity_search(query, k=3) return "\n\n".join([doc.page_content for doc in docs])knowledge_tool = Tool( name="Knowledge Base Search", func=search_knowledge_base, description="Search the knowledge base for relevant information. Input should be a search query.")</code>

Add this tool to your agent—now it can access your entire document collection.

Memory Management Best Practices

Implement memory cleanup to prevent unbounded growth. Periodically prune old, irrelevant memories:

<code >def cleanup_old_memories(vectorstore, days=90): """Remove memories older than specified days""" cutoff_date = datetime.now() - timedelta(days=days) # Implementation depends on your vector store # Most support filtering by metadata timestamps</code>

Implement importance weighting—not all memories are equally valuable. Store importance scores and prioritize retrieval of significant information:

<code >def add_weighted_memory(vectorstore, text, importance=1.0): """Add memory with importance weight""" vectorstore.add_texts( texts=[text], metadatas=[{"importance": importance, "timestamp": datetime.now()}] )</code>

Monitor memory size and performance. Large vector stores impact retrieval speed. Benchmark query times and optimize as needed.

Privacy and Security Considerations

Memory systems store potentially sensitive user information. Implement proper security: encrypt memory storage at rest, use secure access controls, implement data retention policies, provide users with data deletion options, and comply with privacy regulations (GDPR, CCPA).

Never store sensitive information like passwords or payment details in agent memory. Use secure vaults for such data and only store references in memory.

Testing Memory Systems

Verify memory functionality with systematic tests:

<code >def test_memory_retention(): """Test agent remembers information""" agent_executor.invoke({"input": "Remember that my favorite color is blue"}) response = agent_executor.invoke({"input": "What's my favorite color?"}) assert "blue" in response["output"].lower()def test_memory_retrieval(): """Test semantic retrieval from vector memory""" # Add specific information memory.save_context( {"input": "I'm planning a trip to Japan in March"}, {"output": "That sounds exciting! March is cherry blossom season."} ) # Later, related query should retrieve this response = agent_executor.invoke({"input": "What travel plans did I mention?"}) assert "Japan" in response["output"]</code>

Run tests regularly to catch memory-related bugs.

Optimizing Memory Performance

Reduce latency by caching frequent queries, using faster embedding models, implementing async retrieval, and pre-computing common retrievals.

Balance memory depth with cost—deeper memory (more retrieved documents) improves context but increases tokens and API costs. Find optimal trade-offs through experimentation.

Advanced Memory Patterns

Implement hierarchical memory with multiple time scales—immediate (current conversation), recent (last day), long-term (everything older). Different retrieval strategies apply to each tier.

Create episodic memory storing complete interaction episodes as coherent units. This mirrors human memory better than isolated exchanges.

Implement forgetting mechanisms where Less-accessed memories gradually fade, mimicking natural memory decay and keeping vector stores manageable.

Memory transforms agents from reactive tools to genuine assistants that know you, remember your preferences, and improve over time. The next chapter explores giving agents access to external tools and APIs, dramatically expanding what they can accomplish autonomously.

Tags:#AI agent memory #conversation context #vector database agents #LangChain memory #agent conversation history #persistent agent memory #RAG agents