AI Memory Systems and Persistence
Build memory systems that transform one-shot AI interactions into continuous learning relationships
Module 4: Memory Systems and Persistence
Overview
An AI without memory is like a brilliant colleague with amnesia—every conversation starts from zero. In this module, you'll learn to build memory systems that transform one-shot interactions into continuous learning relationships. We'll implement real persistence that makes your AI agents smarter with every interaction.
What You'll Learn
- The architecture of AI memory systems
- Implementing short-term and long-term memory
- Vector databases for semantic search
- The Long-Term Context Management Protocol (LCMP)
- Memory hygiene and optimization
Prerequisites
- Completed Modules 1-3
- Basic understanding of databases
- Familiarity with Claude or similar AI
- 45 minutes of focused implementation time
Understanding AI Memory Architecture
The Memory Hierarchy
Just like computer systems, AI agents benefit from hierarchical memory:
┌─────────────────────────────────────┐
│ Active Context (Hot Memory) │ ← Current conversation
├─────────────────────────────────────┤
│ Short-term Memory (Warm) │ ← Recent sessions
├─────────────────────────────────────┤
│ Long-term Memory (Cold) │ ← Historical knowledge
├─────────────────────────────────────┤
│ Persistent Storage (Frozen) │ ← Archived data
└─────────────────────────────────────┘
Memory Types and Their Roles
Memory Type | Duration | Use Case | Storage Method |
---|---|---|---|
Working Memory | Single turn | Current task context | RAM/Context Window |
Session Memory | Single session | Conversation continuity | Temporary cache |
Short-term Memory | Hours to days | Recent interactions | Database/Files |
Long-term Memory | Permanent | Learned patterns | Vector DB |
Building Short-term Memory
File-Based Memory System
Let's start with a simple but effective approach:
# memory_manager.py
import json
import os
from datetime import datetime, timedelta
from typing import Dict, List, Any
class ShortTermMemory:
def __init__(self, memory_dir: str = "./memory"):
self.memory_dir = memory_dir
self.session_file = f"{memory_dir}/current_session.json"
self.history_dir = f"{memory_dir}/history"
# Create directories
os.makedirs(self.history_dir, exist_ok=True)
def save_interaction(self, interaction: Dict[str, Any]):
"""Save an interaction to current session"""
session = self.load_current_session()
interaction['timestamp'] = datetime.now().isoformat()
session['interactions'].append(interaction)
session['last_updated'] = datetime.now().isoformat()
with open(self.session_file, 'w') as f:
json.dump(session, f, indent=2)
def load_current_session(self) -> Dict[str, Any]:
"""Load or create current session"""
if os.path.exists(self.session_file):
with open(self.session_file, 'r') as f:
return json.load(f)
return {
'session_id': datetime.now().strftime('%Y%m%d_%H%M%S'),
'created': datetime.now().isoformat(),
'interactions': []
}
def get_recent_context(self, hours: int = 24) -> List[Dict]:
"""Get interactions from the last N hours"""
cutoff = datetime.now() - timedelta(hours=hours)
recent = []
# Check current session
session = self.load_current_session()
for interaction in session['interactions']:
if datetime.fromisoformat(interaction['timestamp']) > cutoff:
recent.append(interaction)
# Check recent history files
for filename in sorted(os.listdir(self.history_dir), reverse=True):
if filename.endswith('.json'):
with open(f"{self.history_dir}/{filename}", 'r') as f:
old_session = json.load(f)
for interaction in old_session['interactions']:
if datetime.fromisoformat(interaction['timestamp']) > cutoff:
recent.append(interaction)
else:
return recent # Files are sorted, so we can stop
return recent
Integrating with Your AI Workflow
# Example integration
memory = ShortTermMemory()
# Save interaction
memory.save_interaction({
'user_input': "Help me implement user authentication",
'ai_response': "I'll help you implement JWT-based auth...",
'context': {
'project': 'task-api',
'files_modified': ['auth.service.ts', 'user.model.ts']
}
})
# Retrieve context for new session
recent_context = memory.get_recent_context(hours=48)
context_summary = summarize_interactions(recent_context)
Implementing Long-term Memory with Vector Databases
Why Vector Databases?
Traditional databases search by exact matches. Vector databases search by meaning:
User asks: "How did we handle authentication?"
Traditional DB: Searches for "authentication" keyword
Vector DB: Finds related concepts:
- "user login system"
- "JWT implementation"
- "security middleware"
- "password hashing"
Setting Up a Vector Database
Let's use Pinecone (you can also use Weaviate, Chroma, or others):
# vector_memory.py
import pinecone
from sentence_transformers import SentenceTransformer
import hashlib
from typing import List, Dict, Any
class LongTermVectorMemory:
def __init__(self, api_key: str, environment: str):
# Initialize Pinecone
pinecone.init(api_key=api_key, environment=environment)
# Create or connect to index
self.index_name = "ai-agent-memory"
if self.index_name not in pinecone.list_indexes():
pinecone.create_index(
self.index_name,
dimension=384, # Based on model
metric='cosine'
)
self.index = pinecone.Index(self.index_name)
# Initialize embedding model
self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
def store_memory(self, content: str, metadata: Dict[str, Any]):
"""Store a memory with metadata"""
# Generate embedding
embedding = self.encoder.encode(content).tolist()
# Create unique ID
memory_id = hashlib.md5(content.encode()).hexdigest()
# Upsert to Pinecone
self.index.upsert([
(memory_id, embedding, {
'content': content[:1000], # Pinecone metadata limit
'timestamp': metadata.get('timestamp'),
'type': metadata.get('type', 'general'),
'project': metadata.get('project'),
'tags': metadata.get('tags', [])
})
])
def search_memories(self, query: str, top_k: int = 5) -> List[Dict]:
"""Search for relevant memories"""
# Generate query embedding
query_embedding = self.encoder.encode(query).tolist()
# Search
results = self.index.query(
query_embedding,
top_k=top_k,
include_metadata=True
)
# Format results
memories = []
for match in results['matches']:
memories.append({
'content': match['metadata']['content'],
'relevance': match['score'],
'metadata': match['metadata']
})
return memories
def search_by_metadata(self, filters: Dict[str, Any], top_k: int = 10):
"""Search memories by metadata filters"""
# Create a dummy query for metadata-only search
dummy_embedding = [0.0] * 384
results = self.index.query(
dummy_embedding,
top_k=top_k,
include_metadata=True,
filter=filters
)
return results['matches']
Practical Memory Storage Patterns
# Pattern 1: Decision Memory
memory.store_memory(
content="Decided to use JWT tokens instead of sessions for authentication due to stateless architecture requirements",
metadata={
'type': 'decision',
'project': 'task-api',
'tags': ['architecture', 'authentication', 'security'],
'timestamp': datetime.now().isoformat(),
'rationale': 'Microservices need stateless auth'
}
)
# Pattern 2: Problem-Solution Memory
memory.store_memory(
content="Fixed memory leak in WebSocket handler by properly cleaning up event listeners on disconnect",
metadata={
'type': 'solution',
'problem': 'Memory leak in production',
'tags': ['bug-fix', 'websocket', 'performance'],
'files': ['websocket.handler.ts']
}
)
# Pattern 3: Learning Memory
memory.store_memory(
content="Team prefers functional components with hooks over class components for better code reuse",
metadata={
'type': 'pattern',
'category': 'coding-standards',
'tags': ['react', 'frontend', 'team-preferences']
}
)
The Long-Term Context Management Protocol (LCMP)
Understanding LCMP
LCMP is a systematic approach to maintaining project continuity across AI sessions:
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Active Work │────▶│ Update Logs │────▶│ Next Session │
└────────────────┘ └────────────────┘ └────────────────┘
│
▼
┌──────────────┐
│ Context Files │
└──────────────┘
LCMP Implementation
# lcmp_manager.py
class LCMPManager:
def __init__(self, project_root: str):
self.project_root = project_root
self.context_dir = f"{project_root}/.context"
self.init_structure()
def init_structure(self):
"""Initialize LCMP directory structure"""
os.makedirs(self.context_dir, exist_ok=True)
# Core LCMP files
self.files = {
'state': f"{self.context_dir}/state.md",
'schema': f"{self.context_dir}/schema.md",
'decisions': f"{self.context_dir}/decisions.md",
'insights': f"{self.context_dir}/insights.md"
}
# Create files if they don't exist
for file_type, path in self.files.items():
if not os.path.exists(path):
self.init_file(file_type, path)
def init_file(self, file_type: str, path: str):
"""Initialize a context file with template"""
templates = {
'state': """# Current Project State
## Active Tasks
- [ ]
## Blockers
- None
## Last Completed
-
## Next Steps
1.
""",
'schema': """# Data Structures & Formats
## File Organization
project/
├── src/
└── tests/
## Key Data Models
## API Endpoints
""",
'decisions': """# Technical Decisions Log
## Decision Template
**Date**:
**Decision**:
**Rationale**:
**Alternatives Considered**:
""",
'insights': """# Project Insights
## Patterns Discovered
## Performance Observations
## Team Preferences
"""
}
with open(path, 'w') as f:
f.write(templates.get(file_type, ''))
def update_state(self, updates: Dict[str, Any]):
"""Update current state"""
with open(self.files['state'], 'r') as f:
content = f.read()
# Smart update logic
if 'completed' in updates:
content = self.move_task_to_completed(content, updates['completed'])
if 'new_tasks' in updates:
content = self.add_new_tasks(content, updates['new_tasks'])
if 'blockers' in updates:
content = self.update_blockers(content, updates['blockers'])
with open(self.files['state'], 'w') as f:
f.write(content)
def add_decision(self, decision: str, rationale: str, alternatives: List[str]):
"""Log a technical decision"""
with open(self.files['decisions'], 'a') as f:
f.write(f"""
## {datetime.now().strftime('%Y-%m-%d')}: {decision}
**Decision**: {decision}
**Rationale**: {rationale}
**Alternatives Considered**: {', '.join(alternatives)}
---
""")
def add_insight(self, insight: str, category: str = "general"):
"""Record a project insight"""
with open(self.files['insights'], 'a') as f:
f.write(f"""
### [{category}] {datetime.now().strftime('%Y-%m-%d %H:%M')}
{insight}
""")
def get_full_context(self) -> Dict[str, str]:
"""Load all context files"""
context = {}
for file_type, path in self.files.items():
if os.path.exists(path):
with open(path, 'r') as f:
context[file_type] = f.read()
return context
Using LCMP in Practice
# Integration with AI assistant
lcmp = LCMPManager("/path/to/project")
# Start new session
context = lcmp.get_full_context()
prompt = f"""
Continue working on the project. Here's the current context:
State:
{context['state']}
Recent Decisions:
{context['decisions'][-1000:]} # Last 1000 chars
Let's pick up where we left off.
"""
# During work
lcmp.update_state({
'completed': 'Implement user authentication',
'new_tasks': ['Add password reset flow', 'Implement 2FA']
})
# Record important decisions
lcmp.add_decision(
decision="Use Redis for session storage",
rationale="Need fast access and automatic expiration",
alternatives=["In-memory storage", "PostgreSQL"]
)
Memory Hygiene and Optimization
The Memory Decay Problem
Without management, memory systems become polluted:
Day 1: Clean, relevant memories
Day 30: Some outdated information
Day 90: Contradictions and confusion
Day 365: More noise than signal
Implementing Memory Hygiene
class MemoryHygiene:
def __init__(self, vector_memory: LongTermVectorMemory):
self.memory = vector_memory
def apply_temporal_decay(self, days_old: int) -> float:
"""Calculate relevance decay over time"""
if days_old < 7:
return 1.0 # Full relevance
elif days_old < 30:
return 0.8
elif days_old < 90:
return 0.5
elif days_old < 365:
return 0.3
else:
return 0.1 # Minimal relevance
def deduplicate_memories(self, threshold: float = 0.95):
"""Remove near-duplicate memories"""
# Fetch all memories
all_memories = self.memory.search_by_metadata({}, top_k=1000)
# Compare embeddings
duplicates = []
for i, mem1 in enumerate(all_memories):
for mem2 in all_memories[i+1:]:
similarity = cosine_similarity(
mem1['values'],
mem2['values']
)
if similarity > threshold:
# Keep the more recent one
older = mem1 if mem1['metadata']['timestamp'] < mem2['metadata']['timestamp'] else mem2
duplicates.append(older['id'])
# Remove duplicates
self.memory.index.delete(ids=duplicates)
return len(duplicates)
def consolidate_insights(self, time_window: int = 30):
"""Consolidate related insights from time window"""
# Search for insights within window
cutoff = datetime.now() - timedelta(days=time_window)
insights = self.memory.search_by_metadata({
'type': 'insight',
'timestamp': {'$gte': cutoff.isoformat()}
})
# Group by topic
grouped = self.group_by_embedding_similarity(insights)
# Create consolidated insights
for group in grouped:
if len(group) > 3: # Threshold for consolidation
consolidated = self.summarize_insights(group)
self.memory.store_memory(
content=consolidated,
metadata={
'type': 'consolidated_insight',
'source_count': len(group),
'timestamp': datetime.now().isoformat()
}
)
Memory Access Patterns
Optimize how you retrieve memories:
class SmartMemoryRetrieval:
def __init__(self, short_term: ShortTermMemory,
long_term: LongTermVectorMemory):
self.short_term = short_term
self.long_term = long_term
def get_relevant_context(self, query: str, task_type: str) -> Dict:
"""Retrieve context based on query and task type"""
context = {
'recent': [],
'historical': [],
'decisions': [],
'patterns': []
}
# Recent interactions (high priority)
context['recent'] = self.short_term.get_recent_context(hours=24)
# Task-specific retrieval strategies
if task_type == 'debugging':
# Prioritize error solutions and patterns
context['historical'] = self.long_term.search_memories(
query + " error solution bug fix",
top_k=10
)
elif task_type == 'feature_development':
# Prioritize architecture decisions and patterns
context['decisions'] = self.long_term.search_by_metadata({
'type': 'decision',
'tags': {'$in': ['architecture', 'design']}
})
elif task_type == 'refactoring':
# Prioritize code patterns and standards
context['patterns'] = self.long_term.search_by_metadata({
'type': 'pattern',
'category': 'coding-standards'
})
return self.prioritize_context(context, max_tokens=50000)
Building a Complete Memory-Enabled Agent
Architecture Overview
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Input │────▶│ Memory Manager │────▶│ AI Agent │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
┌──────┴────────┐ │
▼ ▼ │
┌─────────────┐ ┌─────────────┐ │
│ Short-term │ │ Long-term │ │
│ Memory │ │ Memory │ │
└─────────────┘ └─────────────┘ │
│
┌───────────────────────────────────┘
▼
┌─────────────────┐
│ User Response │
└─────────────────┘
Complete Implementation
# memory_agent.py
class MemoryEnabledAgent:
def __init__(self, config: Dict[str, Any]):
# Initialize components
self.short_term = ShortTermMemory(config['memory_dir'])
self.long_term = LongTermVectorMemory(
config['pinecone_api_key'],
config['pinecone_env']
)
self.lcmp = LCMPManager(config['project_root'])
self.hygiene = MemoryHygiene(self.long_term)
# AI client (Claude, OpenAI, etc.)
self.ai_client = self.init_ai_client(config)
def process_request(self, user_input: str, task_type: str = 'general'):
"""Process user request with full memory context"""
# 1. Retrieve relevant memories
memories = self.gather_context(user_input, task_type)
# 2. Build context-enhanced prompt
enhanced_prompt = self.build_enhanced_prompt(
user_input,
memories,
task_type
)
# 3. Get AI response
response = self.ai_client.complete(enhanced_prompt)
# 4. Store interaction in memory
self.store_interaction(user_input, response, task_type)
# 5. Update LCMP if needed
self.update_project_context(user_input, response)
# 6. Periodic hygiene
if self.should_run_hygiene():
self.run_memory_hygiene()
return response
def gather_context(self, query: str, task_type: str) -> Dict:
"""Gather all relevant context"""
context = {}
# LCMP context (always relevant)
context['project'] = self.lcmp.get_full_context()
# Recent interactions
context['recent'] = self.short_term.get_recent_context(hours=48)
# Semantic search for relevant memories
context['relevant'] = self.long_term.search_memories(query, top_k=10)
# Task-specific memories
if task_type == 'debugging':
context['similar_issues'] = self.long_term.search_memories(
f"{query} error bug solution",
top_k=5
)
return context
Practical Exercises
Exercise 1: Build a Learning Assistant
Create an assistant that remembers everything it learns:
# learning_assistant.py
class LearningAssistant(MemoryEnabledAgent):
def learn_from_correction(self, correction: str):
"""Learn from user corrections"""
# Store as high-priority memory
self.long_term.store_memory(
content=f"Correction: {correction}",
metadata={
'type': 'learning',
'priority': 'high',
'timestamp': datetime.now().isoformat()
}
)
# Update behavior immediately
self.update_behavior_rules(correction)
def demonstrate_learning(self, similar_task: str):
"""Show that learning was retained"""
# Search for related corrections
past_learnings = self.long_term.search_memories(
f"correction learning {similar_task}",
top_k=5
)
# Apply past learnings to new task
return self.apply_learnings(similar_task, past_learnings)
Exercise 2: Build a Customer Support Agent
Create an agent that remembers all customer interactions:
class CustomerSupportAgent(MemoryEnabledAgent):
def handle_ticket(self, customer_id: str, issue: str):
"""Handle support ticket with full history"""
# Get customer history
history = self.get_customer_history(customer_id)
# Search for similar resolved issues
similar_resolutions = self.search_resolutions(issue)
# Generate personalized response
response = self.generate_response(
issue,
history,
similar_resolutions
)
# Store resolution for future reference
self.store_resolution(customer_id, issue, response)
return response
Checkpoint Task
Your Mission
Build a memory-enabled customer support agent that:
- Remembers all interactions with each customer
- Learns from resolved issues to improve future responses
- Maintains conversation continuity across sessions
- Gets smarter over time through pattern recognition
Requirements
Your implementation should:
- Use both short-term and long-term memory
- Implement the LCMP protocol
- Include memory hygiene
- Search past resolutions semantically
- Track customer satisfaction improvements
Success Metrics
- ✅ Zero repeated questions to customers
- ✅ 90%+ relevant past issue retrieval
- ✅ Measurable improvement in resolution time
- ✅ Clean memory management (no pollution)
Deliverable
A working system that demonstrates:
- Initial customer interaction
- Memory storage and retrieval
- Improved response on similar issue
- Memory consolidation after multiple interactions
Common Memory Pitfalls
Pitfall 1: Memory Explosion
Problem: Storing everything leads to noise Solution: Implement significance scoring
def calculate_significance(self, interaction: Dict) -> float:
"""Score interaction significance"""
score = 0.0
# Contains decision
if 'decided' in interaction['content'].lower():
score += 0.3
# Contains solution
if 'fixed' in interaction['content'].lower():
score += 0.3
# Contains learning
if 'learned' in interaction['content'].lower():
score += 0.2
# Length indicates complexity
if len(interaction['content']) > 500:
score += 0.2
return min(score, 1.0)
Pitfall 2: Context Conflation
Problem: Mixing unrelated memories Solution: Strict domain separation
Pitfall 3: Stale Memory Dominance
Problem: Old patterns overriding new learnings Solution: Temporal decay and explicit updates
Next Steps
You've built AI agents with perfect memory. Module 5 will show you how to scale these systems to production.
Preview of Module 5
- Production architecture patterns
- Performance optimization
- Cost management strategies
- Error recovery and resilience
Ready for production? Module 5 teaches you to scale!