Context Assembly¶
Context assembly is the process of converting working memory into a formatted string that can be used with your LLM. This guide covers the three assembly strategies, optimization techniques, and best practices for creating high-quality context.
What is Context Assembly?¶
Context assembly transforms working memory into LLM-ready context:
Basic Usage¶
The assemble_context method on working memory creates a context string:
# Basic context assembly
context = htm.working_memory.assemble_context(
strategy: :balanced, # Assembly strategy
max_tokens: nil # Optional token limit
)
# Use with your LLM
prompt = <<~PROMPT
Context from memory:
#{context}
User question: How do we handle authentication?
Assistant:
PROMPT
# Send to LLM...
response = llm.complete(prompt)
Assembly Strategies¶
HTM provides three strategies for assembling context, each optimized for different use cases.
Recent Strategy¶
The :recent strategy prioritizes newest memories first.
How it works:
- Sort memories by access time (most recent first)
- Add memories in order until token limit reached
- Return assembled context
Best for:
- Continuing recent conversations
- Session-based interactions
- Short-term context tracking
- Real-time applications
Example:
# Chat application
class ChatBot
def initialize
@htm = HTM.new(robot_name: "Chat", working_memory_size: 128_000)
@turn = 0
end
def chat(user_message)
@turn += 1
# Add user message
@htm.remember(
"User: #{user_message}",
metadata: { turn: @turn, role: "user" }
)
# Get recent context
context = @htm.working_memory.assemble_context(
strategy: :recent,
max_tokens: 10_000
)
# Generate response
response = llm_generate(context, user_message)
# Store assistant response
@htm.remember(
"Assistant: #{response}",
metadata: { turn: @turn, role: "assistant" }
)
response
end
private
def llm_generate(context, message)
# Your LLM integration here
"Generated response based on context"
end
end
Frequent Strategy¶
The :frequent strategy prioritizes frequently accessed memories.
How it works:
- Sort memories by access count (highest first)
- Add memories in order until token limit reached
- Return assembled context
Best for:
- Critical information retention
- System constraints and rules
- User preferences
- Core knowledge base
- Decision-making support
Example:
# Knowledge base with priorities tracked via access patterns
class KnowledgeBot
def initialize
@htm = HTM.new(robot_name: "Knowledge")
# Add critical system constraints
@htm.remember(
"CRITICAL: Never expose API keys in responses",
metadata: { priority: "critical", category: "constraint" }
)
# Add important user preferences
@htm.remember(
"User prefers concise explanations",
metadata: { priority: "high", category: "preference" }
)
# Add general knowledge
@htm.remember(
"Python uses indentation for code blocks",
metadata: { priority: "medium", category: "fact" }
)
end
def answer_question(question)
# Get most frequently accessed context first
context = @htm.working_memory.assemble_context(
strategy: :frequent,
max_tokens: 5_000
)
# Frequently accessed constraints and preferences are included first
generate_answer(context, question)
end
private
def generate_answer(context, question)
# LLM integration
"Answer based on frequently accessed context"
end
end
Balanced Strategy (Recommended)¶
The :balanced strategy combines access frequency and recency using a weighted formula.
How it works:
- Calculate score:
log(1 + access_count) × recency_factor - Sort by score (highest first)
- Add memories until token limit reached
- Return assembled context
Scoring examples:
# Recent + Frequently accessed: High score
# Access count: 10, Age: 1 hour
# Score: log(11) × (1 / (1 + 1/3600)) ≈ 2.4 ✓ Included
# Old + Frequently accessed: Medium score
# Access count: 10, Age: 24 hours
# Score: log(11) × (1 / (1 + 24)) ≈ 0.10 ≈ Maybe
# Recent + Rarely accessed: Low score
# Access count: 1, Age: 1 hour
# Score: log(2) × (1 / (1 + 1/3600)) ≈ 0.69 ≈ Maybe
# Old + Rarely accessed: Very low score
# Access count: 1, Age: 24 hours
# Score: log(2) × (1 / (1 + 24)) ≈ 0.03 ✗ Excluded
Best for:
- General-purpose applications (recommended default)
- Mixed temporal needs
- Production systems
- Balanced context requirements
Example:
# General-purpose assistant
class Assistant
def initialize
@htm = HTM.new(
robot_name: "Assistant",
working_memory_size: 128_000
)
end
def process(user_input)
# Add user input
@htm.remember(
user_input,
metadata: { role: "user", timestamp: Time.now.to_i }
)
# Get balanced context (frequent + recent)
context = @htm.working_memory.assemble_context(
strategy: :balanced,
max_tokens: 50_000
)
# Use context with LLM
generate_response(context, user_input)
end
private
def generate_response(context, input)
prompt = <<~PROMPT
You are a helpful assistant with access to memory.
Context from memory:
#{context}
User: #{input}
Assistant:
PROMPT
# Send to LLM
llm_complete(prompt)
end
def llm_complete(prompt)
# Your LLM integration
"Generated response"
end
end
Token Limits¶
Control context size with token limits:
# Use default (working memory size)
context = htm.working_memory.assemble_context(strategy: :balanced)
# Custom limit
context = htm.working_memory.assemble_context(
strategy: :balanced,
max_tokens: 50_000
)
# Small context for simple queries
context = htm.working_memory.assemble_context(
strategy: :recent,
max_tokens: 5_000
)
# Large context for complex tasks
context = htm.working_memory.assemble_context(
strategy: :frequent,
max_tokens: 200_000
)
Choosing token limits:
| Limit | Use Case |
|---|---|
| 2K-5K | Simple Q&A, quick lookups |
| 10K-20K | Standard conversations |
| 50K-100K | Complex analysis, code generation |
| 100K+ | Document processing, extensive context |
LLM Context Windows
Don't exceed your LLM's context window: - GPT-3.5: 4K-16K tokens - GPT-4: 8K-128K tokens - Claude: 100K-200K tokens - Llama 2: 4K tokens
Strategy Comparison¶
Performance¶
require 'benchmark'
# Add 1000 test memories
1000.times do |i|
htm.remember("Memory #{i}")
end
# Benchmark strategies
Benchmark.bm(15) do |x|
x.report("Recent:") do
100.times { htm.working_memory.assemble_context(strategy: :recent) }
end
x.report("Frequent:") do
100.times { htm.working_memory.assemble_context(strategy: :frequent) }
end
x.report("Balanced:") do
100.times { htm.working_memory.assemble_context(strategy: :balanced) }
end
end
# Typical results:
# user system total real
# Recent: 0.050000 0.000000 0.050000 ( 0.051234)
# Frequent: 0.045000 0.000000 0.045000 ( 0.047891)
# Balanced: 0.080000 0.000000 0.080000 ( 0.082456)
Notes:
:recentis fastest (simple sort):frequentis fast (simple sort):balancedis slower (complex calculation)- All are typically < 100ms for normal working memory sizes
Quality Comparison¶
# Test scenario: Mix of frequently accessed old data and recent rarely accessed data
# Setup
htm = HTM.new(robot_name: "Test")
# Add frequently accessed data (simulate high access count)
htm.remember("Critical system constraint", metadata: { priority: "critical" })
sleep 1 # Simulate age
# Add recent but rarely accessed data
20.times do |i|
htm.remember("Recent note #{i}", metadata: { priority: "low" })
end
# Compare strategies
puts "=== Recent Strategy ==="
context = htm.working_memory.assemble_context(strategy: :recent, max_tokens: 1000)
puts context.include?("Critical system constraint") ? "✓ Has critical" : "✗ Missing critical"
puts "\n=== Frequent Strategy ==="
context = htm.working_memory.assemble_context(strategy: :frequent, max_tokens: 1000)
puts context.include?("Critical system constraint") ? "✓ Has critical" : "✗ Missing critical"
puts "\n=== Balanced Strategy ==="
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: 1000)
puts context.include?("Critical system constraint") ? "✓ Has critical" : "✗ Missing critical"
# Results depend on actual access patterns:
# Recent: May miss older frequently accessed data
# Frequent: Prioritizes frequently accessed items
# Balanced: Combines frequency and recency
Advanced Techniques¶
1. Multi-Strategy Context¶
Use multiple strategies for comprehensive context:
def multi_strategy_context(max_tokens_per_strategy: 10_000)
# Get different perspectives
recent = htm.working_memory.assemble_context(
strategy: :recent,
max_tokens: max_tokens_per_strategy
)
frequent = htm.working_memory.assemble_context(
strategy: :frequent,
max_tokens: max_tokens_per_strategy
)
# Combine (you might want to deduplicate)
combined = <<~CONTEXT
=== Recent Context ===
#{recent}
=== Frequently Accessed Context ===
#{frequent}
CONTEXT
combined
end
2. Dynamic Strategy Selection¶
Choose strategy based on query type:
def smart_context(query)
strategy = if query.match?(/recent|latest|current/)
:recent
elsif query.match?(/important|critical|must|frequent/)
:frequent
else
:balanced
end
htm.working_memory.assemble_context(strategy: strategy, max_tokens: 20_000)
end
# Usage
context = smart_context("What are the recent changes?") # Uses :recent
context = smart_context("What are critical constraints?") # Uses :frequent
context = smart_context("How do we handle auth?") # Uses :balanced
3. Filtered Context¶
Include only memories matching specific metadata:
def filtered_context(category:)
# Recall memories with specific metadata
memories = htm.recall(
category,
timeframe: "last 24 hours",
metadata: { category: category },
strategy: :hybrid,
limit: 50,
raw: true
)
# Manually assemble context from results
memories.map { |m| m['content'] }.join("\n\n")
end
# Usage
facts_only = filtered_context(category: "fact")
decisions_only = filtered_context(category: "decision")
4. Sectioned Context¶
Organize context into sections:
def sectioned_context
# Get different types of context using metadata filtering
facts = htm.recall(
"facts",
timeframe: "all time",
metadata: { category: "fact" },
limit: 5,
raw: true
)
decisions = htm.recall(
"decisions",
timeframe: "all time",
metadata: { category: "decision" },
limit: 5,
raw: true
)
recent = htm.recall(
"recent",
timeframe: "last hour",
limit: 5,
raw: true
)
# Format as sections
<<~CONTEXT
=== Core Facts ===
#{facts.map { |f| "- #{f['content']}" }.join("\n")}
=== Key Decisions ===
#{decisions.map { |d| "- #{d['content']}" }.join("\n")}
=== Recent Activity ===
#{recent.map { |r| "- #{r['content']}" }.join("\n")}
CONTEXT
end
5. Token-Aware Context¶
Ensure context fits LLM limits:
class TokenAwareContext
def initialize(htm)
@htm = htm
end
def create(strategy:, llm_context_window:, reserve_for_prompt: 1000)
# Calculate available tokens
available = llm_context_window - reserve_for_prompt
# Get context
context = @htm.working_memory.assemble_context(
strategy: strategy,
max_tokens: available
)
# Verify token count
actual_tokens = HTM.configuration.count_tokens(context)
if actual_tokens > available
warn "Context exceeded limit! Truncating..."
# Retry with smaller limit
context = @htm.working_memory.assemble_context(
strategy: strategy,
max_tokens: (available * 0.9).to_i # 90% to be safe
)
end
context
end
end
# Usage
context_builder = TokenAwareContext.new(htm)
context = context_builder.create(
strategy: :balanced,
llm_context_window: 100_000, # Claude 100K
reserve_for_prompt: 2_000
)
Using Context with LLMs¶
Pattern 1: System Prompt + Context¶
def generate_with_context(user_query)
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: 50_000)
system_prompt = <<~SYSTEM
You are a helpful AI assistant with access to memory.
Use the provided context to answer questions accurately.
SYSTEM
user_prompt = <<~USER
Context from memory:
#{context}
---
User question: #{user_query}
Please answer based on the context above.
USER
# Send to LLM with system + user prompts
llm.chat(system: system_prompt, user: user_prompt)
end
Pattern 2: Conversation History¶
class ConversationManager
def initialize
@htm = HTM.new(robot_name: "Chat")
@conversation_id = SecureRandom.uuid
end
def add_turn(user_msg, assistant_msg)
timestamp = Time.now.to_i
@htm.remember(
user_msg,
tags: ["conversation:#{@conversation_id}"],
metadata: { role: "user", timestamp: timestamp }
)
@htm.remember(
assistant_msg,
tags: ["conversation:#{@conversation_id}"],
metadata: { role: "assistant", timestamp: timestamp }
)
end
def get_context_for_llm
# Get recent conversation
@htm.working_memory.assemble_context(
strategy: :recent,
max_tokens: 10_000
)
end
end
Pattern 3: RAG with Context¶
def rag_query(question)
# 1. Retrieve relevant memories (adds to working memory)
relevant = htm.recall(
question,
timeframe: "last month",
strategy: :hybrid,
limit: 10
)
# 2. Create context from working memory (includes retrieved + existing)
context = htm.working_memory.assemble_context(
strategy: :balanced,
max_tokens: 30_000
)
# 3. Generate answer
prompt = <<~PROMPT
Context:
#{context}
Question: #{question}
Answer based on the context above:
PROMPT
llm.complete(prompt)
end
Optimization Tips¶
1. Cache Context¶
class ContextCache
def initialize(htm, ttl: 60)
@htm = htm
@ttl = ttl
@cache = {}
end
def get_context(strategy:, max_tokens: nil)
cache_key = "#{strategy}_#{max_tokens}"
# Check cache
if cached = @cache[cache_key]
if Time.now - cached[:time] < @ttl
return cached[:context]
end
end
# Generate new context
context = @htm.working_memory.assemble_context(
strategy: strategy,
max_tokens: max_tokens
)
# Cache it
@cache[cache_key] = {
context: context,
time: Time.now
}
context
end
def invalidate
@cache.clear
end
end
# Usage
cache = ContextCache.new(htm, ttl: 30) # 30 second TTL
context = cache.get_context(strategy: :balanced) # Cached for 30s
2. Progressive Context Loading¶
def progressive_context(start_tokens: 5_000, max_tokens: 50_000)
# Start small
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: start_tokens)
# Check if more context needed (based on your logic)
if needs_more_context?(context)
# Expand gradually
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: start_tokens * 2)
end
if still_needs_more?(context)
# Expand to max
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: max_tokens)
end
context
end
def needs_more_context?(context)
# Your logic here
context.length < 1000 # Example: too short
end
def still_needs_more?(context)
# Your logic here
false # Example
end
3. Selective Inclusion¶
def selective_context(query)
# Determine what's relevant based on query
include_facts = query.match?(/fact|truth|information/)
include_decisions = query.match?(/decision|choice|why/)
include_code = query.match?(/code|implement|example/)
# Build custom context using metadata filtering
parts = []
if include_facts
facts = htm.recall(
query,
timeframe: "all time",
metadata: { category: "fact" },
limit: 5,
raw: true
)
parts << "Facts:\n" + facts.map { |f| "- #{f['content']}" }.join("\n")
end
if include_decisions
decisions = htm.recall(
query,
timeframe: "all time",
metadata: { category: "decision" },
limit: 5,
raw: true
)
parts << "Decisions:\n" + decisions.map { |d| "- #{d['content']}" }.join("\n")
end
if include_code
code = htm.recall(
query,
timeframe: "all time",
metadata: { category: "code" },
limit: 3,
raw: true
)
parts << "Code Examples:\n" + code.map { |c| c['content'] }.join("\n\n")
end
parts.join("\n\n")
end
Best Practices¶
1. Choose the Right Strategy¶
# Use :recent for conversations
context = htm.working_memory.assemble_context(strategy: :recent)
# Use :frequent for critical operations
context = htm.working_memory.assemble_context(strategy: :frequent)
# Use :balanced as default (recommended)
context = htm.working_memory.assemble_context(strategy: :balanced)
2. Set Appropriate Token Limits¶
# Don't exceed LLM context window
context = htm.working_memory.assemble_context(
strategy: :balanced,
max_tokens: 100_000 # Leave room for prompt
)
# Smaller contexts are faster
context = htm.working_memory.assemble_context(
strategy: :recent,
max_tokens: 5_000 # Quick queries
)
3. Monitor Context Quality¶
def monitor_context
context = htm.working_memory.assemble_context(strategy: :balanced)
puts "Context length: #{context.length} characters"
# Count tokens
tokens = HTM.configuration.count_tokens(context)
puts "Estimated tokens: #{tokens}"
# Check if too small or too large
warn "Context very small!" if tokens < 500
warn "Context very large!" if tokens > 100_000
end
4. Include Metadata¶
def context_with_metadata
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: 20_000)
# Add metadata header
wm = htm.working_memory
<<~CONTEXT
[Context assembled at #{Time.now}]
[Strategy: balanced]
[Working memory: #{wm.node_count} nodes]
[Robot: #{htm.robot_name}]
#{context}
CONTEXT
end
Complete Example¶
require 'htm'
# Initialize HTM
htm = HTM.new(
robot_name: "Context Demo",
working_memory_size: 128_000
)
# Add various memories with metadata for categorization
htm.remember("User prefers Ruby", metadata: { category: "fact", priority: "high" })
htm.remember("Use PostgreSQL for database", metadata: { category: "decision", priority: "high" })
htm.remember("Currently debugging auth module", metadata: { category: "context", priority: "medium" })
htm.remember("def authenticate(token)...", metadata: { category: "code", priority: "medium" })
htm.remember("Check logs later", metadata: { category: "note", priority: "low" })
puts "=== Recent Strategy ==="
recent = htm.working_memory.assemble_context(strategy: :recent, max_tokens: 5_000)
puts recent
puts "\n(Newest first)"
puts "\n=== Frequent Strategy ==="
frequent = htm.working_memory.assemble_context(strategy: :frequent, max_tokens: 5_000)
puts frequent
puts "\n(Most frequently accessed first)"
puts "\n=== Balanced Strategy ==="
balanced = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: 5_000)
puts balanced
puts "\n(Balanced frequency + recency)"
# Use with LLM
def ask_llm(context, question)
prompt = <<~PROMPT
Context:
#{context}
Question: #{question}
Answer:
PROMPT
# Send to your LLM here
puts "\n=== LLM Prompt ==="
puts prompt
end
ask_llm(balanced, "What database are we using?")
Next Steps¶
- Recalling Memories - Populate working memory effectively
- Working Memory - Understand memory management
- Search Strategies - Optimize retrieval for context