Skip to content

Working Memory Management

Working memory is HTM's token-limited active context system designed for immediate LLM use. This guide explains how it works, how to manage it effectively, and best practices for optimal performance.

What is Working Memory?

Working memory is an in-memory cache that:

  • Stores active memories for fast access
  • Respects token limits (default: 128,000 tokens)
  • Evicts old/unimportant memories when full
  • Syncs with long-term memory for durability

Think of it as RAM for your robot's consciousness - fast, limited, and volatile.

Architecture

Working Memory Architecture

Initialization

Configure working memory size when creating HTM:

# Default: 128K tokens (roughly 512KB of text)
htm = HTM.new(
  robot_name: "Assistant",
  working_memory_size: 128_000
)

# Large working memory for extensive context
htm = HTM.new(
  robot_name: "Long Context Bot",
  working_memory_size: 1_000_000  # 1M tokens
)

# Small working memory for focused tasks
htm = HTM.new(
  robot_name: "Focused Bot",
  working_memory_size: 32_000  # 32K tokens
)

Choosing Memory Size

  • 32K-64K: Focused tasks, single conversations
  • 128K-256K: General purpose, multiple topics (recommended)
  • 512K-1M: Extensive context, long sessions
  • >1M: Specialized use cases only (memory overhead)

How Working Memory Works

Adding Memories

When you remember content, it goes to both working and long-term memory:

htm.remember("User prefers Ruby for scripting")

# Internally:
# 1. Calculate token count
# 2. Store in long-term memory (PostgreSQL)
# 3. Add to working memory (in-memory)
# 4. Check capacity, evict if needed

Recalling Memories

When you recall, memories are added to working memory:

memories = htm.recall(
  "database design",
  timeframe: "last week"
)

# Internally:
# 1. Search long-term memory (RAG)
# 2. For each result:
#    a. Check if space available
#    b. Evict if needed
#    c. Add to working memory

Automatic Eviction

When working memory is full, HTM evicts memories using a smart algorithm:

# Algorithm:
# 1. Calculate eviction score = importance × recency
# 2. Sort by score (lowest first)
# 3. Evict until enough space
# 4. Mark as evicted in long-term memory

Note

Evicted memories are not deleted - they remain in long-term memory and can be recalled later.

Monitoring Utilization

Basic Stats

wm = htm.working_memory

puts "Nodes: #{wm.node_count}"
puts "Tokens: #{wm.token_count} / #{wm.max_tokens}"
puts "Utilization: #{wm.utilization_percentage}%"

Detailed Monitoring

class MemoryMonitor
  def initialize(htm)
    @htm = htm
  end

  def report
    wm = @htm.working_memory

    puts "=== Working Memory Report ==="
    puts "Capacity: #{wm.max_tokens} tokens"
    puts "Used: #{wm.token_count} tokens (#{wm.utilization_percentage}%)"
    puts "Free: #{wm.max_tokens - wm.token_count} tokens"
    puts "Nodes: #{wm.node_count}"
    puts
    puts "Average tokens per node: #{wm.token_count / wm.node_count}" if wm.node_count > 0
    puts
    puts "=== Long-term Memory ==="
    puts "Total nodes: #{HTM::Models::Node.count}"
  end

  def health_check
    util = @htm.working_memory.utilization_percentage

    case util
    when 0..50
      { status: :healthy, message: "Plenty of space" }
    when 51..80
      { status: :warning, message: "Approaching capacity" }
    when 81..95
      { status: :critical, message: "Nearly full, evictions likely" }
    else
      { status: :full, message: "At capacity, frequent evictions" }
    end
  end
end

monitor = MemoryMonitor.new(htm)
monitor.report
health = monitor.health_check
puts "Health: #{health[:status]} - #{health[:message]}"

Eviction Behavior

Understanding Eviction

HTM evicts memories based on two factors:

  1. Importance: Higher importance = less likely to evict
  2. Recency: Newer memories = less likely to evict
# Eviction score calculation
score = importance × (1 / age_in_hours)

# Example scores:
# High importance (9.0), recent (1 hour): 9.0 × 1.0 = 9.0 (keep)
# High importance (9.0), old (24 hours): 9.0 × 0.042 = 0.38 (maybe evict)
# Low importance (2.0), recent (1 hour): 2.0 × 1.0 = 2.0 (evict soon)
# Low importance (2.0), old (24 hours): 2.0 × 0.042 = 0.08 (evict first)

Eviction Example

# Fill working memory
htm = HTM.new(
  robot_name: "Test",
  working_memory_size: 10_000  # Small for demo
)

# Add important fact (stays longer due to higher access frequency)
critical_id = htm.remember(
  "Critical system password",
  metadata: { priority: "critical" }
)

# Add many items
100.times do |i|
  htm.remember("Temporary note #{i}")
end

# Check what survived in working memory
wm = htm.working_memory
puts "Surviving nodes: #{wm.node_count}"

# Critical fact is still in long-term memory
critical = htm.long_term_memory.retrieve(critical_id)
puts "Critical fact present in LTM: #{!critical.nil?}"

Manual Eviction

You can trigger eviction manually:

# Access the eviction mechanism (internal API)
needed_tokens = 50_000

evicted = htm.working_memory.evict_to_make_space(needed_tokens)

puts "Evicted #{evicted.length} memories:"
evicted.each do |mem|
  puts "- #{mem[:key]}: #{mem[:value][0..50]}..."
end

Warning

Manual eviction is rarely needed. HTM handles this automatically during normal operations.

Best Practices

1. Use Metadata for Priority Tracking

# Critical data: Mark with priority metadata
htm.remember(
  "Production API key: secret123",
  metadata: { priority: "critical", category: "credentials" }
)

# Important context: Mark appropriately
htm.remember(
  "User wants to optimize database",
  metadata: { priority: "high", category: "goal" }
)

# Temporary context
htm.remember(
  "Discussing query optimization",
  metadata: { priority: "medium", category: "context" }
)

# Disposable notes
htm.remember(
  "Temporary calculation result",
  metadata: { priority: "low", category: "scratch" }
)

2. Monitor Utilization Regularly

class WorkingMemoryManager
  def initialize(htm, threshold: 80.0)
    @htm = htm
    @threshold = threshold
  end

  def check_and_warn
    util = @htm.working_memory.utilization_percentage

    if util > @threshold
      warn "Working memory at #{util}%!"
      warn "Consider increasing working_memory_size or reducing context"
    end
  end

  def auto_adjust_importance
    util = @htm.working_memory.utilization_percentage

    # If critically full, boost importance of current context
    if util > 90
      # Implementation would require tracking current context keys
      # and updating their importance in the database
      warn "Critical capacity reached"
    end
  end
end

3. Use Context Strategically

Don't load unnecessary data into working memory:

# Bad: Load everything
all_memories = htm.recall(
  "anything",
  timeframe: "all time",
  limit: 1000
)
# This fills working memory with potentially irrelevant data

# Good: Load what you need
relevant = htm.recall(
  "current project",
  timeframe: "last week",
  limit: 20
)
# This keeps working memory focused

4. Clean Up When Done

Remove temporary memories when no longer needed:

def with_temporary_context(htm, value)
  # Add temporary context
  node_id = htm.remember(value, metadata: { temporary: true })

  result = yield

  # Clean up - soft delete by default (recoverable)
  htm.forget(node_id)

  result
end

with_temporary_context(htm, "Temp calculation data") do
  # Use the temporary context
  context = htm.working_memory.assemble_context(strategy: :recent)
  # ... do work with context
end
# Temp data is now soft-deleted

5. Batch Operations Carefully

Be mindful when adding many memories at once:

# Add batch data with monitoring
batch_data.each_with_index do |data, i|
  htm.remember(data, metadata: { batch: "import_001", index: i })

  # Check capacity every 100 items
  if i % 100 == 0
    util = htm.working_memory.utilization_percentage
    puts "Utilization: #{util}%"
  end
end

Working Memory Strategies

Strategy 1: Sliding Window

Keep only recent memories by tracking node IDs:

class SlidingWindow
  def initialize(htm, window_size: 50)
    @htm = htm
    @window_size = window_size
    @node_ids = []
  end

  def add(value, **opts)
    node_id = @htm.remember(value, **opts)
    @node_ids << node_id

    # Forget oldest if window exceeded
    if @node_ids.length > @window_size
      oldest_id = @node_ids.shift
      @htm.forget(oldest_id) rescue nil
    end

    node_id
  end
end

Strategy 2: Priority-Based Management

Use metadata to track priority:

class PriorityManager
  def initialize(htm)
    @htm = htm
  end

  def add(value, priority: "medium", **opts)
    metadata = (opts[:metadata] || {}).merge(priority: priority)
    node_id = @htm.remember(value, **opts.merge(metadata: metadata))

    # If low priority and memory is tight, it will evict naturally
    # HTM uses LFU + LRU eviction based on access patterns

    node_id
  end
end

Strategy 3: Topic-Based Management

Group memories by topic using tags:

class TopicManager
  def initialize(htm)
    @htm = htm
    @topics = Hash.new { |h, k| h[k] = [] }
  end

  def add(value, topic:, **opts)
    tags = (opts[:tags] || []) + ["topic:#{topic}"]
    node_id = @htm.remember(value, **opts.merge(tags: tags))
    @topics[topic] << node_id
    node_id
  end

  def clear_topic(topic)
    node_ids = @topics[topic] || []
    node_ids.each do |node_id|
      @htm.forget(node_id) rescue nil
    end
    @topics.delete(topic)
  end

  def focus_on_topic(topic)
    # Clear all other topics to make space
    @topics.keys.each do |t|
      clear_topic(t) unless t == topic
    end
  end
end

Token Counting

HTM uses Tiktoken to count tokens:

# Token counts vary by content
short = "Hello world"  # ~2 tokens
medium = "A" * 100     # ~25 tokens
long = "word " * 1000  # ~1000 tokens

# Check token count of a string
tokens = HTM.configuration.count_tokens(long)
puts "Token count: #{tokens}"

Token vs Characters

  • 1 token ≈ 4 characters (English)
  • 128K tokens ≈ 512KB text
  • Code uses fewer tokens per character
  • Special characters use more tokens

Performance Considerations

Memory Overhead

Working memory has minimal overhead:

# Memory usage per node (approximate):
# - Key: ~50 bytes
# - Value: N bytes (your content)
# - Metadata: ~100 bytes
# - Total: ~150 bytes + content

# For 1000 nodes with 500-char content:
# 1000 × (150 + 500) = ~650KB

# Token count is stored but content dominates

Access Speed

Working memory is very fast:

require 'benchmark'

htm = HTM.new(robot_name: "Perf Test")

# Add 1000 memories
1000.times do |i|
  htm.remember("Value #{i}")
end

# Benchmark working memory access
Benchmark.bm do |x|
  x.report("assemble_context:") do
    1000.times { htm.working_memory.assemble_context(strategy: :balanced) }
  end
end

# Typical results:
# assemble_context: ~1ms per call

Optimization Tips

# 1. Avoid frequent context assembly
# Bad: Assemble context every message
def process_message(message)
  context = htm.working_memory.assemble_context(strategy: :balanced)  # Slow if called frequently
  llm.chat(context + message)
end

# Good: Cache context, update periodically
@context_cache = nil
@context_age = 0

def process_message(message)
  if @context_cache.nil? || @context_age > 10
    @context_cache = htm.working_memory.assemble_context(strategy: :balanced)
    @context_age = 0
  end
  @context_age += 1

  llm.chat(@context_cache + message)
end

# 2. Use appropriate token limits
# Don't request more than your LLM can handle
context = htm.working_memory.assemble_context(
  strategy: :balanced,
  max_tokens: 100_000  # Match LLM's context window
)

# 3. Monitor and adjust
util = htm.working_memory.utilization_percentage
if util > 90
  # Reduce working memory size or increase eviction
end

Debugging Working Memory

Inspecting Contents

class WorkingMemoryInspector
  def initialize(htm)
    @htm = htm
  end

  def show_contents
    wm = @htm.working_memory

    puts "=== Working Memory Contents ==="
    puts "Total nodes: #{wm.node_count}"
    puts "Total tokens: #{wm.token_count}"
    puts

    # Access internal structure (advanced)
    # Note: This requires access to WorkingMemory internals
    # For production, use public APIs only
  end

  def find_large_nodes(threshold: 1000)
    # Find nodes using many tokens
    # This would require iterating working memory
    # (not directly exposed in current API)
  end

  def show_eviction_candidates
    # Show which nodes would be evicted next
    # Based on importance and recency
  end
end

Common Issues

Issue: Working memory always full

# Check working memory utilization
wm_util = htm.working_memory.utilization_percentage

if wm_util > 95
  puts "Working memory consistently full"
  puts "Solutions:"
  puts "1. Increase working_memory_size when creating HTM"
  puts "2. Reduce recall limit"
  puts "3. Clean up temporary data more frequently with forget()"
end

Issue: Important data getting evicted

# HTM evicts based on access frequency and recency
# Access important data more frequently to keep it in working memory
# Or query it from long-term memory when needed:
critical = htm.long_term_memory.retrieve(critical_node_id)

Issue: Memory utilization too low

# Working memory underutilized
wm_util = htm.working_memory.utilization_percentage

if wm_util < 20
  puts "Working memory underutilized"
  puts "Consider:"
  puts "1. Reducing working_memory_size to save RAM"
  puts "2. Recalling more context with larger limit"
  puts "3. Using larger token limits in assemble_context"
end

Next Steps

Complete Example

require 'htm'

# Initialize with moderate working memory
htm = HTM.new(
  robot_name: "Memory Manager",
  working_memory_size: 128_000
)

# Monitor class
class Monitor
  def initialize(htm)
    @htm = htm
  end

  def report
    wm = @htm.working_memory
    puts "Utilization: #{wm.utilization_percentage}%"
    puts "Nodes: #{wm.node_count}"
    puts "Tokens: #{wm.token_count} / #{wm.max_tokens}"
  end
end

monitor = Monitor.new(htm)

# Add memories with different priorities via metadata
puts "Adding critical data..."
critical_id = htm.remember("Critical system data", metadata: { priority: "critical" })
monitor.report

puts "\nAdding important data..."
10.times do |i|
  htm.remember("Important item #{i}", metadata: { priority: "high" })
end
monitor.report

puts "\nAdding regular data..."
50.times do |i|
  htm.remember("Regular item #{i}", metadata: { priority: "medium" })
end
monitor.report

puts "\nAdding temporary data..."
100.times do |i|
  htm.remember("Temporary item #{i}", metadata: { priority: "low" })
end
monitor.report

# Check that critical data is still in long-term memory
puts "\n=== Survival Check ==="
critical = htm.long_term_memory.retrieve(critical_id)
puts "Critical in LTM: #{!critical.nil?}"

# Create context from working memory
puts "\nCreating context..."
context = htm.working_memory.assemble_context(strategy: :balanced, max_tokens: 50_000)
puts "Context length: #{context.length} characters"

# Final stats
puts "\n=== Final Stats ==="
monitor.report