Architecture Overview¶

HTM (Hierarchical Temporal Memory) implements a sophisticated two-tier memory system designed specifically for LLM-based applications ("robots"). This architecture enables robots to maintain long-term context across sessions while managing token budgets efficiently.

System Overview¶

HTM provides intelligent memory management through five core components that work together to deliver persistent, searchable, and context-aware memory for AI agents.

HTM Coordination Layer

Working Memory Token-Limited In-Memory LRU Eviction

Long-Term Memory PostgreSQL Unlimited Capacity Durable Storage

Embedding Service Multi-Provider (RubyLLM) Vector Embeddings Semantic Search

Database PostgreSQL 16+ TimescaleDB pgvector + pg_trgm

manages persists generates stores

Data Flow: Add Memory → Working Memory → Long-Term Memory (persistent) Recall → Long-Term (RAG search) → Working Memory (evict if needed)

Core Components¶

HTM (Main Interface)¶

The HTM class is the primary interface for memory operations. It coordinates between working memory, long-term memory, and embedding services to provide a unified API.

Key Responsibilities:

Initialize and coordinate all memory subsystems
Manage robot identification and registration
Generate embeddings for new memories
Orchestrate recall operations with RAG-based retrieval
Assemble context for LLM consumption
Track memory statistics and robot activity

Related ADRs: ADR-002, ADR-008

Working Memory¶

Token-limited, in-memory storage for active conversation context. Working memory acts as a fast cache for recently accessed or highly important memories that the LLM needs immediate access to.

Characteristics:

Capacity: Token-limited (default: 128,000 tokens)
Storage: Ruby Hash (in-memory)
Eviction: Hybrid importance + recency (LRU-based)
Lifetime: Process lifetime
Access Time: O(1) hash lookups

Related ADRs: ADR-002, ADR-007

Long-Term Memory¶

Durable PostgreSQL storage for permanent knowledge retention. All memories are stored here permanently unless explicitly deleted.

Characteristics:

Capacity: Effectively unlimited
Storage: PostgreSQL with TimescaleDB extension
Retention: Permanent (explicit deletion only)
Access Pattern: RAG-based retrieval (semantic + temporal)
Lifetime: Forever

Related ADRs: ADR-001, ADR-005

Embedding Service¶

Generates vector embeddings for semantic search and manages token counting for memory management.

Supported Providers (via RubyLLM):

Ollama (default): Local embedding models (nomic-embed-text, mxbai-embed-large)
OpenAI: text-embedding-3-small, text-embedding-3-large
Gemini: text-embedding-004
Azure: Azure OpenAI Service
Bedrock: Amazon Titan, Cohere models
DeepSeek: DeepSeek embeddings

Related ADRs: ADR-003

Database¶

PostgreSQL 16+ with extensions for time-series optimization, vector similarity search, and full-text search.

Key Extensions:

TimescaleDB: Hypertable partitioning, compression policies, time-range optimization
pgvector: Vector similarity search with HNSW indexing
pg_trgm: Trigram-based fuzzy text matching

Related ADRs: ADR-001

Component Interaction Flow¶

Adding a Memory¶

sequenceDiagram
    participant User
    participant HTM
    participant EmbeddingService
    participant LongTermMemory
    participant WorkingMemory
    participant Database

    User->>HTM: add_node(key, value, ...)
    HTM->>EmbeddingService: embed(value)
    EmbeddingService-->>HTM: embedding vector
    HTM->>EmbeddingService: count_tokens(value)
    EmbeddingService-->>HTM: token_count
    HTM->>LongTermMemory: add(key, value, embedding, ...)
    LongTermMemory->>Database: INSERT INTO nodes
    Database-->>LongTermMemory: node_id
    LongTermMemory-->>HTM: node_id
    HTM->>WorkingMemory: add(key, value, token_count, ...)
    Note over WorkingMemory: Evict if needed
    WorkingMemory-->>HTM: success
    HTM-->>User: node_id

Recalling Memories¶

sequenceDiagram
    participant User
    participant HTM
    participant LongTermMemory
    participant EmbeddingService
    participant Database
    participant WorkingMemory

    User->>HTM: recall(timeframe, topic, ...)
    HTM->>EmbeddingService: embed(topic)
    EmbeddingService-->>HTM: query_embedding
    HTM->>LongTermMemory: search(timeframe, embedding, ...)
    LongTermMemory->>Database: SELECT with vector similarity
    Database-->>LongTermMemory: matching nodes
    LongTermMemory-->>HTM: recalled_memories
    loop For each recalled memory
        HTM->>WorkingMemory: add(memory)
        Note over WorkingMemory: Evict old memories if needed
    end
    HTM-->>User: recalled_memories

Key Architectural Principles¶

1. Never Forget (Unless Told)¶

HTM implements a "never forget" philosophy. Eviction from working memory moves data to long-term storage, it doesn't delete it. Only explicit forget(key, confirm: :confirmed) operations delete data.

Design Principle

Memory eviction is about managing working memory tokens, not data deletion. All evicted memories remain searchable and recallable from long-term storage.

Related ADRs: ADR-009

2. Two-Tier Memory Hierarchy¶

Working memory provides fast O(1) access to recent/important context, while long-term memory provides unlimited durable storage with RAG-based retrieval.

Performance Benefit

This architecture balances the competing needs of fast access (working memory) and unlimited retention (long-term memory).

Related ADRs: ADR-002

3. Hive Mind Architecture¶

All robots share a global long-term memory database, enabling cross-robot learning and context continuity. Each robot maintains its own working memory for process isolation.

Multi-Robot Collaboration

Knowledge gained by one robot benefits all robots. Users never need to repeat information across sessions or robots.

Related ADRs: ADR-004

4. RAG-Based Retrieval¶

HTM uses Retrieval-Augmented Generation patterns with hybrid search strategies combining semantic similarity (vector search) and temporal relevance (time-range filtering).

Search Strategies

Vector: Pure semantic similarity
Full-text: Keyword-based search
Hybrid: Combines both with RRF scoring

Related ADRs: ADR-005

5. Importance-Weighted Eviction¶

Working memory eviction prioritizes low-importance older memories first, preserving critical context even if it's old.

Token Budget Management

Eviction is inevitable with finite token limits. The hybrid importance + recency strategy ensures the most valuable memories stay in working memory.

Related ADRs: ADR-007

Memory Lifecycle¶

stateDiagram-v2
    [*] --> Created: add_node()
    Created --> InWorkingMemory: Add to WM
    Created --> InLongTermMemory: Persist to LTM

    InWorkingMemory --> Evicted: Token limit reached
    Evicted --> InLongTermMemory: Mark as evicted

    InLongTermMemory --> Recalled: recall()
    Recalled --> InWorkingMemory: Add back to WM

    InWorkingMemory --> [*]: Process ends
    InLongTermMemory --> Forgotten: forget(confirm: :confirmed)
    Forgotten --> [*]: Permanently deleted

    note right of InWorkingMemory
        Fast O(1) access
        Token-limited
        Process-local
    end note

    note right of InLongTermMemory
        Durable PostgreSQL
        Unlimited capacity
        RAG retrieval
    end note

Architecture Documents¶

Explore detailed architecture documentation:

Detailed Architecture - Deep dive into system architecture, data flows, and performance characteristics
Two-Tier Memory System - Working memory and long-term memory design, eviction strategies, and context assembly
Hive Mind Architecture - Multi-robot shared memory, robot identification, and cross-robot knowledge sharing

Technology Stack¶

Layer	Technology	Purpose
Language	Ruby 3.2+	Core implementation
Database	PostgreSQL 16+	Relational storage
Time-Series	TimescaleDB	Hypertable partitioning, compression
Vector Search	pgvector	Semantic similarity (HNSW)
Full-Text	pg_trgm	Fuzzy text matching
Embeddings	RubyLLM (multi-provider)	Vector generation
Connection Pool	connection_pool gem	Database connection management
Testing	Minitest	Test framework

Performance Characteristics¶

Working Memory¶

Add: O(1) amortized (eviction is O(n log n) when needed)
Retrieve: O(1) hash lookup
Context Assembly: O(n log n) for sorting, O(k) for selecting
Typical Size: 50-200 nodes (~128K tokens)

Long-Term Memory¶

Add: O(log n) with PostgreSQL indexes
Vector Search: O(log n) with HNSW (approximate)
Full-Text Search: O(log n) with GIN indexes
Hybrid Search: O(log n) + merge
Typical Size: Thousands to millions of nodes

Overall System¶

Memory Addition: < 100ms (including embedding generation)
Recall Operation: < 200ms (typical hybrid search)
Context Assembly: < 10ms (working memory sort)
Eviction: < 10ms (rare, only when working memory full)

Scalability Considerations¶

Vertical Scaling¶

Working Memory: Limited by process RAM (~1-2GB for 128K tokens)
Database: PostgreSQL scales to TBs with proper indexing
Embeddings: Local models (Ollama) bounded by GPU/CPU; cloud providers scale independently

Horizontal Scaling¶

Multiple Robots: Each robot process has independent working memory
Database: Single shared PostgreSQL instance (can add replicas)
Read Replicas: For query scaling (future consideration)
Sharding: By robot_id or timeframe (future consideration)

Scaling Strategy

Start with single PostgreSQL instance. Add read replicas when query load increases. Consider partitioning by robot_id for multi-tenant scenarios.

Installation Guide - Setup PostgreSQL, TimescaleDB, and dependencies
Quick Start - Get started with HTM in 5 minutes
API Reference - Complete API documentation
Architecture Decision Records - Detailed decision history

Architecture Reviews¶

All architecture decisions are documented in ADRs and reviewed by domain experts:

Systems Architect: Overall system design and scalability
Database Architect: PostgreSQL schema and query optimization
AI Engineer: Embedding strategies and RAG implementation
Performance Specialist: Latency and throughput analysis
Ruby Expert: Idiomatic Ruby patterns and best practices
Security Specialist: Data privacy and access control

See Architecture Decision Records for complete review notes.

Architecture Overview¶

System Overview¶

Core Components¶

HTM (Main Interface)¶

Working Memory¶

Long-Term Memory¶

Embedding Service¶

Database¶

Component Interaction Flow¶

Adding a Memory¶

Recalling Memories¶

Key Architectural Principles¶

1. Never Forget (Unless Told)¶

2. Two-Tier Memory Hierarchy¶

3. Hive Mind Architecture¶

4. RAG-Based Retrieval¶

5. Importance-Weighted Eviction¶

Memory Lifecycle¶

Architecture Documents¶

Technology Stack¶

Performance Characteristics¶

Working Memory¶

Long-Term Memory¶

Overall System¶

Scalability Considerations¶

Vertical Scaling¶

Horizontal Scaling¶

Related Documentation¶

Architecture Reviews¶