Propositions: Atomic Fact Extraction¶

Proposition extraction breaks complex text into atomic, self-contained factual statements. This improves RAG retrieval accuracy by storing granular facts that can be matched more precisely.

Overview¶

When proposition extraction is enabled, HTM:

Stores the original content as a node
Extracts atomic propositions from the content
Creates independent nodes for each proposition
Each proposition gets its own embedding and tags

What is a Proposition?¶

A proposition is an atomic factual statement that:

Expresses a single fact or claim
Is understandable without context
Uses full names, not pronouns
Includes relevant dates, times, and qualifiers
Contains one subject-predicate relationship

Example¶

Original text:

"In 1969, Neil Armstrong became the first person to walk on the Moon during Apollo 11."

Extracted propositions: - "Neil Armstrong was an astronaut." - "Neil Armstrong walked on the Moon in 1969." - "Neil Armstrong was the first person to walk on the Moon." - "Neil Armstrong walked on the Moon during the Apollo 11 mission." - "The Apollo 11 mission occurred in 1969."

Configuration¶

Enable Proposition Extraction¶

# Via configuration block
HTM.configure do |config|
  config.extract_propositions = true
  config.proposition_provider = :ollama  # or :openai, :anthropic, etc.
  config.proposition_model = 'gemma3:latest'
end

# Or via environment variable
# HTM_EXTRACT_PROPOSITIONS=true

Provider Options¶

Proposition extraction uses LLM chat completion. Configure your preferred provider:

Provider	Model Examples
`:ollama` (default)	`gemma3:latest`, `llama3`, `mistral`
`:openai`	`gpt-4o-mini`, `gpt-4o`
`:anthropic`	`claude-3-haiku-20240307`
`:gemini`	`gemini-1.5-flash`

HTM.configure do |config|
  config.extract_propositions = true

  # Use OpenAI for higher quality extraction
  config.proposition_provider = :openai
  config.proposition_model = 'gpt-4o-mini'
end

How It Works¶

Workflow¶

sequenceDiagram
    participant User
    participant HTM
    participant PropositionService
    participant LLM
    participant Database

    User->>HTM: remember("Complex text...")
    HTM->>Database: Save original node
    HTM->>PropositionService: Extract propositions
    PropositionService->>LLM: Parse into atomic facts
    LLM-->>PropositionService: ["Prop 1", "Prop 2", ...]
    loop For each proposition
        PropositionService->>Database: Create proposition node
    end
    HTM-->>User: node_id

Proposition Nodes¶

Proposition nodes are stored with special metadata:

{
  "is_proposition" => true,
  "source_node_id" => 123  # ID of the original node
}

This metadata allows you to: - Identify proposition nodes - Trace propositions back to source - Filter propositions in queries

Usage¶

Basic Usage¶

htm = HTM.new(robot_name: "Proposition Demo")

# With extraction enabled, this creates multiple nodes
node_id = htm.remember(
  "PostgreSQL 16 was released in September 2023 with improved query performance and new JSON features."
)

# The original node plus propositions are created:
# - "PostgreSQL 16 was released in September 2023."
# - "PostgreSQL 16 includes improved query performance."
# - "PostgreSQL 16 includes new JSON features."

Direct Extraction¶

You can extract propositions without storing them:

propositions = HTM.extract_propositions(
  "Ruby 3.3 introduced YJIT improvements and the Prism parser."
)
# => [
#   "Ruby 3.3 introduced YJIT improvements.",
#   "Ruby 3.3 introduced the Prism parser.",
#   "YJIT is a just-in-time compiler for Ruby.",
#   "Prism is a parser for Ruby."
# ]

# Manually store if needed
propositions.each { |p| htm.remember(p) }

Querying Propositions¶

# Find all proposition nodes
propositions = HTM::Models::Node.where("metadata->>'is_proposition' = ?", 'true')

# Find propositions from a specific source
source_node_id = 123
related = HTM::Models::Node.where(
  "metadata->>'source_node_id' = ?",
  source_node_id.to_s
)

# Include propositions in recall (default behavior)
results = htm.recall("PostgreSQL features", strategy: :hybrid)

Recursion Prevention¶

Proposition nodes do not trigger further proposition extraction. This prevents infinite recursion:

# Original node → triggers proposition extraction
htm.remember("Complex statement about many things.")

# Proposition nodes → do NOT trigger extraction
# (metadata.is_proposition = true prevents this)

Performance Considerations¶

Processing Time¶

Proposition extraction adds latency:

Provider	Typical Latency
Ollama (local)	1-3 seconds
OpenAI	0.5-1 second
Anthropic	0.5-1 second

Async Processing¶

With async job backend, extraction happens in background:

HTM.configure do |config|
  config.extract_propositions = true
  config.job.backend = :thread  # or :sidekiq
end

# Returns immediately, propositions created async
node_id = htm.remember("Complex content...")

Storage Impact¶

Proposition extraction increases storage:

Original node: 1 record
Propositions: 3-10 additional records (typical)
Each proposition gets its own embedding

Best Practices¶

When to Use Propositions¶

Good use cases: - Dense factual content (Wikipedia, documentation) - Complex statements with multiple facts - Content that will be queried for specific facts

Less suitable: - Simple, atomic statements - Conversational content - Content where context is critical

Quality Tuning¶

Use a capable model for better extraction:

# Higher quality (slower, costs more)
config.proposition_provider = :openai
config.proposition_model = 'gpt-4o'

# Balanced (faster, local)
config.proposition_provider = :ollama
config.proposition_model = 'gemma3:latest'

Selective Extraction¶

Enable/disable per operation if needed:

# Temporarily disable for specific content
original_setting = HTM.configuration.extract_propositions
HTM.configuration.extract_propositions = false
htm.remember("Simple fact that doesn't need decomposition.")
HTM.configuration.extract_propositions = original_setting

Rake Tasks¶

# Rebuild all propositions (clears and regenerates)
rake htm:db:rebuild:propositions

Adding Memories - Core memory operations
Search Strategies - Querying memories
Tags - Hierarchical tagging (propositions get tags too)
API Reference: PropositionService