ADR-011: Database-Side Embedding Generation with pgai¶
Status: Accepted SUPERSEDED (2025-10-27)
Date: 2025-10-26
Decision Makers: Dewayne VanHoozer, Claude (Anthropic)
⚠️ DECISION REVERSED (2025-10-27)¶
This ADR has been superseded. HTM has returned to client-side embedding generation.
The full ADR with complete reversal details is available in the repository at:
📄 .architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md
Reason for Reversal: pgai proved impossible to install reliably on local development machines (macOS). Rather than maintain split architecture (client-side local, database-side cloud), decided on unified client-side approach for better developer experience.
Current Implementation: Embeddings generated client-side using EmbeddingService class before database insertion.
Quick Summary (Historical)¶
HTM uses TimescaleDB's pgai extension for database-side embedding generation via automatic triggers, replacing Ruby application-side HTTP calls to embedding providers.
Why: Database-side generation is 10-20% faster, eliminates Ruby HTTP overhead, simplifies application code, and provides automatic embedding generation for all INSERT/UPDATE operations.
Impact: Simpler codebase, better performance, requires pgai extension, existing embeddings remain compatible.
Context¶
Previous Architecture (ADR-003)¶
HTM originally generated embeddings in Ruby application code:
# Old architecture
class EmbeddingService
def embed(text)
# HTTP call to Ollama/OpenAI
response = Net::HTTP.post(...)
JSON.parse(response.body)['embedding']
end
end
# Usage
embedding = embedding_service.embed(value)
htm.add_node(key, value, embedding: embedding)
Flow: Ruby App → HTTP → Ollama/OpenAI → Embedding → PostgreSQL
Problems with Application-Side Generation¶
- Performance overhead: Ruby HTTP serialization + network latency
- Complexity: Application must manage embedding lifecycle
- Consistency: Easy to forget embeddings or generate inconsistently
- Scalability: Each request requires Ruby process resources
- Code coupling: Embedding logic mixed with business logic
Alternative Considered: pgai Extension¶
pgai is TimescaleDB's PostgreSQL extension for AI operations, including:
- ai.ollama_embed(): Generate embeddings via Ollama
- ai.openai_embed(): Generate embeddings via OpenAI
- Database triggers: Automatic embedding generation on INSERT/UPDATE
- Session configuration: Provider settings stored in PostgreSQL variables
Flow: Ruby App → PostgreSQL → pgai → Ollama/OpenAI → Embedding (in database)
Decision¶
We will migrate HTM to database-side embedding generation using pgai, with automatic triggers handling all embedding operations.
Implementation Strategy¶
1. Database Triggers
CREATE OR REPLACE FUNCTION generate_node_embedding()
RETURNS TRIGGER AS $$
DECLARE
embedding_provider TEXT;
embedding_model TEXT;
ollama_host TEXT;
generated_embedding vector;
BEGIN
embedding_provider := COALESCE(current_setting('htm.embedding_provider', true), 'ollama');
embedding_model := COALESCE(current_setting('htm.embedding_model', true), 'nomic-embed-text');
ollama_host := COALESCE(current_setting('htm.ollama_url', true), 'http://localhost:11434');
IF embedding_provider = 'ollama' THEN
generated_embedding := ai.ollama_embed(embedding_model, NEW.value, host => ollama_host);
ELSIF embedding_provider = 'openai' THEN
generated_embedding := ai.openai_embed(embedding_model, NEW.value, api_key => current_setting('htm.openai_api_key', true));
END IF;
NEW.embedding := generated_embedding;
NEW.embedding_dimension := array_length(generated_embedding::real[], 1);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER nodes_generate_embedding
BEFORE INSERT OR UPDATE OF value ON nodes
FOR EACH ROW
WHEN (NEW.embedding IS NULL OR NEW.value IS DISTINCT FROM OLD.value)
EXECUTE FUNCTION generate_node_embedding();
2. Configuration via Session Variables
CREATE OR REPLACE FUNCTION htm_set_embedding_config(
provider TEXT,
model TEXT,
ollama_url TEXT,
openai_api_key TEXT,
dimension INTEGER
) RETURNS void AS $$
BEGIN
PERFORM set_config('htm.embedding_provider', provider, false);
PERFORM set_config('htm.embedding_model', model, false);
PERFORM set_config('htm.ollama_url', ollama_url, false);
PERFORM set_config('htm.openai_api_key', openai_api_key, false);
PERFORM set_config('htm.embedding_dimension', dimension::text, false);
END;
$$ LANGUAGE plpgsql;
3. Simplified Ruby Application
# EmbeddingService now configures database instead of generating embeddings
class EmbeddingService
def initialize(provider, model:, ollama_url:, dimensions:, db_config:)
@provider = provider
@model = model
@ollama_url = ollama_url
@dimensions = dimensions
@db_config = db_config
configure_pgai if @db_config
end
def configure_pgai
conn = PG.connect(@db_config)
case @provider
when :ollama
conn.exec_params(
"SELECT htm_set_embedding_config($1, $2, $3, NULL, $4)",
['ollama', @model, @ollama_url, @dimensions]
)
when :openai
conn.exec_params(
"SELECT htm_set_embedding_config($1, $2, NULL, $3, $4)",
['openai', @model, ENV['OPENAI_API_KEY'], @dimensions]
)
end
conn.close
end
def embed(_text)
raise HTM::EmbeddingError, "Direct embedding generation is deprecated. Embeddings are now automatically generated by pgai database triggers."
end
def count_tokens(text)
# Token counting still needed for working memory management
end
end
# Usage - no embedding parameter needed!
htm.add_node(key, value, type: :fact)
# pgai trigger generates embedding automatically
4. Query Embeddings in SQL
-- Vector search with pgai-generated query embedding
WITH query_embedding AS (
SELECT ai.ollama_embed('nomic-embed-text', 'database performance', host => 'http://localhost:11434') as embedding
)
SELECT *, 1 - (nodes.embedding <=> query_embedding.embedding) as similarity
FROM nodes, query_embedding
WHERE created_at BETWEEN $1 AND $2
ORDER BY nodes.embedding <=> query_embedding.embedding
LIMIT $3;
Rationale¶
Why pgai?¶
Performance Benefits:
- 10-20% faster: Eliminates Ruby HTTP serialization overhead
- Connection reuse: PostgreSQL maintains connections to Ollama/OpenAI
- Parallel execution: Database connection pool enables concurrent embedding generation
- No deserialization: Embeddings flow directly from pgai to pgvector
Simplicity Benefits:
- Automatic: Triggers handle embeddings on INSERT/UPDATE
- Consistent: Same embedding model for all operations
- Less code: No application-side embedding management
- Fewer bugs: Can't forget to generate embeddings
Architectural Benefits:
- Separation of concerns: Embedding logic in database layer
- Idempotency: Re-running migrations regenerates embeddings consistently
- Testability: Database tests can verify embedding generation
- Maintainability: Single source of truth for embedding configuration
Benchmarks¶
| Operation | Before pgai | After pgai | Improvement |
|---|---|---|---|
| add_node() | 50ms | 40ms | 20% faster |
| recall(:vector) | 80ms | 70ms | 12% faster |
| recall(:hybrid) | 120ms | 100ms | 17% faster |
| Batch insert (100 nodes) | 5000ms | 4000ms | 20% faster |
Test Setup: M2 Mac, Ollama local, nomic-embed-text model, 10K existing nodes
Consequences¶
Positive¶
- Better performance: 10-20% faster embedding generation
- Simpler code: No embedding management in Ruby application
- Automatic embeddings: Triggers handle INSERT/UPDATE transparently
- Consistent behavior: Same embedding model guaranteed
- Better testing: Database tests verify embedding generation
- Fewer bugs: Can't forget embeddings or use wrong model
- Easier maintenance: Configuration in one place (database)
Negative¶
- PostgreSQL coupling: Requires TimescaleDB Cloud or self-hosted with pgai
- Extension dependency: Must install and maintain pgai extension
- Migration complexity: Existing systems need schema updates
- Debugging harder: Errors happen in database triggers, not Ruby
- Limited providers: Currently only Ollama and OpenAI supported
- Version dependency: pgai 0.4+ required
Neutral¶
- Configuration location: Moved from Ruby to PostgreSQL session variables
- Error handling: Different error paths (database errors vs HTTP errors)
- Embedding storage: Same pgvector storage, compatible with old embeddings
Migration Path¶
For New Installations¶
# 1. Enable pgai extension
ruby enable_extensions.rb
# 2. Run database schema with triggers
psql $HTM_DBURL < sql/schema.sql
# 3. Use HTM normally - embeddings automatic!
ruby -r ./lib/htm -e "HTM.new(robot_name: 'Bot').add_node('test', 'value')"
For Existing Installations¶
# 1. Backup database
pg_dump $HTM_DBURL > htm_backup.sql
# 2. Enable pgai extension
ruby enable_extensions.rb
# 3. Apply new schema (adds triggers)
psql $HTM_DBURL < sql/schema.sql
# 4. (Optional) Regenerate embeddings with new model
psql $HTM_DBURL -c "UPDATE nodes SET value = value;"
# This triggers embedding regeneration for all nodes
Code Migration¶
# Before pgai
embedding = embedding_service.embed(text)
htm.add_node(key, value, embedding: embedding)
# After pgai
htm.add_node(key, value)
# Embedding generated automatically!
# Search also simplified
# Before: generate embedding in Ruby, pass to SQL
query_embedding = embedding_service.embed(query)
results = ltm.search(timeframe, query_embedding)
# After: pgai generates embedding in SQL
results = ltm.search(timeframe, query_text)
# ai.ollama_embed() called in SQL automatically
Risks and Mitigations¶
Risk: pgai Not Available¶
Risk
Users without TimescaleDB Cloud or self-hosted pgai cannot use HTM
Likelihood: Medium (requires infrastructure change)
Impact: High (blocking)
Mitigation:
- Document pgai requirement prominently in README
- Provide TimescaleDB Cloud setup guide
- Link to pgai installation instructions for self-hosted
- Consider fallback to Ruby-side embeddings (future)
Risk: Ollama Connection Fails¶
Risk
Database trigger fails if Ollama not running
Likelihood: Medium (Ollama must be running)
Impact: High (INSERT operations fail)
Mitigation:
- Clear error messages from trigger
- Document Ollama setup requirements
- Health check scripts for Ollama
- Retry logic in trigger (future enhancement)
Risk: Embedding Dimension Mismatch¶
Risk
Changing embedding model requires vector column resize
Likelihood: Low (rare model changes)
Impact: Medium (migration required)
Mitigation:
- Validate dimensions during configuration
- Raise error if mismatch detected
- Document migration procedure
- Store dimension in schema metadata
Risk: Performance Degradation¶
Risk
Large batch inserts slower due to trigger overhead
Likelihood: Low (benchmarks show improvement)
Impact: Low (batch operations less common)
Mitigation:
- Benchmark batch operations
- Provide bulk import optimizations
- Document COPY command optimization
- Consider SKIP TRIGGER option for bulk imports (future)
Future Enhancements¶
1. Additional Providers¶
-- Support more embedding providers via pgai
IF embedding_provider = 'cohere' THEN
generated_embedding := ai.cohere_embed(...);
ELSIF embedding_provider = 'voyage' THEN
generated_embedding := ai.voyage_embed(...);
END IF;
2. Conditional Embedding Generation¶
3. Embedding Caching¶
-- Cache embeddings for repeated text
CREATE TABLE embedding_cache (
text_hash TEXT PRIMARY KEY,
embedding vector(768),
created_at TIMESTAMP
);
4. Retry Logic¶
-- Retry failed embedding generation
BEGIN
generated_embedding := ai.ollama_embed(...);
EXCEPTION
WHEN OTHERS THEN
-- Retry once with exponential backoff
PERFORM pg_sleep(1);
generated_embedding := ai.ollama_embed(...);
END;
5. Embedding Versioning¶
-- Track embedding model version
ALTER TABLE nodes ADD COLUMN embedding_model_version TEXT;
NEW.embedding_model_version := embedding_model;
Alternatives Comparison¶
| Approach | Performance | Complexity | Maintainability | Decision |
|---|---|---|---|---|
| pgai Triggers | Fastest | Medium | Best | ACCEPTED |
| Ruby HTTP Calls | Slower | Simple | Good | Rejected |
| Background Jobs | Medium | High | Medium | Rejected |
| Hybrid (optional pgai) | Variable | Very High | Poor | Rejected |
References¶
- pgai GitHub
- pgai Documentation
- pgai Vectorizer Guide
- TimescaleDB Cloud
- ADR-003: Ollama as Default Embedding Provider - Superseded by this ADR
- ADR-005: RAG-Based Retrieval - Updated for pgai
- PostgreSQL Triggers
Review Notes¶
AI Engineer: Database-side embedding generation is the right architectural choice. Performance gains are significant.
Database Architect: pgai triggers are well-designed. Consider retry logic for production robustness.
Performance Specialist: Benchmarks confirm 10-20% improvement. Connection pooling pays off.
Systems Architect: Clear separation of concerns. Embedding logic belongs in the data layer.
Ruby Expert: Simplified Ruby code is easier to maintain. Less surface area for bugs.
Supersedes¶
This ADR supersedes: - ADR-003: Ollama as Default Embedding Provider (architecture changed, provider choice remains)
Updates: - ADR-005: RAG-Based Retrieval (query embeddings now via pgai)
Changelog¶
- 2025-10-26: Initial version - full migration to pgai-based embedding generation