Performance Tuning¶
Optimization Strategies and Monitoring¶
Ragdoll is designed for high-performance RAG operations with PostgreSQL and pgvector at its foundation. This guide covers comprehensive optimization strategies for production deployments.
Key Performance Areas¶
- Database optimization: PostgreSQL + pgvector tuning
- Vector search performance: Index strategies and similarity search optimization
- Memory management: Embedding caching and object lifecycle
- Background job performance: ActiveJob and queue optimization
- Application-level optimization: Service layer and API response tuning
Database Performance¶
PostgreSQL + pgvector Optimization¶
Vector Index Configuration¶
-- IVFFlat index for embeddings (default in Ragdoll)
CREATE INDEX CONCURRENTLY ragdoll_embeddings_vector_idx
ON ragdoll_embeddings
USING ivfflat (embedding_vector vector_cosine_ops)
WITH (lists = 100);
-- HNSW index for better performance (PostgreSQL 14+)
CREATE INDEX CONCURRENTLY ragdoll_embeddings_hnsw_idx
ON ragdoll_embeddings
USING hnsw (embedding_vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Connection Pool Configuration¶
# config/database.yml or Ragdoll configuration
Ragdoll::Core.configure do |config|
config.database_config = {
adapter: 'postgresql',
database: 'ragdoll_production',
username: 'ragdoll',
password: ENV['DATABASE_PASSWORD'],
host: 'localhost',
port: 5432,
pool: 20, # Connection pool size
timeout: 5000, # Connection timeout in ms
checkout_timeout: 5, # Pool checkout timeout
reaping_frequency: 10 # Dead connection reaping
}
end
Query Optimization¶
# Optimize embedding searches with proper scoping
results = Ragdoll::Embedding
.includes(:embeddable) # Eager load to avoid N+1
.nearest_neighbors(:embedding_vector, query_vector, distance: "cosine")
.limit(50)
.where("neighbor_distance < ?", 0.3) # Threshold filtering
Memory Allocation Tuning¶
-- PostgreSQL configuration (postgresql.conf)
shared_buffers = 256MB # 25% of RAM for small systems
effective_cache_size = 1GB # Available memory for caching
work_mem = 4MB # Per-operation memory
maintenance_work_mem = 64MB # Maintenance operations
max_connections = 100 # Concurrent connections
-- pgvector specific
hnsw.ef_search = 40 # HNSW search parameter
Vector Search Optimization¶
Index Strategy Selection¶
flowchart TD
A[Vector Search Requirements] --> B{Dataset Size}
B -->|< 100K vectors| C[IVFFlat Index]
B -->|> 100K vectors| D[HNSW Index]
C --> E[Good recall/performance balance]
D --> F[Better performance, higher memory]
E --> G[lists = sqrt(rows)]
F --> H[m=16, ef_construction=64]
G --> I[Moderate build time]
H --> J[Longer build time]
Similarity Threshold Tuning¶
# Performance vs accuracy trade-offs
class SearchEngine
PERFORMANCE_THRESHOLDS = {
high_precision: 0.85, # Fewer, more relevant results
balanced: 0.70, # Default Ragdoll setting
high_recall: 0.50 # More results, lower precision
}
def search_optimized(query, mode: :balanced)
threshold = PERFORMANCE_THRESHOLDS[mode]
search_similar_content(query, threshold: threshold)
end
end
Batch Processing Optimization¶
# Efficient batch embedding generation
class EmbeddingService
def generate_embeddings_batch_optimized(texts, batch_size: 50)
texts.each_slice(batch_size).flat_map do |batch|
generate_embeddings_batch(batch)
end
end
end
Memory Management¶
Embedding Cache Strategies¶
# In-memory embedding cache
class EmbeddingCache
def initialize(max_size: 1000)
@cache = {}
@max_size = max_size
@access_times = {}
end
def get_or_generate(text)
key = Digest::SHA256.hexdigest(text)
if @cache.key?(key)
@access_times[key] = Time.current
return @cache[key]
end
embedding = generate_embedding(text)
store_with_eviction(key, embedding)
embedding
end
private
def store_with_eviction(key, embedding)
if @cache.size >= @max_size
# Evict least recently used
lru_key = @access_times.min_by { |_, time| time }&.first
@cache.delete(lru_key)
@access_times.delete(lru_key)
end
@cache[key] = embedding
@access_times[key] = Time.current
end
end
Object Lifecycle Management¶
# Efficient document processing
class DocumentProcessor
def self.process_large_document(file_path)
File.open(file_path, 'r') do |file|
file.each_line.lazy
.each_slice(1000) # Process in chunks
.each do |chunk|
process_chunk(chunk.join)
GC.start if chunk.size % 10 == 0 # Periodic GC
end
end
end
end
Memory Leak Prevention¶
# Monitor memory usage in background jobs
class GenerateEmbeddings < ActiveJob::Base
def perform(document_id)
memory_before = get_memory_usage
# Process document
process_document(document_id)
memory_after = get_memory_usage
if (memory_after - memory_before) > 100.megabytes
Rails.logger.warn "High memory usage detected: #{memory_after - memory_before} MB"
end
end
private
def get_memory_usage
`ps -o rss= -p #{Process.pid}`.to_i.kilobytes
end
end
Background Job Performance¶
Queue Configuration¶
# Sidekiq configuration for production
Ragdoll::Core.configure do |config|
# Use Sidekiq for background processing
ActiveJob::Base.queue_adapter = :sidekiq
end
# config/sidekiq.yml
:concurrency: 10
:queues:
- [critical, 4]
- [embeddings, 3]
- [default, 2]
- [low, 1]
Worker Scaling Strategies¶
# Dynamic worker scaling based on queue size
class WorkerScaler
def self.scale_workers
queue_size = Sidekiq::Queue.new('embeddings').size
case queue_size
when 0..10
target_workers = 2
when 11..50
target_workers = 5
when 51..100
target_workers = 10
else
target_workers = 15
end
adjust_worker_count(target_workers)
end
end
Batch Processing Techniques¶
# Efficient batch job processing
class BatchEmbeddingJob < ActiveJob::Base
def perform(document_ids)
document_ids.each_slice(10) do |batch|
process_batch(batch)
# Yield control to other jobs
sleep(0.1) if batch.size == 10
end
end
private
def process_batch(document_ids)
documents = Document.where(id: document_ids).includes(:contents)
# Batch generate embeddings
contents = documents.flat_map(&:contents)
texts = contents.map(&:content)
embeddings = EmbeddingService.new.generate_embeddings_batch(texts)
# Bulk insert embeddings
embedding_data = contents.zip(embeddings).map do |content, embedding|
{
embeddable: content,
embedding_vector: embedding,
content: content.content,
chunk_index: 0,
created_at: Time.current,
updated_at: Time.current
}
end
Ragdoll::Embedding.insert_all(embedding_data)
end
end
Application Performance¶
Service Layer Optimization¶
# Optimized search engine with caching
class SearchEngine
def initialize(embedding_service)
@embedding_service = embedding_service
@query_cache = LRUCache.new(100)
end
def search_similar_content(query, **options)
cache_key = "#{query}_#{options.hash}"
@query_cache.fetch(cache_key) do
perform_search(query, **options)
end
end
private
def perform_search(query, **options)
# Cached embedding generation
query_embedding = @embedding_service.generate_embedding_cached(query)
# Optimized database query
Ragdoll::Embedding.search_similar(
query_embedding,
limit: options[:limit] || 20,
threshold: options[:threshold] || 0.7
)
end
end
API Response Optimization¶
# Streaming responses for large result sets
class Client
def search_stream(query:, **options)
Enumerator.new do |yielder|
results = search_similar_content(query: query, **options)
results.find_each(batch_size: 100) do |result|
yielder << format_result(result)
end
end
end
end
Monitoring and Profiling¶
Key Metrics to Track¶
# Custom metrics collection
class PerformanceMonitor
METRICS = {
embedding_generation_time: 'histogram',
search_query_time: 'histogram',
database_connection_pool: 'gauge',
memory_usage: 'gauge',
background_job_queue_size: 'gauge'
}
def self.track_embedding_generation(text_length)
start_time = Time.current
yield
duration = Time.current - start_time
# Log metrics (integrate with your monitoring system)
Rails.logger.info({
metric: 'embedding_generation',
duration_ms: (duration * 1000).round(2),
text_length: text_length,
timestamp: Time.current.iso8601
}.to_json)
end
end
Performance Benchmarking¶
# Benchmark suite for performance regression testing
class PerformanceBenchmark
def self.run_search_benchmark
queries = [
"machine learning algorithms",
"natural language processing",
"computer vision techniques"
]
client = Ragdoll::Core.client
queries.each do |query|
iterations = 100
total_time = 0
iterations.times do
start_time = Time.current
client.search(query: query)
total_time += Time.current - start_time
end
avg_time = (total_time / iterations * 1000).round(2)
puts "Query: #{query} - Average time: #{avg_time}ms"
end
end
end
Scaling Strategies¶
Horizontal Scaling Architecture¶
flowchart TB
A[Load Balancer] --> B[App Server 1]
A --> C[App Server 2]
A --> D[App Server N]
B --> E[PostgreSQL Primary]
C --> E
D --> E
E --> F[PostgreSQL Read Replica 1]
E --> G[PostgreSQL Read Replica 2]
B --> H[Redis Sidekiq]
C --> H
D --> H
H --> I[Sidekiq Worker Pool]
Database Sharding Considerations¶
# Document-based sharding strategy
class ShardedDocument < Document
def self.shard_key(document_id)
Digest::SHA1.hexdigest(document_id.to_s)[0..7].to_i(16) % 4
end
def self.find_sharded(document_id)
shard = shard_key(document_id)
connection_handler.retrieve_connection("shard_#{shard}")
find(document_id)
end
end
Load Testing Strategy¶
# Load testing with realistic usage patterns
class LoadTest
def self.simulate_concurrent_searches(concurrent_users: 50)
threads = []
concurrent_users.times do |i|
threads << Thread.new do
client = Ragdoll::Core.client
10.times do
query = generate_random_query
start_time = Time.current
begin
client.search(query: query)
duration = Time.current - start_time
puts "Thread #{i}: #{query} - #{(duration * 1000).round(2)}ms"
rescue => e
puts "Thread #{i}: Error - #{e.message}"
end
sleep(rand(0.5..2.0)) # Realistic user behavior
end
end
end
threads.each(&:join)
end
end
Production Optimization Checklist¶
- Configure PostgreSQL connection pooling (20+ connections)
- Set up pgvector indexes (IVFFlat or HNSW)
- Implement embedding caching strategy
- Configure background job queues with priorities
- Set up performance monitoring and alerting
- Implement database read replicas for search queries
- Configure memory limits and garbage collection
- Set up load testing and performance benchmarks
- Implement query result caching
- Configure log aggregation and analysis
This document is part of the Ragdoll documentation suite. For immediate help, see the Quick Start Guide or API Reference.