Monitoring & Analytics¶
Ragdoll provides comprehensive monitoring and analytics capabilities built directly into the PostgreSQL database schema. The system tracks usage patterns, performance metrics, and system health through ActiveRecord models and native PostgreSQL features.
Usage Tracking and System Health¶
The monitoring system is built around PostgreSQL's native capabilities and pgvector optimization, providing real-time insights into system performance, usage patterns, and content analytics. All monitoring data is stored in the same database as your content, ensuring consistency and reducing infrastructure complexity.
Usage Analytics¶
Ragdoll automatically tracks usage patterns through the embedding model's built-in analytics fields. This data drives both performance optimization and business intelligence.
Search Analytics¶
Every search operation is tracked through the usage_count and returned_at fields in the embeddings table:
# Get search frequency data
freq_data = Ragdoll::Embedding
.frequently_used
.group(:embeddable_id)
.sum(:usage_count)
# Popular content identification
popular_embeddings = Ragdoll::Embedding
.where('usage_count > ?', 10)
.joins(:embeddable)
.includes(embeddable: :document)
.order(usage_count: :desc)
# Recent search patterns
recent_activity = Ragdoll::Embedding
.where('returned_at > ?', 7.days.ago)
.group_by_day(:returned_at)
.count
# Query performance metrics
search_performance = {
avg_results_per_query: popular_embeddings.average(:usage_count),
total_searches: Ragdoll::Embedding.sum(:usage_count),
unique_content_accessed: Ragdoll::Embedding.where('usage_count > 0').count
}
Embedding Analytics¶
Track embedding model performance and usage patterns:
# Embedding usage by model type
usage_by_model = Ragdoll::Embedding
.joins(:embeddable)
.group('ragdoll_contents.embedding_model')
.count
# Vector quality metrics through similarity distribution
similarity_stats = {
high_quality: Ragdoll::Embedding.where('usage_count > 5').count,
medium_quality: Ragdoll::Embedding.where('usage_count BETWEEN 1 AND 5').count,
unused: Ragdoll::Embedding.where(usage_count: 0).count
}
# Cache effectiveness (usage-based ranking)
cache_metrics = {
frequently_accessed: Ragdoll::Embedding
.where('returned_at > ?', 24.hours.ago)
.average(:usage_count),
recency_distribution: Ragdoll::Embedding
.where('returned_at IS NOT NULL')
.group_by_day(:returned_at, last: 30)
.count
}
Document Analytics¶
Monitor document processing and access patterns:
# Document processing success rates
processing_stats = Ragdoll::Document.group(:status).count
# => {"processed"=>45, "pending"=>3, "error"=>2}
# Content type distribution
content_distribution = Ragdoll::Document.group(:document_type).count
# => {"text"=>25, "pdf"=>15, "image"=>8, "audio"=>2}
# Comprehensive document statistics
doc_stats = Ragdoll::Document.stats
# Returns detailed hash with processing metrics, content counts, etc.
# Storage utilization by content type
storage_metrics = {
text_content_count: Ragdoll::TextContent.count,
image_content_count: Ragdoll::ImageContent.count,
audio_content_count: Ragdoll::AudioContent.count,
total_embeddings: Ragdoll::Embedding.count,
avg_embeddings_per_document: Ragdoll::Document
.joins(:text_embeddings, :image_embeddings, :audio_embeddings)
.average('COUNT(*)')
}
System Health Monitoring¶
Monitor system health through PostgreSQL native features and ActiveRecord connection management.
Database Health¶
Utilize PostgreSQL's built-in statistics and monitoring capabilities:
# Connection pool status
pool_status = ActiveRecord::Base.connection_pool.stat
# => {size: 20, checked_out: 3, checked_in: 17, ...}
# Query performance metrics using PostgreSQL pg_stat_statements
ActiveRecord::Base.connection.execute("
SELECT query, calls, total_time, mean_time, rows
FROM pg_stat_statements
WHERE query LIKE '%ragdoll%'
ORDER BY mean_time DESC;
")
# Index usage statistics
index_stats = ActiveRecord::Base.connection.execute("
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
AND tablename LIKE 'ragdoll_%';
")
# Storage utilization
table_sizes = ActiveRecord::Base.connection.execute("
SELECT
tablename,
pg_size_pretty(pg_total_relation_size('ragdoll_'||tablename)) as size
FROM pg_tables
WHERE tablename LIKE 'ragdoll_%';
")
Background Job Health¶
Monitor ActiveJob performance through document status tracking:
# Job success/failure tracking through document status
job_health = {
successful_processing: Ragdoll::Document.where(status: 'processed').count,
failed_processing: Ragdoll::Document.where(status: 'error').count,
pending_jobs: Ragdoll::Document.where(status: 'pending').count,
currently_processing: Ragdoll::Document.where(status: 'processing').count
}
# Processing time analysis
recent_docs = Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
processing_times = recent_docs.map do |doc|
(doc.updated_at - doc.created_at).to_i
end
performance_metrics = {
avg_processing_time: processing_times.sum / processing_times.length,
median_processing_time: processing_times.sort[processing_times.length / 2],
max_processing_time: processing_times.max,
success_rate: (job_health[:successful_processing].to_f /
(job_health[:successful_processing] + job_health[:failed_processing])) * 100
}
Memory and CPU Monitoring¶
Integrate with system monitoring tools and PostgreSQL process information:
# PostgreSQL process monitoring
process_info = ActiveRecord::Base.connection.execute("
SELECT
pid,
usename,
application_name,
state,
query_start,
query
FROM pg_stat_activity
WHERE application_name LIKE '%ragdoll%';
")
# Memory usage through Ruby process monitoring
memory_usage = {
process_memory: `ps -o rss= -p #{Process.pid}`.to_i * 1024, # bytes
gc_stats: GC.stat,
object_space: ObjectSpace.count_objects
}
# Connection monitoring
connection_health = {
active_connections: ActiveRecord::Base.connection_pool.connections.size,
checked_out: ActiveRecord::Base.connection_pool.stat[:checked_out],
available: ActiveRecord::Base.connection_pool.stat[:size] -
ActiveRecord::Base.connection_pool.stat[:checked_out]
}
Metrics Collection¶
Ragdoll provides built-in metrics collection through ActiveRecord queries and PostgreSQL native features.
Built-in Metrics Endpoints¶
Create monitoring endpoints using the comprehensive statistics methods:
# Complete system metrics collection
def collect_system_metrics
{
timestamp: Time.current.iso8601,
documents: Ragdoll::Document.stats,
embeddings: {
total: Ragdoll::Embedding.count,
by_type: {
text: Ragdoll::Embedding.text_embeddings.count,
image: Ragdoll::Embedding.image_embeddings.count,
audio: Ragdoll::Embedding.audio_embeddings.count
},
usage_stats: {
total_searches: Ragdoll::Embedding.sum(:usage_count),
active_last_24h: Ragdoll::Embedding
.where('returned_at > ?', 24.hours.ago).count,
never_used: Ragdoll::Embedding.where(usage_count: 0).count
}
},
performance: {
avg_similarity_threshold: Ragdoll.config.search[:similarity_threshold],
max_results_configured: Ragdoll.config.search[:max_results],
analytics_enabled: Ragdoll.config.search[:enable_analytics]
}
}
end
# Usage pattern metrics
def collect_usage_metrics
{
popular_content: Ragdoll::Embedding
.frequently_used
.limit(10)
.includes(embeddable: :document)
.map { |e| {
document_title: e.embeddable.document.title,
usage_count: e.usage_count,
last_accessed: e.returned_at
}},
search_patterns: {
daily_searches: Ragdoll::Embedding
.where('returned_at > ?', 30.days.ago)
.group_by_day(:returned_at)
.sum(:usage_count),
content_type_preferences: Ragdoll::Embedding
.joins(:embeddable)
.group('ragdoll_contents.type')
.sum(:usage_count)
}
}
end
Custom Metrics Definition¶
Extend the monitoring system with custom business metrics:
class CustomMetrics
def self.document_processing_velocity
# Documents processed per hour over last 24 hours
Ragdoll::Document
.where('updated_at > ?', 24.hours.ago)
.where(status: 'processed')
.group_by_hour(:updated_at)
.count
end
def self.embedding_quality_distribution
# Distribution of embedding usage as quality indicator
{
high_quality: Ragdoll::Embedding.where('usage_count >= 10').count,
medium_quality: Ragdoll::Embedding.where('usage_count BETWEEN 3 AND 9').count,
low_quality: Ragdoll::Embedding.where('usage_count BETWEEN 1 AND 2').count,
unused: Ragdoll::Embedding.where(usage_count: 0).count
}
end
def self.multi_modal_adoption
# Track multi-modal document usage
{
text_only: Ragdoll::Document.joins(:text_contents)
.where.not(id: Ragdoll::Document.joins(:image_contents).select(:id))
.where.not(id: Ragdoll::Document.joins(:audio_contents).select(:id))
.count,
multi_modal: Ragdoll::Document
.where(id: Ragdoll::Document.joins(:text_contents, :image_contents).select(:id))
.or(Ragdoll::Document.where(id: Ragdoll::Document.joins(:text_contents, :audio_contents).select(:id)))
.count
}
end
end
Data Export Capabilities¶
Export metrics data for external analysis:
# Export to JSON for external monitoring systems
def export_metrics_json(start_date: 30.days.ago, end_date: Time.current)
{
export_info: {
generated_at: Time.current.iso8601,
period: { start: start_date.iso8601, end: end_date.iso8601 },
ragdoll_version: Ragdoll::Core::VERSION
},
documents: {
created: Ragdoll::Document
.where(created_at: start_date..end_date)
.group(:status, :document_type)
.count,
processing_times: Ragdoll::Document
.where(created_at: start_date..end_date, status: 'processed')
.pluck(:created_at, :updated_at)
.map { |created, updated| (updated - created).to_i }
},
search_analytics: {
total_searches: Ragdoll::Embedding
.where(returned_at: start_date..end_date)
.sum(:usage_count),
unique_content_accessed: Ragdoll::Embedding
.where(returned_at: start_date..end_date)
.distinct
.count(:embeddable_id)
}
}.to_json
end
# Export to CSV for spreadsheet analysis
def export_usage_csv
require 'csv'
CSV.generate(headers: true) do |csv|
csv << ['Document Title', 'Document Type', 'Embedding Count', 'Total Usage', 'Last Accessed']
Ragdoll::Document.includes(:contents, :text_embeddings).find_each do |doc|
csv << [
doc.title,
doc.document_type,
doc.total_embedding_count,
doc.text_embeddings.sum(:usage_count),
doc.text_embeddings.maximum(:returned_at)
]
end
end
end
Historical Data Retention¶
Manage historical metrics data with PostgreSQL partitioning and cleanup:
# Data retention policies
class MetricsRetention
def self.cleanup_old_usage_data(retention_days: 365)
# Clear old returned_at timestamps but keep usage_count
old_threshold = retention_days.days.ago
Ragdoll::Embedding
.where('returned_at < ?', old_threshold)
.update_all(returned_at: nil)
end
def self.archive_document_metrics(archive_after_days: 180)
# Archive processed documents older than threshold
archive_threshold = archive_after_days.days.ago
old_docs = Ragdoll::Document
.where('created_at < ? AND status = ?', archive_threshold, 'processed')
# Could export to JSON before cleanup
archive_data = old_docs.map(&:to_hash)
# Store archive data or remove based on retention policy
# This is application-specific implementation
end
def self.metrics_summary_for_period(days: 30)
period_start = days.days.ago
{
period: "#{days} days",
documents_processed: Ragdoll::Document
.where('created_at > ? AND status = ?', period_start, 'processed')
.count,
total_searches: Ragdoll::Embedding
.where('returned_at > ?', period_start)
.sum(:usage_count),
new_embeddings: Ragdoll::Embedding
.where('created_at > ?', period_start)
.count
}
end
end
Performance Dashboards¶
Create comprehensive dashboards using the built-in metrics collection and PostgreSQL analytics capabilities.
Real-time Performance Views¶
Build live monitoring dashboards with ActiveRecord queries:
class PerformanceDashboard
def self.realtime_stats
{
current_time: Time.current.iso8601,
system_status: {
total_documents: Ragdoll::Document.count,
processing_queue: Ragdoll::Document.where(status: 'pending').count,
currently_processing: Ragdoll::Document.where(status: 'processing').count,
failed_jobs: Ragdoll::Document.where(status: 'error').count
},
recent_activity: {
documents_added_today: Ragdoll::Document
.where('created_at > ?', Date.current)
.count,
searches_last_hour: Ragdoll::Embedding
.where('returned_at > ?', 1.hour.ago)
.sum(:usage_count),
embeddings_created_today: Ragdoll::Embedding
.where('created_at > ?', Date.current)
.count
},
performance_indicators: {
avg_search_quality: Ragdoll::Embedding
.where('returned_at > ?', 24.hours.ago)
.average(:usage_count) || 0,
content_utilization: (Ragdoll::Embedding.where('usage_count > 0').count.to_f /
Ragdoll::Embedding.count * 100).round(2),
processing_success_rate: calculate_success_rate
}
}
end
def self.calculate_success_rate
total = Ragdoll::Document.count
return 0 if total == 0
successful = Ragdoll::Document.where(status: 'processed').count
(successful.to_f / total * 100).round(2)
end
end
# Usage in a web dashboard
def dashboard_data
{
realtime: PerformanceDashboard.realtime_stats,
queue_health: {
processing_velocity: documents_per_hour,
error_rate: error_percentage_last_24h,
average_processing_time: avg_processing_time_minutes
}
}
end
Historical Trend Analysis¶
Analyze trends over time using PostgreSQL date functions:
class TrendAnalysis
def self.document_processing_trends(days: 30)
end_date = Date.current
start_date = end_date - days.days
{
daily_processing: Ragdoll::Document
.where(created_at: start_date..end_date)
.where(status: 'processed')
.group_by_day(:updated_at, range: start_date..end_date)
.count,
daily_failures: Ragdoll::Document
.where(created_at: start_date..end_date)
.where(status: 'error')
.group_by_day(:updated_at, range: start_date..end_date)
.count,
content_type_trends: Ragdoll::Document
.where(created_at: start_date..end_date)
.group_by_week(:created_at, range: start_date..end_date)
.group(:document_type)
.count
}
end
def self.search_usage_trends(days: 30)
end_date = Date.current
start_date = end_date - days.days
{
daily_searches: Ragdoll::Embedding
.where(returned_at: start_date..end_date)
.group_by_day(:returned_at, range: start_date..end_date)
.sum(:usage_count),
popular_content_over_time: Ragdoll::Embedding
.joins(embeddable: :document)
.where(returned_at: start_date..end_date)
.group('ragdoll_documents.document_type')
.group_by_week(:returned_at, range: start_date..end_date)
.sum(:usage_count),
embedding_efficiency: Ragdoll::Embedding
.where('created_at > ?', start_date)
.group_by_week(:created_at, range: start_date..end_date)
.average(:usage_count)
}
end
end
Custom Dashboard Creation¶
Framework for building custom monitoring dashboards:
class CustomDashboard
attr_reader :widgets, :refresh_interval
def initialize(name:, refresh_interval: 30)
@name = name
@refresh_interval = refresh_interval
@widgets = []
end
def add_widget(type:, title:, query_method:, **options)
@widgets << {
type: type, # :counter, :chart, :table, :gauge
title: title,
query_method: query_method,
options: options
}
end
def render_data
@widgets.map do |widget|
{
type: widget[:type],
title: widget[:title],
data: send(widget[:query_method]),
options: widget[:options],
last_updated: Time.current.iso8601
}
end
end
# Example widget methods
def document_count_by_type
Ragdoll::Document.group(:document_type).count
end
def embedding_usage_distribution
{
'Never Used' => Ragdoll::Embedding.where(usage_count: 0).count,
'Low Usage (1-5)' => Ragdoll::Embedding.where(usage_count: 1..5).count,
'Medium Usage (6-20)' => Ragdoll::Embedding.where(usage_count: 6..20).count,
'High Usage (21+)' => Ragdoll::Embedding.where('usage_count > 20').count
}
end
def recent_processing_times
Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
.limit(50)
.pluck(:created_at, :updated_at)
.map { |created, updated|
{
document_id: created.to_i,
processing_time: ((updated - created) / 60).round(2) # minutes
}
}
end
end
# Usage example
dashboard = CustomDashboard.new(name: "Ragdoll System Health", refresh_interval: 60)
dashboard.add_widget(type: :counter, title: "Total Documents",
query_method: :document_count_by_type)
dashboard.add_widget(type: :chart, title: "Embedding Usage",
query_method: :embedding_usage_distribution)
dashboard.add_widget(type: :table, title: "Recent Processing Times",
query_method: :recent_processing_times)
Alert Configuration¶
Set up monitoring alerts based on system thresholds:
class AlertSystem
ALERT_THRESHOLDS = {
error_rate_percentage: 5.0,
queue_length: 100,
processing_time_minutes: 30,
disk_usage_percentage: 80.0,
connection_pool_usage: 80.0
}
def self.check_system_health
alerts = []
# Check error rate
error_rate = calculate_error_rate
if error_rate > ALERT_THRESHOLDS[:error_rate_percentage]
alerts << create_alert(
severity: :high,
type: :error_rate,
message: "Error rate #{error_rate}% exceeds threshold #{ALERT_THRESHOLDS[:error_rate_percentage]}%",
current_value: error_rate
)
end
# Check queue length
queue_length = Ragdoll::Document.where(status: 'pending').count
if queue_length > ALERT_THRESHOLDS[:queue_length]
alerts << create_alert(
severity: :medium,
type: :queue_length,
message: "Processing queue length #{queue_length} exceeds threshold #{ALERT_THRESHOLDS[:queue_length]}",
current_value: queue_length
)
end
# Check connection pool usage
pool_usage = connection_pool_usage_percentage
if pool_usage > ALERT_THRESHOLDS[:connection_pool_usage]
alerts << create_alert(
severity: :high,
type: :connection_pool,
message: "Connection pool usage #{pool_usage}% exceeds threshold #{ALERT_THRESHOLDS[:connection_pool_usage]}%",
current_value: pool_usage
)
end
alerts
end
private
def self.calculate_error_rate
total_docs = Ragdoll::Document.where('created_at > ?', 24.hours.ago).count
return 0 if total_docs == 0
error_docs = Ragdoll::Document.where('created_at > ?', 24.hours.ago)
.where(status: 'error').count
(error_docs.to_f / total_docs * 100).round(2)
end
def self.connection_pool_usage_percentage
pool = ActiveRecord::Base.connection_pool
(pool.stat[:checked_out].to_f / pool.stat[:size] * 100).round(2)
end
def self.create_alert(severity:, type:, message:, current_value:)
{
id: SecureRandom.uuid,
timestamp: Time.current.iso8601,
severity: severity,
type: type,
message: message,
current_value: current_value,
threshold: ALERT_THRESHOLDS[type],
system: 'ragdoll'
}
end
end
Alerting System¶
Implement comprehensive alerting based on system thresholds and anomaly detection using PostgreSQL and ActiveRecord.
Threshold-based Alerts¶
Define and monitor system thresholds:
class ThresholdAlerts
THRESHOLDS = {
# Processing performance
error_rate: { warning: 2.0, critical: 5.0 }, # percentage
queue_length: { warning: 50, critical: 100 },
avg_processing_time: { warning: 300, critical: 600 }, # seconds
# System resources
connection_pool_usage: { warning: 70.0, critical: 85.0 }, # percentage
disk_usage: { warning: 75.0, critical: 90.0 }, # percentage
# Content metrics
unused_embeddings: { warning: 50.0, critical: 70.0 }, # percentage
search_volume_drop: { warning: 30.0, critical: 50.0 } # percentage decrease
}
def self.check_all_thresholds
alerts = []
THRESHOLDS.each do |metric, thresholds|
current_value = send("get_#{metric}")
if current_value >= thresholds[:critical]
alerts << create_threshold_alert(metric, :critical, current_value, thresholds[:critical])
elsif current_value >= thresholds[:warning]
alerts << create_threshold_alert(metric, :warning, current_value, thresholds[:warning])
end
end
alerts
end
private
def self.get_error_rate
total = Ragdoll::Document.where('created_at > ?', 24.hours.ago).count
return 0 if total == 0
errors = Ragdoll::Document.where('created_at > ?', 24.hours.ago)
.where(status: 'error').count
(errors.to_f / total * 100).round(2)
end
def self.get_queue_length
Ragdoll::Document.where(status: 'pending').count
end
def self.get_avg_processing_time
recent_docs = Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
.pluck(:created_at, :updated_at)
return 0 if recent_docs.empty?
times = recent_docs.map { |created, updated| (updated - created).to_i }
times.sum / times.length
end
def self.get_unused_embeddings
total = Ragdoll::Embedding.count
return 0 if total == 0
unused = Ragdoll::Embedding.where(usage_count: 0).count
(unused.to_f / total * 100).round(2)
end
def self.create_threshold_alert(metric, severity, current_value, threshold)
{
id: SecureRandom.uuid,
type: :threshold,
metric: metric,
severity: severity,
current_value: current_value,
threshold: threshold,
message: "#{metric.to_s.humanize} #{current_value} exceeds #{severity} threshold #{threshold}",
timestamp: Time.current.iso8601
}
end
end
Anomaly Detection¶
Detect unusual patterns in system behavior:
class AnomalyDetection
def self.detect_search_anomalies(lookback_days: 7)
anomalies = []
# Get baseline search volume
baseline_searches = daily_search_volume(lookback_days)
return anomalies if baseline_searches.empty?
baseline_avg = baseline_searches.values.sum / baseline_searches.length
baseline_std = calculate_standard_deviation(baseline_searches.values)
# Check today's volume
today_volume = daily_search_volume(1).values.first || 0
# Detect significant deviations (2 standard deviations)
if (today_volume - baseline_avg).abs > (2 * baseline_std)
severity = today_volume < baseline_avg ? :warning : :info
anomalies << {
type: :search_volume_anomaly,
severity: severity,
current_value: today_volume,
baseline_average: baseline_avg.round(2),
deviation: ((today_volume - baseline_avg) / baseline_avg * 100).round(2),
message: "Search volume #{today_volume} deviates significantly from baseline #{baseline_avg.round(2)}"
}
end
anomalies
end
def self.detect_processing_anomalies
anomalies = []
# Check for unusual processing patterns
recent_times = Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
.pluck(:created_at, :updated_at)
.map { |created, updated| (updated - created).to_i }
return anomalies if recent_times.length < 10
avg_time = recent_times.sum / recent_times.length
std_dev = calculate_standard_deviation(recent_times)
# Find outliers (processing times > 3 standard deviations)
outliers = recent_times.select { |time| (time - avg_time).abs > (3 * std_dev) }
if outliers.any?
anomalies << {
type: :processing_time_outliers,
severity: :warning,
outlier_count: outliers.length,
max_outlier_time: outliers.max,
baseline_average: avg_time.round(2),
message: "#{outliers.length} documents had unusual processing times (max: #{outliers.max}s)"
}
end
anomalies
end
private
def self.daily_search_volume(days)
Ragdoll::Embedding
.where('returned_at > ?', days.days.ago)
.group_by_day(:returned_at, last: days)
.sum(:usage_count)
end
def self.calculate_standard_deviation(values)
return 0 if values.empty?
mean = values.sum / values.length
variance = values.sum { |v| (v - mean) ** 2 } / values.length
Math.sqrt(variance)
end
end
Notification Channels¶
Integrate with various notification systems:
class NotificationManager
def self.send_alert(alert, channels: [:log, :webhook])
channels.each do |channel|
case channel
when :log
log_alert(alert)
when :webhook
send_webhook_alert(alert)
when :email
send_email_alert(alert)
when :slack
send_slack_alert(alert)
end
end
end
private
def self.log_alert(alert)
logger = defined?(Rails) ? Rails.logger : Logger.new(STDOUT)
case alert[:severity]
when :critical
logger.error "[RAGDOLL CRITICAL] #{alert[:message]}"
when :warning
logger.warn "[RAGDOLL WARNING] #{alert[:message]}"
else
logger.info "[RAGDOLL INFO] #{alert[:message]}"
end
end
def self.send_webhook_alert(alert)
webhook_url = ENV['RAGDOLL_WEBHOOK_URL']
return unless webhook_url
payload = {
service: 'ragdoll',
alert: alert,
timestamp: Time.current.iso8601,
environment: ENV['RAILS_ENV'] || 'development'
}
# Use Faraday or Net::HTTP to send webhook
require 'net/http'
require 'json'
uri = URI(webhook_url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = uri.scheme == 'https'
request = Net::HTTP::Post.new(uri)
request['Content-Type'] = 'application/json'
request.body = payload.to_json
response = http.request(request)
puts "Webhook sent: #{response.code}" if response.code != '200'
end
def self.send_slack_alert(alert)
slack_webhook = ENV['SLACK_WEBHOOK_URL']
return unless slack_webhook
color = case alert[:severity]
when :critical then '#FF0000'
when :warning then '#FFA500'
else '#36A64F'
end
payload = {
attachments: [{
color: color,
title: "Ragdoll #{alert[:severity].to_s.upcase} Alert",
text: alert[:message],
fields: [
{ title: "Metric", value: alert[:metric], short: true },
{ title: "Current Value", value: alert[:current_value], short: true },
{ title: "Timestamp", value: alert[:timestamp], short: false }
]
}]
}
# Send to Slack webhook
# Implementation similar to webhook above
end
end
Alert Escalation¶
Implement alert escalation policies:
class AlertEscalation
ESCALATION_RULES = {
critical: {
immediate: [:log, :webhook, :slack],
after_5_minutes: [:email],
after_15_minutes: [:sms] # if configured
},
warning: {
immediate: [:log],
after_10_minutes: [:webhook],
after_30_minutes: [:email]
}
}
def self.process_alert(alert)
# Store alert for tracking
alert_record = store_alert(alert)
# Send immediate notifications
immediate_channels = ESCALATION_RULES.dig(alert[:severity], :immediate) || [:log]
NotificationManager.send_alert(alert, channels: immediate_channels)
# Schedule escalation if needed
schedule_escalation(alert_record) if alert[:severity] == :critical
end
def self.check_escalations
# This would typically be called by a background job
unresolved_alerts = get_unresolved_alerts
unresolved_alerts.each do |alert_record|
escalate_if_needed(alert_record)
end
end
private
def self.store_alert(alert)
# Store in database or memory store for tracking
{
id: alert[:id],
created_at: Time.current,
alert_data: alert,
escalation_level: 0,
resolved_at: nil
}
end
def self.escalate_if_needed(alert_record)
minutes_since_creation = (Time.current - alert_record[:created_at]) / 60
severity = alert_record[:alert_data][:severity]
escalation_rules = ESCALATION_RULES[severity] || {}
escalation_rules.each do |time_key, channels|
next unless time_key.to_s.include?('after_')
threshold_minutes = time_key.to_s.match(/after_(\d+)_minutes/)&.captures&.first.to_i
next unless threshold_minutes
if minutes_since_creation >= threshold_minutes &&
alert_record[:escalation_level] < threshold_minutes
NotificationManager.send_alert(alert_record[:alert_data], channels: channels)
alert_record[:escalation_level] = threshold_minutes
end
end
end
end
Integration with External Tools¶
Ragdoll integrates seamlessly with popular monitoring and observability platforms through standardized metrics and APIs.
Prometheus Integration¶
Expose metrics in Prometheus format for scraping:
# Prometheus metrics exporter
class PrometheusExporter
def self.metrics
output = []
# Document metrics
doc_stats = Ragdoll::Document.group(:status).count
doc_stats.each do |status, count|
output << "ragdoll_documents_total{status=\"#{status}\"} #{count}"
end
# Embedding metrics
output << "ragdoll_embeddings_total #{Ragdoll::Embedding.count}"
output << "ragdoll_embeddings_used_total #{Ragdoll::Embedding.where('usage_count > 0').count}"
output << "ragdoll_searches_total #{Ragdoll::Embedding.sum(:usage_count)}"
# Processing metrics
recent_processing_times = Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
.pluck(:created_at, :updated_at)
.map { |created, updated| (updated - created).to_i }
if recent_processing_times.any?
avg_time = recent_processing_times.sum / recent_processing_times.length
output << "ragdoll_avg_processing_time_seconds #{avg_time}"
end
# Connection pool metrics
pool = ActiveRecord::Base.connection_pool
output << "ragdoll_connection_pool_size #{pool.stat[:size]}"
output << "ragdoll_connection_pool_checked_out #{pool.stat[:checked_out]}"
output << "ragdoll_connection_pool_checked_in #{pool.stat[:checked_in]}"
# Content type distribution
content_types = Ragdoll::Document.group(:document_type).count
content_types.each do |type, count|
output << "ragdoll_documents_by_type{type=\"#{type}\"} #{count}"
end
output.join("\n") + "\n"
end
# Rack middleware for serving metrics
class Middleware
def initialize(app)
@app = app
end
def call(env)
if env['PATH_INFO'] == '/metrics' && env['REQUEST_METHOD'] == 'GET'
metrics_response
else
@app.call(env)
end
end
private
def metrics_response
metrics = PrometheusExporter.metrics
[
200,
{
'Content-Type' => 'text/plain; version=0.0.4; charset=utf-8',
'Content-Length' => metrics.bytesize.to_s
},
[metrics]
]
end
end
end
# Rails integration
# In config/application.rb:
# config.middleware.use PrometheusExporter::Middleware
Grafana Dashboard Templates¶
JSON dashboard configuration for Grafana:
# Grafana dashboard generator
class GrafanaDashboard
def self.generate_dashboard_json
{
"dashboard" => {
"id" => nil,
"title" => "Ragdoll Core Monitoring",
"tags" => ["ragdoll", "rag", "search"],
"timezone" => "browser",
"panels" => [
{
"id" => 1,
"title" => "Document Processing Status",
"type" => "stat",
"targets" => [
{
"expr" => "ragdoll_documents_total",
"legendFormat" => "{{ status }}"
}
],
"gridPos" => { "h" => 8, "w" => 12, "x" => 0, "y" => 0 }
},
{
"id" => 2,
"title" => "Search Volume Over Time",
"type" => "graph",
"targets" => [
{
"expr" => "rate(ragdoll_searches_total[5m])",
"legendFormat" => "Searches per second"
}
],
"gridPos" => { "h" => 8, "w" => 12, "x" => 12, "y" => 0 }
},
{
"id" => 3,
"title" => "Average Processing Time",
"type" => "singlestat",
"targets" => [
{
"expr" => "ragdoll_avg_processing_time_seconds",
"legendFormat" => "Seconds"
}
],
"gridPos" => { "h" => 4, "w" => 6, "x" => 0, "y" => 8 }
},
{
"id" => 4,
"title" => "Connection Pool Usage",
"type" => "gauge",
"targets" => [
{
"expr" => "(ragdoll_connection_pool_checked_out / ragdoll_connection_pool_size) * 100",
"legendFormat" => "Pool Usage %"
}
],
"gridPos" => { "h" => 4, "w" => 6, "x" => 6, "y" => 8 }
}
],
"time" => {
"from" => "now-1h",
"to" => "now"
},
"refresh" => "30s"
},
"folderId" => 0,
"overwrite" => true
}.to_json
}
end
New Relic Compatibility¶
Integrate with New Relic APM and custom metrics:
# New Relic custom metrics
class NewRelicIntegration
def self.record_custom_metrics
return unless defined?(NewRelic)
# Document processing metrics
doc_stats = Ragdoll::Document.group(:status).count
doc_stats.each do |status, count|
NewRelic::Agent.record_metric("Custom/Ragdoll/Documents/#{status}", count)
end
# Search metrics
total_searches = Ragdoll::Embedding.sum(:usage_count)
NewRelic::Agent.record_metric("Custom/Ragdoll/Searches/Total", total_searches)
# Processing performance
recent_times = calculate_recent_processing_times
if recent_times.any?
avg_time = recent_times.sum / recent_times.length
NewRelic::Agent.record_metric("Custom/Ragdoll/Processing/AverageTime", avg_time)
end
# Embedding efficiency
used_embeddings = Ragdoll::Embedding.where('usage_count > 0').count
total_embeddings = Ragdoll::Embedding.count
efficiency = total_embeddings > 0 ? (used_embeddings.to_f / total_embeddings * 100) : 0
NewRelic::Agent.record_metric("Custom/Ragdoll/Embeddings/EfficiencyPercent", efficiency)
end
# New Relic custom events
def self.track_search_event(query:, results_count:, processing_time:)
return unless defined?(NewRelic)
NewRelic::Agent.record_custom_event('RagdollSearch', {
query_length: query.length,
results_count: results_count,
processing_time_ms: processing_time,
timestamp: Time.current.to_i
})
end
def self.track_document_processing_event(document:, processing_time:, success:)
return unless defined?(NewRelic)
NewRelic::Agent.record_custom_event('RagdollDocumentProcessing', {
document_type: document.document_type,
document_size: document.content&.length || 0,
processing_time_seconds: processing_time,
success: success,
embedding_count: document.total_embedding_count,
timestamp: Time.current.to_i
})
end
private
def self.calculate_recent_processing_times
Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
.pluck(:created_at, :updated_at)
.map { |created, updated| (updated - created).to_i }
end
end
# Background job to send metrics
class MetricsReportingJob < ActiveJob::Base
queue_as :default
def perform
NewRelicIntegration.record_custom_metrics
end
end
# Schedule regular metrics reporting
# In Rails initializer or similar:
# MetricsReportingJob.set(wait: 5.minutes).perform_later
Custom Monitoring Solutions¶
Framework for building custom monitoring integrations:
class CustomMonitoringAdapter
attr_reader :config, :client
def initialize(config = {})
@config = config
@client = initialize_client
end
def send_metrics(metrics)
case config[:type]
when :datadog
send_datadog_metrics(metrics)
when :statsd
send_statsd_metrics(metrics)
when :influxdb
send_influxdb_metrics(metrics)
when :custom_api
send_custom_api_metrics(metrics)
else
Rails.logger.info "Custom metrics: #{metrics.to_json}" if defined?(Rails)
end
end
def collect_all_metrics
{
timestamp: Time.current.to_i,
documents: document_metrics,
embeddings: embedding_metrics,
performance: performance_metrics,
system: system_metrics
}
end
private
def document_metrics
{
total: Ragdoll::Document.count,
by_status: Ragdoll::Document.group(:status).count,
by_type: Ragdoll::Document.group(:document_type).count,
processing_queue_length: Ragdoll::Document.where(status: 'pending').count
}
end
def embedding_metrics
{
total: Ragdoll::Embedding.count,
total_searches: Ragdoll::Embedding.sum(:usage_count),
used_embeddings: Ragdoll::Embedding.where('usage_count > 0').count,
recent_searches: Ragdoll::Embedding
.where('returned_at > ?', 1.hour.ago)
.sum(:usage_count)
}
end
def performance_metrics
recent_times = Ragdoll::Document
.where('created_at > ?', 24.hours.ago)
.where(status: 'processed')
.pluck(:created_at, :updated_at)
.map { |created, updated| (updated - created).to_i }
{
avg_processing_time: recent_times.any? ? recent_times.sum / recent_times.length : 0,
processed_documents_24h: recent_times.length,
error_rate: calculate_error_rate,
embedding_efficiency: calculate_embedding_efficiency
}
end
def system_metrics
pool = ActiveRecord::Base.connection_pool
{
connection_pool_size: pool.stat[:size],
connection_pool_used: pool.stat[:checked_out],
connection_pool_available: pool.stat[:size] - pool.stat[:checked_out]
}
end
def send_datadog_metrics(metrics)
# Datadog StatsD format
metrics.each do |key, value|
if value.is_a?(Hash)
value.each do |subkey, subvalue|
send_metric("ragdoll.#{key}.#{subkey}", subvalue, type: :gauge)
end
else
send_metric("ragdoll.#{key}", value, type: :gauge)
end
end
end
def send_statsd_metrics(metrics)
# StatsD protocol implementation
# Similar to Datadog but with different format
end
def send_custom_api_metrics(metrics)
# Custom HTTP API endpoint
require 'net/http'
require 'json'
uri = URI(config[:endpoint])
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = uri.scheme == 'https'
request = Net::HTTP::Post.new(uri)
request['Content-Type'] = 'application/json'
request['Authorization'] = "Bearer #{config[:api_key]}" if config[:api_key]
request.body = metrics.to_json
response = http.request(request)
Rails.logger.info "Metrics sent: #{response.code}" if defined?(Rails)
end
end
# Usage
monitoring = CustomMonitoringAdapter.new(
type: :datadog,
api_key: ENV['DATADOG_API_KEY'],
endpoint: 'https://api.datadoghq.com/api/v1/series'
)
# Collect and send metrics
metrics = monitoring.collect_all_metrics
monitoring.send_metrics(metrics)
Troubleshooting Guides¶
Comprehensive troubleshooting workflows for common Ragdoll issues, focusing on PostgreSQL and pgvector performance optimization.
Common Performance Issues¶
Slow Search Performance¶
Symptoms: - Search queries taking > 2 seconds - High CPU usage during searches - Connection pool exhaustion
Diagnostic Commands:
# Check pgvector index usage
ActiveRecord::Base.connection.execute("
EXPLAIN ANALYZE
SELECT * FROM ragdoll_embeddings
ORDER BY embedding_vector <=> '[0.1,0.2,...]'
LIMIT 10;
")
# Check index statistics
ActiveRecord::Base.connection.execute("
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE tablename = 'ragdoll_embeddings';
")
# Monitor connection pool
pool_stats = ActiveRecord::Base.connection_pool.stat
puts "Pool usage: #{pool_stats[:checked_out]}/#{pool_stats[:size]}"
Resolution Steps:
1. Optimize pgvector index: Ensure IVFFlat index is properly configured
2. Increase connection pool: Adjust pool setting in database.yml
3. Enable connection pooling: Use pgbouncer for high-load scenarios
4. Query optimization: Review embedding search filters and limits
# Optimize search queries
class SearchOptimizer
def self.optimize_embedding_search
# Use index hints for better performance
Ragdoll::Embedding.connection.execute("
SET enable_seqscan = OFF;
SET work_mem = '256MB';
")
end
def self.batch_similar_searches(query_embeddings, batch_size: 100)
# Process multiple searches in batches to reduce overhead
query_embeddings.each_slice(batch_size) do |batch|
# Process batch of searches
yield batch
end
end
end
High Memory Usage¶
Symptoms: - Ruby process memory growth - PostgreSQL memory pressure - Frequent garbage collection
Diagnostic Procedures:
# Memory usage analysis
def analyze_memory_usage
{
ruby_process_mb: `ps -o rss= -p #{Process.pid}`.to_i / 1024,
gc_stats: GC.stat,
object_counts: ObjectSpace.count_objects,
connection_pool: ActiveRecord::Base.connection_pool.stat
}
end
# PostgreSQL memory analysis
ActiveRecord::Base.connection.execute("
SELECT
setting AS shared_buffers_mb,
pg_size_pretty(pg_database_size(current_database())) AS db_size
FROM pg_settings
WHERE name = 'shared_buffers';
")
Resolution Strategies:
1. Optimize embeddings loading: Use select to load only needed columns
2. Implement connection pooling: Use pgbouncer or similar
3. Tune PostgreSQL memory: Adjust shared_buffers and work_mem
4. Regular cleanup: Implement data retention policies
Document Processing Failures¶
Symptoms: - Documents stuck in 'processing' status - High error rates - Background job failures
Diagnostic Commands:
# Check processing status distribution
processing_status = Ragdoll::Document.group(:status).count
puts "Status distribution: #{processing_status}"
# Find stuck documents
stuck_docs = Ragdoll::Document
.where(status: 'processing')
.where('updated_at < ?', 1.hour.ago)
puts "Stuck documents: #{stuck_docs.count}"
# Check recent errors
error_docs = Ragdoll::Document
.where(status: 'error')
.where('updated_at > ?', 24.hours.ago)
.includes(:contents)
Resolution Steps: 1. Reset stuck documents: Change status back to 'pending' 2. Check job queue: Ensure ActiveJob backend is running 3. Review error logs: Identify common failure patterns 4. Validate file access: Ensure file permissions and availability
# Recovery procedures
class DocumentRecovery
def self.reset_stuck_documents
stuck_docs = Ragdoll::Document
.where(status: 'processing')
.where('updated_at < ?', 1.hour.ago)
stuck_docs.update_all(status: 'pending')
puts "Reset #{stuck_docs.count} stuck documents"
end
def self.retry_failed_documents
failed_docs = Ragdoll::Document
.where(status: 'error')
.where('updated_at > ?', 24.hours.ago)
failed_docs.each do |doc|
begin
doc.update!(status: 'pending')
# Trigger reprocessing
Ragdoll::ExtractTextJob.perform_later(doc.id)
rescue => e
puts "Failed to retry document #{doc.id}: #{e.message}"
end
end
end
end
Diagnostic Procedures¶
System Health Check¶
class SystemHealthCheck
def self.run_full_diagnostic
results = {
timestamp: Time.current.iso8601,
database: check_database_health,
models: check_model_integrity,
performance: check_performance_metrics,
storage: check_storage_health,
jobs: check_job_health
}
generate_health_report(results)
end
private
def self.check_database_health
{
connection_status: ActiveRecord::Base.connected?,
pool_status: ActiveRecord::Base.connection_pool.stat,
table_sizes: get_table_sizes,
index_usage: get_index_usage_stats,
slow_queries: get_slow_queries
}
end
def self.check_model_integrity
{
total_documents: Ragdoll::Document.count,
orphaned_embeddings: find_orphaned_embeddings,
missing_content: find_documents_without_content,
invalid_embeddings: find_invalid_embeddings
}
end
def self.check_performance_metrics
recent_searches = Ragdoll::Embedding
.where('returned_at > ?', 1.hour.ago)
{
searches_last_hour: recent_searches.sum(:usage_count),
avg_search_time: calculate_avg_search_time,
cache_hit_rate: calculate_cache_hit_rate,
processing_backlog: Ragdoll::Document.where(status: 'pending').count
}
end
def self.get_table_sizes
ActiveRecord::Base.connection.execute("
SELECT
tablename,
pg_size_pretty(pg_total_relation_size('ragdoll_'||tablename)) as size,
pg_total_relation_size('ragdoll_'||tablename) as bytes
FROM pg_tables
WHERE tablename LIKE 'ragdoll_%'
ORDER BY pg_total_relation_size('ragdoll_'||tablename) DESC;
").to_a
end
def self.find_orphaned_embeddings
# Find embeddings without valid embeddable references
Ragdoll::Embedding.left_joins(:embeddable)
.where(ragdoll_contents: { id: nil })
.count
end
end
Performance Profiling¶
class PerformanceProfiler
def self.profile_search_operation(query, iterations: 100)
require 'benchmark'
results = []
embedding_service = Ragdoll::EmbeddingService.new
search_engine = Ragdoll::SearchEngine.new(embedding_service)
# Warm up
3.times { search_engine.search_documents(query, limit: 10) }
# Profile multiple iterations
iterations.times do |i|
start_time = Time.current
begin
search_results = search_engine.search_documents(query, limit: 10)
end_time = Time.current
results << {
iteration: i + 1,
duration_ms: ((end_time - start_time) * 1000).round(2),
results_count: search_results.length,
success: true
}
rescue => e
results << {
iteration: i + 1,
error: e.message,
success: false
}
end
end
analyze_profile_results(results)
end
private
def self.analyze_profile_results(results)
successful_results = results.select { |r| r[:success] }
return { error: "No successful iterations" } if successful_results.empty?
durations = successful_results.map { |r| r[:duration_ms] }
{
total_iterations: results.length,
successful_iterations: successful_results.length,
success_rate: (successful_results.length.to_f / results.length * 100).round(2),
performance: {
min_ms: durations.min,
max_ms: durations.max,
avg_ms: (durations.sum / durations.length).round(2),
median_ms: durations.sort[durations.length / 2],
std_dev_ms: calculate_std_dev(durations).round(2)
},
percentiles: {
p50: percentile(durations, 50),
p90: percentile(durations, 90),
p95: percentile(durations, 95),
p99: percentile(durations, 99)
}
}
end
end
Prevention Techniques¶
Proactive Monitoring Setup¶
class PreventiveMaintenance
def self.setup_monitoring_jobs
# Schedule regular health checks
HealthCheckJob.set(cron: '*/15 * * * *').perform_later # Every 15 minutes
# Schedule daily cleanup
CleanupJob.set(cron: '0 2 * * *').perform_later # Daily at 2 AM
# Schedule weekly analytics
WeeklyReportJob.set(cron: '0 8 * * 1').perform_later # Monday at 8 AM
end
def self.optimize_database_settings
# PostgreSQL optimization for pgvector
settings = {
'shared_buffers' => '256MB',
'work_mem' => '64MB',
'maintenance_work_mem' => '256MB',
'effective_cache_size' => '1GB',
'random_page_cost' => '1.1' # Optimized for SSD
}
settings.each do |setting, value|
ActiveRecord::Base.connection.execute(
"ALTER SYSTEM SET #{setting} = '#{value}';"
)
end
# Reload configuration
ActiveRecord::Base.connection.execute("SELECT pg_reload_conf();")
end
def self.setup_automated_backups
# Database backup strategy
backup_script = <<~SCRIPT
#!/bin/bash
# Automated Ragdoll database backup
DB_NAME="ragdoll_production"
BACKUP_DIR="/var/backups/ragdoll"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p $BACKUP_DIR
# Full database backup
pg_dump $DB_NAME | gzip > $BACKUP_DIR/ragdoll_$DATE.sql.gz
# Cleanup old backups (keep 30 days)
find $BACKUP_DIR -name "ragdoll_*.sql.gz" -mtime +30 -delete
# Verify backup integrity
if [ $? -eq 0 ]; then
echo "Backup completed successfully: ragdoll_$DATE.sql.gz"
else
echo "Backup failed!" | mail -s "Ragdoll Backup Failure" admin@example.com
fi
SCRIPT
puts "Add this script to crontab for daily backups:"
puts "0 3 * * * /path/to/ragdoll_backup.sh"
end
end
Configuration Best Practices¶
class ConfigurationValidator
def self.validate_production_config
issues = []
config = Ragdoll.config
# Database configuration validation
if config.database_config[:pool] < 20
issues << "Connection pool size (#{config.database_config[:pool]}) may be too small for production"
end
# Search configuration validation
if config.search[:max_results] > 100
issues << "max_results (#{config.search[:max_results]}) may impact performance"
end
if config.search[:similarity_threshold] < 0.5
issues << "similarity_threshold (#{config.search[:similarity_threshold]}) may return too many irrelevant results"
end
# Analytics configuration
unless config.search[:enable_analytics]
issues << "Analytics disabled - monitoring capabilities will be limited"
end
# Memory settings validation
if config.chunking[:text][:max_tokens] > 2000
issues << "text chunk size (#{config.chunking[:text][:max_tokens]}) may cause memory issues"
end
display_validation_results(issues)
end
private
def self.display_validation_results(issues)
if issues.empty?
puts "✅ Configuration validation passed"
else
puts "⚠️ Configuration issues found:"
issues.each_with_index do |issue, index|
puts "#{index + 1}. #{issue}"
end
end
end
end
Summary¶
Ragdoll's monitoring and analytics system provides comprehensive insights into system performance, usage patterns, and health metrics through PostgreSQL-native features and ActiveRecord integration. The built-in analytics track embedding usage for intelligent caching, while the flexible alerting system ensures proactive issue detection.
Key monitoring capabilities include: - Usage Analytics: Search patterns, content popularity, embedding efficiency - Performance Metrics: Processing times, error rates, system resource usage - Health Monitoring: Database status, connection pools, job queue health - External Integrations: Prometheus, Grafana, New Relic, and custom solutions - Proactive Alerting: Threshold-based alerts with escalation policies - Troubleshooting Tools: Diagnostic procedures and automated recovery
All monitoring data leverages PostgreSQL's built-in statistics and pgvector optimization for minimal performance impact while providing maximum visibility into system behavior.
This document is part of the Ragdoll documentation suite. For immediate help, see the Quick Start Guide or API Reference.