🎪 Ragdoll Documentation

🔄 Unified Text-Based RAG

All media types—images, audio, documents—are converted to comprehensive text representations, enabling powerful cross-modal search through a single unified index.

🏗️ Production Ready

Enterprise-grade features including AI-powered text conversion, PostgreSQL + pgvector, background processing, and comprehensive error handling.

⚡ Performance Optimized

Single embedding model for all content types, unified search index, and intelligent caching for scalable deployments.

📊 Cross-Modal Search

Find images through descriptions, audio through transcripts, and documents through content—all with unified semantic search.

🔧 Smart Conversion

AI-powered image descriptions, audio transcription, and intelligent text extraction with quality assessment.

🛡️ Secure

Production security best practices, API key management, file validation, and comprehensive audit logging.

🆕 What's New¶

Latest Release (v0.1.10)¶

Latest gem release includes:

Updated API documentation and RDoc coverage
Improved command-line interface with enhanced commands
Bug fixes and performance improvements

Search Tracking System (v0.1.9)¶

Ragdoll now includes comprehensive search tracking and analytics capabilities:

Automatic Search Recording: All searches are automatically tracked with query embeddings, execution times, and result metrics
Search Similarity Analysis: Find similar searches using vector similarity on query embeddings
Click-Through Tracking: Monitor user engagement with search results
Performance Analytics: Track slow queries, execution times, and search patterns
Session & User Tracking: Associate searches with sessions and users for behavior analysis
Automatic Cleanup: Orphaned and old unused searches are automatically cleaned up

Learn more about Search Tracking →

📚 Documentation Overview¶

Getting Started¶

Quick Start Guide - Get up and running with Ragdoll in minutes
Installation & Setup - Complete installation and environment setup
Configuration Guide - Comprehensive configuration system documentation

Core Architecture¶

Architecture Overview - System design and component relationships
Unified Text RAG - Cross-modal search through text conversion
Database Schema - Polymorphic multi-modal database design
Background Processing - ActiveJob integration and async operations

Features & Capabilities¶

Document Processing - File parsing, metadata extraction, and content analysis
Search & Analytics - Advanced semantic search with usage analytics
Embedding System - Vector generation and similarity search
File Upload System - Shrine-based production file handling

API Documentation¶

Client API Reference - High-level client interface methods
Models Reference - ActiveRecord models and relationships
Services Reference - Business logic and processing services
Jobs Reference - Background job system

Deployment & Operations¶

Production Deployment - Production setup with PostgreSQL + pgvector
Performance Tuning - Optimization strategies and monitoring
Monitoring & Analytics - Usage tracking and system health
Troubleshooting - Common issues and solutions

Advanced Topics¶

LLM Integration - Multiple provider support and configuration
Metadata Schemas - Structured content analysis and validation
Extending the System - Adding new content types and processors
Security Considerations - Production security best practices

Development¶

Development Setup - Setting up development environment
Testing Guide - Running tests and coverage analysis
Contributing - Guidelines for contributing to the project

🚀 What Makes Ragdoll Special¶

Ragdoll is not just a "simple RAG library" - it's a production-ready document intelligence platform with enterprise-grade features:

Unlike most RAG systems that retrofit multi-modal support, Ragdoll was designed from the ground up to handle text, image, and audio content as first-class citizens through a sophisticated polymorphic architecture.

🏗️ Sophisticated Architecture¶

Dual Metadata Design: Separates LLM-generated content analysis from system file properties
Polymorphic Database Schema: Unified search across all content types
Background Processing: Complete ActiveJob integration for scalable operations
Production File Handling: Shrine-based upload system with validation

📊 Advanced Analytics¶

Usage Tracking: Sophisticated ranking algorithms based on frequency and recency
Performance Monitoring: Built-in analytics for search patterns and system health
Smart Ranking: Combines similarity scores with usage analytics for better results

🔧 Enterprise Features¶

7 LLM Providers: OpenAI, Anthropic, Google, Azure, Ollama, HuggingFace, OpenRouter
Production Database Support: PostgreSQL + pgvector
Comprehensive Error Handling: Custom exception hierarchy with detailed logging
Health Monitoring: System diagnostics and status reporting

⚡ Performance Optimized¶

pgvector Integration: Hardware-accelerated vector operations
Intelligent Indexing: Optimized database indexes for fast search
Background Processing: Non-blocking document processing
Connection Pooling: Scalable database connections

📖 Documentation Philosophy¶

This documentation is implementation-driven - every feature documented here is fully implemented and tested. We believe in accurate documentation that matches the actual capabilities of the system.

What You'll Find Here:¶

✅ Accurate Examples: All code examples are tested and working
✅ Production-Ready Guidance: Real-world deployment and optimization advice
✅ Complete Feature Coverage: Documentation for all implemented features
✅ Advanced Use Cases: Enterprise scenarios and complex integrations

What You Won't Find:¶

❌ Vapor Features: We don't document features that don't exist
❌ Oversimplified Examples: Our examples reflect real-world complexity
❌ Marketing Fluff: Technical accuracy over marketing copy

🤝 Getting Help¶

Documentation Issues¶

If you find any discrepancies between the documentation and actual implementation, please file an issue. We maintain strict accuracy standards.

Feature Requests¶

Ragdoll has many undocumented capabilities. Before requesting a feature, check if it already exists by reviewing the complete documentation.

Support Channels¶

GitHub Issues: Bug reports and feature requests
Documentation: Comprehensive guides and references
Code Examples: Working examples for all major features

New to Ragdoll? Start with:

Quick Start Guide - Basic usage in 5 minutes
Architecture Overview - Understand the system design
Unified Text RAG - See what makes us different

Ready for Production? Focus on:

Production Deployment - PostgreSQL setup
Configuration Guide - Enterprise configuration
Performance Tuning - Optimization strategies

Integrating with Existing Systems? Review:

API Reference - Client interface methods
LLM Integration - Provider configuration
Security Considerations - Production security

This documentation is intended to reflect the actual implementation of Ragdoll v0.1.12 and should be updated with each release to maintain accuracy.