Unified Text-Based RAG
All media typesβimages, audio, documentsβare converted to comprehensive text representations, enabling powerful cross-modal search through a single unified index.
Production Ready
Enterprise-grade features including AI-powered text conversion, PostgreSQL + pgvector, background processing, and comprehensive error handling.
Performance Optimized
Single embedding model for all content types, unified search index, and intelligent caching for scalable deployments.
Cross-Modal Search
Find images through descriptions, audio through transcripts, and documents through contentβall with unified semantic search.
Smart Conversion
AI-powered image descriptions, audio transcription, and intelligent text extraction with quality assessment.
Secure
Production security best practices, API key management, file validation, and comprehensive audit logging.
π What's New¶
Latest Release (v0.1.10)¶
Latest gem release includes:
- Updated API documentation and RDoc coverage
- Improved command-line interface with enhanced commands
- Bug fixes and performance improvements
Search Tracking System (v0.1.9)¶
Ragdoll now includes comprehensive search tracking and analytics capabilities:
- Automatic Search Recording: All searches are automatically tracked with query embeddings, execution times, and result metrics
- Search Similarity Analysis: Find similar searches using vector similarity on query embeddings
- Click-Through Tracking: Monitor user engagement with search results
- Performance Analytics: Track slow queries, execution times, and search patterns
- Session & User Tracking: Associate searches with sessions and users for behavior analysis
- Automatic Cleanup: Orphaned and old unused searches are automatically cleaned up
Learn more about Search Tracking β
π Documentation Overview¶
Getting Started¶
- Quick Start Guide - Get up and running with Ragdoll in minutes
- Installation & Setup - Complete installation and environment setup
- Configuration Guide - Comprehensive configuration system documentation
Core Architecture¶
- Architecture Overview - System design and component relationships
- Unified Text RAG - Cross-modal search through text conversion
- Database Schema - Polymorphic multi-modal database design
- Background Processing - ActiveJob integration and async operations
Features & Capabilities¶
- Document Processing - File parsing, metadata extraction, and content analysis
- Search & Analytics - Advanced semantic search with usage analytics
- Embedding System - Vector generation and similarity search
- File Upload System - Shrine-based production file handling
API Documentation¶
- Client API Reference - High-level client interface methods
- Models Reference - ActiveRecord models and relationships
- Services Reference - Business logic and processing services
- Jobs Reference - Background job system
Deployment & Operations¶
- Production Deployment - Production setup with PostgreSQL + pgvector
- Performance Tuning - Optimization strategies and monitoring
- Monitoring & Analytics - Usage tracking and system health
- Troubleshooting - Common issues and solutions
Advanced Topics¶
- LLM Integration - Multiple provider support and configuration
- Metadata Schemas - Structured content analysis and validation
- Extending the System - Adding new content types and processors
- Security Considerations - Production security best practices
Development¶
- Development Setup - Setting up development environment
- Testing Guide - Running tests and coverage analysis
- Contributing - Guidelines for contributing to the project
π What Makes Ragdoll Special¶
Ragdoll is not just a "simple RAG library" - it's a production-ready document intelligence platform with enterprise-grade features:
π― Multi-Modal First¶
Unlike most RAG systems that retrofit multi-modal support, Ragdoll was designed from the ground up to handle text, image, and audio content as first-class citizens through a sophisticated polymorphic architecture.
ποΈ Sophisticated Architecture¶
- Dual Metadata Design: Separates LLM-generated content analysis from system file properties
- Polymorphic Database Schema: Unified search across all content types
- Background Processing: Complete ActiveJob integration for scalable operations
- Production File Handling: Shrine-based upload system with validation
π Advanced Analytics¶
- Usage Tracking: Sophisticated ranking algorithms based on frequency and recency
- Performance Monitoring: Built-in analytics for search patterns and system health
- Smart Ranking: Combines similarity scores with usage analytics for better results
π§ Enterprise Features¶
- 7 LLM Providers: OpenAI, Anthropic, Google, Azure, Ollama, HuggingFace, OpenRouter
- Production Database Support: PostgreSQL + pgvector
- Comprehensive Error Handling: Custom exception hierarchy with detailed logging
- Health Monitoring: System diagnostics and status reporting
β‘ Performance Optimized¶
- pgvector Integration: Hardware-accelerated vector operations
- Intelligent Indexing: Optimized database indexes for fast search
- Background Processing: Non-blocking document processing
- Connection Pooling: Scalable database connections
π Documentation Philosophy¶
This documentation is implementation-driven - every feature documented here is fully implemented and tested. We believe in accurate documentation that matches the actual capabilities of the system.
What You'll Find Here:¶
- β Accurate Examples: All code examples are tested and working
- β Production-Ready Guidance: Real-world deployment and optimization advice
- β Complete Feature Coverage: Documentation for all implemented features
- β Advanced Use Cases: Enterprise scenarios and complex integrations
What You Won't Find:¶
- β Vapor Features: We don't document features that don't exist
- β Oversimplified Examples: Our examples reflect real-world complexity
- β Marketing Fluff: Technical accuracy over marketing copy
π€ Getting Help¶
Documentation Issues¶
If you find any discrepancies between the documentation and actual implementation, please file an issue. We maintain strict accuracy standards.
Feature Requests¶
Ragdoll has many undocumented capabilities. Before requesting a feature, check if it already exists by reviewing the complete documentation.
Support Channels¶
- GitHub Issues: Bug reports and feature requests
- Documentation: Comprehensive guides and references
- Code Examples: Working examples for all major features
π― Quick Navigation¶
New to Ragdoll? Start with:
- Quick Start Guide - Basic usage in 5 minutes
- Architecture Overview - Understand the system design
- Unified Text RAG - See what makes us different
Ready for Production? Focus on:
- Production Deployment - PostgreSQL setup
- Configuration Guide - Enterprise configuration
- Performance Tuning - Optimization strategies
Integrating with Existing Systems? Review:
- API Reference - Client interface methods
- LLM Integration - Provider configuration
- Security Considerations - Production security
This documentation is intended to reflect the actual implementation of Ragdoll v0.1.12 and should be updated with each release to maintain accuracy.