Client API Reference¶
The Ragdoll Client provides a high-level interface for all document intelligence operations. This comprehensive API handles document management, search operations, RAG enhancement, and system monitoring through a clean, intuitive interface.
Detailed API Documentation
For complete class and method documentation, see the Ruby API Documentation (RDoc) which provides detailed technical reference for all Ragdoll classes and methods.
Overview¶
Schema Note: Following recent schema optimization, the Ragdoll database now uses a normalized schema where embedding_model information is stored in content-specific tables rather than duplicated in individual embeddings. This provides better data organization while maintaining all API functionality.
The Client class serves as the primary orchestration layer, providing:
- Document Lifecycle Management: Add, update, delete, and monitor documents
- Multi-Modal Content Support: Handle text, image, and audio content seamlessly
- Advanced Search Operations: Semantic, full-text, and hybrid search capabilities
- RAG Enhancement: Context retrieval and prompt enhancement for LLM applications
- System Analytics: Health monitoring, usage statistics, and performance metrics
- Background Processing: Asynchronous job management and status tracking
Client Initialization¶
Basic Initialization¶
# Setup configuration first
Ragdoll::Core.configure do |config|
config.llm_providers[:openai][:api_key] = ENV['OPENAI_API_KEY']
config.models[:embedding][:text] = 'openai/text-embedding-3-small'
config.models[:text_generation][:default] = 'openai/gpt-4o-mini'
config.database = {
adapter: 'postgresql',
database: 'ragdoll_development',
username: 'ragdoll',
password: ENV['RAGDOLL_DATABASE_PASSWORD'],
host: 'localhost',
port: 5432,
auto_migrate: true
}
end
# Initialize client (uses global configuration)
client = Ragdoll::Core::Client.new
Document Management¶
Adding Documents¶
Single Document Addition¶
# Add document from file path
result = client.add_document(path: 'research_paper.pdf')
# Returns:
# {
# success: true,
# document_id: "123",
# title: "research_paper",
# document_type: "pdf",
# content_length: 15000,
# embeddings_queued: true,
# message: "Document 'research_paper' added successfully with ID 123"
# }
Text Content Addition¶
# Add raw text content
doc_id = client.add_text(
content: "This is the text content to be processed...",
title: "Text Document",
source: 'user_input',
language: 'en'
)
# Add text with custom chunking
doc_id = client.add_text(
content: long_text_content,
title: "Long Text Document",
chunk_size: 800,
chunk_overlap: 150
)
Directory Processing¶
# Process entire directory
results = client.add_directory(
path: '/path/to/documents',
recursive: true
)
# Returns array of results:
# [
# { file: "/path/to/doc1.pdf", document_id: "123", status: "success" },
# { file: "/path/to/doc2.txt", document_id: "124", status: "success" },
# { file: "/path/to/bad.doc", error: "Unsupported format", status: "error" }
# ]
Document Retrieval¶
Single Document Retrieval¶
# Get document by ID
document = client.get_document(id: "123")
# Returns document hash with basic information
# Get document status
status = client.document_status(id: "123")
# Returns:
# {
# id: "123",
# title: "research_paper",
# status: "processed",
# embeddings_count: 24,
# embeddings_ready: true,
# content_preview: "This research paper discusses...",
# message: "Document processed successfully with 24 embeddings"
# }
Document Listing¶
# List documents with basic options
documents = client.list_documents
# List with options
documents = client.list_documents(
status: 'processed',
document_type: 'pdf',
limit: 20
)
Document Updates¶
# Update document metadata
result = client.update_document(
id: "123",
title: "Updated Title",
metadata: {
category: 'technical',
priority: 'high',
last_reviewed: Date.current
}
)
# Reprocess document with new settings
result = client.reprocess_document(
id: "123",
chunk_size: 1200,
regenerate_embeddings: true,
update_summary: true,
extract_keywords: true
)
Document Deletion¶
# Delete single document
result = client.delete_document(id: "123")
# Returns:
# {
# success: true,
# message: "Document 123 and associated content deleted successfully",
# deleted_embeddings: 24,
# deleted_content_items: 4
# }
# Delete multiple documents
result = client.delete_documents(ids: ["123", "124", "125"])
# Delete with criteria
result = client.delete_documents(
status: 'error',
created_before: 1.month.ago,
confirm: true # Safety check for bulk deletion
)
Search Operations¶
Basic Search¶
# Simple semantic search
results = client.search(query: "machine learning algorithms")
# Search with options
results = client.search(
query: "neural network architectures",
limit: 25,
threshold: 0.8
)
# Returns:
# {
# query: "neural network architectures",
# results: [
# {
# embedding_id: "456",
# document_id: "123",
# document_title: "Deep Learning Fundamentals",
# document_location: "/path/to/document.pdf",
# content: "Neural networks are computational models...",
# similarity: 0.92,
# chunk_index: 3,
# usage_count: 5
# }
# ],
# total_results: 15
# }
# Hybrid search (semantic + full-text)
results = client.hybrid_search(
query: "neural networks",
semantic_weight: 0.7,
text_weight: 0.3
)
Search Analytics¶
# Simple search analytics available
analytics = client.search_analytics(days: 30)
# Returns basic usage statistics
RAG Enhancement¶
Context Retrieval¶
# Get relevant context for a query
context = client.get_context(
query: "How do neural networks learn?",
limit: 5
)
# Returns:
# {
# context_chunks: [
# {
# content: "Neural networks learn through backpropagation...",
# source: "/path/to/textbook.pdf",
# similarity: 0.91,
# chunk_index: 3
# }
# ],
# combined_context: "Neural networks learn through backpropagation...",
# total_chunks: 5
# }
Prompt Enhancement¶
# Enhance prompt with relevant context
enhanced = client.enhance_prompt(
prompt: "Explain how neural networks learn",
context_limit: 5
)
# Returns:
# {
# enhanced_prompt: "You are an AI assistant. Use the following context...",
# original_prompt: "Explain how neural networks learn",
# context_sources: ["/path/to/textbook.pdf", "/path/to/paper.pdf"],
# context_count: 3
# }
Document Status and Monitoring¶
Processing Status¶
# Check document processing status
status = client.document_status(id: "123")
# Returns:
# {
# document_id: "123",
# status: "processing", # pending, processing, processed, error
# progress: 65, # Percentage complete
# message: "Generating embeddings (15/24 chunks complete)",
# jobs_queued: 2,
# jobs_completed: 6,
# estimated_completion: "2024-01-15T10:30:00Z",
# processing_steps: {
# text_extraction: "completed",
# embedding_generation: "in_progress",
# keyword_extraction: "queued",
# summary_generation: "queued"
# },
# error_details: nil
# }
# Batch status check
statuses = client.batch_document_status(ids: ["123", "124", "125"])
Background Job Management¶
# Monitor background jobs
job_status = client.job_status
# Returns:
# {
# active_jobs: 12,
# queued_jobs: 5,
# failed_jobs: 1,
# queue_status: {
# embeddings: { size: 3, latency: 2.5 },
# processing: { size: 2, latency: 1.1 },
# analysis: { size: 0, latency: 0.0 }
# },
# worker_status: {
# busy_workers: 4,
# idle_workers: 2,
# total_workers: 6
# }
# }
# Retry failed jobs
result = client.retry_failed_jobs(
job_types: ['GenerateEmbeddingsJob'],
older_than: 1.hour.ago
)
System Operations¶
Health Monitoring¶
# Simple health check
healthy = client.healthy?
# Returns true/false
# Basic system statistics
stats = client.stats
# Returns document statistics hash
Error Handling¶
Standard Error Response Format¶
# All methods return structured error information on failure
begin
result = client.add_document(path: 'nonexistent.pdf')
rescue Ragdoll::Core::DocumentNotFoundError => e
puts e.message # "File not found: nonexistent.pdf"
puts e.error_code # "DOCUMENT_NOT_FOUND"
puts e.details # { path: 'nonexistent.pdf', checked_locations: [...] }
end
# Check for errors in response
result = client.search(query: "test")
if result[:success]
# Process results
puts result[:results]
else
# Handle error
puts result[:error]
puts result[:error_code]
puts result[:details]
end
Common Error Types¶
# Document processing errors
Ragdoll::Core::DocumentNotFoundError
Ragdoll::Core::UnsupportedDocumentTypeError
Ragdoll::Core::DocumentProcessingError
# Search errors
Ragdoll::Core::InvalidQueryError
Ragdoll::Core::SearchTimeoutError
Ragdoll::Core::EmbeddingGenerationError
# Configuration errors
Ragdoll::Core::ConfigurationError
Ragdoll::Core::InvalidProviderError
Ragdoll::Core::MissingCredentialsError
# System errors
Ragdoll::Core::DatabaseConnectionError
Ragdoll::Core::BackgroundJobError
Ragdoll::Core::SystemHealthError
Best Practices¶
1. Error Handling¶
- Always check for errors in responses
- Implement retry logic for transient failures
- Log errors with sufficient context for debugging
- Use appropriate error handling strategies for different error types
2. Performance Optimization¶
- Use batch operations for multiple documents
- Implement appropriate caching for frequently accessed data
- Monitor search performance and adjust thresholds accordingly
- Use background processing for large document collections
3. Search Strategy¶
- Start with default settings and tune based on results quality
- Use faceted search to help users navigate large result sets
- Implement search suggestions to improve user experience
- Monitor search analytics to understand usage patterns
4. RAG Implementation¶
- Choose appropriate context limits based on your LLM's capabilities
- Use prompt templates for consistent formatting
- Include source attribution for transparency
- Monitor RAG response quality and adjust context retrieval accordingly
The Client API provides a comprehensive interface for building sophisticated document intelligence applications with Ragdoll, offering both simplicity for basic use cases and advanced features for complex requirements.