Client API Reference¶

The Ragdoll Client provides a high-level interface for all document intelligence operations. This comprehensive API handles document management, search operations, RAG enhancement, and system monitoring through a clean, intuitive interface.

Overview¶

Schema Note: Following recent schema optimization, the Ragdoll database now uses a normalized schema where embedding_model information is stored in content-specific tables rather than duplicated in individual embeddings. This provides better data organization while maintaining all API functionality.

The Client class serves as the primary orchestration layer, providing:

Document Lifecycle Management: Add, update, delete, and monitor documents
Multi-Modal Content Support: Handle text, image, and audio content seamlessly
Advanced Search Operations: Semantic, full-text, and hybrid search capabilities
RAG Enhancement: Context retrieval and prompt enhancement for LLM applications
System Analytics: Health monitoring, usage statistics, and performance metrics
Background Processing: Asynchronous job management and status tracking

Client Initialization¶

Basic Initialization¶

# Setup configuration first
Ragdoll::Core.configure do |config|
  config.llm_providers[:openai][:api_key] = ENV['OPENAI_API_KEY']
  config.models[:embedding][:text] = 'openai/text-embedding-3-small'
  config.models[:text_generation][:default] = 'openai/gpt-4o-mini'
  config.database = {
    adapter: 'postgresql',
    database: 'ragdoll_development',
    username: 'ragdoll',
    password: ENV['RAGDOLL_DATABASE_PASSWORD'],
    host: 'localhost',
    port: 5432,
    auto_migrate: true
  }
end

# Initialize client (uses global configuration)
client = Ragdoll::Core::Client.new

Document Management¶

Adding Documents¶

Single Document Addition¶

# Add document from file path
result = client.add_document(path: 'research_paper.pdf')
# Returns:
# {
#   success: true,
#   document_id: "123",
#   title: "research_paper",
#   document_type: "pdf",
#   content_length: 15000,
#   embeddings_queued: true,
#   message: "Document 'research_paper' added successfully with ID 123"
# }

Text Content Addition¶

# Add raw text content
doc_id = client.add_text(
  content: "This is the text content to be processed...",
  title: "Text Document",
  source: 'user_input',
  language: 'en'
)

# Add text with custom chunking
doc_id = client.add_text(
  content: long_text_content,
  title: "Long Text Document",
  chunk_size: 800,
  chunk_overlap: 150
)

Directory Processing¶

# Process entire directory
results = client.add_directory(
  path: '/path/to/documents',
  recursive: true
)

# Returns array of results:
# [
#   { file: "/path/to/doc1.pdf", document_id: "123", status: "success" },
#   { file: "/path/to/doc2.txt", document_id: "124", status: "success" },
#   { file: "/path/to/bad.doc", error: "Unsupported format", status: "error" }
# ]

Document Retrieval¶

Single Document Retrieval¶

# Get document by ID
document = client.get_document(id: "123")
# Returns document hash with basic information

# Get document status
status = client.document_status(id: "123")
# Returns:
# {
#   id: "123",
#   title: "research_paper",
#   status: "processed",
#   embeddings_count: 24,
#   embeddings_ready: true,
#   content_preview: "This research paper discusses...",
#   message: "Document processed successfully with 24 embeddings"
# }

Document Listing¶

# List documents with basic options
documents = client.list_documents

# List with options
documents = client.list_documents(
  status: 'processed',
  document_type: 'pdf',
  limit: 20
)

Document Updates¶

# Update document metadata
result = client.update_document(
  id: "123",
  title: "Updated Title",
  metadata: {
    category: 'technical',
    priority: 'high',
    last_reviewed: Date.current
  }
)

# Reprocess document with new settings
result = client.reprocess_document(
  id: "123",
  chunk_size: 1200,
  regenerate_embeddings: true,
  update_summary: true,
  extract_keywords: true
)

Document Deletion¶

# Delete single document
result = client.delete_document(id: "123")
# Returns:
# {
#   success: true,
#   message: "Document 123 and associated content deleted successfully",
#   deleted_embeddings: 24,
#   deleted_content_items: 4
# }

# Delete multiple documents
result = client.delete_documents(ids: ["123", "124", "125"])

# Delete with criteria
result = client.delete_documents(
  status: 'error',
  created_before: 1.month.ago,
  confirm: true  # Safety check for bulk deletion
)

Search Operations¶

Basic Search¶

# Simple semantic search
results = client.search(query: "machine learning algorithms")

# Search with options
results = client.search(
  query: "neural network architectures",
  limit: 25,
  threshold: 0.8
)

# Returns:
# {
#   query: "neural network architectures",
#   results: [
#     {
#       embedding_id: "456",
#       document_id: "123",
#       document_title: "Deep Learning Fundamentals",
#       document_location: "/path/to/document.pdf",
#       content: "Neural networks are computational models...",
#       similarity: 0.92,
#       chunk_index: 3,
#       usage_count: 5
#     }
#   ],
#   total_results: 15
# }

# Hybrid search (semantic + full-text)
results = client.hybrid_search(
  query: "neural networks",
  semantic_weight: 0.7,
  text_weight: 0.3
)

Search Analytics¶

# Simple search analytics available
analytics = client.search_analytics(days: 30)
# Returns basic usage statistics

RAG Enhancement¶

Context Retrieval¶

# Get relevant context for a query
context = client.get_context(
  query: "How do neural networks learn?",
  limit: 5
)

# Returns:
# {
#   context_chunks: [
#     {
#       content: "Neural networks learn through backpropagation...",
#       source: "/path/to/textbook.pdf",
#       similarity: 0.91,
#       chunk_index: 3
#     }
#   ],
#   combined_context: "Neural networks learn through backpropagation...",
#   total_chunks: 5
# }

Prompt Enhancement¶

# Enhance prompt with relevant context
enhanced = client.enhance_prompt(
  prompt: "Explain how neural networks learn",
  context_limit: 5
)

# Returns:
# {
#   enhanced_prompt: "You are an AI assistant. Use the following context...",
#   original_prompt: "Explain how neural networks learn",
#   context_sources: ["/path/to/textbook.pdf", "/path/to/paper.pdf"],
#   context_count: 3
# }

Document Status and Monitoring¶

Processing Status¶

# Check document processing status
status = client.document_status(id: "123")
# Returns:
# {
#   document_id: "123",
#   status: "processing",  # pending, processing, processed, error
#   progress: 65,          # Percentage complete
#   message: "Generating embeddings (15/24 chunks complete)",
#   jobs_queued: 2,
#   jobs_completed: 6,
#   estimated_completion: "2024-01-15T10:30:00Z",
#   processing_steps: {
#     text_extraction: "completed",
#     embedding_generation: "in_progress",
#     keyword_extraction: "queued",
#     summary_generation: "queued"
#   },
#   error_details: nil
# }

# Batch status check
statuses = client.batch_document_status(ids: ["123", "124", "125"])

Background Job Management¶

# Monitor background jobs
job_status = client.job_status
# Returns:
# {
#   active_jobs: 12,
#   queued_jobs: 5,
#   failed_jobs: 1,
#   queue_status: {
#     embeddings: { size: 3, latency: 2.5 },
#     processing: { size: 2, latency: 1.1 },
#     analysis: { size: 0, latency: 0.0 }
#   },
#   worker_status: {
#     busy_workers: 4,
#     idle_workers: 2,
#     total_workers: 6
#   }
# }

# Retry failed jobs
result = client.retry_failed_jobs(
  job_types: ['GenerateEmbeddingsJob'],
  older_than: 1.hour.ago
)

System Operations¶

Health Monitoring¶

# Simple health check
healthy = client.healthy?
# Returns true/false

# Basic system statistics
stats = client.stats
# Returns document statistics hash

Error Handling¶

Standard Error Response Format¶

# All methods return structured error information on failure
begin
  result = client.add_document(path: 'nonexistent.pdf')
rescue Ragdoll::Core::DocumentNotFoundError => e
  puts e.message  # "File not found: nonexistent.pdf"
  puts e.error_code  # "DOCUMENT_NOT_FOUND"
  puts e.details  # { path: 'nonexistent.pdf', checked_locations: [...] }
end

# Check for errors in response
result = client.search(query: "test")
if result[:success]
  # Process results
  puts result[:results]
else
  # Handle error
  puts result[:error]
  puts result[:error_code]
  puts result[:details]
end

Common Error Types¶

# Document processing errors
Ragdoll::Core::DocumentNotFoundError
Ragdoll::Core::UnsupportedDocumentTypeError
Ragdoll::Core::DocumentProcessingError

# Search errors
Ragdoll::Core::InvalidQueryError
Ragdoll::Core::SearchTimeoutError
Ragdoll::Core::EmbeddingGenerationError

# Configuration errors
Ragdoll::Core::ConfigurationError
Ragdoll::Core::InvalidProviderError
Ragdoll::Core::MissingCredentialsError

# System errors
Ragdoll::Core::DatabaseConnectionError
Ragdoll::Core::BackgroundJobError
Ragdoll::Core::SystemHealthError

Best Practices¶

1. Error Handling¶

Always check for errors in responses
Implement retry logic for transient failures
Log errors with sufficient context for debugging
Use appropriate error handling strategies for different error types

2. Performance Optimization¶

Use batch operations for multiple documents
Implement appropriate caching for frequently accessed data
Monitor search performance and adjust thresholds accordingly
Use background processing for large document collections

3. Search Strategy¶

Start with default settings and tune based on results quality
Use faceted search to help users navigate large result sets
Implement search suggestions to improve user experience
Monitor search analytics to understand usage patterns

4. RAG Implementation¶

Choose appropriate context limits based on your LLM's capabilities
Use prompt templates for consistent formatting
Include source attribution for transparency
Monitor RAG response quality and adjust context retrieval accordingly

The Client API provides a comprehensive interface for building sophisticated document intelligence applications with Ragdoll, offering both simplicity for basic use cases and advanced features for complex requirements.