Skip to content

Client API Reference

The Ragdoll Client provides a high-level interface for all document intelligence operations. This comprehensive API handles document management, search operations, RAG enhancement, and system monitoring through a clean, intuitive interface.

Detailed API Documentation

For complete class and method documentation, see the Ruby API Documentation (RDoc) which provides detailed technical reference for all Ragdoll classes and methods.

Overview

Schema Note: Following recent schema optimization, the Ragdoll database now uses a normalized schema where embedding_model information is stored in content-specific tables rather than duplicated in individual embeddings. This provides better data organization while maintaining all API functionality.

The Client class serves as the primary orchestration layer, providing:

  • Document Lifecycle Management: Add, update, delete, and monitor documents
  • Multi-Modal Content Support: Handle text, image, and audio content seamlessly
  • Advanced Search Operations: Semantic, full-text, and hybrid search capabilities
  • RAG Enhancement: Context retrieval and prompt enhancement for LLM applications
  • System Analytics: Health monitoring, usage statistics, and performance metrics
  • Background Processing: Asynchronous job management and status tracking

Client Initialization

Basic Initialization

# Setup configuration first
Ragdoll::Core.configure do |config|
  config.llm_providers[:openai][:api_key] = ENV['OPENAI_API_KEY']
  config.models[:embedding][:text] = 'openai/text-embedding-3-small'
  config.models[:text_generation][:default] = 'openai/gpt-4o-mini'
  config.database = {
    adapter: 'postgresql',
    database: 'ragdoll_development',
    username: 'ragdoll',
    password: ENV['RAGDOLL_DATABASE_PASSWORD'],
    host: 'localhost',
    port: 5432,
    auto_migrate: true
  }
end

# Initialize client (uses global configuration)
client = Ragdoll::Core::Client.new

Document Management

Adding Documents

Single Document Addition

# Add document from file path
result = client.add_document(path: 'research_paper.pdf')
# Returns:
# {
#   success: true,
#   document_id: "123",
#   title: "research_paper",
#   document_type: "pdf",
#   content_length: 15000,
#   embeddings_queued: true,
#   message: "Document 'research_paper' added successfully with ID 123"
# }

Text Content Addition

# Add raw text content
doc_id = client.add_text(
  content: "This is the text content to be processed...",
  title: "Text Document",
  source: 'user_input',
  language: 'en'
)

# Add text with custom chunking
doc_id = client.add_text(
  content: long_text_content,
  title: "Long Text Document",
  chunk_size: 800,
  chunk_overlap: 150
)

Directory Processing

# Process entire directory
results = client.add_directory(
  path: '/path/to/documents',
  recursive: true
)

# Returns array of results:
# [
#   { file: "/path/to/doc1.pdf", document_id: "123", status: "success" },
#   { file: "/path/to/doc2.txt", document_id: "124", status: "success" },
#   { file: "/path/to/bad.doc", error: "Unsupported format", status: "error" }
# ]

Document Retrieval

Single Document Retrieval

# Get document by ID
document = client.get_document(id: "123")
# Returns document hash with basic information

# Get document status
status = client.document_status(id: "123")
# Returns:
# {
#   id: "123",
#   title: "research_paper",
#   status: "processed",
#   embeddings_count: 24,
#   embeddings_ready: true,
#   content_preview: "This research paper discusses...",
#   message: "Document processed successfully with 24 embeddings"
# }

Document Listing

# List documents with basic options
documents = client.list_documents

# List with options
documents = client.list_documents(
  status: 'processed',
  document_type: 'pdf',
  limit: 20
)

Document Updates

# Update document metadata
result = client.update_document(
  id: "123",
  title: "Updated Title",
  metadata: {
    category: 'technical',
    priority: 'high',
    last_reviewed: Date.current
  }
)

# Reprocess document with new settings
result = client.reprocess_document(
  id: "123",
  chunk_size: 1200,
  regenerate_embeddings: true,
  update_summary: true,
  extract_keywords: true
)

Document Deletion

# Delete single document
result = client.delete_document(id: "123")
# Returns:
# {
#   success: true,
#   message: "Document 123 and associated content deleted successfully",
#   deleted_embeddings: 24,
#   deleted_content_items: 4
# }

# Delete multiple documents
result = client.delete_documents(ids: ["123", "124", "125"])

# Delete with criteria
result = client.delete_documents(
  status: 'error',
  created_before: 1.month.ago,
  confirm: true  # Safety check for bulk deletion
)

Search Operations

# Simple semantic search
results = client.search(query: "machine learning algorithms")

# Search with options
results = client.search(
  query: "neural network architectures",
  limit: 25,
  threshold: 0.8
)

# Returns:
# {
#   query: "neural network architectures",
#   results: [
#     {
#       embedding_id: "456",
#       document_id: "123",
#       document_title: "Deep Learning Fundamentals",
#       document_location: "/path/to/document.pdf",
#       content: "Neural networks are computational models...",
#       similarity: 0.92,
#       chunk_index: 3,
#       usage_count: 5
#     }
#   ],
#   total_results: 15
# }

# Hybrid search (semantic + full-text)
results = client.hybrid_search(
  query: "neural networks",
  semantic_weight: 0.7,
  text_weight: 0.3
)

Search Analytics

# Simple search analytics available
analytics = client.search_analytics(days: 30)
# Returns basic usage statistics

RAG Enhancement

Context Retrieval

# Get relevant context for a query
context = client.get_context(
  query: "How do neural networks learn?",
  limit: 5
)

# Returns:
# {
#   context_chunks: [
#     {
#       content: "Neural networks learn through backpropagation...",
#       source: "/path/to/textbook.pdf",
#       similarity: 0.91,
#       chunk_index: 3
#     }
#   ],
#   combined_context: "Neural networks learn through backpropagation...",
#   total_chunks: 5
# }

Prompt Enhancement

# Enhance prompt with relevant context
enhanced = client.enhance_prompt(
  prompt: "Explain how neural networks learn",
  context_limit: 5
)

# Returns:
# {
#   enhanced_prompt: "You are an AI assistant. Use the following context...",
#   original_prompt: "Explain how neural networks learn",
#   context_sources: ["/path/to/textbook.pdf", "/path/to/paper.pdf"],
#   context_count: 3
# }

Document Status and Monitoring

Processing Status

# Check document processing status
status = client.document_status(id: "123")
# Returns:
# {
#   document_id: "123",
#   status: "processing",  # pending, processing, processed, error
#   progress: 65,          # Percentage complete
#   message: "Generating embeddings (15/24 chunks complete)",
#   jobs_queued: 2,
#   jobs_completed: 6,
#   estimated_completion: "2024-01-15T10:30:00Z",
#   processing_steps: {
#     text_extraction: "completed",
#     embedding_generation: "in_progress",
#     keyword_extraction: "queued",
#     summary_generation: "queued"
#   },
#   error_details: nil
# }

# Batch status check
statuses = client.batch_document_status(ids: ["123", "124", "125"])

Background Job Management

# Monitor background jobs
job_status = client.job_status
# Returns:
# {
#   active_jobs: 12,
#   queued_jobs: 5,
#   failed_jobs: 1,
#   queue_status: {
#     embeddings: { size: 3, latency: 2.5 },
#     processing: { size: 2, latency: 1.1 },
#     analysis: { size: 0, latency: 0.0 }
#   },
#   worker_status: {
#     busy_workers: 4,
#     idle_workers: 2,
#     total_workers: 6
#   }
# }

# Retry failed jobs
result = client.retry_failed_jobs(
  job_types: ['GenerateEmbeddingsJob'],
  older_than: 1.hour.ago
)

System Operations

Health Monitoring

# Simple health check
healthy = client.healthy?
# Returns true/false

# Basic system statistics
stats = client.stats
# Returns document statistics hash

Error Handling

Standard Error Response Format

# All methods return structured error information on failure
begin
  result = client.add_document(path: 'nonexistent.pdf')
rescue Ragdoll::Core::DocumentNotFoundError => e
  puts e.message  # "File not found: nonexistent.pdf"
  puts e.error_code  # "DOCUMENT_NOT_FOUND"
  puts e.details  # { path: 'nonexistent.pdf', checked_locations: [...] }
end

# Check for errors in response
result = client.search(query: "test")
if result[:success]
  # Process results
  puts result[:results]
else
  # Handle error
  puts result[:error]
  puts result[:error_code]
  puts result[:details]
end

Common Error Types

# Document processing errors
Ragdoll::Core::DocumentNotFoundError
Ragdoll::Core::UnsupportedDocumentTypeError
Ragdoll::Core::DocumentProcessingError

# Search errors
Ragdoll::Core::InvalidQueryError
Ragdoll::Core::SearchTimeoutError
Ragdoll::Core::EmbeddingGenerationError

# Configuration errors
Ragdoll::Core::ConfigurationError
Ragdoll::Core::InvalidProviderError
Ragdoll::Core::MissingCredentialsError

# System errors
Ragdoll::Core::DatabaseConnectionError
Ragdoll::Core::BackgroundJobError
Ragdoll::Core::SystemHealthError

Best Practices

1. Error Handling

  • Always check for errors in responses
  • Implement retry logic for transient failures
  • Log errors with sufficient context for debugging
  • Use appropriate error handling strategies for different error types

2. Performance Optimization

  • Use batch operations for multiple documents
  • Implement appropriate caching for frequently accessed data
  • Monitor search performance and adjust thresholds accordingly
  • Use background processing for large document collections

3. Search Strategy

  • Start with default settings and tune based on results quality
  • Use faceted search to help users navigate large result sets
  • Implement search suggestions to improve user experience
  • Monitor search analytics to understand usage patterns

4. RAG Implementation

  • Choose appropriate context limits based on your LLM's capabilities
  • Use prompt templates for consistent formatting
  • Include source attribution for transparency
  • Monitor RAG response quality and adjust context retrieval accordingly

The Client API provides a comprehensive interface for building sophisticated document intelligence applications with Ragdoll, offering both simplicity for basic use cases and advanced features for complex requirements.