Skip to content

Client API Reference

The Ragdoll Client provides a high-level interface for all document intelligence operations. This comprehensive API handles document management, search operations, RAG enhancement, and system monitoring through a clean, intuitive interface.

Overview

Schema Note: Following recent schema optimization, the Ragdoll database now uses a normalized schema where embedding_model information is stored in content-specific tables rather than duplicated in individual embeddings. This provides better data organization while maintaining all API functionality.

The Client class serves as the primary orchestration layer, providing:

  • Document Lifecycle Management: Add, update, delete, and monitor documents
  • Multi-Modal Content Support: Handle text, image, and audio content seamlessly
  • Advanced Search Operations: Semantic, full-text, and hybrid search capabilities
  • RAG Enhancement: Context retrieval and prompt enhancement for LLM applications
  • System Analytics: Health monitoring, usage statistics, and performance metrics
  • Background Processing: Asynchronous job management and status tracking

Client Initialization

Basic Initialization

# Setup configuration first
Ragdoll::Core.configure do |config|
  config.llm_providers[:openai][:api_key] = ENV['OPENAI_API_KEY']
  config.models[:embedding][:text] = 'openai/text-embedding-3-small'
  config.models[:text_generation][:default] = 'openai/gpt-4o-mini'
  config.database = {
    adapter: 'postgresql',
    database: 'ragdoll_development',
    username: 'ragdoll',
    password: ENV['RAGDOLL_DATABASE_PASSWORD'],
    host: 'localhost',
    port: 5432,
    auto_migrate: true
  }
end

# Initialize client (uses global configuration)
client = Ragdoll::Core::Client.new

Document Management

Adding Documents

Single Document Addition

# Add document from file path
result = client.add_document(path: 'research_paper.pdf')
# Returns:
# {
#   success: true,
#   document_id: "123",
#   title: "research_paper",
#   document_type: "pdf",
#   content_length: 15000,
#   embeddings_queued: true,
#   message: "Document 'research_paper' added successfully with ID 123"
# }

Text Content Addition

# Add raw text content
doc_id = client.add_text(
  content: "This is the text content to be processed...",
  title: "Text Document",
  source: 'user_input',
  language: 'en'
)

# Add text with custom chunking
doc_id = client.add_text(
  content: long_text_content,
  title: "Long Text Document",
  chunk_size: 800,
  chunk_overlap: 150
)

Directory Processing

# Process entire directory
results = client.add_directory(
  path: '/path/to/documents',
  recursive: true
)

# Returns array of results:
# [
#   { file: "/path/to/doc1.pdf", document_id: "123", status: "success" },
#   { file: "/path/to/doc2.txt", document_id: "124", status: "success" },
#   { file: "/path/to/bad.doc", error: "Unsupported format", status: "error" }
# ]

Document Retrieval

Single Document Retrieval

# Get document by ID
document = client.get_document(id: "123")
# Returns document hash with basic information

# Get document status
status = client.document_status(id: "123")
# Returns:
# {
#   id: "123",
#   title: "research_paper",
#   status: "processed",
#   embeddings_count: 24,
#   embeddings_ready: true,
#   content_preview: "This research paper discusses...",
#   message: "Document processed successfully with 24 embeddings"
# }

Document Listing

# List documents with basic options
documents = client.list_documents

# List with options
documents = client.list_documents(
  status: 'processed',
  document_type: 'pdf',
  limit: 20
)

Document Updates

# Update document metadata
result = client.update_document(
  id: "123",
  title: "Updated Title",
  metadata: {
    category: 'technical',
    priority: 'high',
    last_reviewed: Date.current
  }
)

# Reprocess document with new settings
result = client.reprocess_document(
  id: "123",
  chunk_size: 1200,
  regenerate_embeddings: true,
  update_summary: true,
  extract_keywords: true
)

Document Deletion

# Delete single document
result = client.delete_document(id: "123")
# Returns:
# {
#   success: true,
#   message: "Document 123 and associated content deleted successfully",
#   deleted_embeddings: 24,
#   deleted_content_items: 4
# }

# Delete multiple documents
result = client.delete_documents(ids: ["123", "124", "125"])

# Delete with criteria
result = client.delete_documents(
  status: 'error',
  created_before: 1.month.ago,
  confirm: true  # Safety check for bulk deletion
)

Search Operations

# Simple semantic search
results = client.search(query: "machine learning algorithms")

# Search with options
results = client.search(
  query: "neural network architectures",
  limit: 25,
  threshold: 0.8
)

# Returns:
# {
#   query: "neural network architectures",
#   results: [
#     {
#       embedding_id: "456",
#       document_id: "123",
#       document_title: "Deep Learning Fundamentals",
#       document_location: "/path/to/document.pdf",
#       content: "Neural networks are computational models...",
#       similarity: 0.92,
#       chunk_index: 3,
#       usage_count: 5
#     }
#   ],
#   total_results: 15
# }

# Hybrid search (semantic + full-text)
results = client.hybrid_search(
  query: "neural networks",
  semantic_weight: 0.7,
  text_weight: 0.3
)

Search Analytics

# Simple search analytics available
analytics = client.search_analytics(days: 30)
# Returns basic usage statistics

RAG Enhancement

Context Retrieval

# Get relevant context for a query
context = client.get_context(
  query: "How do neural networks learn?",
  limit: 5
)

# Returns:
# {
#   context_chunks: [
#     {
#       content: "Neural networks learn through backpropagation...",
#       source: "/path/to/textbook.pdf",
#       similarity: 0.91,
#       chunk_index: 3
#     }
#   ],
#   combined_context: "Neural networks learn through backpropagation...",
#   total_chunks: 5
# }

Prompt Enhancement

# Enhance prompt with relevant context
enhanced = client.enhance_prompt(
  prompt: "Explain how neural networks learn",
  context_limit: 5
)

# Returns:
# {
#   enhanced_prompt: "You are an AI assistant. Use the following context...",
#   original_prompt: "Explain how neural networks learn",
#   context_sources: ["/path/to/textbook.pdf", "/path/to/paper.pdf"],
#   context_count: 3
# }

Document Status and Monitoring

Processing Status

# Check document processing status
status = client.document_status(id: "123")
# Returns:
# {
#   document_id: "123",
#   status: "processing",  # pending, processing, processed, error
#   progress: 65,          # Percentage complete
#   message: "Generating embeddings (15/24 chunks complete)",
#   jobs_queued: 2,
#   jobs_completed: 6,
#   estimated_completion: "2024-01-15T10:30:00Z",
#   processing_steps: {
#     text_extraction: "completed",
#     embedding_generation: "in_progress",
#     keyword_extraction: "queued",
#     summary_generation: "queued"
#   },
#   error_details: nil
# }

# Batch status check
statuses = client.batch_document_status(ids: ["123", "124", "125"])

Background Job Management

# Monitor background jobs
job_status = client.job_status
# Returns:
# {
#   active_jobs: 12,
#   queued_jobs: 5,
#   failed_jobs: 1,
#   queue_status: {
#     embeddings: { size: 3, latency: 2.5 },
#     processing: { size: 2, latency: 1.1 },
#     analysis: { size: 0, latency: 0.0 }
#   },
#   worker_status: {
#     busy_workers: 4,
#     idle_workers: 2,
#     total_workers: 6
#   }
# }

# Retry failed jobs
result = client.retry_failed_jobs(
  job_types: ['GenerateEmbeddingsJob'],
  older_than: 1.hour.ago
)

System Operations

Health Monitoring

# Simple health check
healthy = client.healthy?
# Returns true/false

# Basic system statistics
stats = client.stats
# Returns document statistics hash

Error Handling

Standard Error Response Format

# All methods return structured error information on failure
begin
  result = client.add_document(path: 'nonexistent.pdf')
rescue Ragdoll::Core::DocumentNotFoundError => e
  puts e.message  # "File not found: nonexistent.pdf"
  puts e.error_code  # "DOCUMENT_NOT_FOUND"
  puts e.details  # { path: 'nonexistent.pdf', checked_locations: [...] }
end

# Check for errors in response
result = client.search(query: "test")
if result[:success]
  # Process results
  puts result[:results]
else
  # Handle error
  puts result[:error]
  puts result[:error_code]
  puts result[:details]
end

Common Error Types

# Document processing errors
Ragdoll::Core::DocumentNotFoundError
Ragdoll::Core::UnsupportedDocumentTypeError
Ragdoll::Core::DocumentProcessingError

# Search errors
Ragdoll::Core::InvalidQueryError
Ragdoll::Core::SearchTimeoutError
Ragdoll::Core::EmbeddingGenerationError

# Configuration errors
Ragdoll::Core::ConfigurationError
Ragdoll::Core::InvalidProviderError
Ragdoll::Core::MissingCredentialsError

# System errors
Ragdoll::Core::DatabaseConnectionError
Ragdoll::Core::BackgroundJobError
Ragdoll::Core::SystemHealthError

Best Practices

1. Error Handling

  • Always check for errors in responses
  • Implement retry logic for transient failures
  • Log errors with sufficient context for debugging
  • Use appropriate error handling strategies for different error types

2. Performance Optimization

  • Use batch operations for multiple documents
  • Implement appropriate caching for frequently accessed data
  • Monitor search performance and adjust thresholds accordingly
  • Use background processing for large document collections

3. Search Strategy

  • Start with default settings and tune based on results quality
  • Use faceted search to help users navigate large result sets
  • Implement search suggestions to improve user experience
  • Monitor search analytics to understand usage patterns

4. RAG Implementation

  • Choose appropriate context limits based on your LLM's capabilities
  • Use prompt templates for consistent formatting
  • Include source attribution for transparency
  • Monitor RAG response quality and adjust context retrieval accordingly

The Client API provides a comprehensive interface for building sophisticated document intelligence applications with Ragdoll, offering both simplicity for basic use cases and advanced features for complex requirements.