Contributing¶
Guidelines for Contributing to the Project¶
Welcome to Ragdoll! We appreciate your interest in contributing to this PostgreSQL-focused RAG system. This guide provides comprehensive information for all types of contributions.
Quick Start for Contributors¶
- Fork the repository on GitHub
- Clone your fork locally
- Set up the development environment
- Create a feature branch
- Make your changes with tests
- Submit a pull request
Getting Started¶
Development Setup¶
Fork and Clone Process¶
# 1. Fork the repository on GitHub (click Fork button)
# 2. Clone your fork
git clone https://github.com/YOUR_USERNAME/ragdoll.git
cd ragdoll
# 3. Add upstream remote
git remote add upstream https://github.com/original-org/ragdoll.git
# 4. Verify remotes
git remote -v
# origin https://github.com/YOUR_USERNAME/ragdoll.git (fetch)
# origin https://github.com/YOUR_USERNAME/ragdoll.git (push)
# upstream https://github.com/original-org/ragdoll.git (fetch)
# upstream https://github.com/original-org/ragdoll.git (push)
Development Environment Setup¶
Prerequisites: - Ruby 3.0+ (recommended: 3.2+) - PostgreSQL 12+ with pgvector extension - Git 2.0+ - Basic development tools (gcc, make, etc.)
# Install dependencies
bundle install
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys and database credentials
# Set up PostgreSQL database
./bin/setup_postgresql.rb
# Run tests to verify setup
bundle exec rake test
# Check code style
bundle exec rubocop
Initial Configuration¶
# config/development.rb (create if needed)
Ragdoll::Core.configure do |config|
config.database_config = {
adapter: 'postgresql',
database: 'ragdoll_development',
username: 'ragdoll',
password: ENV['DATABASE_PASSWORD'],
host: 'localhost',
port: 5432,
auto_migrate: true,
logger: Logger.new(STDOUT, level: Logger::INFO)
}
# Set up API keys for testing
config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
end
Understanding the Codebase¶
Architecture Overview¶
flowchart TD
A[Client Layer] --> B[Service Layer]
B --> C[Model Layer]
C --> D[Database Layer]
B --> E[Jobs Layer]
E --> F[External APIs]
D --> G[PostgreSQL + pgvector]
F --> H[LLM Providers]
subgraph "Core Services"
I[DocumentProcessor]
J[EmbeddingService]
K[SearchEngine]
L[TextGenerationService]
end
B --> I
B --> J
B --> K
B --> L
Code Organization¶
lib/ragdoll/core/
├── client.rb # Main public API
├── configuration.rb # System configuration
├── database.rb # Database setup and migrations
├── document_processor.rb # Multi-format parsing
├── embedding_service.rb # Vector generation
├── search_engine.rb # Semantic search
├── text_chunker.rb # Content segmentation
├── text_generation_service.rb # LLM integration
├── jobs/ # Background processing
│ ├── generate_embeddings.rb
│ ├── extract_keywords.rb
│ └── generate_summary.rb
├── models/ # ActiveRecord models
│ ├── document.rb
│ ├── embedding.rb
│ └── content.rb # STI base class
└── services/ # Specialized services
├── metadata_generator.rb
└── image_description_service.rb
Design Patterns¶
- Service Layer Pattern: Business logic in dedicated service classes
- Repository Pattern: Database access through ActiveRecord models
- Factory Pattern: Document creation through DocumentProcessor
- Strategy Pattern: Multiple LLM providers via configuration
- Observer Pattern: Background jobs triggered by model callbacks
Coding Conventions¶
- Ruby Style: Follow Ruby community conventions
- RuboCop: Automated style enforcement
- Naming: Descriptive names over comments
- Methods: Single responsibility, max 20 lines
- Classes: Max 200 lines, extract services for complex logic
Contribution Types¶
Code Contributions¶
Bug Fixes¶
# Create bug fix branch
git checkout -b fix/document-parsing-error
# Write failing test first
# test/core/document_processor_test.rb
def test_handles_corrupted_pdf
assert_raises(DocumentProcessor::ParseError) do
DocumentProcessor.parse('test/fixtures/corrupted.pdf')
end
end
# Implement fix
# lib/ragdoll/core/document_processor.rb
def parse_pdf
# Add error handling
rescue PDF::Reader::MalformedPDFError => e
raise ParseError, "Corrupted PDF: #{e.message}"
end
# Verify fix works
bundle exec rake test
Feature Implementations¶
# Create feature branch
git checkout -b feature/add-excel-support
# Plan implementation
# 1. Add Excel gem dependency
# 2. Implement parse_excel method
# 3. Add Excel to supported formats
# 4. Write comprehensive tests
# 5. Update documentation
# Implementation example
# Gemfile
gem 'roo', '~> 2.9'
# lib/ragdoll/core/document_processor.rb
when '.xlsx', '.xls'
parse_excel
private
def parse_excel
workbook = Roo::Spreadsheet.open(@file_path)
content = extract_excel_content(workbook)
metadata = extract_excel_metadata(workbook)
{
content: content,
metadata: metadata,
document_type: 'excel'
}
end
Performance Improvements¶
# Example: Optimize batch embedding generation
class EmbeddingService
def generate_embeddings_batch_optimized(texts, batch_size: 50)
# Process in smaller batches to reduce memory usage
texts.each_slice(batch_size).flat_map do |batch|
generate_embeddings_batch(batch)
end
end
end
# Add benchmark test
class EmbeddingServicePerformanceTest < Minitest::Test
def test_batch_processing_performance
texts = Array.new(1000) { "Sample text #{rand(1000)}" }
time = Benchmark.measure do
service.generate_embeddings_batch_optimized(texts)
end
assert time.real < 30, "Batch processing too slow: #{time.real}s"
end
end
Documentation Contributions¶
Documentation Updates¶
# Always include code examples
## New Feature Documentation
### Usage
```ruby
# Basic usage
client = Ragdoll::Core.client
result = client.new_feature(param: 'value')
# Advanced usage with options
result = client.new_feature(
param: 'value',
advanced_option: true,
callback: ->(data) { puts data }
)
Configuration¶
#### API Documentation
```ruby
# Use YARD-style documentation
class NewService
# Process documents with advanced filtering
#
# @param documents [Array<Document>] Documents to process
# @param options [Hash] Processing options
# @option options [String] :filter_type ('all') Filter criteria
# @option options [Integer] :batch_size (100) Batch processing size
# @return [Array<Hash>] Processed document results
# @raise [ProcessingError] When processing fails
#
# @example Basic usage
# service = NewService.new
# results = service.process(documents)
#
# @example With options
# results = service.process(documents,
# filter_type: 'academic',
# batch_size: 50
# )
def process(documents, options = {})
# Implementation
end
end
Testing Contributions¶
Test Coverage Improvements¶
# Always test edge cases
class DocumentProcessorTest < Minitest::Test
def test_handles_empty_file
File.write('test/fixtures/empty.txt', '')
result = DocumentProcessor.parse('test/fixtures/empty.txt')
assert_equal '', result[:content]
end
def test_handles_binary_file
assert_raises(DocumentProcessor::UnsupportedFormatError) do
DocumentProcessor.parse('test/fixtures/binary.exe')
end
end
def test_handles_very_large_file
# Create 100MB test file
large_content = 'x' * (100 * 1024 * 1024)
File.write('test/fixtures/large.txt', large_content)
result = DocumentProcessor.parse('test/fixtures/large.txt')
assert result[:content].length > 0
ensure
File.delete('test/fixtures/large.txt') if File.exist?('test/fixtures/large.txt')
end
end
Development Process¶
Branch Management¶
Branch Naming Conventions¶
- Features:
feature/short-description
- Bug fixes:
fix/issue-description
- Documentation:
docs/section-being-updated
- Refactoring:
refactor/component-name
- Performance:
perf/optimization-area
Commit Message Format¶
Type: Brief description (50 characters max)
Detailed explanation of what changed and why. Include:
- What problem this solves
- How it was implemented
- Any breaking changes
- References to issues (#123)
Types: feat, fix, docs, style, refactor, test, chore
Examples:
feat: Add Excel document processing support
- Implement parse_excel method using roo gem
- Support .xlsx and .xls file formats
- Extract cell values, formulas, and metadata
- Add comprehensive test coverage
- Update documentation with usage examples
Resolves #145
fix: Handle corrupted PDF files gracefully
- Add proper error handling for PDF::Reader exceptions
- Return informative error messages
- Add test cases for various corruption scenarios
- Prevent application crashes during parsing
Fixes #178
Code Quality¶
Pre-commit Checklist¶
# Run before every commit
# 1. Code style check
bundle exec rubocop
# 2. Run all tests
bundle exec rake test
# 3. Check test coverage
open coverage/index.html
# Ensure coverage >= 85%
# 4. Run specific tests for changed code
bundle exec rake test test/core/your_changed_test.rb
# 5. Manual testing of changes
./bin/console
# Test your changes interactively
Code Review Preparation¶
# Before creating PR
# 1. Rebase on latest main
git fetch upstream
git rebase upstream/main
# 2. Squash commits if needed
git rebase -i HEAD~3
# 3. Update documentation
# Update relevant .md files
# Add code examples
# Update CHANGELOG.md
# 4. Self-review your changes
git diff upstream/main..HEAD
Submission Process¶
Pull Request Process¶
PR Creation Guidelines¶
- Title: Clear, concise description of changes
- Description: Use the PR template
- Labels: Add appropriate labels (bug, feature, documentation)
- Reviewers: Request reviews from maintainers
- Linked Issues: Reference related issues
PR Template¶
## Summary
Brief description of changes and motivation.
## Changes Made
- [ ] Added new feature X
- [ ] Fixed bug in component Y
- [ ] Updated documentation
- [ ] Added test coverage
## Testing
- [ ] All existing tests pass
- [ ] Added new tests for changes
- [ ] Manual testing completed
- [ ] Performance impact assessed
## Documentation
- [ ] Updated relevant documentation
- [ ] Added code examples
- [ ] Updated CHANGELOG.md
## Breaking Changes
- [ ] No breaking changes
- [ ] Breaking changes documented below
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Code is commented where needed
- [ ] Tests cover edge cases
Review Process¶
- Automated Checks: CI must pass
- Code Review: At least one maintainer approval
- Testing: Manual testing by reviewer
- Documentation: Verify docs are complete
- Merge: Squash and merge after approval
Code Review Guidelines¶
For Contributors¶
- Respond promptly to review feedback
- Ask questions if feedback is unclear
- Test suggested changes before implementing
- Update PR description if scope changes
- Be patient - thorough reviews take time
For Reviewers¶
- Be constructive and specific in feedback
- Explain the why behind suggestions
- Acknowledge good code when you see it
- Test the changes locally when possible
- Consider backward compatibility impact
Bug Reports¶
Issue Templates¶
Bug Report Template¶
**Bug Description**
A clear description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Configure Ragdoll with...
2. Process document...
3. Call search method...
4. See error
**Expected Behavior**
What you expected to happen.
**Actual Behavior**
What actually happened, including error messages.
**Environment**
- Ruby version: [e.g. 3.2.0]
- Ragdoll version: [e.g. 0.1.0]
- PostgreSQL version: [e.g. 14.2]
- Operating System: [e.g. macOS 12.0]
**Additional Context**
- Configuration details
- Log output
- Sample files (if applicable)
- Stack trace
**Possible Solution**
(Optional) Suggest a fix or workaround
Information Requirements¶
Always include: - Reproduction steps: Minimal code to reproduce - Environment details: Ruby, PostgreSQL, OS versions - Configuration: Relevant Ragdoll configuration - Error messages: Complete stack traces - Expected vs actual: What should happen vs what happens
For document processing issues: - File format and size - Sample file (if not sensitive) - Processing configuration
For search issues: - Query text - Search configuration - Database state (document count, embedding count)
Feature Requests¶
Request Format¶
Feature Request Template¶
**Feature Summary**
One-line summary of the requested feature.
**Problem Statement**
What problem does this solve? What use case does it enable?
**Proposed Solution**
Detailed description of the proposed implementation.
**Alternative Solutions**
Other approaches you've considered.
**Use Cases**
Specific scenarios where this would be useful:
1. Scenario 1: ...
2. Scenario 2: ...
**Implementation Considerations**
- Database schema changes needed
- API changes required
- Backward compatibility impact
- Performance implications
**Priority**
- [ ] Critical - Blocking current work
- [ ] High - Important for upcoming release
- [ ] Medium - Would be nice to have
- [ ] Low - Future consideration
**Would you be willing to implement this?**
- [ ] Yes, I can submit a PR
- [ ] Yes, with guidance
- [ ] No, but I can test
- [ ] No
Evaluation Process¶
- Initial Review: Maintainers assess fit with project goals
- Community Discussion: Gather feedback from users
- Technical Design: Plan implementation approach
- Priority Assignment: Based on impact and complexity
- Implementation: Either by maintainers or community
Community Guidelines¶
Code of Conduct¶
Behavioral Expectations¶
- Be respectful in all interactions
- Be inclusive and welcoming to newcomers
- Be constructive in criticism and feedback
- Be patient with questions and learning
- Be professional in all communications
Communication Guidelines¶
- Use clear, descriptive titles for issues and PRs
- Provide context when asking questions
- Search existing issues before creating new ones
- Stay on topic in discussions
- Use appropriate channels for different types of communication
Communication Channels¶
- GitHub Issues: Bug reports, feature requests
- GitHub Discussions: General questions, ideas
- Pull Requests: Code review and discussion
- Documentation: Inline comments and suggestions
Release Process¶
Version Management¶
Ragdoll follows Semantic Versioning (SemVer):
- MAJOR.MINOR.PATCH (e.g., 1.2.3)
- MAJOR: Breaking changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes (backward compatible)
Release Cycle¶
- Patch releases: As needed for critical bugs
- Minor releases: Monthly with new features
- Major releases: Quarterly with breaking changes
Changelog Maintenance¶
# Changelog
## [Unreleased]
### Added
- New feature descriptions
### Changed
- Modified behavior descriptions
### Fixed
- Bug fix descriptions
### Removed
- Deprecated feature removals
## [1.2.0] - 2024-01-15
### Added
- Excel document processing support
- Batch embedding optimization
### Fixed
- PDF parsing error handling
- Memory leak in background jobs
Getting Help¶
Before Contributing¶
- Read the documentation - Check existing docs first
- Search issues - Your question might already be answered
- Try the troubleshooting guide - Common issues and solutions
- Check the development guide - Setup and workflow information
Need Assistance?¶
- Questions: Use GitHub Discussions
- Bugs: Create detailed issue reports
- Features: Submit feature request with use cases
- Code: Start with small contributions and ask for guidance
Recognition¶
Contributors are recognized in: - README.md - Contributor section - CHANGELOG.md - Release notes - GitHub - Automatic contribution tracking - Documentation - Author attribution where appropriate
Thank you for contributing to Ragdoll! 🎉
This document is part of the Ragdoll documentation suite. For immediate help, see the Quick Start Guide or API Reference.