LLMExtractor¶

AI-powered fact extraction using large language models.

Class: `FactDb::Extractors::LLMExtractor`¶

extractor = FactDb::Extractors::LLMExtractor.new(config)

Requirements¶

ruby_llm gem installed
LLM provider configured (API key, model)

Configuration¶

FactDb.configure do |config|
  config.llm.provider = :openai
  config.llm.model = "gpt-4o-mini"
  config.llm.api_key = ENV['OPENAI_API_KEY']
end

Methods¶

extract¶

def extract(content)

Extract facts from content using LLM.

Parameters:

source (Models::Source) - Source to process

Returns: Array<Models::Fact>

Example:

extractor = LLMExtractor.new(config)
facts = extractor.extract(source)

facts.each do |fact|
  puts fact.text
  puts "  Valid: #{fact.valid_at}"
  puts "  Confidence: #{fact.confidence}"
end

Extraction Process¶

Prompt Construction - Build prompt with content text
LLM Call - Send to configured LLM provider
Response Parsing - Parse JSON response
Fact Creation - Create fact records
Entity Resolution - Resolve mentioned entities
Source Linking - Link facts to source content

Prompt Structure¶

The extractor uses a structured prompt:

Extract temporal facts from this content. For each fact:
1. Identify the assertion (what is being stated)
2. Identify entities mentioned (people, organizations, places)
3. Determine when the fact became valid
4. Assess confidence level

Content:
{source.content}

Return JSON:
{
  "facts": [
    {
      "text": "...",
      "valid_at": "YYYY-MM-DD",
      "entities": [
        {"name": "...", "type": "person|organization|place", "role": "subject|object|..."}
      ],
      "confidence": 0.0-1.0
    }
  ]
}

Supported Providers¶

Provider	Models	Config
OpenAI	gpt-4o, gpt-4o-mini	`llm.provider = :openai`
Anthropic	claude-sonnet-4, claude-3-haiku	`llm.provider = :anthropic`
Google	gemini-2.0-flash	`llm.provider = :gemini`
Ollama	llama3.2, mistral	`llm.provider = :ollama`
AWS Bedrock	claude-sonnet-4	`llm.provider = :bedrock`
OpenRouter	Various	`llm.provider = :openrouter`

Error Handling¶

begin
  facts = extractor.extract(content)
rescue FactDb::ConfigurationError => e
  # LLM not configured
  puts "Config error: #{e.message}"
rescue FactDb::ExtractionError => e
  # Extraction failed
  puts "Extraction error: #{e.message}"
end

Advantages¶

Handles unstructured text
Understands context and nuance
Identifies implicit facts
Resolves entities automatically

Disadvantages¶

API costs
Latency
Occasional errors
Requires validation

Best Practices¶

1. Validate Results¶

facts = extractor.extract(source)
facts.each do |fact|
  if fact.confidence < 0.7
    fact.update!(metadata: { needs_review: true })
  end
end

2. Cache Responses¶

cache_key = "llm:#{source.content_hash}"
facts = Rails.cache.fetch(cache_key) do
  extractor.extract(source)
end

3. Handle Rate Limits¶

require 'retryable'

Retryable.retryable(tries: 3, sleep: lambda { |n| 2**n }) do
  extractor.extract(source)
end

LLMExtractor¶

Class: FactDb::Extractors::LLMExtractor¶

Requirements¶

Configuration¶

Methods¶

extract¶

Extraction Process¶

Prompt Structure¶

Supported Providers¶

Error Handling¶

Advantages¶

Disadvantages¶

Best Practices¶

1. Validate Results¶

2. Cache Responses¶

3. Handle Rate Limits¶

Class: `FactDb::Extractors::LLMExtractor`¶