LLMExtractor¶
AI-powered fact extraction using large language models.
Class: FactDb::Extractors::LLMExtractor¶
Requirements¶
ruby_llmgem installed- LLM provider configured (API key, model)
Configuration¶
FactDb.configure do |config|
config.llm.provider = :openai
config.llm.model = "gpt-4o-mini"
config.llm.api_key = ENV['OPENAI_API_KEY']
end
Methods¶
extract¶
Extract facts from content using LLM.
Parameters:
source(Models::Source) - Source to process
Returns: Array<Models::Fact>
Example:
extractor = LLMExtractor.new(config)
facts = extractor.extract(source)
facts.each do |fact|
puts fact.text
puts " Valid: #{fact.valid_at}"
puts " Confidence: #{fact.confidence}"
end
Extraction Process¶
- Prompt Construction - Build prompt with content text
- LLM Call - Send to configured LLM provider
- Response Parsing - Parse JSON response
- Fact Creation - Create fact records
- Entity Resolution - Resolve mentioned entities
- Source Linking - Link facts to source content
Prompt Structure¶
The extractor uses a structured prompt:
Extract temporal facts from this content. For each fact:
1. Identify the assertion (what is being stated)
2. Identify entities mentioned (people, organizations, places)
3. Determine when the fact became valid
4. Assess confidence level
Content:
{source.content}
Return JSON:
{
"facts": [
{
"text": "...",
"valid_at": "YYYY-MM-DD",
"entities": [
{"name": "...", "type": "person|organization|place", "role": "subject|object|..."}
],
"confidence": 0.0-1.0
}
]
}
Supported Providers¶
| Provider | Models | Config |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini | llm.provider = :openai |
| Anthropic | claude-sonnet-4, claude-3-haiku | llm.provider = :anthropic |
| gemini-2.0-flash | llm.provider = :gemini |
|
| Ollama | llama3.2, mistral | llm.provider = :ollama |
| AWS Bedrock | claude-sonnet-4 | llm.provider = :bedrock |
| OpenRouter | Various | llm.provider = :openrouter |
Error Handling¶
begin
facts = extractor.extract(content)
rescue FactDb::ConfigurationError => e
# LLM not configured
puts "Config error: #{e.message}"
rescue FactDb::ExtractionError => e
# Extraction failed
puts "Extraction error: #{e.message}"
end
Advantages¶
- Handles unstructured text
- Understands context and nuance
- Identifies implicit facts
- Resolves entities automatically
Disadvantages¶
- API costs
- Latency
- Occasional errors
- Requires validation
Best Practices¶
1. Validate Results¶
facts = extractor.extract(source)
facts.each do |fact|
if fact.confidence < 0.7
fact.update!(metadata: { needs_review: true })
end
end
2. Cache Responses¶
cache_key = "llm:#{source.content_hash}"
facts = Rails.cache.fetch(cache_key) do
extractor.extract(source)
end