Three-Layer Model¶

FactDb organizes information into three distinct layers, each with specific responsibilities.

Overview¶

graph TB
    subgraph Layer1["Layer 1: Content"]
        C1[Immutable Documents]
        C2[Source Evidence]
        C3[Captured Timestamps]
    end

    subgraph Layer2["Layer 2: Entities"]
        E1[Canonical Names]
        E2[Aliases]
        E3[Types]
    end

    subgraph Layer3["Layer 3: Facts"]
        F1[Temporal Assertions]
        F2[Validity Periods]
        F3[Status Tracking]
    end

    Layer1 --> Layer3
    Layer2 --> Layer3

    style C1 fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
    style C2 fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
    style C3 fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
    style E1 fill:#047857,stroke:#065F46,color:#FFFFFF
    style E2 fill:#047857,stroke:#065F46,color:#FFFFFF
    style E3 fill:#047857,stroke:#065F46,color:#FFFFFF
    style F1 fill:#B91C1C,stroke:#991B1B,color:#FFFFFF
    style F2 fill:#B91C1C,stroke:#991B1B,color:#FFFFFF
    style F3 fill:#B91C1C,stroke:#991B1B,color:#FFFFFF

Layer 1: Content¶

The content layer stores raw source material that serves as evidence for facts.

Characteristics¶

Property	Description
Immutable	Content never changes after ingestion
Deduplicated	SHA256 hash prevents duplicate storage
Timestamped	`captured_at` records when content was obtained
Typed	Categories like email, document, article
Searchable	Full-text and semantic vector search

Content Types¶

# Common content types
:email        # Email messages
:document     # General documents
:article      # News articles
:transcript   # Meeting transcripts
:report       # Reports and analysis
:announcement # Official announcements
:social       # Social media posts

Example¶

source = facts.ingest(
  "Paula Chen accepted the offer for Principal Engineer...",
  type: :email,
  title: "RE: Offer Letter - Paula Chen",
  source_uri: "mailto:hr@company.com/12345",
  captured_at: Time.current,
  metadata: {
    from: "hr@company.com",
    to: "hiring@company.com",
    subject: "RE: Offer Letter - Paula Chen"
  }
)

Layer 2: Entities¶

Entities represent real-world things mentioned in content.

Entity Types¶

Type	Description	Examples
`person`	Individual people	Paula Chen, John Smith
`organization`	Companies, teams, groups	Microsoft, Platform Team
`place`	Locations	San Francisco, Building A
`product`	Products and services	Windows 11, Azure
`event`	Named events	Q4 Earnings, Annual Review

Resolution Status¶

stateDiagram-v2
    [*] --> unresolved: Created
    unresolved --> resolved: Confirmed identity
    resolved --> merged: Duplicate found
    merged --> [*]: Points to canonical

    classDef blue fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
    classDef green fill:#047857,stroke:#065F46,color:#FFFFFF
    classDef red fill:#B91C1C,stroke:#991B1B,color:#FFFFFF

    class unresolved blue
    class resolved green
    class merged red

unresolved - Entity created but not confirmed
resolved - Entity identity confirmed
merged - Entity merged into another (canonical) entity

Aliases¶

Entities can have multiple aliases for flexible matching:

entity = facts.entity_service.create(
  "Paula Chen",
  type: :person,
  aliases: [
    "Paula",
    "P. Chen",
    "Chen, Paula"
  ]
)

Layer 3: Facts¶

Facts are temporal assertions about entities, extracted from content.

Fact Structure¶

fact = Models::Fact.new(
  text: "Paula Chen is Principal Engineer at Microsoft",
  valid_at: Date.parse("2024-01-10"),
  invalid_at: nil,  # Still valid
  status: "canonical",
  confidence: 0.95,
  extraction_method: "llm"
)

Temporal Bounds¶

Every fact has:

valid_at - When the fact became true (required)
invalid_at - When the fact stopped being true (nil if current)

# Currently valid fact
fact1 = { valid_at: "2024-01-10", invalid_at: nil }

# Historical fact
fact2 = { valid_at: "2023-01-01", invalid_at: "2024-01-09" }

# Point-in-time query
facts.facts_at(Date.parse("2023-06-15"))  # Returns fact2
facts.facts_at(Date.parse("2024-02-01"))  # Returns fact1

Fact Status¶

Status	Description
`canonical`	Current authoritative version
`superseded`	Replaced by newer information
`corroborated`	Confirmed by multiple sources
`synthesized`	Derived from multiple facts

Relationships¶

Facts connect to both content and entities:

graph LR
    S[Source] -->|fact_sources| F[Fact]
    F -->|entity_mentions| E1[Entity 1]
    F -->|entity_mentions| E2[Entity 2]

    style S fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
    style F fill:#B91C1C,stroke:#991B1B,color:#FFFFFF
    style E1 fill:#047857,stroke:#065F46,color:#FFFFFF
    style E2 fill:#047857,stroke:#065F46,color:#FFFFFF

Layer Interactions¶

Source to Facts¶

Facts are extracted from sources and maintain source links:

# Extract facts from source
extracted = facts.extract_facts(source.id, extractor: :llm)

# Each fact links back to source
extracted.first.fact_sources.each do |fs|
  puts fs.source.title
  puts fs.excerpt
end

Entities to Facts¶

Facts mention entities with specific roles:

fact.entity_mentions.each do |mention|
  puts "#{mention.entity.name}: #{mention.mention_role}"
end
# Output:
# Paula Chen: subject
# Microsoft: organization
# Principal Engineer: role

Cross-Layer Queries¶

Query across all layers:

# Find all sources about an entity
sources = facts.source_service.mentioning_entity(paula.id)

# Find all entities mentioned in source
entities = facts.entity_service.in_source(source.id)

# Find all facts from a specific source
source_facts = facts.fact_service.from_source(source.id)