Class: FactDb::Models::Source

Inherits:
ActiveRecord::Base
  • Object
show all
Defined in:
lib/fact_db/models/source.rb

Overview

Represents a source document from which facts are extracted

Sources are immutable content documents (emails, transcripts, documents, etc.) that serve as the provenance for extracted facts. Content is deduplicated by SHA256 hash.

Examples:

Create a source

source = Source.create!(content: "Meeting notes...", kind: "meeting_notes", captured_at: Time.now)

Search sources

Source.search_text("quarterly report").by_kind("document")

Constant Summary collapse

KINDS =

Returns valid source content kinds.

Returns:

  • (Array<String>)

    valid source content kinds

%w[email transcript document slack meeting_notes contract report].freeze

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.nearest_neighbors(embedding, limit: 10) ⇒ ActiveRecord::Relation

Finds sources by vector similarity using pgvector

Parameters:

  • embedding (Array<Float>)

    the embedding vector to search with

  • limit (Integer) (defaults to: 10)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    sources ordered by similarity



74
75
76
77
78
# File 'lib/fact_db/models/source.rb', line 74

def self.nearest_neighbors(embedding, limit: 10)
  return none unless embedding

  order(Arel.sql("embedding <=> '#{embedding}'")).limit(limit)
end

Instance Method Details

#by_kind(k) ⇒ ActiveRecord::Relation

Returns sources of a specific kind

Parameters:

  • k (String)

    the source kind

Returns:

  • (ActiveRecord::Relation)


40
# File 'lib/fact_db/models/source.rb', line 40

scope :by_kind, ->(k) { where(kind: k) }

#captured_after(date) ⇒ ActiveRecord::Relation

Returns sources captured after a date

Parameters:

  • date (Date, Time)

    the cutoff date

Returns:

  • (ActiveRecord::Relation)


53
# File 'lib/fact_db/models/source.rb', line 53

scope :captured_after, ->(date) { where("captured_at >= ?", date) }

#captured_before(date) ⇒ ActiveRecord::Relation

Returns sources captured before a date

Parameters:

  • date (Date, Time)

    the cutoff date

Returns:

  • (ActiveRecord::Relation)


59
# File 'lib/fact_db/models/source.rb', line 59

scope :captured_before, ->(date) { where("captured_at <= ?", date) }

#captured_between(from, to) ⇒ ActiveRecord::Relation

Returns sources captured within a date range

Parameters:

  • from (Date, Time)

    start of range

  • to (Date, Time)

    end of range

Returns:

  • (ActiveRecord::Relation)


47
# File 'lib/fact_db/models/source.rb', line 47

scope :captured_between, ->(from, to) { where(captured_at: from..to) }

#immutable?Boolean

Returns whether the source content can be modified

Sources are always immutable to preserve provenance integrity.

Returns:

  • (Boolean)

    always returns true



85
86
87
# File 'lib/fact_db/models/source.rb', line 85

def immutable?
  true
end

#preview(length: 200) ⇒ String

Returns a preview of the content, truncated if needed

Parameters:

  • length (Integer) (defaults to: 200)

    maximum length (default: 200)

Returns:

  • (String)

    content preview with “…” if truncated



100
101
102
103
104
# File 'lib/fact_db/models/source.rb', line 100

def preview(length: 200)
  return content if content.length <= length

  "#{content[0, length]}..."
end

#search_text(query) ⇒ ActiveRecord::Relation

Full-text search on source content using PostgreSQL tsvector

Parameters:

  • query (String)

    the search query

Returns:

  • (ActiveRecord::Relation)


65
66
67
# File 'lib/fact_db/models/source.rb', line 65

scope :search_text, lambda { |query|
  where("to_tsvector('english', content) @@ plainto_tsquery('english', ?)", query)
}

#word_countInteger

Returns the word count of the content

Returns:

  • (Integer)

    number of words in content



92
93
94
# File 'lib/fact_db/models/source.rb', line 92

def word_count
  content.split.size
end