Class: FactDb::Services::SourceService

Inherits:
Object
  • Object
show all
Defined in:
lib/fact_db/services/source_service.rb

Overview

Service class for managing source documents in the database

Provides methods for creating, searching, and retrieving source documents which are the original content from which facts are extracted.

Examples:

Basic usage

service = SourceService.new
source = service.create("Meeting notes...", kind: :document)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(config = FactDb.config) ⇒ SourceService

Initializes a new SourceService instance

Parameters:

  • config (FactDb::Config) (defaults to: FactDb.config)

    configuration object (defaults to FactDb.config)



21
22
23
# File 'lib/fact_db/services/source_service.rb', line 21

def initialize(config = FactDb.config)
  @config = config
end

Instance Attribute Details

#configFactDb::Config (readonly)

Returns the configuration object.

Returns:



16
17
18
# File 'lib/fact_db/services/source_service.rb', line 16

def config
  @config
end

Instance Method Details

#between(from, to) ⇒ ActiveRecord::Relation

Returns sources captured between two dates

Parameters:

  • from (Date, Time)

    start of range

  • to (Date, Time)

    end of range

Returns:

  • (ActiveRecord::Relation)

    sources in the date range



126
127
128
# File 'lib/fact_db/services/source_service.rb', line 126

def between(from, to)
  Models::Source.captured_between(from, to).order(captured_at: :asc)
end

#by_kind(kind, limit: nil) ⇒ ActiveRecord::Relation

Returns sources of a specific kind

Parameters:

  • kind (Symbol, String)

    the source kind

  • limit (Integer, nil) (defaults to: nil)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    sources of that kind



115
116
117
118
119
# File 'lib/fact_db/services/source_service.rb', line 115

def by_kind(kind, limit: nil)
  scope = Models::Source.by_kind(kind).order(captured_at: :desc)
  scope = scope.limit(limit) if limit
  scope
end

#create(content, kind:, captured_at: Time.current, metadata: {}, title: nil, source_uri: nil) ⇒ FactDb::Models::Source

Creates a new source document in the database

Automatically deduplicates by content hash - returns existing source if content matches.

Examples:

Create a source with metadata

service.create("Email content...",
  kind: :email,
  captured_at: Time.parse("2024-01-15"),
  metadata: { from: "john@example.com" })

Parameters:

  • content (String)

    the source content text

  • kind (Symbol, String)

    source kind (:document, :email, :transcript, etc.)

  • captured_at (Time) (defaults to: Time.current)

    when the source was captured (defaults to now)

  • metadata (Hash) (defaults to: {})

    additional metadata

  • title (String, nil) (defaults to: nil)

    optional title

  • source_uri (String, nil) (defaults to: nil)

    optional URI of the original source

Returns:



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/fact_db/services/source_service.rb', line 42

def create(content, kind:, captured_at: Time.current, metadata: {}, title: nil, source_uri: nil)
  content_hash = Digest::SHA256.hexdigest(content)

  # Check for duplicate content
  existing = Models::Source.find_by(content_hash: content_hash)
  return existing if existing

  embedding = generate_embedding(content)

  Models::Source.create!(
    content: content,
    content_hash: content_hash,
    kind: kind.to_s,
    title: title,
    source_uri: source_uri,
    metadata: ,
    captured_at: captured_at,
    embedding: embedding
  )
end

#find(id) ⇒ FactDb::Models::Source

Finds a source by ID

Parameters:

  • id (Integer)

    the source ID

Returns:

Raises:

  • (ActiveRecord::RecordNotFound)

    if source not found



68
69
70
# File 'lib/fact_db/services/source_service.rb', line 68

def find(id)
  Models::Source.find(id)
end

#find_by_hash(hash) ⇒ FactDb::Models::Source?

Finds a source by content hash

Parameters:

  • hash (String)

    the SHA256 content hash

Returns:



76
77
78
# File 'lib/fact_db/services/source_service.rb', line 76

def find_by_hash(hash)
  Models::Source.find_by(content_hash: hash)
end

#recent(limit: 10) ⇒ ActiveRecord::Relation

Returns recently captured sources

Parameters:

  • limit (Integer) (defaults to: 10)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    recent sources ordered by capture date



134
135
136
# File 'lib/fact_db/services/source_service.rb', line 134

def recent(limit: 10)
  Models::Source.order(captured_at: :desc).limit(limit)
end

#search(query, kind: nil, from: nil, to: nil, limit: 20) ⇒ ActiveRecord::Relation

Searches sources using full-text search with optional filters

Parameters:

  • query (String)

    the search query

  • kind (Symbol, String, nil) (defaults to: nil)

    optional kind filter

  • from (Date, Time, nil) (defaults to: nil)

    captured after this date

  • to (Date, Time, nil) (defaults to: nil)

    captured before this date

  • limit (Integer) (defaults to: 20)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    matching sources



88
89
90
91
92
93
94
# File 'lib/fact_db/services/source_service.rb', line 88

def search(query, kind: nil, from: nil, to: nil, limit: 20)
  scope = Models::Source.search_text(query)
  scope = scope.by_kind(kind) if kind
  scope = scope.captured_after(from) if from
  scope = scope.captured_before(to) if to
  scope.order(captured_at: :desc).limit(limit)
end

#semantic_search(query, limit: 20) ⇒ ActiveRecord::Relation

Searches sources using semantic similarity (vector search)

Requires an embedding generator to be configured.

Parameters:

  • query (String)

    the search query

  • limit (Integer) (defaults to: 20)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    semantically similar sources



103
104
105
106
107
108
# File 'lib/fact_db/services/source_service.rb', line 103

def semantic_search(query, limit: 20)
  embedding = generate_embedding(query)
  return Models::Source.none unless embedding

  Models::Source.nearest_neighbors(embedding, limit: limit)
end

#statsHash

Returns aggregate statistics about sources

Returns:

  • (Hash)

    statistics including counts by kind and date range



141
142
143
144
145
146
147
148
149
150
# File 'lib/fact_db/services/source_service.rb', line 141

def stats
  {
    total: Models::Source.count,
    total_count: Models::Source.count,
    by_kind: Models::Source.group(:kind).count,
    earliest: Models::Source.minimum(:captured_at),
    latest: Models::Source.maximum(:captured_at),
    total_words: Models::Source.sum("array_length(regexp_split_to_array(content, '\\s+'), 1)")
  }
end