Class: FactDb::Services::EntityService

Inherits:
Object
  • Object
show all
Defined in:
lib/fact_db/services/entity_service.rb

Overview

Service class for managing entities in the database

Provides methods for creating, searching, and managing entities including name resolution, alias management, and duplicate detection.

Examples:

Basic usage

service = EntityService.new
entity = service.create("John Smith", kind: :person)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(config = FactDb.config) ⇒ EntityService

Initializes a new EntityService instance

Parameters:

  • config (FactDb::Config) (defaults to: FactDb.config)

    configuration object (defaults to FactDb.config)



24
25
26
27
# File 'lib/fact_db/services/entity_service.rb', line 24

def initialize(config = FactDb.config)
  @config = config
  @resolver = Resolution::EntityResolver.new(config)
end

Instance Attribute Details

#configFactDb::Config (readonly)

Returns the configuration object.

Returns:



16
17
18
# File 'lib/fact_db/services/entity_service.rb', line 16

def config
  @config
end

#resolverFactDb::Resolution::EntityResolver (readonly)

Returns the entity resolver instance.

Returns:



19
20
21
# File 'lib/fact_db/services/entity_service.rb', line 19

def resolver
  @resolver
end

Instance Method Details

#add_alias(entity_id, alias_name, kind: nil, confidence: 1.0) ⇒ FactDb::Models::EntityAlias

Adds an alias to an entity

Parameters:

  • entity_id (Integer)

    the entity ID

  • alias_name (String)

    the alias text

  • kind (String, nil) (defaults to: nil)

    alias kind

  • confidence (Float) (defaults to: 1.0)

    confidence score

Returns:



141
142
143
144
# File 'lib/fact_db/services/entity_service.rb', line 141

def add_alias(entity_id, alias_name, kind: nil, confidence: 1.0)
  entity = Models::Entity.find(entity_id)
  entity.add_alias(alias_name, kind: kind, confidence: confidence)
end

#auto_merge_duplicates!void

This method returns an undefined value.

Automatically merges high-confidence duplicates



281
282
283
# File 'lib/fact_db/services/entity_service.rb', line 281

def auto_merge_duplicates!
  @resolver.auto_merge_duplicates!
end

#by_kind(kind) ⇒ ActiveRecord::Relation

Returns entities of a specific kind

Parameters:

  • kind (Symbol, String)

    the entity kind

Returns:

  • (ActiveRecord::Relation)

    entities of that kind



242
243
244
# File 'lib/fact_db/services/entity_service.rb', line 242

def by_kind(kind)
  Models::Entity.by_kind(kind).not_merged.order(:name)
end

#create(name, kind:, aliases: [], attributes: {}, description: nil) ⇒ FactDb::Models::Entity

Creates a new entity in the database

Parameters:

  • name (String)

    the canonical name

  • kind (Symbol, String)

    entity kind (:person, :organization, etc.)

  • aliases (Array<String>) (defaults to: [])

    alternative names

  • attributes (Hash) (defaults to: {})

    additional metadata attributes

  • description (String, nil) (defaults to: nil)

    entity description

Returns:



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/fact_db/services/entity_service.rb', line 37

def create(name, kind:, aliases: [], attributes: {}, description: nil)
  embedding = generate_embedding(name)

  entity = Models::Entity.create!(
    name: name,
    kind: kind.to_s,
    description: description,
    metadata: attributes,
    resolution_status: "resolved",
    embedding: embedding
  )

  aliases.each do |alias_text|
    entity.add_alias(alias_text)
  end

  entity
end

#facts_about(entity_id, at: nil, status: :canonical) ⇒ ActiveRecord::Relation

Returns facts about an entity

Parameters:

  • entity_id (Integer)

    the entity ID

  • at (Date, Time, nil) (defaults to: nil)

    optional point in time

  • status (Symbol) (defaults to: :canonical)

    fact status filter

Returns:

  • (ActiveRecord::Relation)

    facts mentioning the entity



252
253
254
255
256
257
258
# File 'lib/fact_db/services/entity_service.rb', line 252

def facts_about(entity_id, at: nil, status: :canonical)
  Temporal::Query.new.execute(
    entity_id: entity_id,
    at: at,
    status: status
  )
end

#find(id) ⇒ FactDb::Models::Entity

Finds an entity by ID

Parameters:

  • id (Integer)

    the entity ID

Returns:

Raises:

  • (ActiveRecord::RecordNotFound)

    if entity not found



61
62
63
# File 'lib/fact_db/services/entity_service.rb', line 61

def find(id)
  Models::Entity.find(id)
end

#find_by_name(name, kind: nil) ⇒ FactDb::Models::Entity?

Finds an entity by exact name match

Parameters:

  • name (String)

    the entity name (case-insensitive)

  • kind (Symbol, String, nil) (defaults to: nil)

    optional kind filter

Returns:



70
71
72
73
74
# File 'lib/fact_db/services/entity_service.rb', line 70

def find_by_name(name, kind: nil)
  scope = Models::Entity.where(["LOWER(name) = ?", name.downcase])
  scope = scope.where(kind: kind) if kind
  scope.not_merged.first
end

#find_duplicates(threshold: nil) ⇒ Array<Hash>

Finds potential duplicate entities

Parameters:

  • threshold (Float, nil) (defaults to: nil)

    minimum similarity score

Returns:

  • (Array<Hash>)

    array of potential duplicates



274
275
276
# File 'lib/fact_db/services/entity_service.rb', line 274

def find_duplicates(threshold: nil)
  @resolver.find_duplicates(threshold: threshold)
end

#fuzzy_search(query, kind: nil, threshold: 0.3, limit: 20) ⇒ Array<FactDb::Models::Entity>

Searches entities using PostgreSQL trigram similarity (handles typos)

Requires pg_trgm extension. Falls back to LIKE search if unavailable.

Parameters:

  • query (String)

    search term (minimum 3 characters)

  • kind (Symbol, String, nil) (defaults to: nil)

    optional kind filter

  • threshold (Float) (defaults to: 0.3)

    minimum similarity score (0.0-1.0)

  • limit (Integer) (defaults to: 20)

    maximum number of results

Returns:



192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'lib/fact_db/services/entity_service.rb', line 192

def fuzzy_search(query, kind: nil, threshold: 0.3, limit: 20)
  return [] if query.to_s.strip.length < 3

  sql = <<~SQL
    SELECT DISTINCT e.id,
           GREATEST(
             similarity(LOWER(e.name), LOWER(?)),
             COALESCE(MAX(similarity(LOWER(a.name), LOWER(?))), 0)
           ) as sim_score
    FROM fact_db_entities e
    LEFT JOIN fact_db_entity_aliases a ON a.entity_id = e.id
    WHERE e.resolution_status != 'merged'
      AND (
        similarity(LOWER(e.name), LOWER(?)) > ?
        OR similarity(LOWER(a.name), LOWER(?)) > ?
      )
    GROUP BY e.id
    ORDER BY sim_score DESC
    LIMIT ?
  SQL

  sanitized = ActiveRecord::Base.sanitize_sql(
    [sql, query, query, query, threshold, query, threshold, limit]
  )

  results = ActiveRecord::Base.connection.execute(sanitized)
  entity_ids = results.map { |r| r["id"] }

  return [] if entity_ids.empty?

  # Preserve ordering by fetching in order
  entities_by_id = Models::Entity.where(id: entity_ids).index_by(&:id)
  ordered_entities = entity_ids.map { |id| entities_by_id[id] }.compact

  # Apply kind filter if specified
  if kind
    ordered_entities = ordered_entities.select { |e| e.kind == kind.to_s }
  end

  ordered_entities
rescue ActiveRecord::StatementInvalid => e
  # pg_trgm extension not available, fall back to LIKE search
  config.logger&.warn("Fuzzy search unavailable (pg_trgm not installed): #{e.message}")
  search(query, kind: kind, limit: limit).to_a
end

#merge(keep_id, merge_id) ⇒ FactDb::Models::Entity

Merges two entities, keeping one as canonical

Parameters:

  • keep_id (Integer)

    ID of the entity to keep

  • merge_id (Integer)

    ID of the entity to merge

Returns:



130
131
132
# File 'lib/fact_db/services/entity_service.rb', line 130

def merge(keep_id, merge_id)
  @resolver.merge(keep_id, merge_id)
end

#relationship_typesArray<Symbol>

Returns all relationship types used in the database

Returns:

  • (Array<Symbol>)

    relationship types (mention roles)



302
303
304
# File 'lib/fact_db/services/entity_service.rb', line 302

def relationship_types
  Models::EntityMention.distinct.pluck(:mention_role).compact.map(&:to_sym)
end

#relationship_types_for(entity_id) ⇒ Array<Symbol>

Returns relationship types for a specific entity

Parameters:

  • entity_id (Integer)

    Entity ID

Returns:

  • (Array<Symbol>)

    Relationship types for this entity



310
311
312
313
314
315
316
317
# File 'lib/fact_db/services/entity_service.rb', line 310

def relationship_types_for(entity_id)
  Models::EntityMention
    .where(entity_id: entity_id)
    .distinct
    .pluck(:mention_role)
    .compact
    .map(&:to_sym)
end

#resolve(name, kind: nil) ⇒ FactDb::Resolution::ResolvedEntity?

Resolves a name to an existing entity

Uses exact alias matching, canonical name matching, and fuzzy matching.

Parameters:

  • name (String)

    the name to resolve

  • kind (Symbol, nil) (defaults to: nil)

    optional kind filter

Returns:



83
84
85
# File 'lib/fact_db/services/entity_service.rb', line 83

def resolve(name, kind: nil)
  @resolver.resolve(name, kind: kind)
end

#resolve_or_create(name, kind:, aliases: [], attributes: {}, description: nil) ⇒ FactDb::Models::Entity

Resolves a name to an entity, creating one if not found

Also checks if any provided aliases match existing entities.

Parameters:

  • name (String)

    the name to resolve or create

  • kind (Symbol, String)

    entity kind (required for creation)

  • aliases (Array<String>) (defaults to: [])

    additional aliases

  • attributes (Hash) (defaults to: {})

    additional attributes for new entity

  • description (String, nil) (defaults to: nil)

    entity description

Returns:



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/fact_db/services/entity_service.rb', line 97

def resolve_or_create(name, kind:, aliases: [], attributes: {}, description: nil)
  # First, try to resolve the canonical name
  resolved = @resolver.resolve(name, kind: kind)
  if resolved
    # Add any new aliases to the resolved entity
    add_new_aliases(resolved.entity, aliases)
    return resolved.entity
  end

  # Check if any of the provided aliases match an existing entity
  # This handles cases like: name="Lord", aliases=["Jesus"] where "Jesus" already exists
  aliases.each do |alias_text|
    next if alias_text.to_s.strip.empty?

    resolved_by_alias = @resolver.resolve(alias_text.to_s.strip, kind: kind)
    if resolved_by_alias
      entity = resolved_by_alias.entity
      # Add the new canonical name as an alias to the existing entity
      entity.add_alias(name) unless entity.name.downcase == name.downcase
      # Add all the other aliases too
      add_new_aliases(entity, aliases)
      return entity
    end
  end

  create(name, kind: kind, aliases: aliases, attributes: attributes, description: description)
end

#search(query, kind: nil, limit: 20) ⇒ ActiveRecord::Relation

Searches entities by name or alias using LIKE pattern matching

Parameters:

  • query (String)

    the search query

  • kind (Symbol, String, nil) (defaults to: nil)

    optional kind filter

  • limit (Integer) (defaults to: 20)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    matching entities



152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/fact_db/services/entity_service.rb', line 152

def search(query, kind: nil, limit: 20)
  scope = Models::Entity.not_merged

  # Search canonical names and aliases
  scope = scope.left_joins(:aliases).where(
    "LOWER(fact_db_entities.name) LIKE ? OR LOWER(fact_db_entity_aliases.name) LIKE ?",
    "%#{query.downcase}%",
    "%#{query.downcase}%"
  ).distinct

  scope = scope.where(kind: kind) if kind
  scope.limit(limit)
end

#semantic_search(query, kind: nil, limit: 20) ⇒ ActiveRecord::Relation

Searches entities using semantic similarity (vector search)

Requires an embedding generator to be configured.

Parameters:

  • query (String)

    the search query

  • kind (Symbol, String, nil) (defaults to: nil)

    optional kind filter

  • limit (Integer) (defaults to: 20)

    maximum number of results

Returns:

  • (ActiveRecord::Relation)

    semantically similar entities



174
175
176
177
178
179
180
181
# File 'lib/fact_db/services/entity_service.rb', line 174

def semantic_search(query, kind: nil, limit: 20)
  embedding = generate_embedding(query)
  return Models::Entity.none unless embedding

  scope = Models::Entity.not_merged.nearest_neighbors(embedding, limit: limit)
  scope = scope.where(kind: kind) if kind
  scope
end

#statsHash

Returns aggregate statistics about entities

Returns:

  • (Hash)

    statistics including counts by kind and status



288
289
290
291
292
293
294
295
296
297
# File 'lib/fact_db/services/entity_service.rb', line 288

def stats
  {
    total: Models::Entity.not_merged.count,
    total_count: Models::Entity.not_merged.count,
    by_kind: Models::Entity.not_merged.group(:kind).count,
    by_status: Models::Entity.group(:resolution_status).count,
    merged_count: Models::Entity.where(resolution_status: "merged").count,
    with_facts: Models::Entity.joins(:entity_mentions).distinct.count
  }
end

#timeline_for(entity_id, from: nil, to: nil) ⇒ FactDb::Temporal::Timeline

Builds a timeline of facts for an entity

Parameters:

  • entity_id (Integer)

    the entity ID

  • from (Date, Time, nil) (defaults to: nil)

    start of timeline range

  • to (Date, Time, nil) (defaults to: nil)

    end of timeline range

Returns:



266
267
268
# File 'lib/fact_db/services/entity_service.rb', line 266

def timeline_for(entity_id, from: nil, to: nil)
  Temporal::Timeline.new.build(entity_id: entity_id, from: from, to: to)
end

#timespan_for(entity_id) ⇒ Hash

Returns the timespan of facts for an entity

Parameters:

  • entity_id (Integer)

    Entity ID

Returns:

  • (Hash)

    Hash with :from and :to dates



323
324
325
326
327
328
329
330
331
332
# File 'lib/fact_db/services/entity_service.rb', line 323

def timespan_for(entity_id)
  facts = Models::Fact
    .joins(:entity_mentions)
    .where(entity_mentions: { entity_id: entity_id })

  {
    from: facts.minimum(:valid_at),
    to: facts.maximum(:valid_at) || Date.today
  }
end