Class: FactDb::Resolution::EntityResolver

Inherits:
Object
  • Object
show all
Defined in:
lib/fact_db/resolution/entity_resolver.rb

Overview

Resolves entity names to canonical entities in the database

Provides entity resolution through exact alias matching, canonical name matching, and fuzzy matching using Levenshtein distance. Also handles entity merging, splitting, and duplicate detection.

Examples:

Basic usage

resolver = EntityResolver.new
resolved = resolver.resolve("John Smith", kind: :person)
if resolved
  puts "Found: #{resolved.entity.name} (confidence: #{resolved.confidence})"
end

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(config = FactDb.config) ⇒ EntityResolver

Initializes a new EntityResolver instance

Parameters:

  • config (FactDb::Config) (defaults to: FactDb.config)

    configuration object (defaults to FactDb.config)



25
26
27
28
29
# File 'lib/fact_db/resolution/entity_resolver.rb', line 25

def initialize(config = FactDb.config)
  @config = config
  @threshold = config.fuzzy_match_threshold
  @auto_merge_threshold = config.auto_merge_threshold
end

Instance Attribute Details

#configFactDb::Config (readonly)

Returns the configuration object.

Returns:



20
21
22
# File 'lib/fact_db/resolution/entity_resolver.rb', line 20

def config
  @config
end

Instance Method Details

#auto_merge_duplicates!void

This method returns an undefined value.

Automatically merges high-confidence duplicates

Uses the auto_merge_threshold from config and keeps the entity with more mentions.



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# File 'lib/fact_db/resolution/entity_resolver.rb', line 190

def auto_merge_duplicates!
  duplicates = find_duplicates(threshold: @auto_merge_threshold)

  duplicates.each do |dup|
    next if dup[:entity1].merged? || dup[:entity2].merged?

    # Keep the entity with more mentions
    keep, merge_entity = if dup[:entity1].entity_mentions.count >= dup[:entity2].entity_mentions.count
                           [dup[:entity1], dup[:entity2]]
                         else
                           [dup[:entity2], dup[:entity1]]
                         end

    merge(keep.id, merge_entity.id)
  end
end

#find_duplicates(threshold: nil) ⇒ Array<Hash>

Finds potential duplicate entities based on name similarity

Examples:

Find duplicates with custom threshold

duplicates = resolver.find_duplicates(threshold: 0.85)
duplicates.each { |d| puts "#{d[:entity1].name} ~ #{d[:entity2].name} (#{d[:similarity]})" }

Parameters:

  • threshold (Float, nil) (defaults to: nil)

    minimum similarity score (defaults to config threshold)

Returns:

  • (Array<Hash>)

    array of hashes with :entity1, :entity2, :similarity keys



163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
# File 'lib/fact_db/resolution/entity_resolver.rb', line 163

def find_duplicates(threshold: nil)
  threshold ||= @threshold
  duplicates = []

  entities = Models::Entity.resolved.to_a

  entities.each_with_index do |entity, i|
    entities[(i + 1)..].each do |other|
      similarity = calculate_similarity(entity.name, other.name)
      if similarity >= threshold
        duplicates << {
          entity1: entity,
          entity2: other,
          similarity: similarity
        }
      end
    end
  end

  duplicates.sort_by { |d| -d[:similarity] }
end

#merge(keep_id, merge_id) ⇒ FactDb::Models::Entity

Merges two entities, keeping one as canonical

Transfers all aliases and mentions from the merged entity to the kept entity.

Examples:

Merge duplicate entities

resolver.merge(primary_entity.id, duplicate_entity.id)

Parameters:

  • keep_id (Integer)

    ID of the entity to keep

  • merge_id (Integer)

    ID of the entity to merge (will be marked as merged)

Returns:

Raises:

  • (ResolutionError)

    if attempting to merge into itself or merge already merged entity



88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/fact_db/resolution/entity_resolver.rb', line 88

def merge(keep_id, merge_id)
  keep = Models::Entity.find(keep_id)
  merge_entity = Models::Entity.find(merge_id)

  raise ResolutionError, "Cannot merge entity into itself" if keep_id == merge_id
  raise ResolutionError, "Cannot merge already merged entity" if merge_entity.merged?

  Models::Entity.transaction do
    # Move all aliases to kept entity
    merge_entity.aliases.each do |alias_record|
      keep.aliases.find_or_create_by!(name: alias_record.name) do |a|
        a.kind = alias_record.kind
        a.confidence = alias_record.confidence
      end
    end

    # Add the merged entity's canonical name as an alias
    keep.aliases.find_or_create_by!(name: merge_entity.name) do |a|
      a.kind = "name"
      a.confidence = 1.0
    end

    # Update all entity mentions to point to kept entity
    Models::EntityMention.where(entity_id: merge_id).update_all(entity_id: keep_id)

    # Mark merged entity
    merge_entity.update!(
      resolution_status: "merged",
      canonical_id: keep_id
    )
  end

  keep.reload
end

#resolve(name, kind: nil) ⇒ ResolvedEntity?

Resolves a name to an existing entity

Tries resolution in order: exact alias match, canonical name match, fuzzy match.

Examples:

Resolve with kind filter

resolver.resolve("Acme", kind: :organization)

Parameters:

  • name (String)

    the name to resolve

  • kind (Symbol, nil) (defaults to: nil)

    optional entity kind filter (:person, :organization, etc.)

Returns:

  • (ResolvedEntity, nil)

    resolved entity with confidence score, or nil if not found



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/fact_db/resolution/entity_resolver.rb', line 41

def resolve(name, kind: nil)
  return nil if name.nil? || name.empty?

  # 1. Exact alias match
  exact = find_by_exact_alias(name, kind: kind)
  return ResolvedEntity.new(exact, confidence: 1.0, match_type: :exact_alias) if exact

  # 2. Canonical name match
  canonical = find_by_name(name, kind: kind)
  return ResolvedEntity.new(canonical, confidence: 1.0, match_type: :name) if canonical

  # 3. Fuzzy matching
  fuzzy = find_by_fuzzy_match(name, kind: kind)
  return fuzzy if fuzzy && fuzzy.confidence >= @threshold

  # 4. No match found
  nil
end

#resolve_or_create(name, kind:, aliases: [], attributes: {}) ⇒ FactDb::Models::Entity

Resolves a name to an entity, creating one if not found

Examples:

Create with aliases

resolver.resolve_or_create("John Smith", kind: :person, aliases: ["J. Smith", "Johnny"])

Parameters:

  • name (String)

    the name to resolve or create

  • kind (Symbol)

    the entity kind (required for creation)

  • aliases (Array<String>) (defaults to: [])

    additional aliases to add

  • attributes (Hash) (defaults to: {})

    additional attributes for new entity

Returns:



70
71
72
73
74
75
# File 'lib/fact_db/resolution/entity_resolver.rb', line 70

def resolve_or_create(name, kind:, aliases: [], attributes: {})
  resolved = resolve(name, kind: kind)
  return resolved.entity if resolved

  create_entity(name, kind: kind, aliases: aliases, attributes: attributes)
end

#split(entity_id, split_configs) ⇒ Array<FactDb::Models::Entity>

Splits an entity into multiple new entities

Creates new entities based on the split configuration and marks the original as split.

Examples:

Split an ambiguous entity

resolver.split(entity.id, [
  { name: "John Smith (Sales)", kind: :person },
  { name: "John Smith (Engineering)", kind: :person }
])

Parameters:

  • entity_id (Integer)

    ID of the entity to split

  • split_configs (Array<Hash>)

    array of hashes with :name, :kind, :aliases, :attributes

Returns:



136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/fact_db/resolution/entity_resolver.rb', line 136

def split(entity_id, split_configs)
  original = Models::Entity.find(entity_id)

  Models::Entity.transaction do
    new_entities = split_configs.map do |config|
      create_entity(
        config[:name],
        kind: config[:kind] || original.kind,
        aliases: config[:aliases] || [],
        attributes: config[:attributes] || {}
      )
    end

    original.update!(resolution_status: "split")

    new_entities
  end
end