Class: FactDb::Resolution::EntityResolver
- Inherits:
-
Object
- Object
- FactDb::Resolution::EntityResolver
- Defined in:
- lib/fact_db/resolution/entity_resolver.rb
Overview
Resolves entity names to canonical entities in the database
Provides entity resolution through exact alias matching, canonical name matching, and fuzzy matching using Levenshtein distance. Also handles entity merging, splitting, and duplicate detection.
Instance Attribute Summary collapse
-
#config ⇒ FactDb::Config
readonly
The configuration object.
Instance Method Summary collapse
-
#auto_merge_duplicates! ⇒ void
Automatically merges high-confidence duplicates.
-
#find_duplicates(threshold: nil) ⇒ Array<Hash>
Finds potential duplicate entities based on name similarity.
-
#initialize(config = FactDb.config) ⇒ EntityResolver
constructor
Initializes a new EntityResolver instance.
-
#merge(keep_id, merge_id) ⇒ FactDb::Models::Entity
Merges two entities, keeping one as canonical.
-
#resolve(name, kind: nil) ⇒ ResolvedEntity?
Resolves a name to an existing entity.
-
#resolve_or_create(name, kind:, aliases: [], attributes: {}) ⇒ FactDb::Models::Entity
Resolves a name to an entity, creating one if not found.
-
#split(entity_id, split_configs) ⇒ Array<FactDb::Models::Entity>
Splits an entity into multiple new entities.
Constructor Details
#initialize(config = FactDb.config) ⇒ EntityResolver
Initializes a new EntityResolver instance
25 26 27 28 29 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 25 def initialize(config = FactDb.config) @config = config @threshold = config.fuzzy_match_threshold @auto_merge_threshold = config.auto_merge_threshold end |
Instance Attribute Details
#config ⇒ FactDb::Config (readonly)
Returns the configuration object.
20 21 22 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 20 def config @config end |
Instance Method Details
#auto_merge_duplicates! ⇒ void
This method returns an undefined value.
Automatically merges high-confidence duplicates
Uses the auto_merge_threshold from config and keeps the entity with more mentions.
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 190 def auto_merge_duplicates! duplicates = find_duplicates(threshold: @auto_merge_threshold) duplicates.each do |dup| next if dup[:entity1].merged? || dup[:entity2].merged? # Keep the entity with more mentions keep, merge_entity = if dup[:entity1].entity_mentions.count >= dup[:entity2].entity_mentions.count [dup[:entity1], dup[:entity2]] else [dup[:entity2], dup[:entity1]] end merge(keep.id, merge_entity.id) end end |
#find_duplicates(threshold: nil) ⇒ Array<Hash>
Finds potential duplicate entities based on name similarity
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 163 def find_duplicates(threshold: nil) threshold ||= @threshold duplicates = [] entities = Models::Entity.resolved.to_a entities.each_with_index do |entity, i| entities[(i + 1)..].each do |other| similarity = calculate_similarity(entity.name, other.name) if similarity >= threshold duplicates << { entity1: entity, entity2: other, similarity: similarity } end end end duplicates.sort_by { |d| -d[:similarity] } end |
#merge(keep_id, merge_id) ⇒ FactDb::Models::Entity
Merges two entities, keeping one as canonical
Transfers all aliases and mentions from the merged entity to the kept entity.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 88 def merge(keep_id, merge_id) keep = Models::Entity.find(keep_id) merge_entity = Models::Entity.find(merge_id) raise ResolutionError, "Cannot merge entity into itself" if keep_id == merge_id raise ResolutionError, "Cannot merge already merged entity" if merge_entity.merged? Models::Entity.transaction do # Move all aliases to kept entity merge_entity.aliases.each do |alias_record| keep.aliases.find_or_create_by!(name: alias_record.name) do |a| a.kind = alias_record.kind a.confidence = alias_record.confidence end end # Add the merged entity's canonical name as an alias keep.aliases.find_or_create_by!(name: merge_entity.name) do |a| a.kind = "name" a.confidence = 1.0 end # Update all entity mentions to point to kept entity Models::EntityMention.where(entity_id: merge_id).update_all(entity_id: keep_id) # Mark merged entity merge_entity.update!( resolution_status: "merged", canonical_id: keep_id ) end keep.reload end |
#resolve(name, kind: nil) ⇒ ResolvedEntity?
Resolves a name to an existing entity
Tries resolution in order: exact alias match, canonical name match, fuzzy match.
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 41 def resolve(name, kind: nil) return nil if name.nil? || name.empty? # 1. Exact alias match exact = find_by_exact_alias(name, kind: kind) return ResolvedEntity.new(exact, confidence: 1.0, match_type: :exact_alias) if exact # 2. Canonical name match canonical = find_by_name(name, kind: kind) return ResolvedEntity.new(canonical, confidence: 1.0, match_type: :name) if canonical # 3. Fuzzy matching fuzzy = find_by_fuzzy_match(name, kind: kind) return fuzzy if fuzzy && fuzzy.confidence >= @threshold # 4. No match found nil end |
#resolve_or_create(name, kind:, aliases: [], attributes: {}) ⇒ FactDb::Models::Entity
Resolves a name to an entity, creating one if not found
70 71 72 73 74 75 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 70 def resolve_or_create(name, kind:, aliases: [], attributes: {}) resolved = resolve(name, kind: kind) return resolved.entity if resolved create_entity(name, kind: kind, aliases: aliases, attributes: attributes) end |
#split(entity_id, split_configs) ⇒ Array<FactDb::Models::Entity>
Splits an entity into multiple new entities
Creates new entities based on the split configuration and marks the original as split.
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/fact_db/resolution/entity_resolver.rb', line 136 def split(entity_id, split_configs) original = Models::Entity.find(entity_id) Models::Entity.transaction do new_entities = split_configs.map do |config| create_entity( config[:name], kind: config[:kind] || original.kind, aliases: config[:aliases] || [], attributes: config[:attributes] || {} ) end original.update!(resolution_status: "split") new_entities end end |