Class: FactDb::Validation::AliasFilter
- Inherits:
-
Object
- Object
- FactDb::Validation::AliasFilter
- Defined in:
- lib/fact_db/validation/alias_filter.rb
Overview
Filters out invalid aliases such as pronouns, common terms, and generic references. Used by extractors, services, and models to ensure alias quality.
Constant Summary collapse
- PRONOUNS =
English pronouns (subject, object, possessive, reflexive)
%w[ i me my mine myself you your yours yourself yourselves he him his himself she her hers herself it its itself we us our ours ourselves they them their theirs themselves who whom whose this that these those what which one ones all any both each either neither none some another other others ].freeze
- GENERIC_TERMS =
Common generic terms that shouldn’t be aliases
%w[ a an the man woman person people men women boy girl child children husband wife brother sister father mother son daughter king queen prince princess lord lady sir madam mr mrs ms miss dr someone something somewhere anyone anything anywhere everyone everything everywhere nobody nothing nowhere here there today yesterday tomorrow now then ].freeze
- GENERIC_ROLES =
Common role/title references that are too generic
%w[ the\ man the\ woman the\ person the\ people a\ man a\ woman a\ person this\ man this\ woman this\ person that\ man that\ woman that\ person the\ king the\ queen the\ lord the\ lady the\ brother the\ sister the\ father the\ mother the\ husband the\ wife the\ boy the\ girl the\ child believers disciples apostles men greek\ men ].freeze
- AMBIGUOUS_FIRST_NAMES =
Common first names that are too ambiguous to use as standalone aliases These should only be valid when part of a fuller name
%w[ simon peter john james paul mark matthew luke andrew philip thomas james joseph mary martha elizabeth sarah anna david michael robert william richard henry george charles edward mary ann jane elizabeth margaret catherine alice ].freeze
Class Method Summary collapse
-
.filter(aliases, name: nil) ⇒ Array<String>
Filter an array of aliases, returning only valid ones.
-
.rejection_reason(text, name: nil) ⇒ String?
Get a human-readable reason why an alias was rejected.
-
.valid?(text, name: nil) ⇒ Boolean
Check if a potential alias is valid.
Class Method Details
.filter(aliases, name: nil) ⇒ Array<String>
Filter an array of aliases, returning only valid ones
89 90 91 92 93 94 95 96 97 |
# File 'lib/fact_db/validation/alias_filter.rb', line 89 def filter(aliases, name: nil) return [] unless aliases.is_a?(Array) aliases .map { |a| a.to_s.strip } .reject { |a| a.empty? } .select { |a| valid?(a, name: name) } .uniq { |a| a.downcase } end |
.rejection_reason(text, name: nil) ⇒ String?
Get a human-readable reason why an alias was rejected
103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/fact_db/validation/alias_filter.rb', line 103 def rejection_reason(text, name: nil) return "empty or nil" if text.nil? || text.to_s.strip.empty? normalized = text.to_s.strip.downcase return "too short (less than 2 characters)" if too_short?(normalized) return "is a pronoun" if pronoun?(normalized) return "is a generic term" if generic_term?(normalized) return "is a generic role reference" if generic_role?(normalized) return "contains only articles and generic words" if only_articles_and_generic?(normalized) return "is an ambiguous standalone first name" if ambiguous_standalone_name?(normalized, name) nil end |
.valid?(text, name: nil) ⇒ Boolean
Check if a potential alias is valid
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/fact_db/validation/alias_filter.rb', line 68 def valid?(text, name: nil) return false if text.nil? normalized = text.to_s.strip.downcase return false if normalized.empty? return false if too_short?(normalized) return false if pronoun?(normalized) return false if generic_term?(normalized) return false if generic_role?(normalized) return false if matches_canonical?(normalized, name) return false if only_articles_and_generic?(normalized) return false if ambiguous_standalone_name?(normalized, name) true end |