EmbeddingService Class¶

Client-side embedding generation service for HTM.

Overview¶

HTM::EmbeddingService generates vector embeddings for text content before database insertion. It uses RubyLLM to support multiple embedding providers:

Ollama - Local embedding server (default for development)
OpenAI - OpenAI's embedding models
Anthropic - For tag extraction (via chat models)
Gemini - Google's embedding models
Azure - Azure OpenAI deployments
Bedrock - AWS Bedrock models
DeepSeek - DeepSeek embeddings

The service also provides token counting for working memory management.

Architecture: - Ruby application generates embeddings via RubyLLM - RubyLLM handles provider-specific API calls - Embeddings are passed to PostgreSQL during INSERT - Simple, reliable, cross-platform operation

Class Definition¶

class HTM::EmbeddingService
  attr_reader :provider, :model, :dimensions
end

Initialization¶

`new(provider, **options)`¶

Create a new embedding service instance.

HTM::EmbeddingService.new(
  provider = :ollama,
  model: 'nomic-embed-text',
  dimensions: nil
)

Parameters¶

Parameter	Type	Default	Description
`provider`	Symbol	`:ollama`	Embedding provider (`:ollama`, `:openai`, `:gemini`, `:azure`, `:bedrock`, `:deepseek`)
`model`	String	Provider-dependent	Model name for the provider
`dimensions`	Integer, nil	Auto-detected	Expected embedding dimensions

Provider-specific configuration is handled via environment variables (see RubyLLM documentation).

Returns¶

HTM::EmbeddingService - Configured embedding service instance

Raises¶

HTM::EmbeddingError - If provider is invalid or configuration fails

Examples¶

Default configuration (uses Ollama):

service = HTM::EmbeddingService.new
# Uses Ollama with nomic-embed-text (768 dimensions)

Ollama with custom model:

service = HTM::EmbeddingService.new(
  :ollama,
  model: 'mxbai-embed-large',
  dimensions: 1024
)

OpenAI configuration:

# Requires OPENAI_API_KEY environment variable
service = HTM::EmbeddingService.new(
  :openai,
  model: 'text-embedding-3-small',
  dimensions: 1536
)

Gemini configuration:

# Requires GEMINI_API_KEY environment variable
service = HTM::EmbeddingService.new(
  :gemini,
  model: 'text-embedding-004',
  dimensions: 768
)

HTM global configuration (recommended):

HTM.configure do |config|
  config.embedding.provider = :openai  # or :ollama, :gemini, etc.
  config.embedding.model = 'text-embedding-3-small'
end

htm = HTM.new(robot_name: "Assistant")
# EmbeddingService configured automatically from global config

Instance Methods¶

`embed(text)`¶

Generate embedding vector for text.

embed(text) → Array<Float>

Parameters¶

Parameter	Type	Description
`text`	String	Text to embed

Returns¶

Array<Float> - Embedding vector (dimensions depend on model)

Raises¶

HTM::EmbeddingError - If embedding generation fails
ArgumentError - If text is nil or empty

Examples¶

service = HTM::EmbeddingService.new(:ollama)

# Generate embedding
embedding = service.embed("PostgreSQL with TimescaleDB")
# => [0.023, -0.441, 0.182, ..., 0.091]  # 768 dimensions

puts embedding.length  # => 768 (for nomic-embed-text)

Error handling:

begin
  embedding = service.embed("some text")
rescue HTM::EmbeddingError => e
  puts "Embedding failed: #{e.message}"
  # For Ollama: Check if running with `curl http://localhost:11434/api/tags`
  # For cloud providers: Check API key is set correctly
end

Implementation Details¶

All providers are handled through RubyLLM, which provides a consistent interface across providers.

Ollama: Local HTTP calls, requires Ollama server running

OpenAI: Cloud API calls, requires OPENAI_API_KEY

Gemini: Cloud API calls, requires GEMINI_API_KEY

Azure: Cloud API calls, requires Azure credentials

Bedrock: AWS API calls, requires AWS credentials

`count_tokens(text)`¶

Count tokens in text for working memory management.

count_tokens(text) → Integer

Parameters¶

Parameter	Type	Description
`text`	String	Text to count tokens for

Returns¶

Integer - Approximate token count

Examples¶

service = HTM::EmbeddingService.new

tokens = service.count_tokens("Hello, world!")
# => 4

tokens = service.count_tokens("The quick brown fox jumps over the lazy dog")
# => 10

Used internally by HTM:

htm.add_message(
  "This is a long conversation message...",
  speaker: "user"
)
# HTM calls embedding_service.count_tokens() internally
# to manage working memory token budget

Embedding Providers¶

HTM uses RubyLLM which supports multiple providers. Choose based on your requirements for privacy, cost, and quality.

Ollama (Default for Development)¶

Status: ✅ Fully implemented

Local embedding server with various models.

Installation:

# macOS/Linux
curl https://ollama.ai/install.sh | sh

# Pull embedding model
ollama pull nomic-embed-text

Models:

Model	Dimensions	Speed	Use Case
`nomic-embed-text`	768	Fast	General-purpose (default)
`mxbai-embed-large`	1024	Medium	Higher quality embeddings
`all-minilm`	384	Very fast	Lower quality, fast search

Configuration:

HTM.configure do |config|
  config.embedding.provider = :ollama
  config.embedding.model = 'nomic-embed-text'
end

Advantages: - ✅ Free (no API costs) - ✅ Private (data never leaves your machine) - ✅ Fast (local generation) - ✅ Works offline

Disadvantages: - ❌ Requires local installation - ❌ Uses local compute resources

OpenAI (Recommended for Production)¶

Status: ✅ Fully implemented

Uses OpenAI's embedding API.

Configuration:

export OPENAI_API_KEY="sk-..."

HTM.configure do |config|
  config.embedding.provider = :openai
  config.embedding.model = 'text-embedding-3-small'
end

Models:

Model	Dimensions	Speed	Cost
`text-embedding-3-small`	1536	Fast	$0.0001/1K tokens
`text-embedding-3-large`	3072	Fast	$0.00013/1K tokens

Advantages: - ✅ High quality embeddings - ✅ No local installation required - ✅ Managed service

Disadvantages: - ❌ API costs - ❌ Requires internet connection - ❌ Data sent to cloud

Other Providers¶

Gemini:

export GEMINI_API_KEY="..."

HTM.configure do |config|
  config.embedding.provider = :gemini
  config.embedding.model = 'text-embedding-004'
end

Azure OpenAI:

export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"

AWS Bedrock:

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

DeepSeek:

export DEEPSEEK_API_KEY="..."

See the RubyLLM documentation for complete provider configuration.

Error Handling¶

Common Errors¶

Ollama not running:

# Error: Failed to connect to Ollama
# Solution: Start Ollama
system("ollama serve")

API key missing (cloud providers):

# Error: API key not set
# Solution: Set environment variable
ENV['OPENAI_API_KEY'] = 'sk-...'     # For OpenAI
ENV['ANTHROPIC_API_KEY'] = 'sk-...'  # For Anthropic
ENV['GEMINI_API_KEY'] = '...'        # For Gemini

Invalid model (Ollama):

# Error: Model not found
# Solution: Pull the model first
system("ollama pull nomic-embed-text")

Exception Types¶

HTM Exception Hierarchy

Performance¶

Latency Benchmarks¶

Based on typical production workloads:

Provider	Model	Latency (P50)	Latency (P95)	Cost per 1K embeds
Ollama	nomic-embed-text	20ms	40ms	Free
Ollama	mxbai-embed-large	30ms	60ms	Free
OpenAI	text-embedding-3-small	40ms	80ms	$0.10
Gemini	text-embedding-004	50ms	90ms	Varies

Factors affecting latency: - Network latency (local providers vs cloud) - Text length (longer text = more tokens = slower) - Model size (larger models = slower) - System load (CPU/GPU utilization)

Optimization Tips¶

Use appropriate model size:

# Fast but lower quality (Ollama)
HTM.configure { |c| c.embedding.model = 'all-minilm' }

# Balanced - Ollama (recommended for development)
HTM.configure { |c| c.embedding.model = 'nomic-embed-text' }

# High quality - OpenAI (recommended for production)
HTM.configure do |c|
  c.embedding.provider = :openai
  c.embedding.model = 'text-embedding-3-small'
end

Batch operations:

# HTM automatically generates embeddings for each message
# No special batching API needed
messages.each do |msg|
  htm.add_message(msg, speaker: "user")
  # Embedding generated for each message
end

Integration with HTM¶

Automatic Initialization¶

HTM initializes EmbeddingService automatically:

HTM.configure do |config|
  config.embedding.provider = :ollama  # or :openai, :gemini, etc.
  config.embedding.model = 'nomic-embed-text'
end

htm = HTM.new(robot_name: "Assistant")
# EmbeddingService is ready to use internally

Embedding Generation Flow¶

sequenceDiagram
    participant App as Application
    participant HTM as HTM
    participant ES as EmbeddingService
    participant LLM as LLM Provider (via RubyLLM)
    participant DB as PostgreSQL

    App->>HTM: add_message(content)
    HTM->>ES: embed(content)
    ES->>LLM: Generate embedding
    LLM->>ES: embedding vector
    ES->>HTM: Array<Float>
    HTM->>DB: INSERT with embedding
    DB->>HTM: node_id
    HTM->>App: node_id

Query Embedding¶

Search queries also generate embeddings:

# User searches for "database performance"
results = htm.recall(
  timeframe: "last week",
  topic: "database performance",
  strategy: :vector
)

# Internally:
# 1. embedding_service.embed("database performance")
# 2. SQL vector search using embedding
# 3. Return similar nodes

Examples¶

Basic Usage¶

require 'htm'

# Create service
service = HTM::EmbeddingService.new(:ollama)

# Generate embedding
text = "PostgreSQL with TimescaleDB handles time-series data efficiently"
embedding = service.embed(text)

puts "Embedding dimensions: #{embedding.length}"
puts "First 5 values: #{embedding[0..4]}"

# Count tokens
tokens = service.count_tokens(text)
puts "Token count: #{tokens}"

Multiple Providers¶

# Configure for development (Ollama)
HTM.configure do |config|
  config.embedding.provider = :ollama
  config.embedding.model = 'nomic-embed-text'
end

# Configure for production (OpenAI)
HTM.configure do |config|
  config.embedding.provider = :openai
  config.embedding.model = 'text-embedding-3-small'
end

# Same interface regardless of provider
embedding = HTM::EmbeddingService.new.embed("test")

Custom Model Dimensions¶

# Specify dimensions explicitly
service = HTM::EmbeddingService.new(
  :ollama,
  model: 'custom-model',
  dimensions: 512
)

embedding = service.embed("text")
# Embedding will be padded/truncated to 512 dimensions

EmbeddingService Class¶

Overview¶

Class Definition¶

Initialization¶

new(provider, **options)¶

Parameters¶

Returns¶

Raises¶

Examples¶

Instance Methods¶

embed(text)¶

Parameters¶

Returns¶

Raises¶

Examples¶

Implementation Details¶

count_tokens(text)¶

Parameters¶

Returns¶

Examples¶

Embedding Providers¶

Ollama (Default for Development)¶

OpenAI (Recommended for Production)¶

Other Providers¶

Error Handling¶

Common Errors¶

Exception Types¶

Performance¶

Latency Benchmarks¶

Optimization Tips¶

Integration with HTM¶

Automatic Initialization¶

Embedding Generation Flow¶

Query Embedding¶

Examples¶

Basic Usage¶

Multiple Providers¶

Custom Model Dimensions¶

See Also¶

`new(provider, **options)`¶

`embed(text)`¶

`count_tokens(text)`¶