Skip to content

Observability & Safety

Facilities that help you monitor, control, improve, and scale robot behaviour:

  • Token & Cost Tracking — measure LLM usage per run and cumulatively
  • Tool Loop Circuit Breaker — guard against runaway tool call loops
  • Doom Loop Detection — catch cyclic or repetitive tool-call patterns before they spiral
  • Automatic Context Compaction — prevent context overflow with configurable auto-compression
  • Learning Accumulation — build up cross-run observations that guide future runs
  • Context Window Compression — prune irrelevant history to stay within token budgets
  • Convergence Detection — detect when independent agents reach the same conclusion
  • Structured Delegation — synchronous inter-robot calls with duration and token metadata

Token & Cost Tracking

Per-Run Counts

Every robot.run() returns a RobotResult that carries the token usage for that call:

robot = RobotLab.build(
  name: "analyst",
  system_prompt: "You are a concise technical analyst.",
  model: "claude-haiku-4-5-20251001"
)

result = robot.run("What is the difference between a stack and a queue?")

puts result.input_tokens   # tokens sent to the model this run
puts result.output_tokens  # tokens generated this run
puts result.input_tokens + result.output_tokens  # total for this call

Token counts are 0 for providers that do not report usage data.

Cumulative Totals

The robot accumulates totals across all run() calls:

3.times { |i| robot.run("Question #{i + 1}") }

puts robot.total_input_tokens   # sum across all three runs
puts robot.total_output_tokens

Cost Estimation

Use per-provider pricing constants to estimate cost:

HAIKU_INPUT_CPM  = 0.80   # $ per 1M input tokens
HAIKU_OUTPUT_CPM = 4.00   # $ per 1M output tokens

def run_cost(input, output)
  (input * HAIKU_INPUT_CPM + output * HAIKU_OUTPUT_CPM) / 1_000_000.0
end

result = robot.run("Explain memoization.")
puts "$#{"%.5f" % run_cost(result.input_tokens, result.output_tokens)}"

Batch Accounting with reset_token_totals

reset_token_totals clears the accounting counters without touching the chat history. Use it to isolate the cost of a specific task batch:

# Batch 1
prompts_batch_1.each { |p| robot.run(p) }
puts "Batch 1 cost: $#{"%.4f" % run_cost(robot.total_input_tokens, robot.total_output_tokens)}"

robot.reset_token_totals   # start fresh accounting

# Batch 2 — totals start at zero, but chat history is still intact
prompts_batch_2.each { |p| robot.run(p) }
puts "Batch 2 cost: $#{"%.4f" % run_cost(robot.total_input_tokens, robot.total_output_tokens)}"

Important: Because the chat history keeps growing after a reset, the next run's input_tokens will be larger than the first batch's runs. This is expected — it is the real cost of sending the full accumulated context to the API. The counter reset tracks accounting, not context size.

For a truly fresh context and fresh counters, build a new robot:

fresh = RobotLab.build(
  name: "analyst",
  system_prompt: "You are a concise technical analyst."
)
result = fresh.run("Explain memoization.")
puts result.input_tokens  # smallest possible — no prior history

Tool Loop Circuit Breaker

The Problem

When a tool always instructs the LLM to call it again (e.g., a step-processor returning "more steps remain"), the robot loops indefinitely. Without a guard this consumes tokens, API quota, and time without bound.

max_tool_rounds

Set max_tool_rounds: on the robot to cap how many tool calls can happen in a single run(). When the limit is exceeded, RobotLab::ToolLoopError is raised.

robot = RobotLab.build(
  name: "runner",
  system_prompt: "Execute every step sequentially.",
  local_tools: [StepTool],
  max_tool_rounds: 10
)

begin
  robot.run("Run all steps.")
rescue RobotLab::ToolLoopError => e
  puts "Circuit breaker fired: #{e.message}"
  # => "Circuit breaker fired: Tool call limit of 10 exceeded"
end

max_tool_rounds can also be supplied via RunConfig:

config = RobotLab::RunConfig.new(max_tool_rounds: 10)
robot = RobotLab.build(name: "runner", system_prompt: "...", config: config)

Recovering After ToolLoopError

After a ToolLoopError the chat contains a dangling tool_use block with no matching tool_result. Anthropic and most other providers will reject any subsequent request with that broken history:

Error: tool_use ids were found without tool_result blocks immediately after

Call clear_messages to flush the corrupted history before reusing the robot. The system prompt and all configuration (tools, max_tool_rounds, etc.) are preserved:

rescue RobotLab::ToolLoopError => e
  puts "Breaker fired: #{e.message}"
end

robot.clear_messages
# Robot is healthy — config unchanged
puts robot.config.max_tool_rounds  # still 10

result = robot.run("Start fresh with a simple question.")

Normal Tool Use Is Unaffected

max_tool_rounds is a safety net, not a tax. A robot that calls a tool once and terminates works identically with or without the guard:

unguarded = RobotLab.build(
  name: "calculator",
  system_prompt: "Use the provided tool to answer questions.",
  local_tools: [DoubleTool]
)
result = unguarded.run("Double the number 21 using the tool.")
puts result.reply  # "The result is 42."

Doom Loop Detection

The Problem

max_tool_rounds stops a robot that loops forever, but it fires on quantity alone. A subtler failure is when a robot cycles through the same tool call sequence repeatedly — calling tool A, then B, then C, then A again — without hitting the round limit. This is a doom loop: the robot is working but not making progress.

doom_loop_threshold

robot = RobotLab.build(
  name: "runner",
  system_prompt: "Execute all steps.",
  local_tools: [StepTool],
  doom_loop_threshold: 3
)

Set doom_loop_threshold: to the number of repetitions after which the detector fires. It catches two patterns:

  • Consecutive repetition[A, A, A] (same tool called N times in a row)
  • Cyclic repetition[A, B, C, A, B, C, A, B, C] (same sequence repeated N times)

When a doom loop is detected, a warning message is embedded directly in the tool result, prompting the LLM to try a fundamentally different approach. This avoids corrupting the Anthropic message format (no injected user messages between tool_use/tool_result pairs).

doom_loop_threshold can also be supplied via RunConfig:

config = RobotLab::RunConfig.new(doom_loop_threshold: 3)
robot  = RobotLab.build(name: "runner", system_prompt: "...", config: config)

Complementary to max_tool_rounds

Use both together for comprehensive loop protection:

robot = RobotLab.build(
  name: "executor",
  system_prompt: "Execute every step.",
  local_tools: [StepTool],
  max_tool_rounds:    20,   # hard ceiling on total tool calls
  doom_loop_threshold: 3    # catches repetitive patterns early
)

Automatic Context Compaction

The Problem

Long-running robots accumulate conversation history. Eventually, the cumulative token count approaches the model's context window limit, causing API errors or degraded performance. Manually calling compress_history at the right moment requires application-level bookkeeping.

auto_compact

Set auto_compact: to have the robot compress its history automatically before each run():

# Compact when estimated token usage exceeds 80% of the model's context window
robot = RobotLab.build(
  name: "analyst",
  system_prompt: "You are a research analyst.",
  auto_compact: :context_window
)

Tuning the Threshold

compact_threshold: sets the fraction of the model's context window that triggers compaction. Defaults to 0.80 (80%):

robot = RobotLab.build(
  name: "analyst",
  system_prompt: "You are a research analyst.",
  auto_compact:      :context_window,
  compact_threshold: 0.70   # compact earlier, at 70%
)

Application-Owned Compaction

Pass a Proc to take full control — the proc decides both when and how to compact:

robot = RobotLab.build(
  name: "analyst",
  system_prompt: "You are a research analyst.",
  auto_compact: ->(r) {
    r.compress_history(recent_turns: 5) if r.chat.messages.size > 40
  }
)

The proc receives the robot instance and is called before every run() when messages are non-empty.

Options

Value Behaviour
nil / :none No automatic compaction (default)
:context_window Compact when estimated token usage exceeds compact_threshold fraction of model's context window
Proc Called with the robot; application decides when and how to compact

Via RunConfig:

config = RobotLab::RunConfig.new(auto_compact: :context_window, compact_threshold: 0.75)
robot  = RobotLab.build(name: "analyst", system_prompt: "...", config: config)

Requires the classifier gem (~> 2.3) when using :context_window. Without it, a RobotLab::DependencyError is caught and logged rather than raised, so the robot continues running uncompressed.


Learning Accumulation

The Problem

A robot's inherent memory persists key-value data, but there is no built-in way to tell the LLM "here is what I've learned from previous interactions." Learning accumulation fills that gap.

robot.learn

robot.learn(text)

Records text as an observation. On every subsequent run(), active learnings are automatically prepended to the user message:

LEARNINGS FROM PREVIOUS RUNS:
- This codebase prefers map/collect over manual array accumulation
- Explicit nil comparisons appear frequently here

<original user message>

This gives the LLM access to prior context without requiring a persistent conversation history.

Bidirectional Deduplication

Learnings deduplicate bidirectionally:

  • If the new text is already contained in an existing learning, it is dropped.
  • If an existing learning is contained in the new text (the new one is broader), the narrower one is replaced.
robot.learn("avoid using puts")
robot.learn("avoid using puts and p in production code")

robot.learnings.size  # => 1 — broader learning replaced the narrower one
robot.learnings.first # => "avoid using puts and p in production code"

Accumulated Learnings

robot.learnings  # => Array<String>

Returns the current list of active learnings in insertion order.

Full Example

reviewer = RobotLab.build(
  name: "reviewer",
  system_prompt: <<~PROMPT
    You are a concise Ruby code reviewer.
    Identify the main issue in one sentence and show the fix.
  PROMPT
)

snippets = [snippet_a, snippet_b, snippet_c]
insights = [
  "This codebase prefers map/collect over manual accumulation",
  "Explicit nil comparisons appear frequently",
  "Cart logic tends to have missing edge cases around nil discounts"
]

snippets.each_with_index do |code, i|
  result = reviewer.run("Review this snippet:\n\n#{code}")
  puts result.reply

  reviewer.learn(insights[i])
  puts "Added learning ##{reviewer.learnings.size}"
end

After all three runs, reviewer.learnings contains up to three insights (fewer if any are subsets of others).

Durable Learning (learn: Constructor Shorthand)

The robot_lab-durable gem adds automatic end-of-session learning promotion. Enable it with learn: true in the constructor:

reviewer = RobotLab.build(
  name:         "reviewer",
  system_prompt: "You are a Ruby code reviewer.",
  learn:        true,
  learn_domain: "ruby_review"   # optional namespace for the durable store
)

At the end of each session, the robot reflects on its observations and promotes durable insights to a YAML-backed store that persists across process restarts. On the next run, those stored insights are automatically reloaded as learnings.

Memory Persistence

Learnings are stored in memory[:learnings]. They survive a robot rebuild when the same Memory object is passed to the new robot:

shared_memory = original_robot.memory

rebuilt = RobotLab.build(
  name: "reviewer",
  system_prompt: "You review code."
)
rebuilt.instance_variable_set(:@memory, shared_memory)
persisted = shared_memory.get(:learnings)
rebuilt.instance_variable_set(:@learnings, Array(persisted))

puts rebuilt.learnings.size  # same as original_robot.learnings.size

Context Window Compression

The Problem

Long conversations accumulate turns that are no longer relevant to the current topic. Sending all of them to the LLM on every run() wastes tokens and money, and risks exceeding the model's context window.

robot.compress_history

robot.compress_history(
  recent_turns:    3,      # last N user+assistant pairs — always protected
  keep_threshold:  0.6,    # score >= this → keep verbatim
  drop_threshold:  0.2,    # score < this  → drop
  summarizer:      nil     # optional lambda(text) -> String for medium tier
)

Internally, each old turn is scored against the mean of the recent turns using stemmed term-frequency cosine similarity (via the classifier gem). Turns that score high are kept; turns that score low are dropped; turns in the middle band are either summarized or dropped depending on whether a summarizer is provided.

Always preserved regardless of score:

  • System messages
  • Tool call/result message pairs
  • All messages within the recent_turns window

Thresholds

score >= keep_threshold   →  keep verbatim
score <  drop_threshold   →  drop
otherwise                 →  summarize (if summarizer given) or drop

A good starting point: keep_threshold: 0.6, drop_threshold: 0.2. Widen the drop band (raise drop_threshold) to compress more aggressively; raise keep_threshold to summarize more.

Without a Summarizer (Drop Mode)

robot.compress_history(recent_turns: 3, keep_threshold: 0.6, drop_threshold: 0.2)

Medium-relevance turns are dropped along with low-relevance ones. This is the simplest form — no extra LLM calls, no added latency.

With an LLM Summarizer

summarizer_bot = RobotLab.build(
  name:          "summarizer",
  system_prompt: "Summarize the following text in one sentence."
)

robot.compress_history(
  recent_turns:    3,
  keep_threshold:  0.6,
  drop_threshold:  0.2,
  summarizer:      ->(text) { summarizer_bot.run("Summarize: #{text}").reply }
)

The summarizer replaces each medium-relevance turn with a one-sentence digest, preserving some context while reducing token count. The summary inherits the original message's role so the user/assistant alternation required by LLM APIs is maintained.

Optional Dependency

compress_history requires the classifier gem. Add it to your Gemfile:

gem "classifier", "~> 2.3"

Without it, calling compress_history raises RobotLab::DependencyError with an install hint.


Convergence Detection

The Problem

Multi-robot verification patterns (two independent reviewers, a debate network, a fact-checker) typically ask a reconciler robot to resolve any differences. But when both verifiers already agree, paying for that reconciler call is pure waste.

RobotLab::Convergence

score = RobotLab::Convergence.similarity(text_a, text_b)  # Float 0.0..1.0
agreed = RobotLab::Convergence.detected?(text_a, text_b)  # Boolean (threshold: 0.85)
agreed = RobotLab::Convergence.detected?(text_a, text_b, threshold: 0.6)

Similarity is computed via L2-normalized stemmed term-frequency cosine similarity. Term frequencies (not TF-IDF) are used because fitting TF-IDF on a 2-document corpus suppresses shared terms to near-zero IDF, giving counter-intuitively low scores for texts that agree on the same topic.

Texts shorter than 30 characters always return 0.0.

Typical Scores

Relationship Typical Score
Identical 1.000
Same conclusion, different phrasing 0.60 – 0.75
Same topic, different emphasis 0.45 – 0.60
Unrelated < 0.15

Router Fast-Path Pattern

Skip the reconciler when verifiers agree:

router = ->(args) do
  a = args.context[:verifier_a]&.reply.to_s
  b = args.context[:verifier_b]&.reply.to_s

  if RobotLab::Convergence.detected?(a, b)
    nil                  # both agree — network halts, no reconciler call
  else
    ["reconciler"]       # diverged — send to reconciler
  end
end

network = RobotLab.create_network(
  name:   "fact_check",
  robots: [verifier_a, verifier_b, reconciler],
  router: router
)

Tune threshold: to control how strictly "agreement" is defined. A lower threshold (e.g., 0.6) accepts more variation between verifiers; a higher threshold (e.g., 0.9) only fast-paths near-identical responses.

Optional Dependency

RobotLab::Convergence requires the classifier gem (same as compress_history):

gem "classifier", "~> 2.3"


Structured Delegation

The Problem

RobotLab has two existing patterns for one robot to involve another:

  • Pipelines — predefined sequences where robots share memory and run in order
  • Bus messaging — fire-and-forget pub/sub with no return value

Neither gives you a synchronous call that returns a result with provenance and cost metadata. delegate fills that gap.

Synchronous delegation

Blocks until the delegatee finishes and returns a RobotResult annotated with provenance and timing:

result = manager.delegate(to: specialist, task: "Analyze this data: ...")

puts result.reply          # specialist's answer
puts result.robot_name     # => "specialist"   (who did the work)
puts result.delegated_by   # => "manager"      (who asked)
puts result.duration       # => 1.43           (wall-clock seconds)
puts result.input_tokens   # => 812
puts result.output_tokens  # => 94

All keyword arguments are forwarded to the delegatee's run():

result = manager.delegate(to: worker, task: "hello", company_name: "Acme")

Asynchronous delegation — parallel fan-out

Pass async: true to get a DelegationFuture back immediately. The delegatee runs in a background thread. Call future.value to block for the result, or future.resolved? to poll without blocking.

# Fire both delegations simultaneously
f1 = manager.delegate(to: summarizer, task: "Summarize: #{doc}", async: true)
f2 = manager.delegate(to: analyst,    task: "Key metric: #{doc}", async: true)

# Both are running in parallel here
puts f1.resolved?   # false (probably)

# Collect when ready (optional timeout in seconds)
summary  = f1.value(timeout: 30)
analysis = f2.value(timeout: 30)

If the delegatee raises an error, future.value re-raises it. If timeout: expires before the result arrives, DelegationFuture::DelegationTimeout is raised.

When to Use Each Pattern

Pattern Return value Concurrent Use when
pipeline shared memory yes (parallel groups) fixed workflow graph
bus messaging none (fire-and-forget) yes notify without waiting for a reply
delegate RobotResult with metadata no need the result back, one at a time
delegate(async: true) DelegationFuture yes parallel fan-out, collect results later

Full Example

manager    = RobotLab.build(name: "manager",    system_prompt: "You are a project manager.")
summarizer = RobotLab.build(name: "summarizer", system_prompt: "Summarize in 1-2 sentences.")
analyst    = RobotLab.build(name: "analyst",    system_prompt: "Identify the key metric.")

# Parallel fan-out
f1 = manager.delegate(to: summarizer, task: "Summarize: #{document}", async: true)
f2 = manager.delegate(to: analyst,    task: "Key metric: #{document}", async: true)

summary  = f1.value(timeout: 60)
analysis = f2.value(timeout: 60)

puts "#{summary.robot_name} (#{summary.duration.round(2)}s): #{summary.reply}"
puts "#{analysis.robot_name} (#{analysis.duration.round(2)}s): #{analysis.reply}"

See Also