File Loading Example¶
This example demonstrates loading markdown files into HTM's long-term memory with automatic chunking, YAML frontmatter extraction, and source tracking.
Source: examples/file_loader_usage.rb
Overview¶
The file loading example shows:
- Loading single markdown files
- Loading directories with glob patterns
- YAML frontmatter extraction
- Querying nodes from loaded files
- Re-sync behavior for changed files
- Unloading files from memory
Running the Example¶
export HTM_DATABASE__URL="postgresql://user@localhost:5432/htm_development"
ruby examples/file_loader_usage.rb
Code Walkthrough¶
Loading a Single File¶
htm = HTM.new(robot_name: "FileLoaderDemo")
# Load a markdown file
result = htm.load_file("docs/guide.md")
# => {
# file_source_id: 1,
# chunks_created: 5,
# chunks_updated: 0,
# skipped: false
# }
YAML Frontmatter¶
Files with frontmatter have metadata extracted automatically:
---
title: PostgreSQL Guide
author: HTM Team
tags:
- database
- postgresql
---
# PostgreSQL Guide
Content starts here...
Access frontmatter via FileSource:
source = HTM::Models::FileSource.find(result[:file_source_id])
source.title # => "PostgreSQL Guide"
source.author # => "HTM Team"
source.frontmatter_tags # => ["database", "postgresql"]
source.frontmatter # => { "title" => "...", ... }
Loading a Directory¶
# Load all markdown files
results = htm.load_directory("docs/", pattern: "**/*.md")
# => [
# { file_path: "docs/guide.md", chunks_created: 3, ... },
# { file_path: "docs/api.md", chunks_created: 5, ... }
# ]
# Load with specific pattern
results = htm.load_directory("docs/guides/", pattern: "*.md")
Querying Loaded Files¶
# Get all nodes from a specific file
nodes = htm.nodes_from_file("docs/guide.md")
nodes.each do |node|
puts "#{node.id}: #{node.content[0..50]}..."
end
Re-Sync Behavior¶
HTM tracks file modification times for efficient updates:
# First load - creates chunks
htm.load_file("docs/guide.md")
# => { skipped: false, chunks_created: 5 }
# Second load - skipped (unchanged)
htm.load_file("docs/guide.md")
# => { skipped: true }
# After editing file - re-syncs
htm.load_file("docs/guide.md")
# => { skipped: false, chunks_updated: 2, chunks_created: 1 }
# Force reload
htm.load_file("docs/guide.md", force: true)
Unloading Files¶
# Soft delete all chunks from a file
count = htm.unload_file("docs/guide.md")
puts "Removed #{count} chunks"
Chunking Configuration¶
HTM.configure do |config|
config.chunk_size = 1024 # Characters per chunk (default)
config.chunk_overlap = 64 # Overlap between chunks (default)
end
Or via environment variables:
Expected Output¶
HTM File Loader Example
============================================================
1. Configuring HTM with Ollama provider...
Configured with Ollama provider
2. Initializing HTM...
Robot: FileLoaderDemo (ID: 1)
3. Creating sample markdown files...
Created: /tmp/htm_demo/postgresql_guide.md
Created: /tmp/htm_demo/ruby_intro.md
4. Loading single file with frontmatter...
File: postgresql_guide.md
Source ID: 1
Chunks created: 3
Frontmatter title: PostgreSQL Guide
Frontmatter author: HTM Team
Frontmatter tags: database, postgresql
5. Loading directory...
Files processed: 2
- postgresql_guide.md: skipped
- ruby_intro.md: 2 chunks
...
============================================================
Example completed successfully!
Rake Tasks¶
# Load a single file
rake 'htm:files:load[docs/guide.md]'
# Load directory
rake 'htm:files:load_dir[docs/]'
rake 'htm:files:load_dir[docs/,**/*.md]'
# List loaded files
rake htm:files:list
# Show file details
rake 'htm:files:info[docs/guide.md]'
# Unload a file
rake 'htm:files:unload[docs/guide.md]'
# Sync all files
rake htm:files:sync
# Show statistics
rake htm:files:stats
# Force reload
FORCE=true rake 'htm:files:load[docs/guide.md]'