Choosing a Concurrency Model¶

SimpleFlow supports two different approaches for parallel execution: Ruby threads and the async gem (fiber-based). This guide helps you choose the right one for your use case.

Overview¶

You can control which concurrency model a pipeline uses in two ways:

1. Automatic Detection (Default)¶

When you create a pipeline without specifying concurrency:

pipeline = SimpleFlow::Pipeline.new do
  # steps...
end

SimpleFlow automatically uses the best available model: - Without async gem: Uses Ruby's built-in threads - With async gem: Uses fiber-based concurrency

2. Explicit Concurrency Selection¶

You can explicitly choose the concurrency model per pipeline:

# Force threads (even if async gem is available)
pipeline = SimpleFlow::Pipeline.new(concurrency: :threads) do
  # steps...
end

# Force async (raises error if async gem not available)
pipeline = SimpleFlow::Pipeline.new(concurrency: :async) do
  # steps...
end

# Auto-detect (default behavior)
pipeline = SimpleFlow::Pipeline.new(concurrency: :auto) do
  # steps...
end

Both provide actual parallel execution - the difference is in how they achieve it and their resource characteristics.

Ruby Threads (Without async gem)¶

How It Works¶

Creates actual OS threads (like having multiple workers)
Each thread runs independently
Ruby's GIL (Global Interpreter Lock) means only one thread runs Ruby code at a time
BUT: When a thread waits for I/O (network, disk, database), other threads can run

Best For¶

Simple use cases: You just want things to run in parallel
Blocking I/O operations:
Making HTTP requests to APIs
Reading/writing files
Database queries
Any "waiting" operations
Mixed libraries: Works with any Ruby gem (doesn't need async support)
Small-to-medium concurrency: 10-100 parallel operations

Resource Usage¶

Each thread uses ~1-2 MB of memory
OS manages thread scheduling
Limited by system resources (maybe 100-1,000 threads max)

Example Scenario¶

# Fetching data from 10 different APIs in parallel
pipeline = SimpleFlow::Pipeline.new do
  step :validate, validator, depends_on: []

  # These 10 API calls run in parallel with threads
  step :api_1, ->(r) { r.with_context(:api_1, fetch_api_1) }, depends_on: [:validate]
  step :api_2, ->(r) { r.with_context(:api_2, fetch_api_2) }, depends_on: [:validate]
  # ... 8 more API calls

  step :merge, merger, depends_on: [:api_1, :api_2, ...]
end

# Each API call takes 500ms, threads let them all wait simultaneously
# Total time: ~500ms instead of 5 seconds
result = pipeline.call_parallel(initial_data)

Async Gem (Fiber-based)¶

How It Works¶

Uses Ruby "fibers" (lightweight green threads)
Cooperative scheduling (fibers yield control when waiting)
Event loop manages thousands of concurrent operations
Requires async-aware libraries (async-http, async-postgres, etc.)

Best For¶

High concurrency: Thousands of simultaneous operations
I/O-heavy applications: Web scrapers, API gateways, chat servers
Long-running services: Background workers processing many jobs
Async-compatible stack: When using async-aware gems

Resource Usage¶

Each fiber uses ~4-8 KB of memory (250x lighter than threads!)
Can handle 10,000+ concurrent operations
More efficient CPU and memory usage

Example Scenario¶

# Web scraper fetching 10,000 product pages
require 'async'
require 'async/http/internet'

pipeline = SimpleFlow::Pipeline.new do
  step :load_urls, url_loader, depends_on: []

  # With async gem, can handle thousands of concurrent requests
  step :fetch_pages, ->(result) {
    urls = result.value[:urls]
    pages = Async::HTTP::Internet.new.get_all(urls)
    result.with_context(:pages, pages).continue(result.value)
  }, depends_on: [:load_urls]

  step :parse_data, parser, depends_on: [:fetch_pages]
end

# With threads: Would crash or be very slow (10,000 threads = 10+ GB RAM)
# With async: Handles it smoothly (10,000 fibers = ~80 MB RAM)
result = pipeline.call_parallel(initial_data)

Decision Guide¶

Use Threads (no async gem) when:¶

✅ You have 10-100 parallel operations ✅ Using standard Ruby gems (not async-compatible) ✅ Making database queries or HTTP requests with traditional libraries ✅ You want simple, straightforward code ✅ Building internal tools or scripts

Example:

# E-commerce checkout: Check inventory, calculate shipping, process payment
# 3-5 parallel operations, standard libraries

# Option 1: Auto-detect (uses threads since no async gem needed)
pipeline = SimpleFlow::Pipeline.new do
  step :validate_order, validator, depends_on: []
  step :check_inventory, inventory_checker, depends_on: [:validate_order]
  step :calculate_shipping, shipping_calculator, depends_on: [:validate_order]
  step :process_payment, payment_processor, depends_on: [:check_inventory, :calculate_shipping]
end

# Option 2: Explicitly use threads (works even if async gem is installed)
pipeline = SimpleFlow::Pipeline.new(concurrency: :threads) do
  step :validate_order, validator, depends_on: []
  step :check_inventory, inventory_checker, depends_on: [:validate_order]
  step :calculate_shipping, shipping_calculator, depends_on: [:validate_order]
  step :process_payment, payment_processor, depends_on: [:check_inventory, :calculate_shipping]
end

result = pipeline.call_parallel(order)  # ✅ Threads work great

Use Async (add async gem) when:¶

✅ You need 1,000+ concurrent operations ✅ Building high-performance web services ✅ Processing large-scale I/O operations (web scraping, bulk APIs) ✅ Using async-compatible libraries (async-http, async-postgres) ✅ Optimizing resource usage (hosting costs, memory limits)

Example:

# Monitoring service checking 5,000 endpoints every minute
# Need low memory footprint and high concurrency

# Gemfile:
gem 'async', '~> 2.0'
gem 'async-http', '~> 0.60'

# Explicitly require async concurrency for this high-volume pipeline
pipeline = SimpleFlow::Pipeline.new(concurrency: :async) do
  step :load_endpoints, endpoint_loader, depends_on: []

  # Async gem allows 5,000 concurrent health checks efficiently
  step :check_all, health_checker, depends_on: [:load_endpoints]

  step :aggregate_results, aggregator, depends_on: [:check_all]
end

result = pipeline.call_parallel(config)  # ✅ Async is essential
# Raises error if async gem not installed

Quick Comparison Table¶

Factor	Ruby Threads	Async Gem
Setup	None (built-in)	`gem 'async'`
Concurrency Limit	~100-1,000	~10,000+
Memory per operation	1-2 MB	4-8 KB
Library compatibility	Any Ruby gem	Needs async-aware gems
Learning curve	Simple	Moderate
Speed (I/O)	Fast	Faster
Speed (CPU)	GIL-limited	GIL-limited (same)
Best use case	Standard apps	High-concurrency services

Real-World Analogy¶

Threads = Hiring separate workers - Each worker has their own desk, phone, computer (more resources) - Can have 50-100 workers before office gets crowded - Workers use regular tools everyone knows - Easy to manage

Async = One worker with a really efficient task list - Worker rapidly switches between tasks when waiting - Can juggle 10,000 tasks because they're mostly waiting anyway - Needs special tools designed for rapid task-switching - More efficient but requires planning

Switching Between Models¶

The beauty of SimpleFlow is that you can switch between concurrency models without changing your pipeline code:

Starting with Threads¶

# Gemfile - no async gem
gem 'simple_flow'

# Your pipeline code
pipeline = SimpleFlow::Pipeline.new do
  step :fetch_user, user_fetcher, depends_on: []
  step :fetch_orders, order_fetcher, depends_on: [:fetch_user]
  step :fetch_products, product_fetcher, depends_on: [:fetch_user]
end

result = pipeline.call_parallel(data)  # Uses threads

Upgrading to Async¶

# Gemfile - add async gem
gem 'simple_flow'
gem 'async', '~> 2.0'

# Same pipeline code - no changes needed!
pipeline = SimpleFlow::Pipeline.new do
  step :fetch_user, user_fetcher, depends_on: []
  step :fetch_orders, order_fetcher, depends_on: [:fetch_user]
  step :fetch_products, product_fetcher, depends_on: [:fetch_user]
end

result = pipeline.call_parallel(data)  # Now uses async automatically

Mixing Concurrency Models in One Application¶

You can use different concurrency models for different pipelines in the same application:

# Gemfile - include async for high-volume pipelines
gem 'simple_flow'
gem 'async', '~> 2.0'

# Low-volume pipeline: Use threads for simplicity
user_pipeline = SimpleFlow::Pipeline.new(concurrency: :threads) do
  step :validate, validator, depends_on: []
  step :fetch_profile, profile_fetcher, depends_on: [:validate]
  step :fetch_preferences, prefs_fetcher, depends_on: [:validate]
end

# High-volume pipeline: Use async for efficiency
monitoring_pipeline = SimpleFlow::Pipeline.new(concurrency: :async) do
  step :load_endpoints, endpoint_loader, depends_on: []
  step :check_all, health_checker, depends_on: [:load_endpoints]
  step :alert, alerter, depends_on: [:check_all]
end

# Each pipeline uses its configured concurrency model
user_result = user_pipeline.call_parallel(user_data)        # Uses threads
monitoring_result = monitoring_pipeline.call_parallel(config) # Uses async

This allows you to optimize each pipeline based on its specific requirements!

Performance Characteristics¶

I/O-Bound Operations¶

Both threads and async excel at I/O-bound operations (network, disk, database):

# API calls, database queries, file operations
# Both models provide significant speedup over sequential execution

# Sequential: 10 API calls × 200ms = 2000ms
# Threads:    10 API calls in parallel = ~200ms
# Async:      10 API calls in parallel = ~200ms

# Winner: Tie (both are fast for moderate I/O)

High Concurrency (1000+ operations)¶

Async shines when dealing with thousands of concurrent operations:

# 5,000 concurrent HTTP requests

# Threads:  5,000 threads × 1.5 MB = 7.5 GB RAM ❌
# Async:    5,000 fibers × 6 KB = 30 MB RAM ✅

# Winner: Async (dramatically lower resource usage)

CPU-Bound Operations¶

Neither model helps with pure CPU work due to Ruby's GIL:

# Heavy computation (image processing, data crunching)
# GIL ensures only one thread/fiber does CPU work at a time

# Sequential: 1000ms
# Threads:    1000ms (GIL limitation)
# Async:      1000ms (GIL limitation)

# Winner: None (use process-based parallelism for CPU work)

Common Questions¶

Q: Can I use both in the same application?¶

A: Yes! SimpleFlow automatically detects if async is available and uses it. Different pipelines in the same app can use different models.

Q: Do I need to change my code to switch models?¶

A: No! Just add or remove the async gem from your Gemfile. Your pipeline code stays the same.

Q: What if I'm not sure which to use?¶

A: Start without async (use threads). It's simpler and works great for most use cases. Add async later if you need it.

Q: Can I check which model is being used?¶

A: Yes! Use the async_available? method:

pipeline = SimpleFlow::Pipeline.new
puts "Using async: #{pipeline.async_available?}"

Q: Are there any compatibility issues with async?¶

A: Async requires async-aware libraries for best results: - Use async-http instead of net/http or httparty - Use async-postgres instead of pg - Check if your favorite gems have async versions

With threads, any Ruby gem works out of the box.

Recommendations¶

For Most Users¶

Start with threads (no async gem): - Simpler setup - Works with any library - Sufficient for most applications - Easy to understand and debug

Upgrade to Async When¶

You experience any of these: - ⚠️ High memory usage from threads - ⚠️ Need more than 100 concurrent operations - ⚠️ Building high-throughput services - ⚠️ Already using async-compatible libraries - ⚠️ Hosting costs driven by memory usage

Migration Path¶

Start: Build with threads (no dependencies)
Measure: Profile your application under realistic load
Decide: If you hit thread limits, add async gem
Switch: Just add gem to Gemfile, no code changes
Optimize: Gradually adopt async-aware libraries for better performance

Next Steps¶

Parallel Execution - Deep dive into parallel execution patterns
Performance - Benchmarking and optimization tips
Best Practices - Concurrent programming patterns
Error Handling - Handling errors in parallel pipelines

Summary¶

Your Scenario	Recommendation
Building internal tools, scripts	✅ Threads (no async)
Standard web app with DB queries	✅ Threads (no async)
Processing 10-100 parallel tasks	✅ Threads (no async)
High-volume API gateway	✅ Async (add gem)
Web scraper (1000+ requests)	✅ Async (add gem)
Real-time chat/notifications	✅ Async (add gem)
Background job processor	✅ Async (add gem)

Remember: You can always start simple (threads) and upgrade to async later without changing your pipeline code!