CoAI LogoCoAI.Dev
Configuration

Caching Configuration

Intelligent model response caching to improve performance and reduce costs

Caching Configuration

CoAI.Dev's intelligent caching system optimizes performance and reduces costs by storing and reusing AI model responses for identical requests. This innovative feature significantly decreases API calls, improves response times, and provides substantial cost savings for repeated queries.

Overview

The caching system provides:

  • 🎯 Smart Request Matching: Hash-based identification of identical requests
  • 💰 Cost Reduction: Cached responses don't count toward usage quotas or billing
  • ⚡ Performance Boost: Instant responses for cached queries
  • 🔧 Flexible Configuration: Granular control over caching policies per model
  • 📊 Cache Analytics: Detailed insights into cache performance and hit rates
  • 🔄 Automatic Management: Intelligent cache expiration and cleanup

Performance and Cost Benefits

Effective caching can reduce API costs by 30-70% and improve response times by up to 95% for repeated queries, making it essential for high-traffic deployments.

How Caching Works

Request Processing Pipeline

When a Cached Response Exists

  1. Request Received: User submits a query to the AI model
  2. Hash Calculation: System calculates unique hash of request parameters
  3. Cache Lookup: Check if matching hash exists in cache store
  4. Cache Hit: Matching cached response found
  5. Response Delivery: Return cached result instantly
  6. No Billing: Request doesn't count toward usage limits or costs

Cache Hit Process:

User Request → Hash Generation → Cache Lookup → Cache Hit → Instant Response

No API Call, No Billing, Sub-second Response Time

Benefits of Cache Hits:

  • Zero API call costs
  • Sub-second response times
  • No usage quota consumption
  • Reduced load on AI provider APIs
  • Improved user experience

Configuration Setup

Basic Cache Configuration

Access Cache Settings

Navigate to Admin PanelSystem SettingsPerformanceModel Caching

Enable Global Caching

Toggle the master caching switch and configure global settings:

{
  "global_cache_settings": {
    "enabled": true,
    "cache_backend": "redis",
    "default_ttl": "24_hours",
    "max_memory_usage": "2GB",
    "compression_enabled": true
  }
}

Configure Per-Model Settings

Set specific caching policies for each AI model:

ModelCache EnabledTTLMax ResultsNotes
GPT-412 hours10,000High-cost model, aggressive caching
GPT-3.56 hours50,000Balanced caching policy
Claude8 hours20,000Moderate caching
DALL-E--Creative content, no caching

Test Cache Performance

Verify caching is working correctly:

  1. Submit the same query multiple times
  2. Check response times (should decrease dramatically)
  3. Verify cache hit rates in analytics
  4. Monitor cost reduction in billing reports

Advanced Configuration

Redis Cache Backend:

{
  "redis_config": {
    "host": "localhost",
    "port": 6379,
    "password": "your-redis-password",
    "database": 1,
    "cluster_mode": false,
    "ssl_enabled": true,
    "connection_pool_size": 20
  }
}

Database Cache Backend:

{
  "database_config": {
    "table_name": "model_cache",
    "connection_pool": "cache_pool",
    "index_optimization": true,
    "automatic_cleanup": true,
    "partitioning": "monthly"
  }
}

Model-Specific Configuration

Cache Policies by Model Type

General Text Models (GPT-4, Claude, etc.)

Recommended Settings:

{
  "text_models": {
    "cache_enabled": true,
    "ttl": "12_hours",
    "max_entries_per_model": 15000,
    "cache_threshold": "exact_match",
    "exclude_parameters": ["user_id", "session_id"]
  }
}

Use Cases:

  • FAQ responses and common questions
  • Educational content and explanations
  • Technical documentation queries
  • General knowledge questions

Benefits:

  • High cache hit rates for repeated questions
  • Significant cost savings for customer support
  • Faster response times for common queries
  • Reduced load on expensive models

Cache Analytics and Monitoring

Performance Metrics

Key Performance Indicators:

{
  "cache_metrics": {
    "hit_rate": "65.5%",
    "miss_rate": "34.5%",
    "avg_hit_response_time": "45ms",
    "avg_miss_response_time": "1250ms",
    "cost_savings": "$127.50",
    "api_calls_saved": 2847,
    "cache_size": "1.2GB",
    "cache_utilization": "78%"
  }
}

Analytics Dashboard Features:

  • Real-time cache hit/miss ratios
  • Cost savings calculations
  • Performance improvement metrics
  • Cache storage utilization
  • Model-specific cache performance

Cache Health Monitoring

Health Indicators:

  • Cache hit rate trends
  • Response time improvements
  • Storage efficiency metrics
  • Error rates and cache failures
  • Memory usage and optimization

Automated Alerts:

{
  "cache_alerts": {
    "low_hit_rate": {
      "threshold": "30%",
      "action": "review_cache_configuration"
    },
    "high_memory_usage": {
      "threshold": "90%",
      "action": "increase_cleanup_frequency"
    },
    "cache_errors": {
      "threshold": "5%",
      "action": "investigate_cache_backend"
    }
  }
}

Advanced Features

Cache Warm-up Strategies

Proactive Cache Population:

{
  "cache_warmup": {
    "enabled": true,
    "warmup_schedule": "daily_at_2am",
    "common_queries": [
      "most_frequent_queries_last_30_days",
      "seasonal_trending_topics",
      "business_specific_patterns"
    ],
    "parallel_warmup": true,
    "max_warmup_time": "30_minutes"
  }
}

Benefits of Cache Warm-up:

  • Higher hit rates during peak hours
  • Improved user experience from day one
  • Reduced cold start penalties
  • Better resource utilization

Cache Invalidation

Smart Invalidation Strategies:

{
  "invalidation_rules": {
    "content_updates": "invalidate_related_cache",
    "model_updates": "selective_invalidation",
    "policy_changes": "full_cache_clear",
    "user_request": "manual_invalidation"
  }
}

Invalidation Triggers:

  • Content or knowledge base updates
  • Model version changes
  • Policy or configuration updates
  • Manual administrator actions
  • Time-based expiration

Multi-Tier Caching

Hierarchical Cache Strategy:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Memory Cache  │ ─→ │   Redis Cache   │ ─→ │ Database Cache  │
│   (L1 - Fast)   │    │  (L2 - Medium)  │    │  (L3 - Slow)    │
│   100ms TTL     │    │   1 hour TTL    │    │  24 hour TTL    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Multi-Tier Benefits:

  • Optimal performance for different access patterns
  • Cost-effective storage utilization
  • Reduced load on backend systems
  • Improved scalability

Best Practices

Cache Strategy Design

Effective Caching Strategies:

  • Model Appropriateness: Cache deterministic models, not creative ones
  • Parameter Inclusion: Include all relevant parameters in hash calculation
  • TTL Optimization: Balance freshness with performance gains
  • Size Management: Implement appropriate cleanup and eviction policies
  • Monitoring: Continuously monitor and optimize cache performance

Cache Policy Template:

{
  "cache_policy_template": {
    "deterministic_models": {
      "cache_enabled": true,
      "ttl": "12_hours",
      "aggressive_caching": true
    },
    "creative_models": {
      "cache_enabled": false,
      "session_cache_only": true
    },
    "expensive_models": {
      "cache_enabled": true,
      "ttl": "24_hours",
      "max_cache_size": "large"
    }
  }
}

Performance Optimization

Cache Performance Tips:

  • Use Redis for high-performance scenarios
  • Implement cache preloading for common queries
  • Monitor hit rates and adjust TTL accordingly
  • Use compression for large responses
  • Implement tiered caching for optimal cost/performance

Security Considerations

Cache Security:

  • Encrypt sensitive cached content
  • Implement access controls for cache management
  • Regular security audits of cached data
  • User data isolation in multi-tenant setups
  • Secure cache key generation

Troubleshooting

Common Issues

Low Cache Hit Rates

Problem: Cache hit rates below 30%

Solutions:

  1. Review hash calculation parameters
  2. Check if requests are truly identical
  3. Examine TTL settings (might be too short)
  4. Verify cache storage capacity
  5. Analyze request patterns for optimization opportunities

Diagnostic Steps:

  • Compare request parameters for near-misses
  • Analyze cache expiration patterns
  • Review user behavior and query diversity
  • Check for parameter normalization needs

High Memory Usage

Problem: Cache consuming excessive memory

Solutions:

  1. Implement more aggressive cleanup policies
  2. Reduce TTL for less critical models
  3. Enable response compression
  4. Review cache size limits
  5. Consider cache partitioning strategies

Memory Optimization:

  • Monitor cache size trends
  • Implement LRU eviction policies
  • Use database cache for less frequent data
  • Regular cache health checks

Performance Issues

Slow Cache Operations:

  • Optimize cache backend configuration
  • Consider cache sharding for high volume
  • Review network latency to cache backend
  • Implement connection pooling
  • Monitor cache backend performance

Cache Inconsistency:

  • Verify cache invalidation rules
  • Check for race conditions in cache updates
  • Review cache key generation logic
  • Implement proper locking mechanisms
  • Regular cache integrity checks

Resource Preheating

Performance Optimization

Resource preheating allows you to pre-cache popular resources to CDN nodes, improving access speeds and reducing source server pressure. This is particularly important for enhancing user experience and handling business peak periods.

Key Features

  • Multi-Domain Support: Configure preheating for multiple custom domains
  • Static Resource Optimization: Ideal for CSS, JS, and other static assets
  • Post-Update Refresh: Refresh resources after each update to ensure stability
  • CDN Performance Enhancement: Improves overall CDN service performance

Use Cases

High-Traffic Events

Prepare resources before major events or promotions to handle traffic spikes.

Version Releases

Preheat resources after new version releases to ensure optimal performance.

Hot Content Caching

Pre-cache regularly updated popular content for faster access.

Configuration

To enable resource preheating:

cache:
  preheating:
    enabled: true
    domains:
      - "cdn.example.com"
      - "assets.example.com"
    resources:
      - "*.css"
      - "*.js"
      - "*.png"
      - "*.jpg"
    priority: "high"
    schedule: "0 2 * * *"  # Daily at 2 AM

Important Considerations

Bandwidth Warning

The preheating process will pull large amounts of data from your source server. Monitor your source server's bandwidth load closely during preheating operations.

Best Practices:

  • Perform preheating during off-peak hours
  • Plan preheating resources carefully to avoid unnecessary resource waste
  • Monitor CDN and source server performance during preheating
  • Set up alerts for high bandwidth usage

Operation Steps

  1. Access CDN Console: Login to your CDN management interface
  2. Select Preheating Function: Navigate to the "Resource Preheating" section
  3. Configure URLs: Input the list of URLs that need preheating
  4. Set Priority: Choose the preheating priority level
  5. Submit Task: Submit the preheating task for execution

Monitoring and Analytics

Track preheating effectiveness through:

  • Cache Hit Rates: Monitor improvement in cache performance
  • Response Times: Measure reduction in resource load times
  • Bandwidth Usage: Track source server load reduction
  • User Experience Metrics: Monitor page load speed improvements

Intelligent caching significantly improves platform performance and reduces operational costs. Continue with Advanced Features for enterprise-grade capabilities, or explore System Settings for additional configuration options.