CoAI LogoCoAI.Dev
Reference

Troubleshooting Guide

Common issues and solutions for CoAI.Dev deployment and operation

Troubleshooting Guide

Common issues, diagnostic steps, and solutions for CoAI.Dev deployment and operation problems.

Quick Diagnostics

Health Check Commands

# Check container status
docker ps
docker logs chatnio
 
# Check resource usage
docker stats chatnio
 
# Inspect container
docker inspect chatnio
 
# Execute commands in container
docker exec -it chatnio /bin/sh
 
# Health check
docker exec chatnio curl -f http://localhost:8000/health || echo "Health check failed"

Common Issues

1. Application Won't Start

Symptom

Service fails to start or exits immediately with errors.

Common Causes & Solutions:

Missing Environment Variables

# Check required variables
echo "SECRET: $SECRET"
echo "MYSQL_HOST: $MYSQL_HOST"
echo "MYSQL_PASSWORD: $MYSQL_PASSWORD"
 
# Solution: Set missing variables
export SECRET="your-secret-key"
export MYSQL_PASSWORD="your-password"

Database Connection Failed

# Test database connection
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "SELECT 1"
 
# Common issues:
# 1. Wrong credentials
# 2. Database not running
# 3. Network connectivity
# 4. Firewall blocking port 3306
 
# Solution: Check MySQL status
sudo systemctl status mysql
sudo systemctl start mysql
 
# Check if port is open
telnet $MYSQL_HOST 3306

Port Already in Use

# Check what's using the port
sudo netstat -tulpn | grep :8000
sudo lsof -i :8000
 
# Solution: Stop conflicting service or change port
sudo kill -9 PID_OF_PROCESS
# OR
export SERVER_PORT=8001

File Permission Issues

# Check permissions
ls -la /opt/chatnio/
ls -la /storage/
ls -la /logs/
 
# Solution: Fix permissions
sudo chown -R chatnio:chatnio /opt/chatnio/
sudo chmod -R 755 /storage/
sudo chmod -R 755 /logs/

2. Database Connection Issues

Symptom

"Failed to connect to database" or connection timeout errors.

Diagnostic Steps:

# Test basic connectivity
ping $MYSQL_HOST
telnet $MYSQL_HOST 3306
 
# Test credentials
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "SHOW DATABASES;"
 
# Check MySQL error logs
sudo tail -f /var/log/mysql/error.log
 
# Check connection limits
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD \
      -e "SHOW VARIABLES LIKE 'max_connections';"
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD \
      -e "SHOW STATUS LIKE 'Threads_connected';"

Common Solutions:

Increase Connection Limits

-- Temporary fix
SET GLOBAL max_connections = 500;
 
-- Permanent fix: Add to /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
max_connections = 500

Fix Authentication Issues

-- Check user privileges
SELECT user, host FROM mysql.user WHERE user = 'chatnio';
 
-- Grant proper privileges
GRANT ALL PRIVILEGES ON chatnio.* TO 'chatnio'@'%';
FLUSH PRIVILEGES;
 
-- For MySQL 8.0+ authentication
ALTER USER 'chatnio'@'%' IDENTIFIED WITH mysql_native_password BY 'password';

Network Configuration

# Check MySQL bind address
grep bind-address /etc/mysql/mysql.conf.d/mysqld.cnf
 
# Should be:
bind-address = 0.0.0.0
 
# Restart MySQL after changes
sudo systemctl restart mysql

3. AI Provider Issues

Symptom

API requests fail, timeouts, or "no available channels" errors.

Diagnostic Steps:

# Check channel health
curl -H "Authorization: Bearer YOUR_API_KEY" \
     http://localhost:8000/v1/admin/channels
 
# Test individual provider
curl -X POST "https://api.openai.com/v1/chat/completions" \
     -H "Authorization: Bearer $OPENAI_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "model": "gpt-3.5-turbo",
       "messages": [{"role": "user", "content": "Hello"}],
       "max_tokens": 10
     }'

Common Solutions:

Invalid API Keys

# Check API key format
echo "OpenAI Key: ${OPENAI_API_KEY:0:10}..."
echo "Anthropic Key: ${ANTHROPIC_API_KEY:0:10}..."
 
# Test key validity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models

Rate Limiting

# Check rate limit headers in response
curl -I -H "Authorization: Bearer $OPENAI_API_KEY" \
       https://api.openai.com/v1/models
 
# Look for:
# X-RateLimit-Limit-Requests
# X-RateLimit-Remaining-Requests
# Retry-After (if rate limited)

Channel Configuration

# Check channel configuration
curl -H "Authorization: Bearer YOUR_API_KEY" \
     http://localhost:8000/v1/admin/channels/CHANNEL_ID
 
# Verify:
# - API key is correct
# - Base URL is correct
# - Model names match provider's models
# - Channel is enabled and healthy

4. Performance Issues

Symptom

Slow response times, high memory usage, or timeouts.

Diagnostic Steps:

# Check system resources
top -p $(pgrep chatnio)
htop
free -h
df -h
 
# Check application metrics
curl http://localhost:9090/metrics
 
# Check database performance
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD \
      -e "SHOW PROCESSLIST;"
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD \
      -e "SHOW STATUS LIKE 'Slow_queries';"
 
# Check Redis performance
redis-cli -h $REDIS_HOST -p $REDIS_PORT info stats
redis-cli -h $REDIS_HOST -p $REDIS_PORT info memory

Performance Optimization:

Database Optimization

-- Check slow queries
SHOW VARIABLES LIKE 'slow_query_log';
SHOW VARIABLES LIKE 'long_query_time';
 
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
 
-- Check query cache
SHOW VARIABLES LIKE 'query_cache%';
 
-- Optimize common tables
OPTIMIZE TABLE users, conversations, messages;
 
-- Add indexes for common queries
CREATE INDEX idx_conversations_user_id ON conversations(user_id);
CREATE INDEX idx_messages_conversation_id ON messages(conversation_id);

Memory Optimization

# Check memory usage
ps aux | grep chatnio
cat /proc/$(pgrep chatnio)/status | grep VmRSS
 
# Adjust memory limits
export GOMEMLIMIT=2GiB  # For Go applications
ulimit -m 2097152       # 2GB limit
 
# Configure garbage collection
export GOGC=100         # GC percentage
export GODEBUG=gctrace=1  # GC tracing

Connection Pool Tuning

# Database connections
export DB_MAX_OPEN_CONNS=25
export DB_MAX_IDLE_CONNS=5
export DB_CONN_MAX_LIFETIME=300
 
# HTTP client connections
export HTTP_MAX_IDLE_CONNS=100
export HTTP_MAX_CONNS_PER_HOST=10
export HTTP_IDLE_CONN_TIMEOUT=90

5. File Upload Issues

Symptom

File uploads fail, timeout, or return file processing errors.

Diagnostic Steps:

# Check file upload endpoint
curl -X POST http://localhost:8000/v1/files \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -F "file=@test.pdf" \
     -F "purpose=chat"
 
# Check file permissions
ls -la /storage/
ls -la /storage/uploads/
 
# Check disk space
df -h /storage/
 
# Check file processing logs
grep "file processing" /logs/app.log

Common Solutions:

File Size Limits

# Check current limit
curl -I http://localhost:8000/v1/files
 
# Increase limit
export MAX_FILE_SIZE=104857600  # 100MB
 
# For nginx reverse proxy
# client_max_body_size 100M;

Storage Permissions

# Fix storage permissions
sudo chown -R chatnio:chatnio /storage/
sudo chmod -R 755 /storage/
 
# For Docker
docker exec chatnio chown -R nobody:nogroup /storage

Processing Timeouts

# Increase processing timeout
export FILE_PROCESSING_TIMEOUT=600  # 10 minutes
 
# Check OCR dependencies
docker exec chatnio which tesseract
docker exec chatnio tesseract --version

6. Authentication Issues

Symptom

Login failures, JWT errors, or "unauthorized" responses.

Diagnostic Steps:

# Check JWT secret
echo "JWT Secret length: ${#SECRET}"
 
# Test token generation
curl -X POST http://localhost:8000/v1/auth/login \
     -H "Content-Type: application/json" \
     -d '{"email": "test@example.com", "password": "password"}'
 
# Verify token
jwt_token="YOUR_JWT_TOKEN"
curl -H "Authorization: Bearer $jwt_token" \
     http://localhost:8000/v1/users/me

Common Solutions:

JWT Secret Issues

# Generate strong secret
export SECRET=$(openssl rand -base64 32)
 
# Verify secret is set
if [ -z "$SECRET" ]; then
    echo "JWT secret not set!"
    exit 1
fi

Token Expiration

# Check token expiration
export JWT_EXPIRE=168h  # 7 days
 
# Check refresh token settings
export JWT_REFRESH_EXPIRE=720h  # 30 days

Session Configuration

# Configure session settings
export SESSION_SECURE=true     # HTTPS only
export SESSION_HTTP_ONLY=true  # No JS access
export SESSION_SAME_SITE=strict

7. Docker-Specific Issues

Symptom

Container crashes, networking issues, or volume mount problems.

Docker Diagnostics:

# Check container logs
docker logs chatnio --tail=100 -f
 
# Check container resource usage
docker stats chatnio
 
# Inspect container configuration
docker inspect chatnio
 
# Check container filesystem
docker exec chatnio df -h
docker exec chatnio ls -la /storage /logs

Common Docker Issues:

Volume Mount Problems

# Check volume mounts
docker inspect chatnio | grep -A 10 "Mounts"
 
# Fix volume permissions on host
sudo chown -R 1000:1000 ./storage
sudo chown -R 1000:1000 ./logs
 
# Use bind mounts instead of volumes
-v $(pwd)/storage:/storage
-v $(pwd)/logs:/logs

Network Connectivity

# Check Docker network
docker network ls
docker network inspect bridge
 
# Test connectivity between containers
docker exec chatnio ping mysql
docker exec chatnio ping redis
 
# Check port mapping
docker port chatnio

Resource Limits

# Check memory limits
docker inspect chatnio | grep -i memory
 
# Set resource limits
docker run --memory=2g --cpus=2.0 chatnio
 
# Monitor resource usage
docker stats --no-stream chatnio

8. SSL/TLS Issues

Symptom

HTTPS connection failures, certificate errors, or SSL handshake failures.

SSL Diagnostics:

# Test SSL connection
openssl s_client -connect yourdomain.com:443
 
# Check certificate details
echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | openssl x509 -noout -text
 
# Test with curl
curl -vI https://yourdomain.com/health

SSL Solutions:

Certificate Issues

# Check certificate expiration
echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | openssl x509 -noout -dates
 
# Renew Let's Encrypt certificate
sudo certbot renew --nginx
 
# Test certificate chain
curl -I https://yourdomain.com

Nginx Configuration

# /etc/nginx/sites-available/chatnio
server {
    listen 443 ssl http2;
    server_name yourdomain.com;
    
    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    
    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Monitoring and Alerting

Log Analysis

# Search for specific errors
grep -i "error" /logs/app.log | tail -20
grep -i "failed" /logs/app.log | tail -20
grep -i "timeout" /logs/app.log | tail -20
 
# Monitor logs in real-time
tail -f /logs/app.log | grep -i error
 
# Analyze access patterns
awk '{print $1}' /logs/access.log | sort | uniq -c | sort -nr
 
# Check for rate limiting
grep "rate limit" /logs/app.log | tail -20

Performance Monitoring

# CPU and memory usage over time
sar -u 1 10  # CPU usage
sar -r 1 10  # Memory usage
 
# Disk I/O
iostat -x 1 10
 
# Network statistics
ss -tuln | grep :8000
netstat -i
 
# Application-specific metrics
curl http://localhost:9090/metrics | grep -E "(http_requests|response_time|error_rate)"

Automated Monitoring Script

#!/bin/bash
# monitor.sh - Automated health monitoring
 
LOGFILE="/logs/monitor.log"
ALERT_EMAIL="admin@example.com"
 
log() {
    echo "$(date): $1" | tee -a $LOGFILE
}
 
check_service() {
    if ! curl -f http://localhost:8000/health >/dev/null 2>&1; then
        log "ERROR: Health check failed"
        return 1
    fi
    return 0
}
 
check_database() {
    if ! mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "SELECT 1" >/dev/null 2>&1; then
        log "ERROR: Database connection failed"
        return 1
    fi
    return 0
}
 
check_redis() {
    if ! redis-cli -h $REDIS_HOST -p $REDIS_PORT ping >/dev/null 2>&1; then
        log "WARNING: Redis connection failed"
        return 1
    fi
    return 0
}
 
check_disk_space() {
    USAGE=$(df /storage | awk 'NR==2 {print $5}' | sed 's/%//')
    if [ $USAGE -gt 90 ]; then
        log "WARNING: Disk usage is ${USAGE}%"
        return 1
    fi
    return 0
}
 
send_alert() {
    echo "CoAI.Dev monitoring alert: $1" | mail -s "CoAI.Dev Alert" $ALERT_EMAIL
}
 
# Run checks
ERRORS=0
 
if ! check_service; then
    send_alert "Service health check failed"
    ((ERRORS++))
fi
 
if ! check_database; then
    send_alert "Database connection failed"
    ((ERRORS++))
fi
 
if ! check_redis; then
    send_alert "Redis connection failed"
    ((ERRORS++))
fi
 
if ! check_disk_space; then
    send_alert "High disk usage detected"
    ((ERRORS++))
fi
 
if [ $ERRORS -eq 0 ]; then
    log "All checks passed"
fi
 
exit $ERRORS

Recovery Procedures

Service Recovery

# Emergency restart procedure
systemctl stop chatnio
sleep 5
systemctl start chatnio
systemctl status chatnio
 
# Or for Docker
docker stop chatnio
docker start chatnio
docker logs chatnio --tail=50

Database Recovery

# Backup current state
mysqldump -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD chatnio > emergency_backup.sql
 
# Restore from backup
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD chatnio < backup.sql
 
# Repair corrupted tables
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "REPAIR TABLE chatnio.conversations;"

Complete System Recovery

#!/bin/bash
# disaster_recovery.sh
 
# Stop services
systemctl stop chatnio nginx
 
# Restore database
mysql -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD chatnio < /backups/latest.sql
 
# Restore file storage
rsync -av /backups/storage/ /storage/
 
# Restore configuration
cp /backups/config/* /opt/chatnio/config/
 
# Start services
systemctl start chatnio nginx
 
# Verify recovery
curl -f http://localhost:8000/health

For issues not covered in this guide, check the GitHub Issues or visit our GitHub repository for community support.