Database Infrastructure
This document outlines the database infrastructure for the AI Agent Orchestration Platform.
Overview
The platform uses PostgreSQL as its primary database for storing application data, with additional specialized databases for specific use cases. This document covers database setup, configuration, management, backup strategies, and scaling approaches.
Database Architecture
Primary Database: PostgreSQL
PostgreSQL is the main relational database used for:
- User accounts and authentication
- Workflow definitions and metadata
- Agent registry and configuration
- Execution history and results
- Platform configuration

Note: This is a placeholder for a database architecture diagram. The actual diagram should be created and added to the project.
Additional Databases
The platform also uses these specialized databases:
- Vector Database (e.g., Milvus, Pinecone): For storing and querying embeddings
- Time Series Database (e.g., TimescaleDB): For monitoring metrics
- Graph Database (e.g., Neo4j): For relationship mapping (optional)
- Document Database (e.g., MongoDB): For unstructured data (optional)
PostgreSQL Configuration
Basic Configuration
The PostgreSQL database is configured for optimal performance:
# postgresql.conf
# CONNECTIONS
max_connections = 100
superuser_reserved_connections = 3
# MEMORY
shared_buffers = 1GB
work_mem = 32MB
maintenance_work_mem = 256MB
effective_cache_size = 3GB
# WRITE AHEAD LOG
wal_level = replica
max_wal_size = 1GB
min_wal_size = 80MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
# QUERY OPTIMIZATION
random_page_cost = 1.1
effective_io_concurrency = 200
default_statistics_target = 100
# AUTOVACUUM
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
# LOGGING
log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
High Availability Configuration
For production environments, PostgreSQL is configured for high availability:
- Primary-Replica Setup: Synchronous replication to standby servers
- Automatic Failover: Using tools like Patroni or Stolon
- Connection Pooling: Using PgBouncer for efficient connection management
- Load Balancing: Using HAProxy or similar for read-write splitting
Example Patroni configuration:
scope: meta-agent-cluster
namespace: /service/meta-agent/
name: postgresql0
restapi:
listen: 0.0.0.0:8008
connect_address: postgresql0:8008
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
max_connections: 100
shared_buffers: 1GB
work_mem: 32MB
maintenance_work_mem: 256MB
effective_cache_size: 3GB
wal_level: replica
max_wal_size: 1GB
min_wal_size: 80MB
checkpoint_completion_target: 0.9
wal_buffers: 16MB
random_page_cost: 1.1
effective_io_concurrency: 200
default_statistics_target: 100
autovacuum: on
autovacuum_max_workers: 3
autovacuum_naptime: 1min
autovacuum_vacuum_threshold: 50
autovacuum_analyze_threshold: 50
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.0.0.0/8 md5
- host all all 0.0.0.0/0 md5
postgresql:
listen: 0.0.0.0:5432
connect_address: postgresql0:5432
data_dir: /data/postgresql0
bin_dir: /usr/lib/postgresql/15/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replpass
superuser:
username: postgres
password: secretpassword
parameters:
unix_socket_directories: '/var/run/postgresql'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Database Schema Management
Migration Strategy
The platform uses Alembic for database migrations:
- Version Control: Track schema changes
- Forward Migration: Apply schema changes
- Backward Migration: Revert schema changes
- CI/CD Integration: Automated migration in deployment pipeline
Example Alembic migration script:
"""Add agent execution metrics table
Revision ID: a1b2c3d4e5f6
Revises: 9z8y7x6w5v4
Create Date: 2025-04-18 14:30:45.123456
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB, UUID
# revision identifiers, used by Alembic.
revision = 'a1b2c3d4e5f6'
down_revision = '9z8y7x6w5v4'
branch_labels = None
depends_on = None
def upgrade():
op.create_table(
'agent_execution_metrics',
sa.Column('id', UUID, primary_key=True),
sa.Column('agent_id', UUID, sa.ForeignKey('agents.id'), nullable=False),
sa.Column('workflow_execution_id', UUID, sa.ForeignKey('workflow_executions.id'), nullable=False),
sa.Column('start_time', sa.DateTime(timezone=True), nullable=False),
sa.Column('end_time', sa.DateTime(timezone=True), nullable=True),
sa.Column('status', sa.String(50), nullable=False),
sa.Column('cpu_usage', sa.Float, nullable=True),
sa.Column('memory_usage', sa.Float, nullable=True),
sa.Column('input_size', sa.Integer, nullable=True),
sa.Column('output_size', sa.Integer, nullable=True),
sa.Column('error', sa.Text, nullable=True),
sa.Column('metadata', JSONB, nullable=True),
sa.Column('created_at', sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
sa.Column('updated_at', sa.DateTime(timezone=True), server_default=sa.func.now(), onupdate=sa.func.now(), nullable=False)
)
op.create_index('idx_agent_execution_metrics_agent_id', 'agent_execution_metrics', ['agent_id'])
op.create_index('idx_agent_execution_metrics_workflow_execution_id', 'agent_execution_metrics', ['workflow_execution_id'])
op.create_index('idx_agent_execution_metrics_start_time', 'agent_execution_metrics', ['start_time'])
op.create_index('idx_agent_execution_metrics_status', 'agent_execution_metrics', ['status'])
def downgrade():
op.drop_table('agent_execution_metrics')
Schema Documentation
The database schema is documented using tools like SchemaSpy or pgModeler:
- Entity-Relationship Diagrams: Visual representation of schema
- Table Definitions: Detailed table and column descriptions
- Relationship Documentation: Foreign key relationships
- Constraint Documentation: Primary keys, unique constraints, etc.
Backup and Recovery
Backup Strategy
The platform implements a comprehensive backup strategy:
- Full Backups: Daily full database backups
- WAL Archiving: Continuous transaction log archiving
- Point-in-Time Recovery: Ability to restore to any point in time
- Offsite Storage: Backups stored in multiple locations
- Backup Verification: Regular testing of backup restoration
Example backup script:
#!/bin/bash
# backup_db.sh - Create a full backup of the PostgreSQL database
BACKUP_DIR="/backups/postgresql"
BACKUP_FILE="meta_agent_$(date +%Y%m%d_%H%M%S).backup"
RETENTION_DAYS=7
# Create backup directory if it doesn't exist
mkdir -p $BACKUP_DIR
# Create full backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME -F c -f $BACKUP_DIR/$BACKUP_FILE
# Compress backup
gzip $BACKUP_DIR/$BACKUP_FILE
# Copy to offsite storage
aws s3 cp $BACKUP_DIR/$BACKUP_FILE.gz s3://meta-agent-backups/postgresql/
# Remove old backups
find $BACKUP_DIR -name "meta_agent_*.backup.gz" -type f -mtime +$RETENTION_DAYS -delete
echo "Backup completed: $BACKUP_FILE.gz"
Recovery Procedures
Documented recovery procedures for various scenarios:
- Full Database Restore: Restore from full backup
- Point-in-Time Recovery: Restore to specific point in time
- Single Table Restore: Restore specific table
- Disaster Recovery: Restore in new environment
Example recovery script:
#!/bin/bash
# restore_db.sh - Restore PostgreSQL database from backup
BACKUP_FILE=$1
if [ -z "$BACKUP_FILE" ]; then
echo "Usage: ./restore_db.sh [backup_file]"
echo "Example: ./restore_db.sh meta_agent_20250418_143045.backup.gz"
exit 1
fi
BACKUP_DIR="/backups/postgresql"
# Check if backup file exists
if [ ! -f "$BACKUP_DIR/$BACKUP_FILE" ]; then
# Try to download from offsite storage
aws s3 cp s3://meta-agent-backups/postgresql/$BACKUP_FILE $BACKUP_DIR/
if [ ! -f "$BACKUP_DIR/$BACKUP_FILE" ]; then
echo "Error: Backup file not found"
exit 1
fi
fi
# Uncompress if needed
if [[ $BACKUP_FILE == *.gz ]]; then
gunzip -c $BACKUP_DIR/$BACKUP_FILE > $BACKUP_DIR/${BACKUP_FILE%.gz}
BACKUP_FILE=${BACKUP_FILE%.gz}
fi
# Stop application services
docker-compose stop backend frontend
# Drop and recreate database
psql -h $DB_HOST -U $DB_USER -c "DROP DATABASE IF EXISTS $DB_NAME;"
psql -h $DB_HOST -U $DB_USER -c "CREATE DATABASE $DB_NAME;"
# Restore from backup
pg_restore -h $DB_HOST -U $DB_USER -d $DB_NAME -v $BACKUP_DIR/$BACKUP_FILE
# Start application services
docker-compose start backend frontend
echo "Database restored from $BACKUP_FILE"
Database Scaling
Vertical Scaling
Strategies for vertical scaling:
- Resource Allocation: Increase CPU, memory, and storage
- Storage Optimization: Use faster storage (SSD, NVMe)
- Configuration Tuning: Optimize PostgreSQL parameters for larger hardware
Horizontal Scaling
Strategies for horizontal scaling:
- Read Replicas: Distribute read queries across replicas
- Sharding: Partition data across multiple database instances
- Connection Pooling: Efficiently manage database connections
- Query Optimization: Improve query performance
Multi-Region Deployment
For global deployments, multi-region database strategy:
- Regional Replicas: Read replicas in each region
- Data Synchronization: Efficient data replication between regions
- Conflict Resolution: Handle write conflicts in multi-master setup
- Latency Optimization: Minimize cross-region queries
Database Security
Access Control
Database access control measures:
- Role-Based Access: Granular permissions for database users
- Connection Security: TLS encryption for all connections
- Network Security: Firewall rules and VPC isolation
- Audit Logging: Track all database access and changes
Example PostgreSQL role setup:
-- Create application roles
CREATE ROLE meta_agent_app WITH LOGIN PASSWORD 'secure_password';
CREATE ROLE meta_agent_readonly WITH LOGIN PASSWORD 'readonly_password';
CREATE ROLE meta_agent_admin WITH LOGIN PASSWORD 'admin_password' SUPERUSER;
-- Grant permissions to application role
GRANT CONNECT ON DATABASE meta_agent TO meta_agent_app;
GRANT USAGE ON SCHEMA public TO meta_agent_app;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO meta_agent_app;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO meta_agent_app;
-- Grant read-only permissions
GRANT CONNECT ON DATABASE meta_agent TO meta_agent_readonly;
GRANT USAGE ON SCHEMA public TO meta_agent_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO meta_agent_readonly;
-- Set default privileges for future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO meta_agent_app;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO meta_agent_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT USAGE ON SEQUENCES TO meta_agent_app;
Data Encryption
Database encryption measures:
- Encryption at Rest: File system or tablespace encryption
- Encryption in Transit: TLS for all connections
- Column-Level Encryption: Encrypt sensitive columns
- Key Management: Secure storage and rotation of encryption keys
Edge Database Strategy
For edge deployments, specialized database strategy:
- Local SQLite Database: Lightweight database for edge devices
- Synchronization: Bi-directional sync with central PostgreSQL
- Conflict Resolution: Handle sync conflicts
- Offline Operation: Function without connectivity
See Edge Infrastructure for more details.
Federated Database Strategy
For federated deployments, specialized database strategy:
- Data Sovereignty: Keep data within organizational boundaries
- Secure Queries: Privacy-preserving distributed queries
- Schema Compatibility: Ensure schema compatibility across organizations
- Audit Trail: Track cross-organization data access
See Federated Infrastructure for more details.
Database Monitoring
The platform implements comprehensive database monitoring:
- Performance Metrics: Query performance, connection count, etc.
- Resource Usage: CPU, memory, disk I/O, etc.
- Slow Query Tracking: Identify and optimize slow queries
- Replication Lag: Monitor replica synchronization
- Connection Pooling: Monitor connection pool health
See Monitoring Infrastructure for more details.
Database Management Scripts
Scripts for database management are located in /infra/scripts/:
setup_db.sh- Initialize database and rolesbackup_db.sh- Create database backuprestore_db.sh- Restore from backupmigrate_db.sh- Run database migrationsanalyze_db.sh- Analyze database performance
Best Practices
- Use database migrations for schema changes
- Implement comprehensive backup strategy
- Monitor database performance
- Optimize queries and indexes
- Implement connection pooling
- Use appropriate scaling strategy
- Secure database access
- Regularly maintain database (VACUUM, ANALYZE)
- Document database schema and procedures
References
- Deployment Infrastructure
- Containerization
- Monitoring Infrastructure
- Security Infrastructure
- Edge Infrastructure
- Federated Infrastructure
- Data Model Design
Last updated: 2025-04-18