Add comprehensive deployment guide with systemd service setup and LXC configuration

2026-02-23 11:14:40 -05:00
parent 92bc754316
commit debfb466ad
1 changed files with 379 additions and 0 deletions
--- a/DEPLOYMENT.md
+++ b/DEPLOYMENT.md
@@ -0,0 +1,379 @@
 # Deployment Guide
 This guide covers deploying watsonx-openai-proxy in production environments.
 ## System Requirements
 ### Fedora 43 (or similar RPM-based distributions)
 #### Essential Packages
 ```bash
 sudo dnf install -y \
    python3.12 \
    python3-pip \
    git
 ```
 #### Optional Build Tools (for compiling Python packages)
 ```bash
 sudo dnf install -y \
    python3-devel \
    gcc \
    gcc-c++ \
    make \
    libffi-devel \
    openssl-devel \
    zlib-devel
 ```
 **Note**: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed.
 ## Installation
 ### 1. Clone Repository
 ```bash
 cd /home/app
 git clone <repository-url> watsonx-openai-proxy
 cd watsonx-openai-proxy
 ```
 ### 2. Install Dependencies
 ```bash
 # Using system Python
 python3 -m pip install --user -r requirements.txt
 # Or using virtual environment (recommended)
 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
 ```
 ### 3. Configure Environment
 Create `.env` file (copy from `.env.example`):
 ```bash
 cp .env.example .env
 ```
 **IMPORTANT**: Remove all inline comments from `.env` file. Pydantic cannot parse values with inline comments.
 **Correct format:**
 ```bash
 IBM_CLOUD_API_KEY=your_api_key_here
 WATSONX_PROJECT_ID=your_project_id_here
 WATSONX_CLUSTER=us-south
 HOST=0.0.0.0
 PORT=8000
 LOG_LEVEL=info
 TOKEN_REFRESH_INTERVAL=3000
 ```
 **Incorrect format (will cause errors):**
 ```bash
 LOG_LEVEL=info  # Options: debug, info, warning, error
 TOKEN_REFRESH_INTERVAL=3000  # Refresh token every N seconds
 ```
 ## Systemd Service Setup
 ### 1. Create Service Unit
 Create `/etc/systemd/system/watsonx-proxy.service`:
 ```ini
 [Unit]
 Description=watsonx OpenAI Proxy
 After=network.target
 [Service]
 Type=simple
 User=app
 Group=app
 WorkingDirectory=/home/app/watsonx-openai-proxy
 Environment="PATH=/usr/local/bin:/usr/bin:/bin"
 EnvironmentFile=/home/app/watsonx-openai-proxy/.env
 ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
 Restart=always
 RestartSec=10
 [Install]
 WantedBy=multi-user.target
 ```
 ### 2. Enable and Start Service
 ```bash
 # Reload systemd
 sudo systemctl daemon-reload
 # Enable service (start on boot)
 sudo systemctl enable watsonx-proxy.service
 # Start service
 sudo systemctl start watsonx-proxy.service
 # Check status
 sudo systemctl status watsonx-proxy.service
 ```
 ### 3. View Logs
 ```bash
 # Follow logs
 sudo journalctl -u watsonx-proxy.service -f
 # View last 50 lines
 sudo journalctl -u watsonx-proxy.service -n 50
 ```
 ### 4. Service Management
 ```bash
 # Stop service
 sudo systemctl stop watsonx-proxy.service
 # Restart service
 sudo systemctl restart watsonx-proxy.service
 # Disable auto-start
 sudo systemctl disable watsonx-proxy.service
 ```
 ## LXC Container Deployment
 ### Recommended Resources (5 req/s)
 #### Minimum Configuration
 - **CPU**: 1 core (1000 CPU shares)
 - **RAM**: 2 GB
 - **Storage**: 10 GB
 - **Swap**: 1 GB
 #### Recommended Configuration
 - **CPU**: 2 cores (2000 CPU shares)
 - **RAM**: 4 GB
 - **Storage**: 20 GB
 - **Swap**: 2 GB
 #### Optimal Configuration
 - **CPU**: 4 cores (4000 CPU shares)
 - **RAM**: 8 GB
 - **Storage**: 50 GB
 - **Swap**: 4 GB
 ### Proxmox LXC Configuration
 Edit `/etc/pve/lxc/<VMID>.conf`:
 ```ini
 # CPU allocation
 cores: 2
 cpulimit: 2
 cpuunits: 2000
 # Memory allocation
 memory: 4096
 swap: 2048
 # Storage
 rootfs: local-lvm:vm-<VMID>-disk-0,size=20G
 # Network
 net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth
 ```
 ### Resource Monitoring
 ```bash
 # CPU usage
 lxc-cgroup -n <container> cpu.stat
 # Memory usage
 lxc-cgroup -n <container> memory.usage_in_bytes
 # Network stats
 lxc-attach -n <container> -- ifconfig eth0
 ```
 ## Python Version Management
 ### Using update-alternatives
 ```bash
 # Set up alternatives
 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2
 # Switch version
 sudo update-alternatives --config python3
 # Fix pip if needed
 python3 -m ensurepip --default-pip --upgrade
 ```
 ### Verify Installation
 ```bash
 python3 --version
 python3 -m pip --version
 ```
 ## Troubleshooting
 ### pip Module Not Found
 After switching Python versions:
 ```bash
 python3 -m ensurepip --default-pip --upgrade
 python3 -m pip --version
 ```
 ### Service Fails to Start
 Check logs for errors:
 ```bash
 sudo journalctl -u watsonx-proxy.service -n 100 --no-pager
 ```
 Common issues:
 1. **Inline comments in .env**: Remove all `# comments` from environment variable values
 2. **Missing dependencies**: Run `pip install -r requirements.txt`
 3. **Permission errors**: Ensure `app` user owns `/home/app/watsonx-openai-proxy`
 4. **Port already in use**: Change `PORT` in `.env` or stop conflicting service
 ### Token Refresh Errors
 Check IBM Cloud credentials:
 ```bash
 # Test token generation
 curl -X POST "https://iam.cloud.ibm.com/identity/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY"
 ```
 ### High Memory Usage
 Reduce number of workers:
 ```bash
 # Edit service file
 ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1
 # Restart service
 sudo systemctl restart watsonx-proxy.service
 ```
 ## Performance Tuning
 ### Worker Configuration
 - **1 worker**: ~50 MB RAM, handles ~5 req/s
 - **2 workers**: ~100 MB RAM, handles ~10 req/s
 - **4 workers**: ~200 MB RAM, handles ~20 req/s
 ### Scaling Strategy
 1. **Vertical scaling**: Increase workers up to number of CPU cores
 2. **Horizontal scaling**: Deploy multiple instances behind load balancer
 3. **Auto-scaling**: Monitor CPU/memory and scale based on thresholds
 ## Security Considerations
 ### API Key Authentication
 Enable proxy authentication:
 ```bash
 # In .env file
 API_KEY=your_secure_random_key_here
 ```
 ### CORS Configuration
 Restrict origins:
 ```bash
 # In .env file
 ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com
 ```
 ### Firewall Rules
 ```bash
 # Allow only specific IPs
 sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept'
 sudo firewall-cmd --reload
 ```
 ## Monitoring
 ### Health Check
 ```bash
 curl http://localhost:8000/health
 ```
 ### Metrics
 Monitor these metrics:
 - CPU usage (should stay <70%)
 - Memory usage (should stay <80%)
 - Response times
 - Error rates
 - Token refresh success rate
 ### Log Levels
 Adjust in `.env`:
 ```bash
 LOG_LEVEL=debug  # For troubleshooting
 LOG_LEVEL=info   # For production
 LOG_LEVEL=warning  # For minimal logging
 ```
 ## Backup and Recovery
 ### Backup Configuration
 ```bash
 # Backup .env file
 sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d)
 # Backup service file
 sudo cp /etc/systemd/system/watsonx-proxy.service /backup/
 ```
 ### Disaster Recovery
 ```bash
 # Restore configuration
 sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env
 sudo systemctl restart watsonx-proxy.service
 ```
 ## Updates
 ### Update Application
 ```bash
 cd /home/app/watsonx-openai-proxy
 git pull
 pip install -r requirements.txt --upgrade
 sudo systemctl restart watsonx-proxy.service
 ```
 ### Zero-Downtime Updates
 Use multiple instances behind a load balancer and update one at a time.
 ---
 For additional help, see:
 - [README.md](README.md) - General usage and features
 - [MODELS.md](MODELS.md) - Available models
 - [AGENTS.md](AGENTS.md) - Development guidelines