Files
watsonx-openai-proxy/DEPLOYMENT.md

380 lines
7.3 KiB
Markdown

# Deployment Guide
This guide covers deploying watsonx-openai-proxy in production environments.
## System Requirements
### Fedora 43 (or similar RPM-based distributions)
#### Essential Packages
```bash
sudo dnf install -y \
python3.12 \
python3-pip \
git
```
#### Optional Build Tools (for compiling Python packages)
```bash
sudo dnf install -y \
python3-devel \
gcc \
gcc-c++ \
make \
libffi-devel \
openssl-devel \
zlib-devel
```
**Note**: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed.
## Installation
### 1. Clone Repository
```bash
cd /home/app
git clone <repository-url> watsonx-openai-proxy
cd watsonx-openai-proxy
```
### 2. Install Dependencies
```bash
# Using system Python
python3 -m pip install --user -r requirements.txt
# Or using virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
### 3. Configure Environment
Create `.env` file (copy from `.env.example`):
```bash
cp .env.example .env
```
**IMPORTANT**: Remove all inline comments from `.env` file. Pydantic cannot parse values with inline comments.
**Correct format:**
```bash
IBM_CLOUD_API_KEY=your_api_key_here
WATSONX_PROJECT_ID=your_project_id_here
WATSONX_CLUSTER=us-south
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
TOKEN_REFRESH_INTERVAL=3000
```
**Incorrect format (will cause errors):**
```bash
LOG_LEVEL=info # Options: debug, info, warning, error
TOKEN_REFRESH_INTERVAL=3000 # Refresh token every N seconds
```
## Systemd Service Setup
### 1. Create Service Unit
Create `/etc/systemd/system/watsonx-proxy.service`:
```ini
[Unit]
Description=watsonx OpenAI Proxy
After=network.target
[Service]
Type=simple
User=app
Group=app
WorkingDirectory=/home/app/watsonx-openai-proxy
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/home/app/watsonx-openai-proxy/.env
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
### 2. Enable and Start Service
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable watsonx-proxy.service
# Start service
sudo systemctl start watsonx-proxy.service
# Check status
sudo systemctl status watsonx-proxy.service
```
### 3. View Logs
```bash
# Follow logs
sudo journalctl -u watsonx-proxy.service -f
# View last 50 lines
sudo journalctl -u watsonx-proxy.service -n 50
```
### 4. Service Management
```bash
# Stop service
sudo systemctl stop watsonx-proxy.service
# Restart service
sudo systemctl restart watsonx-proxy.service
# Disable auto-start
sudo systemctl disable watsonx-proxy.service
```
## LXC Container Deployment
### Recommended Resources (5 req/s)
#### Minimum Configuration
- **CPU**: 1 core (1000 CPU shares)
- **RAM**: 2 GB
- **Storage**: 10 GB
- **Swap**: 1 GB
#### Recommended Configuration
- **CPU**: 2 cores (2000 CPU shares)
- **RAM**: 4 GB
- **Storage**: 20 GB
- **Swap**: 2 GB
#### Optimal Configuration
- **CPU**: 4 cores (4000 CPU shares)
- **RAM**: 8 GB
- **Storage**: 50 GB
- **Swap**: 4 GB
### Proxmox LXC Configuration
Edit `/etc/pve/lxc/<VMID>.conf`:
```ini
# CPU allocation
cores: 2
cpulimit: 2
cpuunits: 2000
# Memory allocation
memory: 4096
swap: 2048
# Storage
rootfs: local-lvm:vm-<VMID>-disk-0,size=20G
# Network
net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth
```
### Resource Monitoring
```bash
# CPU usage
lxc-cgroup -n <container> cpu.stat
# Memory usage
lxc-cgroup -n <container> memory.usage_in_bytes
# Network stats
lxc-attach -n <container> -- ifconfig eth0
```
## Python Version Management
### Using update-alternatives
```bash
# Set up alternatives
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2
# Switch version
sudo update-alternatives --config python3
# Fix pip if needed
python3 -m ensurepip --default-pip --upgrade
```
### Verify Installation
```bash
python3 --version
python3 -m pip --version
```
## Troubleshooting
### pip Module Not Found
After switching Python versions:
```bash
python3 -m ensurepip --default-pip --upgrade
python3 -m pip --version
```
### Service Fails to Start
Check logs for errors:
```bash
sudo journalctl -u watsonx-proxy.service -n 100 --no-pager
```
Common issues:
1. **Inline comments in .env**: Remove all `# comments` from environment variable values
2. **Missing dependencies**: Run `pip install -r requirements.txt`
3. **Permission errors**: Ensure `app` user owns `/home/app/watsonx-openai-proxy`
4. **Port already in use**: Change `PORT` in `.env` or stop conflicting service
### Token Refresh Errors
Check IBM Cloud credentials:
```bash
# Test token generation
curl -X POST "https://iam.cloud.ibm.com/identity/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY"
```
### High Memory Usage
Reduce number of workers:
```bash
# Edit service file
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1
# Restart service
sudo systemctl restart watsonx-proxy.service
```
## Performance Tuning
### Worker Configuration
- **1 worker**: ~50 MB RAM, handles ~5 req/s
- **2 workers**: ~100 MB RAM, handles ~10 req/s
- **4 workers**: ~200 MB RAM, handles ~20 req/s
### Scaling Strategy
1. **Vertical scaling**: Increase workers up to number of CPU cores
2. **Horizontal scaling**: Deploy multiple instances behind load balancer
3. **Auto-scaling**: Monitor CPU/memory and scale based on thresholds
## Security Considerations
### API Key Authentication
Enable proxy authentication:
```bash
# In .env file
API_KEY=your_secure_random_key_here
```
### CORS Configuration
Restrict origins:
```bash
# In .env file
ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com
```
### Firewall Rules
```bash
# Allow only specific IPs
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept'
sudo firewall-cmd --reload
```
## Monitoring
### Health Check
```bash
curl http://localhost:8000/health
```
### Metrics
Monitor these metrics:
- CPU usage (should stay <70%)
- Memory usage (should stay <80%)
- Response times
- Error rates
- Token refresh success rate
### Log Levels
Adjust in `.env`:
```bash
LOG_LEVEL=debug # For troubleshooting
LOG_LEVEL=info # For production
LOG_LEVEL=warning # For minimal logging
```
## Backup and Recovery
### Backup Configuration
```bash
# Backup .env file
sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d)
# Backup service file
sudo cp /etc/systemd/system/watsonx-proxy.service /backup/
```
### Disaster Recovery
```bash
# Restore configuration
sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env
sudo systemctl restart watsonx-proxy.service
```
## Updates
### Update Application
```bash
cd /home/app/watsonx-openai-proxy
git pull
pip install -r requirements.txt --upgrade
sudo systemctl restart watsonx-proxy.service
```
### Zero-Downtime Updates
Use multiple instances behind a load balancer and update one at a time.
---
For additional help, see:
- [README.md](README.md) - General usage and features
- [MODELS.md](MODELS.md) - Available models
- [AGENTS.md](AGENTS.md) - Development guidelines