From debfb466ad1821a6e5ebbe6938dd01e72f1196ea Mon Sep 17 00:00:00 2001
From: Michael <michael.schapira@us.ibm.com>
Date: Mon, 23 Feb 2026 11:14:40 -0500
Subject: [PATCH] Add comprehensive deployment guide with systemd service setup
 and LXC configuration

---
 DEPLOYMENT.md | 379 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 379 insertions(+)
 create mode 100644 DEPLOYMENT.md
diff --git a/DEPLOYMENT.md b/DEPLOYMENT.md
new file mode 100644
index 0000000..8f9c168
--- /dev/null
+++ b/DEPLOYMENT.md
@@ -0,0 +1,379 @@
+# Deployment Guide
+
+This guide covers deploying watsonx-openai-proxy in production environments.
+
+## System Requirements
+
+### Fedora 43 (or similar RPM-based distributions)
+
+#### Essential Packages
+```bash
+sudo dnf install -y \
+    python3.12 \
+    python3-pip \
+    git
+```
+
+#### Optional Build Tools (for compiling Python packages)
+```bash
+sudo dnf install -y \
+    python3-devel \
+    gcc \
+    gcc-c++ \
+    make \
+    libffi-devel \
+    openssl-devel \
+    zlib-devel
+```
+
+**Note**: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed.
+
+## Installation
+
+### 1. Clone Repository
+```bash
+cd /home/app
+git clone <repository-url> watsonx-openai-proxy
+cd watsonx-openai-proxy
+```
+
+### 2. Install Dependencies
+```bash
+# Using system Python
+python3 -m pip install --user -r requirements.txt
+
+# Or using virtual environment (recommended)
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+
+### 3. Configure Environment
+
+Create `.env` file (copy from `.env.example`):
+
+```bash
+cp .env.example .env
+```
+
+**IMPORTANT**: Remove all inline comments from `.env` file. Pydantic cannot parse values with inline comments.
+
+**Correct format:**
+```bash
+IBM_CLOUD_API_KEY=your_api_key_here
+WATSONX_PROJECT_ID=your_project_id_here
+WATSONX_CLUSTER=us-south
+HOST=0.0.0.0
+PORT=8000
+LOG_LEVEL=info
+TOKEN_REFRESH_INTERVAL=3000
+```
+
+**Incorrect format (will cause errors):**
+```bash
+LOG_LEVEL=info  # Options: debug, info, warning, error
+TOKEN_REFRESH_INTERVAL=3000  # Refresh token every N seconds
+```
+
+## Systemd Service Setup
+
+### 1. Create Service Unit
+
+Create `/etc/systemd/system/watsonx-proxy.service`:
+
+```ini
+[Unit]
+Description=watsonx OpenAI Proxy
+After=network.target
+
+[Service]
+Type=simple
+User=app
+Group=app
+WorkingDirectory=/home/app/watsonx-openai-proxy
+Environment="PATH=/usr/local/bin:/usr/bin:/bin"
+EnvironmentFile=/home/app/watsonx-openai-proxy/.env
+ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
+Restart=always
+RestartSec=10
+
+[Install]
+WantedBy=multi-user.target
+```
+
+### 2. Enable and Start Service
+
+```bash
+# Reload systemd
+sudo systemctl daemon-reload
+
+# Enable service (start on boot)
+sudo systemctl enable watsonx-proxy.service
+
+# Start service
+sudo systemctl start watsonx-proxy.service
+
+# Check status
+sudo systemctl status watsonx-proxy.service
+```
+
+### 3. View Logs
+
+```bash
+# Follow logs
+sudo journalctl -u watsonx-proxy.service -f
+
+# View last 50 lines
+sudo journalctl -u watsonx-proxy.service -n 50
+```
+
+### 4. Service Management
+
+```bash
+# Stop service
+sudo systemctl stop watsonx-proxy.service
+
+# Restart service
+sudo systemctl restart watsonx-proxy.service
+
+# Disable auto-start
+sudo systemctl disable watsonx-proxy.service
+```
+
+## LXC Container Deployment
+
+### Recommended Resources (5 req/s)
+
+#### Minimum Configuration
+- **CPU**: 1 core (1000 CPU shares)
+- **RAM**: 2 GB
+- **Storage**: 10 GB
+- **Swap**: 1 GB
+
+#### Recommended Configuration
+- **CPU**: 2 cores (2000 CPU shares)
+- **RAM**: 4 GB
+- **Storage**: 20 GB
+- **Swap**: 2 GB
+
+#### Optimal Configuration
+- **CPU**: 4 cores (4000 CPU shares)
+- **RAM**: 8 GB
+- **Storage**: 50 GB
+- **Swap**: 4 GB
+
+### Proxmox LXC Configuration
+
+Edit `/etc/pve/lxc/<VMID>.conf`:
+
+```ini
+# CPU allocation
+cores: 2
+cpulimit: 2
+cpuunits: 2000
+
+# Memory allocation
+memory: 4096
+swap: 2048
+
+# Storage
+rootfs: local-lvm:vm-<VMID>-disk-0,size=20G
+
+# Network
+net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth
+```
+
+### Resource Monitoring
+
+```bash
+# CPU usage
+lxc-cgroup -n <container> cpu.stat
+
+# Memory usage
+lxc-cgroup -n <container> memory.usage_in_bytes
+
+# Network stats
+lxc-attach -n <container> -- ifconfig eth0
+```
+
+## Python Version Management
+
+### Using update-alternatives
+
+```bash
+# Set up alternatives
+sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
+sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2
+
+# Switch version
+sudo update-alternatives --config python3
+
+# Fix pip if needed
+python3 -m ensurepip --default-pip --upgrade
+```
+
+### Verify Installation
+
+```bash
+python3 --version
+python3 -m pip --version
+```
+
+## Troubleshooting
+
+### pip Module Not Found
+
+After switching Python versions:
+
+```bash
+python3 -m ensurepip --default-pip --upgrade
+python3 -m pip --version
+```
+
+### Service Fails to Start
+
+Check logs for errors:
+
+```bash
+sudo journalctl -u watsonx-proxy.service -n 100 --no-pager
+```
+
+Common issues:
+1. **Inline comments in .env**: Remove all `# comments` from environment variable values
+2. **Missing dependencies**: Run `pip install -r requirements.txt`
+3. **Permission errors**: Ensure `app` user owns `/home/app/watsonx-openai-proxy`
+4. **Port already in use**: Change `PORT` in `.env` or stop conflicting service
+
+### Token Refresh Errors
+
+Check IBM Cloud credentials:
+
+```bash
+# Test token generation
+curl -X POST "https://iam.cloud.ibm.com/identity/token" \
+  -H "Content-Type: application/x-www-form-urlencoded" \
+  -d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY"
+```
+
+### High Memory Usage
+
+Reduce number of workers:
+
+```bash
+# Edit service file
+ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1
+
+# Restart service
+sudo systemctl restart watsonx-proxy.service
+```
+
+## Performance Tuning
+
+### Worker Configuration
+
+- **1 worker**: ~50 MB RAM, handles ~5 req/s
+- **2 workers**: ~100 MB RAM, handles ~10 req/s
+- **4 workers**: ~200 MB RAM, handles ~20 req/s
+
+### Scaling Strategy
+
+1. **Vertical scaling**: Increase workers up to number of CPU cores
+2. **Horizontal scaling**: Deploy multiple instances behind load balancer
+3. **Auto-scaling**: Monitor CPU/memory and scale based on thresholds
+
+## Security Considerations
+
+### API Key Authentication
+
+Enable proxy authentication:
+
+```bash
+# In .env file
+API_KEY=your_secure_random_key_here
+```
+
+### CORS Configuration
+
+Restrict origins:
+
+```bash
+# In .env file
+ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com
+```
+
+### Firewall Rules
+
+```bash
+# Allow only specific IPs
+sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept'
+sudo firewall-cmd --reload
+```
+
+## Monitoring
+
+### Health Check
+
+```bash
+curl http://localhost:8000/health
+```
+
+### Metrics
+
+Monitor these metrics:
+- CPU usage (should stay <70%)
+- Memory usage (should stay <80%)
+- Response times
+- Error rates
+- Token refresh success rate
+
+### Log Levels
+
+Adjust in `.env`:
+
+```bash
+LOG_LEVEL=debug  # For troubleshooting
+LOG_LEVEL=info   # For production
+LOG_LEVEL=warning  # For minimal logging
+```
+
+## Backup and Recovery
+
+### Backup Configuration
+
+```bash
+# Backup .env file
+sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d)
+
+# Backup service file
+sudo cp /etc/systemd/system/watsonx-proxy.service /backup/
+```
+
+### Disaster Recovery
+
+```bash
+# Restore configuration
+sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env
+sudo systemctl restart watsonx-proxy.service
+```
+
+## Updates
+
+### Update Application
+
+```bash
+cd /home/app/watsonx-openai-proxy
+git pull
+pip install -r requirements.txt --upgrade
+sudo systemctl restart watsonx-proxy.service
+```
+
+### Zero-Downtime Updates
+
+Use multiple instances behind a load balancer and update one at a time.
+
+---
+
+For additional help, see:
+- [README.md](README.md) - General usage and features
+- [MODELS.md](MODELS.md) - Available models
+- [AGENTS.md](AGENTS.md) - Development guidelines