- Emphasize that inline comments cause service startup failures - Show correct .env format without inline comments - Add example error message when inline comments are present - Include sed command to fix existing .env files with inline comments - Add model mapping examples in correct format
8.1 KiB
Deployment Guide
This guide covers deploying watsonx-openai-proxy in production environments.
System Requirements
Fedora 43 (or similar RPM-based distributions)
Essential Packages
sudo dnf install -y \
python3.12 \
python3-pip \
git
Optional Build Tools (for compiling Python packages)
sudo dnf install -y \
python3-devel \
gcc \
gcc-c++ \
make \
libffi-devel \
openssl-devel \
zlib-devel
Note: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed.
Installation
1. Clone Repository
cd /home/app
git clone <repository-url> watsonx-openai-proxy
cd watsonx-openai-proxy
2. Install Dependencies
# Using system Python
python3 -m pip install --user -r requirements.txt
# Or using virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
3. Configure Environment
Create .env file (copy from .env.example):
cp .env.example .env
CRITICAL: The .env file must NOT contain inline comments. Pydantic cannot parse environment variables with inline comments and will cause the service to fail with validation errors.
Correct format (no inline comments):
IBM_CLOUD_API_KEY=your_api_key_here
WATSONX_PROJECT_ID=your_project_id_here
WATSONX_CLUSTER=us-south
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
TOKEN_REFRESH_INTERVAL=3000
# Model mappings (optional)
MODEL_MAP_GPT4=openai/gpt-oss-120b
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_GPT4_TURBO_PREVIEW=mistral-large-2512
Incorrect format (will cause service startup failure):
# ❌ DO NOT USE INLINE COMMENTS - Service will fail to start
LOG_LEVEL=info # Options: debug, info, warning, error
TOKEN_REFRESH_INTERVAL=3000 # Refresh token every N seconds
Error you'll see if inline comments are present:
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
token_refresh_interval
Input should be a valid integer, unable to parse string as an integer
To fix existing .env files with inline comments:
# Remove all inline comments
sed -i 's/\s*#.*$//' /home/app/watsonx-openai-proxy/.env
# Or manually edit and remove everything after the value on each line
Systemd Service Setup
1. Create Service Unit
Create /etc/systemd/system/watsonx-proxy.service:
[Unit]
Description=watsonx OpenAI Proxy
After=network.target
[Service]
Type=simple
User=app
Group=app
WorkingDirectory=/home/app/watsonx-openai-proxy
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/home/app/watsonx-openai-proxy/.env
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
2. Enable and Start Service
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable watsonx-proxy.service
# Start service
sudo systemctl start watsonx-proxy.service
# Check status
sudo systemctl status watsonx-proxy.service
3. View Logs
# Follow logs
sudo journalctl -u watsonx-proxy.service -f
# View last 50 lines
sudo journalctl -u watsonx-proxy.service -n 50
4. Service Management
# Stop service
sudo systemctl stop watsonx-proxy.service
# Restart service
sudo systemctl restart watsonx-proxy.service
# Disable auto-start
sudo systemctl disable watsonx-proxy.service
LXC Container Deployment
Recommended Resources (5 req/s)
Minimum Configuration
- CPU: 1 core (1000 CPU shares)
- RAM: 2 GB
- Storage: 10 GB
- Swap: 1 GB
Recommended Configuration
- CPU: 2 cores (2000 CPU shares)
- RAM: 4 GB
- Storage: 20 GB
- Swap: 2 GB
Optimal Configuration
- CPU: 4 cores (4000 CPU shares)
- RAM: 8 GB
- Storage: 50 GB
- Swap: 4 GB
Proxmox LXC Configuration
Edit /etc/pve/lxc/<VMID>.conf:
# CPU allocation
cores: 2
cpulimit: 2
cpuunits: 2000
# Memory allocation
memory: 4096
swap: 2048
# Storage
rootfs: local-lvm:vm-<VMID>-disk-0,size=20G
# Network
net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth
Resource Monitoring
# CPU usage
lxc-cgroup -n <container> cpu.stat
# Memory usage
lxc-cgroup -n <container> memory.usage_in_bytes
# Network stats
lxc-attach -n <container> -- ifconfig eth0
Python Version Management
Using update-alternatives
# Set up alternatives
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2
# Switch version
sudo update-alternatives --config python3
# Fix pip if needed
python3 -m ensurepip --default-pip --upgrade
Verify Installation
python3 --version
python3 -m pip --version
Troubleshooting
pip Module Not Found
After switching Python versions:
python3 -m ensurepip --default-pip --upgrade
python3 -m pip --version
Service Fails to Start
Check logs for errors:
sudo journalctl -u watsonx-proxy.service -n 100 --no-pager
Common issues:
- Inline comments in .env: Remove all
# commentsfrom environment variable values - Missing dependencies: Run
pip install -r requirements.txt - Permission errors: Ensure
appuser owns/home/app/watsonx-openai-proxy - Port already in use: Change
PORTin.envor stop conflicting service
Token Refresh Errors
Check IBM Cloud credentials:
# Test token generation
curl -X POST "https://iam.cloud.ibm.com/identity/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY"
High Memory Usage
Reduce number of workers:
# Edit service file
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1
# Restart service
sudo systemctl restart watsonx-proxy.service
Performance Tuning
Worker Configuration
- 1 worker: ~50 MB RAM, handles ~5 req/s
- 2 workers: ~100 MB RAM, handles ~10 req/s
- 4 workers: ~200 MB RAM, handles ~20 req/s
Scaling Strategy
- Vertical scaling: Increase workers up to number of CPU cores
- Horizontal scaling: Deploy multiple instances behind load balancer
- Auto-scaling: Monitor CPU/memory and scale based on thresholds
Security Considerations
API Key Authentication
Enable proxy authentication:
# In .env file
API_KEY=your_secure_random_key_here
CORS Configuration
Restrict origins:
# In .env file
ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com
Firewall Rules
# Allow only specific IPs
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept'
sudo firewall-cmd --reload
Monitoring
Health Check
curl http://localhost:8000/health
Metrics
Monitor these metrics:
- CPU usage (should stay <70%)
- Memory usage (should stay <80%)
- Response times
- Error rates
- Token refresh success rate
Log Levels
Adjust in .env:
LOG_LEVEL=debug # For troubleshooting
LOG_LEVEL=info # For production
LOG_LEVEL=warning # For minimal logging
Backup and Recovery
Backup Configuration
# Backup .env file
sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d)
# Backup service file
sudo cp /etc/systemd/system/watsonx-proxy.service /backup/
Disaster Recovery
# Restore configuration
sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env
sudo systemctl restart watsonx-proxy.service
Updates
Update Application
cd /home/app/watsonx-openai-proxy
git pull
pip install -r requirements.txt --upgrade
sudo systemctl restart watsonx-proxy.service
Zero-Downtime Updates
Use multiple instances behind a load balancer and update one at a time.
For additional help, see: