Files

Michael 0ed0ae9ce8 Update DEPLOYMENT.md with critical .env format requirements

- Emphasize that inline comments cause service startup failures
- Show correct .env format without inline comments
- Add example error message when inline comments are present
- Include sed command to fix existing .env files with inline comments
- Add model mapping examples in correct format

2026-02-23 12:20:16 -05:00

8.1 KiB

Raw Permalink Blame History

Deployment Guide

This guide covers deploying watsonx-openai-proxy in production environments.

System Requirements

Fedora 43 (or similar RPM-based distributions)

Essential Packages

sudo dnf install -y \
    python3.12 \
    python3-pip \
    git

Optional Build Tools (for compiling Python packages)

sudo dnf install -y \
    python3-devel \
    gcc \
    gcc-c++ \
    make \
    libffi-devel \
    openssl-devel \
    zlib-devel

Note: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed.

Installation

1. Clone Repository

cd /home/app
git clone <repository-url> watsonx-openai-proxy
cd watsonx-openai-proxy

2. Install Dependencies

# Using system Python
python3 -m pip install --user -r requirements.txt

# Or using virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Configure Environment

Create .env file (copy from .env.example):

cp .env.example .env

CRITICAL: The .env file must NOT contain inline comments. Pydantic cannot parse environment variables with inline comments and will cause the service to fail with validation errors.

Correct format (no inline comments):

IBM_CLOUD_API_KEY=your_api_key_here
WATSONX_PROJECT_ID=your_project_id_here
WATSONX_CLUSTER=us-south
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
TOKEN_REFRESH_INTERVAL=3000

# Model mappings (optional)
MODEL_MAP_GPT4=openai/gpt-oss-120b
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_GPT4_TURBO_PREVIEW=mistral-large-2512

Incorrect format (will cause service startup failure):

# ❌ DO NOT USE INLINE COMMENTS - Service will fail to start
LOG_LEVEL=info  # Options: debug, info, warning, error
TOKEN_REFRESH_INTERVAL=3000  # Refresh token every N seconds

Error you'll see if inline comments are present:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
token_refresh_interval
  Input should be a valid integer, unable to parse string as an integer

To fix existing .env files with inline comments:

# Remove all inline comments
sed -i 's/\s*#.*$//' /home/app/watsonx-openai-proxy/.env

# Or manually edit and remove everything after the value on each line

Systemd Service Setup

1. Create Service Unit

Create /etc/systemd/system/watsonx-proxy.service:

[Unit]
Description=watsonx OpenAI Proxy
After=network.target

[Service]
Type=simple
User=app
Group=app
WorkingDirectory=/home/app/watsonx-openai-proxy
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/home/app/watsonx-openai-proxy/.env
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

2. Enable and Start Service

# Reload systemd
sudo systemctl daemon-reload

# Enable service (start on boot)
sudo systemctl enable watsonx-proxy.service

# Start service
sudo systemctl start watsonx-proxy.service

# Check status
sudo systemctl status watsonx-proxy.service

3. View Logs

# Follow logs
sudo journalctl -u watsonx-proxy.service -f

# View last 50 lines
sudo journalctl -u watsonx-proxy.service -n 50

4. Service Management

# Stop service
sudo systemctl stop watsonx-proxy.service

# Restart service
sudo systemctl restart watsonx-proxy.service

# Disable auto-start
sudo systemctl disable watsonx-proxy.service

LXC Container Deployment

Recommended Resources (5 req/s)

Minimum Configuration

CPU: 1 core (1000 CPU shares)
RAM: 2 GB
Storage: 10 GB
Swap: 1 GB

Recommended Configuration

CPU: 2 cores (2000 CPU shares)
RAM: 4 GB
Storage: 20 GB
Swap: 2 GB

Optimal Configuration

CPU: 4 cores (4000 CPU shares)
RAM: 8 GB
Storage: 50 GB
Swap: 4 GB

Proxmox LXC Configuration

Edit /etc/pve/lxc/<VMID>.conf:

# CPU allocation
cores: 2
cpulimit: 2
cpuunits: 2000

# Memory allocation
memory: 4096
swap: 2048

# Storage
rootfs: local-lvm:vm-<VMID>-disk-0,size=20G

# Network
net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth

Resource Monitoring

# CPU usage
lxc-cgroup -n <container> cpu.stat

# Memory usage
lxc-cgroup -n <container> memory.usage_in_bytes

# Network stats
lxc-attach -n <container> -- ifconfig eth0

Python Version Management

Using update-alternatives

# Set up alternatives
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2

# Switch version
sudo update-alternatives --config python3

# Fix pip if needed
python3 -m ensurepip --default-pip --upgrade

Verify Installation

python3 --version
python3 -m pip --version

Troubleshooting

pip Module Not Found

After switching Python versions:

python3 -m ensurepip --default-pip --upgrade
python3 -m pip --version

Service Fails to Start

Check logs for errors:

sudo journalctl -u watsonx-proxy.service -n 100 --no-pager

Common issues:

Inline comments in .env: Remove all # comments from environment variable values
Missing dependencies: Run pip install -r requirements.txt
Permission errors: Ensure app user owns /home/app/watsonx-openai-proxy
Port already in use: Change PORT in .env or stop conflicting service

Token Refresh Errors

Check IBM Cloud credentials:

# Test token generation
curl -X POST "https://iam.cloud.ibm.com/identity/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY"

High Memory Usage

Reduce number of workers:

# Edit service file
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1

# Restart service
sudo systemctl restart watsonx-proxy.service

Performance Tuning

Worker Configuration

1 worker: ~50 MB RAM, handles ~5 req/s
2 workers: ~100 MB RAM, handles ~10 req/s
4 workers: ~200 MB RAM, handles ~20 req/s

Scaling Strategy

Vertical scaling: Increase workers up to number of CPU cores
Horizontal scaling: Deploy multiple instances behind load balancer
Auto-scaling: Monitor CPU/memory and scale based on thresholds

Security Considerations

API Key Authentication

Enable proxy authentication:

# In .env file
API_KEY=your_secure_random_key_here

CORS Configuration

Restrict origins:

# In .env file
ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com

Firewall Rules

# Allow only specific IPs
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept'
sudo firewall-cmd --reload

Monitoring

Health Check

curl http://localhost:8000/health

Metrics

Monitor these metrics:

CPU usage (should stay <70%)
Memory usage (should stay <80%)
Response times
Error rates
Token refresh success rate

Log Levels

Adjust in .env:

LOG_LEVEL=debug  # For troubleshooting
LOG_LEVEL=info   # For production
LOG_LEVEL=warning  # For minimal logging

Backup and Recovery

Backup Configuration

# Backup .env file
sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d)

# Backup service file
sudo cp /etc/systemd/system/watsonx-proxy.service /backup/

Disaster Recovery

# Restore configuration
sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env
sudo systemctl restart watsonx-proxy.service

Updates

Update Application

cd /home/app/watsonx-openai-proxy
git pull
pip install -r requirements.txt --upgrade
sudo systemctl restart watsonx-proxy.service

Zero-Downtime Updates

Use multiple instances behind a load balancer and update one at a time.

For additional help, see:

README.md - General usage and features
MODELS.md - Available models
AGENTS.md - Development guidelines

8.1 KiB Raw Permalink Blame History