norbert/watsonx-openai-proxy

Fork 0

Files

Michael debfb466ad Add comprehensive deployment guide with systemd service setup and LXC configuration

2026-02-23 11:14:40 -05:00

7.3 KiB

Raw Blame History

Deployment Guide

This guide covers deploying watsonx-openai-proxy in production environments.

System Requirements

Fedora 43 (or similar RPM-based distributions)

Essential Packages

sudo dnf install -y \
    python3.12 \
    python3-pip \
    git

Optional Build Tools (for compiling Python packages)

sudo dnf install -y \
    python3-devel \
    gcc \
    gcc-c++ \
    make \
    libffi-devel \
    openssl-devel \
    zlib-devel

Note: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed.

Installation

1. Clone Repository

cd /home/app
git clone <repository-url> watsonx-openai-proxy
cd watsonx-openai-proxy

2. Install Dependencies

# Using system Python
python3 -m pip install --user -r requirements.txt

# Or using virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Configure Environment

Create .env file (copy from .env.example):

cp .env.example .env

IMPORTANT: Remove all inline comments from .env file. Pydantic cannot parse values with inline comments.

Correct format:

IBM_CLOUD_API_KEY=your_api_key_here
WATSONX_PROJECT_ID=your_project_id_here
WATSONX_CLUSTER=us-south
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
TOKEN_REFRESH_INTERVAL=3000

Incorrect format (will cause errors):

LOG_LEVEL=info  # Options: debug, info, warning, error
TOKEN_REFRESH_INTERVAL=3000  # Refresh token every N seconds

Systemd Service Setup

1. Create Service Unit

Create /etc/systemd/system/watsonx-proxy.service:

[Unit]
Description=watsonx OpenAI Proxy
After=network.target

[Service]
Type=simple
User=app
Group=app
WorkingDirectory=/home/app/watsonx-openai-proxy
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/home/app/watsonx-openai-proxy/.env
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

2. Enable and Start Service

# Reload systemd
sudo systemctl daemon-reload

# Enable service (start on boot)
sudo systemctl enable watsonx-proxy.service

# Start service
sudo systemctl start watsonx-proxy.service

# Check status
sudo systemctl status watsonx-proxy.service

3. View Logs

# Follow logs
sudo journalctl -u watsonx-proxy.service -f

# View last 50 lines
sudo journalctl -u watsonx-proxy.service -n 50

4. Service Management

# Stop service
sudo systemctl stop watsonx-proxy.service

# Restart service
sudo systemctl restart watsonx-proxy.service

# Disable auto-start
sudo systemctl disable watsonx-proxy.service

LXC Container Deployment

Recommended Resources (5 req/s)

Minimum Configuration

CPU: 1 core (1000 CPU shares)
RAM: 2 GB
Storage: 10 GB
Swap: 1 GB

Recommended Configuration

CPU: 2 cores (2000 CPU shares)
RAM: 4 GB
Storage: 20 GB
Swap: 2 GB

Optimal Configuration

CPU: 4 cores (4000 CPU shares)
RAM: 8 GB
Storage: 50 GB
Swap: 4 GB

Proxmox LXC Configuration

Edit /etc/pve/lxc/<VMID>.conf:

# CPU allocation
cores: 2
cpulimit: 2
cpuunits: 2000

# Memory allocation
memory: 4096
swap: 2048

# Storage
rootfs: local-lvm:vm-<VMID>-disk-0,size=20G

# Network
net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth

Resource Monitoring

# CPU usage
lxc-cgroup -n <container> cpu.stat

# Memory usage
lxc-cgroup -n <container> memory.usage_in_bytes

# Network stats
lxc-attach -n <container> -- ifconfig eth0

Python Version Management

Using update-alternatives

# Set up alternatives
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2

# Switch version
sudo update-alternatives --config python3

# Fix pip if needed
python3 -m ensurepip --default-pip --upgrade

Verify Installation

python3 --version
python3 -m pip --version

Troubleshooting

pip Module Not Found

After switching Python versions:

python3 -m ensurepip --default-pip --upgrade
python3 -m pip --version

Service Fails to Start

Check logs for errors:

sudo journalctl -u watsonx-proxy.service -n 100 --no-pager

Common issues:

Inline comments in .env: Remove all # comments from environment variable values
Missing dependencies: Run pip install -r requirements.txt
Permission errors: Ensure app user owns /home/app/watsonx-openai-proxy
Port already in use: Change PORT in .env or stop conflicting service

Token Refresh Errors

Check IBM Cloud credentials:

# Test token generation
curl -X POST "https://iam.cloud.ibm.com/identity/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY"

High Memory Usage

Reduce number of workers:

# Edit service file
ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1

# Restart service
sudo systemctl restart watsonx-proxy.service

Performance Tuning

Worker Configuration

1 worker: ~50 MB RAM, handles ~5 req/s
2 workers: ~100 MB RAM, handles ~10 req/s
4 workers: ~200 MB RAM, handles ~20 req/s

Scaling Strategy

Vertical scaling: Increase workers up to number of CPU cores
Horizontal scaling: Deploy multiple instances behind load balancer
Auto-scaling: Monitor CPU/memory and scale based on thresholds

Security Considerations

API Key Authentication

Enable proxy authentication:

# In .env file
API_KEY=your_secure_random_key_here

CORS Configuration

Restrict origins:

# In .env file
ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com

Firewall Rules

# Allow only specific IPs
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept'
sudo firewall-cmd --reload

Monitoring

Health Check

curl http://localhost:8000/health

Metrics

Monitor these metrics:

CPU usage (should stay <70%)
Memory usage (should stay <80%)
Response times
Error rates
Token refresh success rate

Log Levels

Adjust in .env:

LOG_LEVEL=debug  # For troubleshooting
LOG_LEVEL=info   # For production
LOG_LEVEL=warning  # For minimal logging

Backup and Recovery

Backup Configuration

# Backup .env file
sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d)

# Backup service file
sudo cp /etc/systemd/system/watsonx-proxy.service /backup/

Disaster Recovery

# Restore configuration
sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env
sudo systemctl restart watsonx-proxy.service

Updates

Update Application

cd /home/app/watsonx-openai-proxy
git pull
pip install -r requirements.txt --upgrade
sudo systemctl restart watsonx-proxy.service

Zero-Downtime Updates

Use multiple instances behind a load balancer and update one at a time.

For additional help, see:

README.md - General usage and features
MODELS.md - Available models
AGENTS.md - Development guidelines

7.3 KiB Raw Blame History