# Deployment Guide This guide covers deploying watsonx-openai-proxy in production environments. ## System Requirements ### Fedora 43 (or similar RPM-based distributions) #### Essential Packages ```bash sudo dnf install -y \ python3.12 \ python3-pip \ git ``` #### Optional Build Tools (for compiling Python packages) ```bash sudo dnf install -y \ python3-devel \ gcc \ gcc-c++ \ make \ libffi-devel \ openssl-devel \ zlib-devel ``` **Note**: Most Python packages have pre-built wheels for x86_64 Linux, so build tools are rarely needed. ## Installation ### 1. Clone Repository ```bash cd /home/app git clone watsonx-openai-proxy cd watsonx-openai-proxy ``` ### 2. Install Dependencies ```bash # Using system Python python3 -m pip install --user -r requirements.txt # Or using virtual environment (recommended) python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ``` ### 3. Configure Environment Create `.env` file (copy from `.env.example`): ```bash cp .env.example .env ``` **IMPORTANT**: Remove all inline comments from `.env` file. Pydantic cannot parse values with inline comments. **Correct format:** ```bash IBM_CLOUD_API_KEY=your_api_key_here WATSONX_PROJECT_ID=your_project_id_here WATSONX_CLUSTER=us-south HOST=0.0.0.0 PORT=8000 LOG_LEVEL=info TOKEN_REFRESH_INTERVAL=3000 ``` **Incorrect format (will cause errors):** ```bash LOG_LEVEL=info # Options: debug, info, warning, error TOKEN_REFRESH_INTERVAL=3000 # Refresh token every N seconds ``` ## Systemd Service Setup ### 1. Create Service Unit Create `/etc/systemd/system/watsonx-proxy.service`: ```ini [Unit] Description=watsonx OpenAI Proxy After=network.target [Service] Type=simple User=app Group=app WorkingDirectory=/home/app/watsonx-openai-proxy Environment="PATH=/usr/local/bin:/usr/bin:/bin" EnvironmentFile=/home/app/watsonx-openai-proxy/.env ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` ### 2. Enable and Start Service ```bash # Reload systemd sudo systemctl daemon-reload # Enable service (start on boot) sudo systemctl enable watsonx-proxy.service # Start service sudo systemctl start watsonx-proxy.service # Check status sudo systemctl status watsonx-proxy.service ``` ### 3. View Logs ```bash # Follow logs sudo journalctl -u watsonx-proxy.service -f # View last 50 lines sudo journalctl -u watsonx-proxy.service -n 50 ``` ### 4. Service Management ```bash # Stop service sudo systemctl stop watsonx-proxy.service # Restart service sudo systemctl restart watsonx-proxy.service # Disable auto-start sudo systemctl disable watsonx-proxy.service ``` ## LXC Container Deployment ### Recommended Resources (5 req/s) #### Minimum Configuration - **CPU**: 1 core (1000 CPU shares) - **RAM**: 2 GB - **Storage**: 10 GB - **Swap**: 1 GB #### Recommended Configuration - **CPU**: 2 cores (2000 CPU shares) - **RAM**: 4 GB - **Storage**: 20 GB - **Swap**: 2 GB #### Optimal Configuration - **CPU**: 4 cores (4000 CPU shares) - **RAM**: 8 GB - **Storage**: 50 GB - **Swap**: 4 GB ### Proxmox LXC Configuration Edit `/etc/pve/lxc/.conf`: ```ini # CPU allocation cores: 2 cpulimit: 2 cpuunits: 2000 # Memory allocation memory: 4096 swap: 2048 # Storage rootfs: local-lvm:vm--disk-0,size=20G # Network net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth ``` ### Resource Monitoring ```bash # CPU usage lxc-cgroup -n cpu.stat # Memory usage lxc-cgroup -n memory.usage_in_bytes # Network stats lxc-attach -n -- ifconfig eth0 ``` ## Python Version Management ### Using update-alternatives ```bash # Set up alternatives sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.14 2 # Switch version sudo update-alternatives --config python3 # Fix pip if needed python3 -m ensurepip --default-pip --upgrade ``` ### Verify Installation ```bash python3 --version python3 -m pip --version ``` ## Troubleshooting ### pip Module Not Found After switching Python versions: ```bash python3 -m ensurepip --default-pip --upgrade python3 -m pip --version ``` ### Service Fails to Start Check logs for errors: ```bash sudo journalctl -u watsonx-proxy.service -n 100 --no-pager ``` Common issues: 1. **Inline comments in .env**: Remove all `# comments` from environment variable values 2. **Missing dependencies**: Run `pip install -r requirements.txt` 3. **Permission errors**: Ensure `app` user owns `/home/app/watsonx-openai-proxy` 4. **Port already in use**: Change `PORT` in `.env` or stop conflicting service ### Token Refresh Errors Check IBM Cloud credentials: ```bash # Test token generation curl -X POST "https://iam.cloud.ibm.com/identity/token" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=YOUR_API_KEY" ``` ### High Memory Usage Reduce number of workers: ```bash # Edit service file ExecStart=/usr/bin/python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1 # Restart service sudo systemctl restart watsonx-proxy.service ``` ## Performance Tuning ### Worker Configuration - **1 worker**: ~50 MB RAM, handles ~5 req/s - **2 workers**: ~100 MB RAM, handles ~10 req/s - **4 workers**: ~200 MB RAM, handles ~20 req/s ### Scaling Strategy 1. **Vertical scaling**: Increase workers up to number of CPU cores 2. **Horizontal scaling**: Deploy multiple instances behind load balancer 3. **Auto-scaling**: Monitor CPU/memory and scale based on thresholds ## Security Considerations ### API Key Authentication Enable proxy authentication: ```bash # In .env file API_KEY=your_secure_random_key_here ``` ### CORS Configuration Restrict origins: ```bash # In .env file ALLOWED_ORIGINS=https://app1.example.com,https://app2.example.com ``` ### Firewall Rules ```bash # Allow only specific IPs sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port protocol="tcp" port="8000" accept' sudo firewall-cmd --reload ``` ## Monitoring ### Health Check ```bash curl http://localhost:8000/health ``` ### Metrics Monitor these metrics: - CPU usage (should stay <70%) - Memory usage (should stay <80%) - Response times - Error rates - Token refresh success rate ### Log Levels Adjust in `.env`: ```bash LOG_LEVEL=debug # For troubleshooting LOG_LEVEL=info # For production LOG_LEVEL=warning # For minimal logging ``` ## Backup and Recovery ### Backup Configuration ```bash # Backup .env file sudo cp /home/app/watsonx-openai-proxy/.env /backup/.env.$(date +%Y%m%d) # Backup service file sudo cp /etc/systemd/system/watsonx-proxy.service /backup/ ``` ### Disaster Recovery ```bash # Restore configuration sudo cp /backup/.env.YYYYMMDD /home/app/watsonx-openai-proxy/.env sudo systemctl restart watsonx-proxy.service ``` ## Updates ### Update Application ```bash cd /home/app/watsonx-openai-proxy git pull pip install -r requirements.txt --upgrade sudo systemctl restart watsonx-proxy.service ``` ### Zero-Downtime Updates Use multiple instances behind a load balancer and update one at a time. --- For additional help, see: - [README.md](README.md) - General usage and features - [MODELS.md](MODELS.md) - Available models - [AGENTS.md](AGENTS.md) - Development guidelines