Michael d924b7c45f Add vLLM message normalization for OpenAI content format compatibility
- Normalize 'developer' role to 'system' (vLLM doesn't support developer role)
- Flatten array content to string for text-only messages
- Preserve mixed content (text + images) as array
- Add comprehensive unit tests for normalization logic

Fixes HTTP 422 errors when clients send OpenAI multi-content format
2026-02-23 11:59:23 -05:00

watsonx-openai-proxy

OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.

Features

  • Full OpenAI API Compatibility: Drop-in replacement for OpenAI API
  • Chat Completions: /v1/chat/completions with streaming support
  • Text Completions: /v1/completions (legacy endpoint)
  • Embeddings: /v1/embeddings for text embeddings
  • Model Listing: /v1/models endpoint
  • Streaming Support: Server-Sent Events (SSE) for real-time responses
  • Model Mapping: Map OpenAI model names to watsonx models
  • Automatic Token Management: Handles IBM Cloud authentication automatically
  • CORS Support: Configurable cross-origin resource sharing
  • Optional API Key Authentication: Secure your proxy with an API key

Quick Start

Prerequisites

  • Python 3.9 or higher
  • IBM Cloud account with watsonx.ai access
  • IBM Cloud API key
  • watsonx.ai Project ID

Installation

  1. Clone or download this directory:
cd watsonx-openai-proxy
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your credentials
  1. Run the server:
python -m app.main

Or with uvicorn:

uvicorn app.main:app --host 0.0.0.0 --port 8000

The server will start at http://localhost:8000

Configuration

Environment Variables

Create a .env file with the following variables:

# Required: IBM Cloud Configuration
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
WATSONX_PROJECT_ID=your_watsonx_project_id_here
WATSONX_CLUSTER=us-south  # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor

# Optional: Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info

# Optional: API Key for Proxy Authentication
API_KEY=your_optional_api_key_for_proxy_authentication

# Optional: CORS Configuration
ALLOWED_ORIGINS=*  # Comma-separated or * for all

# Optional: Model Mapping
MODEL_MAP_GPT4=ibm/granite-4-h-small
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr

Model Mapping

You can map OpenAI model names to watsonx models using environment variables:

MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>

For example:

  • MODEL_MAP_GPT4=ibm/granite-4-h-small maps gpt-4 to ibm/granite-4-h-small
  • MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct maps gpt-3.5-turbo to ibm/granite-3-8b-instruct

Usage

With OpenAI Python SDK

from openai import OpenAI

# Point to your proxy
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-proxy-api-key"  # Optional, if you set API_KEY in .env
)

# Use as normal
response = client.chat.completions.create(
    model="ibm/granite-3-8b-instruct",  # Or use mapped name like "gpt-4"
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

With Streaming

stream = client.chat.completions.create(
    model="ibm/granite-3-8b-instruct",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

With cURL

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-proxy-api-key" \
  -d '{
    "model": "ibm/granite-3-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Embeddings

response = client.embeddings.create(
    model="ibm/slate-125m-english-rtrvr",
    input="Your text to embed"
)

print(response.data[0].embedding)

Available Endpoints

  • GET / - API information
  • GET /health - Health check
  • GET /docs - Interactive API documentation (Swagger UI)
  • POST /v1/chat/completions - Chat completions
  • POST /v1/completions - Text completions (legacy)
  • POST /v1/embeddings - Generate embeddings
  • GET /v1/models - List available models
  • GET /v1/models/{model_id} - Get model information

Supported Models

The proxy supports all watsonx.ai models available in your project, including:

Chat Models

  • IBM Granite models (3.x, 4.x series)
  • Meta Llama models (3.x, 4.x series)
  • Mistral models
  • Other models available on watsonx.ai

Embedding Models

  • ibm/slate-125m-english-rtrvr
  • ibm/slate-30m-english-rtrvr

See /v1/models endpoint for the complete list.

Authentication

Proxy Authentication (Optional)

If you set API_KEY in your .env file, clients must provide it:

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-proxy-api-key"
)

IBM Cloud Authentication

The proxy handles IBM Cloud authentication automatically using your IBM_CLOUD_API_KEY. Bearer tokens are:

  • Automatically obtained on startup
  • Refreshed every 50 minutes (tokens expire after 60 minutes)
  • Refreshed on 401 errors

Deployment

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app ./app
COPY .env .

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run:

docker build -t watsonx-openai-proxy .
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy

Production Deployment

For production, consider:

  1. Use a production ASGI server: The included uvicorn is suitable, but configure workers:

    uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
    
  2. Set up HTTPS: Use a reverse proxy like nginx or Caddy

  3. Configure CORS: Set ALLOWED_ORIGINS to specific domains

  4. Enable API key authentication: Set API_KEY in environment

  5. Monitor logs: Set LOG_LEVEL=info or warning in production

  6. Use environment secrets: Don't commit .env file, use secret management

Troubleshooting

401 Unauthorized

  • Check that IBM_CLOUD_API_KEY is valid
  • Verify your IBM Cloud account has watsonx.ai access
  • Check server logs for token refresh errors

Model Not Found

  • Verify the model ID exists in watsonx.ai
  • Check that your project has access to the model
  • Use /v1/models endpoint to see available models

Connection Errors

  • Verify WATSONX_CLUSTER matches your project's region
  • Check firewall/network settings
  • Ensure watsonx.ai services are accessible

Streaming Issues

  • Some models may not support streaming
  • Check client library supports SSE (Server-Sent Events)
  • Verify network doesn't buffer streaming responses

Development

Running Tests

# Install dev dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest tests/

Code Structure

watsonx-openai-proxy/
├── app/
│   ├── main.py              # FastAPI application
│   ├── config.py            # Configuration management
│   ├── routers/             # API endpoint routers
│   │   ├── chat.py          # Chat completions
│   │   ├── completions.py   # Text completions
│   │   ├── embeddings.py    # Embeddings
│   │   └── models.py        # Model listing
│   ├── services/            # Business logic
│   │   └── watsonx_service.py  # watsonx.ai API client
│   ├── models/              # Pydantic models
│   │   └── openai_models.py    # OpenAI-compatible schemas
│   └── utils/               # Utilities
│       └── transformers.py     # Request/response transformers
├── tests/                   # Test files
├── requirements.txt         # Python dependencies
├── .env.example            # Environment template
└── README.md               # This file

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

Apache 2.0 License - See LICENSE file for details.

Disclaimer

This is not an official IBM product. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.

Support

For issues and questions:

Description
OpenAI-compatible API proxy for IBM watsonx.ai
Readme 78 KiB
Languages
Python 99.3%
Dockerfile 0.7%