Go to file

Michael d924b7c45f Add vLLM message normalization for OpenAI content format compatibility

- Normalize 'developer' role to 'system' (vLLM doesn't support developer role)
- Flatten array content to string for text-only messages
- Preserve mixed content (text + images) as array
- Add comprehensive unit tests for normalization logic

Fixes HTTP 422 errors when clients send OpenAI multi-content format

2026-02-23 11:59:23 -05:00

app

Add vLLM message normalization for OpenAI content format compatibility

2026-02-23 11:59:23 -05:00

tests

Add vLLM message normalization for OpenAI content format compatibility

2026-02-23 11:59:23 -05:00

AGENTS.md

Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00

DEPLOYMENT.md

Add comprehensive deployment guide with systemd service setup and LXC configuration

2026-02-23 11:14:40 -05:00

docker-compose.yml

Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00

Dockerfile

Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00

example_usage.py

Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00

MODELS.md

Add MODELS.md with comprehensive list of available watsonx.ai models

2026-02-23 10:14:25 -05:00

README.md

Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00

requirements.txt

Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00

README.md

watsonx-openai-proxy

OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.

Features

✅ Full OpenAI API Compatibility: Drop-in replacement for OpenAI API
✅ Chat Completions: /v1/chat/completions with streaming support
✅ Text Completions: /v1/completions (legacy endpoint)
✅ Embeddings: /v1/embeddings for text embeddings
✅ Model Listing: /v1/models endpoint
✅ Streaming Support: Server-Sent Events (SSE) for real-time responses
✅ Model Mapping: Map OpenAI model names to watsonx models
✅ Automatic Token Management: Handles IBM Cloud authentication automatically
✅ CORS Support: Configurable cross-origin resource sharing
✅ Optional API Key Authentication: Secure your proxy with an API key

Quick Start

Prerequisites

Python 3.9 or higher
IBM Cloud account with watsonx.ai access
IBM Cloud API key
watsonx.ai Project ID

Installation

Clone or download this directory:

cd watsonx-openai-proxy

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env
# Edit .env with your credentials

Run the server:

python -m app.main

Or with uvicorn:

uvicorn app.main:app --host 0.0.0.0 --port 8000

The server will start at http://localhost:8000

Configuration

Environment Variables

Create a .env file with the following variables:

# Required: IBM Cloud Configuration
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
WATSONX_PROJECT_ID=your_watsonx_project_id_here
WATSONX_CLUSTER=us-south  # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor

# Optional: Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info

# Optional: API Key for Proxy Authentication
API_KEY=your_optional_api_key_for_proxy_authentication

# Optional: CORS Configuration
ALLOWED_ORIGINS=*  # Comma-separated or * for all

# Optional: Model Mapping
MODEL_MAP_GPT4=ibm/granite-4-h-small
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr

Model Mapping

You can map OpenAI model names to watsonx models using environment variables:

MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>

For example:

MODEL_MAP_GPT4=ibm/granite-4-h-small maps gpt-4 to ibm/granite-4-h-small
MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct maps gpt-3.5-turbo to ibm/granite-3-8b-instruct

Usage

With OpenAI Python SDK

from openai import OpenAI

# Point to your proxy
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-proxy-api-key"  # Optional, if you set API_KEY in .env
)

# Use as normal
response = client.chat.completions.create(
    model="ibm/granite-3-8b-instruct",  # Or use mapped name like "gpt-4"
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

With Streaming

stream = client.chat.completions.create(
    model="ibm/granite-3-8b-instruct",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

With cURL

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-proxy-api-key" \
  -d '{
    "model": "ibm/granite-3-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Embeddings

response = client.embeddings.create(
    model="ibm/slate-125m-english-rtrvr",
    input="Your text to embed"
)

print(response.data[0].embedding)

Available Endpoints

GET / - API information
GET /health - Health check
GET /docs - Interactive API documentation (Swagger UI)
POST /v1/chat/completions - Chat completions
POST /v1/completions - Text completions (legacy)
POST /v1/embeddings - Generate embeddings
GET /v1/models - List available models
GET /v1/models/{model_id} - Get model information

Supported Models

The proxy supports all watsonx.ai models available in your project, including:

Chat Models

IBM Granite models (3.x, 4.x series)
Meta Llama models (3.x, 4.x series)
Mistral models
Other models available on watsonx.ai

Embedding Models

ibm/slate-125m-english-rtrvr
ibm/slate-30m-english-rtrvr

See /v1/models endpoint for the complete list.

Authentication

Proxy Authentication (Optional)

If you set API_KEY in your .env file, clients must provide it:

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-proxy-api-key"
)

IBM Cloud Authentication

The proxy handles IBM Cloud authentication automatically using your IBM_CLOUD_API_KEY. Bearer tokens are:

Automatically obtained on startup
Refreshed every 50 minutes (tokens expire after 60 minutes)
Refreshed on 401 errors

Deployment

Docker (Recommended)

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app ./app
COPY .env .

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run:

docker build -t watsonx-openai-proxy .
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy

Production Deployment

For production, consider:

Use a production ASGI server: The included uvicorn is suitable, but configure workers:
```
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
```
Set up HTTPS: Use a reverse proxy like nginx or Caddy
Configure CORS: Set ALLOWED_ORIGINS to specific domains
Enable API key authentication: Set API_KEY in environment
Monitor logs: Set LOG_LEVEL=info or warning in production
Use environment secrets: Don't commit .env file, use secret management

Troubleshooting

401 Unauthorized

Check that IBM_CLOUD_API_KEY is valid
Verify your IBM Cloud account has watsonx.ai access
Check server logs for token refresh errors

Model Not Found

Verify the model ID exists in watsonx.ai
Check that your project has access to the model
Use /v1/models endpoint to see available models

Connection Errors

Verify WATSONX_CLUSTER matches your project's region
Check firewall/network settings
Ensure watsonx.ai services are accessible

Streaming Issues

Some models may not support streaming
Check client library supports SSE (Server-Sent Events)
Verify network doesn't buffer streaming responses

Development

Running Tests

# Install dev dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest tests/

Code Structure

watsonx-openai-proxy/
├── app/
│   ├── main.py              # FastAPI application
│   ├── config.py            # Configuration management
│   ├── routers/             # API endpoint routers
│   │   ├── chat.py          # Chat completions
│   │   ├── completions.py   # Text completions
│   │   ├── embeddings.py    # Embeddings
│   │   └── models.py        # Model listing
│   ├── services/            # Business logic
│   │   └── watsonx_service.py  # watsonx.ai API client
│   ├── models/              # Pydantic models
│   │   └── openai_models.py    # OpenAI-compatible schemas
│   └── utils/               # Utilities
│       └── transformers.py     # Request/response transformers
├── tests/                   # Test files
├── requirements.txt         # Python dependencies
├── .env.example            # Environment template
└── README.md               # This file

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

Apache 2.0 License - See LICENSE file for details.

watsonx-unofficial-aisdk-provider - Vercel AI SDK provider for watsonx.ai
OpenCode watsonx plugin - Token management plugin for OpenCode

Disclaimer

This is not an official IBM product. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.

Support

For issues and questions:

Check the Troubleshooting section
Review server logs (LOG_LEVEL=debug for detailed logs)
Open an issue in the repository
Consult IBM watsonx.ai documentation

README.md

watsonx-openai-proxy

Features

Quick Start

Prerequisites

Installation

Configuration

Environment Variables

Model Mapping

Usage

With OpenAI Python SDK

With Streaming

With cURL

Embeddings

Available Endpoints

Supported Models

Chat Models

Embedding Models

Authentication

Proxy Authentication (Optional)

IBM Cloud Authentication

Deployment

Docker (Recommended)

Production Deployment

Troubleshooting

401 Unauthorized

Model Not Found

Connection Errors

Streaming Issues

Development

Running Tests

Code Structure

Contributing

License

Related Projects

Disclaimer

Support