watsonx-openai-proxy/README.md

# watsonx-openai-proxy

OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.

## Features

- ✅ **Full OpenAI API Compatibility**: Drop-in replacement for OpenAI API
- ✅ **Chat Completions**: `/v1/chat/completions` with streaming support
- ✅ **Text Completions**: `/v1/completions` (legacy endpoint)
- ✅ **Embeddings**: `/v1/embeddings` for text embeddings
- ✅ **Model Listing**: `/v1/models` endpoint
- ✅ **Streaming Support**: Server-Sent Events (SSE) for real-time responses
- ✅ **Model Mapping**: Map OpenAI model names to watsonx models
- ✅ **Automatic Token Management**: Handles IBM Cloud authentication automatically
- ✅ **CORS Support**: Configurable cross-origin resource sharing
- ✅ **Optional API Key Authentication**: Secure your proxy with an API key

## Quick Start

### Prerequisites

- Python 3.9 or higher
- IBM Cloud account with watsonx.ai access
- IBM Cloud API key
- watsonx.ai Project ID

### Installation

1. Clone or download this directory:

```bash
cd watsonx-openai-proxy
```

2. Install dependencies:

```bash
pip install -r requirements.txt
```

3. Configure environment variables:

```bash
cp .env.example .env
# Edit .env with your credentials
```

4. Run the server:

```bash
python -m app.main
```

Or with uvicorn:

```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000
```

The server will start at `http://localhost:8000`

## Configuration

### Environment Variables

Create a `.env` file with the following variables:

```bash
# Required: IBM Cloud Configuration
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
WATSONX_PROJECT_ID=your_watsonx_project_id_here
WATSONX_CLUSTER=us-south  # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor

# Optional: Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info

# Optional: API Key for Proxy Authentication
API_KEY=your_optional_api_key_for_proxy_authentication

# Optional: CORS Configuration
ALLOWED_ORIGINS=*  # Comma-separated or * for all

# Optional: Model Mapping
MODEL_MAP_GPT4=ibm/granite-4-h-small
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr
```

### Model Mapping

You can map OpenAI model names to watsonx models using environment variables:

```bash
MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>
```

For example:
- `MODEL_MAP_GPT4=ibm/granite-4-h-small` maps `gpt-4` to `ibm/granite-4-h-small`
- `MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct` maps `gpt-3.5-turbo` to `ibm/granite-3-8b-instruct`

## Usage

### With OpenAI Python SDK

```python
from openai import OpenAI

# Point to your proxy
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-proxy-api-key"  # Optional, if you set API_KEY in .env
)

# Use as normal
response = client.chat.completions.create(
    model="ibm/granite-3-8b-instruct",  # Or use mapped name like "gpt-4"
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)
```

### With Streaming

```python
stream = client.chat.completions.create(
    model="ibm/granite-3-8b-instruct",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

### With cURL

```bash
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-proxy-api-key" \
  -d '{
    "model": "ibm/granite-3-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'
```

### Embeddings

```python
response = client.embeddings.create(
    model="ibm/slate-125m-english-rtrvr",
    input="Your text to embed"
)

print(response.data[0].embedding)
```

## Available Endpoints

- `GET /` - API information
- `GET /health` - Health check
- `GET /docs` - Interactive API documentation (Swagger UI)
- `POST /v1/chat/completions` - Chat completions
- `POST /v1/completions` - Text completions (legacy)
- `POST /v1/embeddings` - Generate embeddings
- `GET /v1/models` - List available models
- `GET /v1/models/{model_id}` - Get model information

## Supported Models

The proxy supports all watsonx.ai models available in your project, including:

### Chat Models
- IBM Granite models (3.x, 4.x series)
- Meta Llama models (3.x, 4.x series)
- Mistral models
- Other models available on watsonx.ai

### Embedding Models
- `ibm/slate-125m-english-rtrvr`
- `ibm/slate-30m-english-rtrvr`

See `/v1/models` endpoint for the complete list.

## Authentication

### Proxy Authentication (Optional)

If you set `API_KEY` in your `.env` file, clients must provide it:

```python
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-proxy-api-key"
)
```

### IBM Cloud Authentication

The proxy handles IBM Cloud authentication automatically using your `IBM_CLOUD_API_KEY`. Bearer tokens are:
- Automatically obtained on startup
- Refreshed every 50 minutes (tokens expire after 60 minutes)
- Refreshed on 401 errors

## Deployment

### Docker (Recommended)

Create a `Dockerfile`:

```dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app ./app
COPY .env .

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

Build and run:

```bash
docker build -t watsonx-openai-proxy .
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
```

### Production Deployment

For production, consider:

1. **Use a production ASGI server**: The included uvicorn is suitable, but configure workers:
   ```bash
   uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
   ```

2. **Set up HTTPS**: Use a reverse proxy like nginx or Caddy

3. **Configure CORS**: Set `ALLOWED_ORIGINS` to specific domains

4. **Enable API key authentication**: Set `API_KEY` in environment

5. **Monitor logs**: Set `LOG_LEVEL=info` or `warning` in production

6. **Use environment secrets**: Don't commit `.env` file, use secret management

## Troubleshooting

### 401 Unauthorized

- Check that `IBM_CLOUD_API_KEY` is valid
- Verify your IBM Cloud account has watsonx.ai access
- Check server logs for token refresh errors

### Model Not Found

- Verify the model ID exists in watsonx.ai
- Check that your project has access to the model
- Use `/v1/models` endpoint to see available models

### Connection Errors

- Verify `WATSONX_CLUSTER` matches your project's region
- Check firewall/network settings
- Ensure watsonx.ai services are accessible

### Streaming Issues

- Some models may not support streaming
- Check client library supports SSE (Server-Sent Events)
- Verify network doesn't buffer streaming responses

## Development

### Running Tests

```bash
# Install dev dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest tests/
```

### Code Structure

```
watsonx-openai-proxy/
├── app/
│   ├── main.py              # FastAPI application
│   ├── config.py            # Configuration management
│   ├── routers/             # API endpoint routers
│   │   ├── chat.py          # Chat completions
│   │   ├── completions.py   # Text completions
│   │   ├── embeddings.py    # Embeddings
│   │   └── models.py        # Model listing
│   ├── services/            # Business logic
│   │   └── watsonx_service.py  # watsonx.ai API client
│   ├── models/              # Pydantic models
│   │   └── openai_models.py    # OpenAI-compatible schemas
│   └── utils/               # Utilities
│       └── transformers.py     # Request/response transformers
├── tests/                   # Test files
├── requirements.txt         # Python dependencies
├── .env.example            # Environment template
└── README.md               # This file
```

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## License

Apache 2.0 License - See LICENSE file for details.

## Related Projects

- [watsonx-unofficial-aisdk-provider](../wxai-provider/) - Vercel AI SDK provider for watsonx.ai
- [OpenCode watsonx plugin](../.opencode/plugins/) - Token management plugin for OpenCode

## Disclaimer

This is **not an official IBM product**. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.

## Support

For issues and questions:
- Check the [Troubleshooting](#troubleshooting) section
- Review server logs (`LOG_LEVEL=debug` for detailed logs)
- Open an issue in the repository
- Consult [IBM watsonx.ai documentation](https://www.ibm.com/docs/en/watsonx-as-a-service)