# watsonx-openai-proxy OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format. ## Features - ✅ **Full OpenAI API Compatibility**: Drop-in replacement for OpenAI API - ✅ **Chat Completions**: `/v1/chat/completions` with streaming support - ✅ **Text Completions**: `/v1/completions` (legacy endpoint) - ✅ **Embeddings**: `/v1/embeddings` for text embeddings - ✅ **Model Listing**: `/v1/models` endpoint - ✅ **Streaming Support**: Server-Sent Events (SSE) for real-time responses - ✅ **Model Mapping**: Map OpenAI model names to watsonx models - ✅ **Automatic Token Management**: Handles IBM Cloud authentication automatically - ✅ **CORS Support**: Configurable cross-origin resource sharing - ✅ **Optional API Key Authentication**: Secure your proxy with an API key ## Quick Start ### Prerequisites - Python 3.9 or higher - IBM Cloud account with watsonx.ai access - IBM Cloud API key - watsonx.ai Project ID ### Installation 1. Clone or download this directory: ```bash cd watsonx-openai-proxy ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Configure environment variables: ```bash cp .env.example .env # Edit .env with your credentials ``` 4. Run the server: ```bash python -m app.main ``` Or with uvicorn: ```bash uvicorn app.main:app --host 0.0.0.0 --port 8000 ``` The server will start at `http://localhost:8000` ## Configuration ### Environment Variables Create a `.env` file with the following variables: ```bash # Required: IBM Cloud Configuration IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here WATSONX_PROJECT_ID=your_watsonx_project_id_here WATSONX_CLUSTER=us-south # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor # Optional: Server Configuration HOST=0.0.0.0 PORT=8000 LOG_LEVEL=info # Optional: API Key for Proxy Authentication API_KEY=your_optional_api_key_for_proxy_authentication # Optional: CORS Configuration ALLOWED_ORIGINS=* # Comma-separated or * for all # Optional: Model Mapping MODEL_MAP_GPT4=ibm/granite-4-h-small MODEL_MAP_GPT35=ibm/granite-3-8b-instruct MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr ``` ### Model Mapping You can map OpenAI model names to watsonx models using environment variables: ```bash MODEL_MAP_= ``` For example: - `MODEL_MAP_GPT4=ibm/granite-4-h-small` maps `gpt-4` to `ibm/granite-4-h-small` - `MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct` maps `gpt-3.5-turbo` to `ibm/granite-3-8b-instruct` ## Usage ### With OpenAI Python SDK ```python from openai import OpenAI # Point to your proxy client = OpenAI( base_url="http://localhost:8000/v1", api_key="your-proxy-api-key" # Optional, if you set API_KEY in .env ) # Use as normal response = client.chat.completions.create( model="ibm/granite-3-8b-instruct", # Or use mapped name like "gpt-4" messages=[ {"role": "user", "content": "Hello, how are you?"} ] ) print(response.choices[0].message.content) ``` ### With Streaming ```python stream = client.chat.completions.create( model="ibm/granite-3-8b-instruct", messages=[{"role": "user", "content": "Tell me a story"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ### With cURL ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-proxy-api-key" \ -d '{ "model": "ibm/granite-3-8b-instruct", "messages": [ {"role": "user", "content": "Hello!"} ] }' ``` ### Embeddings ```python response = client.embeddings.create( model="ibm/slate-125m-english-rtrvr", input="Your text to embed" ) print(response.data[0].embedding) ``` ## Available Endpoints - `GET /` - API information - `GET /health` - Health check - `GET /docs` - Interactive API documentation (Swagger UI) - `POST /v1/chat/completions` - Chat completions - `POST /v1/completions` - Text completions (legacy) - `POST /v1/embeddings` - Generate embeddings - `GET /v1/models` - List available models - `GET /v1/models/{model_id}` - Get model information ## Supported Models The proxy supports all watsonx.ai models available in your project, including: ### Chat Models - IBM Granite models (3.x, 4.x series) - Meta Llama models (3.x, 4.x series) - Mistral models - Other models available on watsonx.ai ### Embedding Models - `ibm/slate-125m-english-rtrvr` - `ibm/slate-30m-english-rtrvr` See `/v1/models` endpoint for the complete list. ## Authentication ### Proxy Authentication (Optional) If you set `API_KEY` in your `.env` file, clients must provide it: ```python client = OpenAI( base_url="http://localhost:8000/v1", api_key="your-proxy-api-key" ) ``` ### IBM Cloud Authentication The proxy handles IBM Cloud authentication automatically using your `IBM_CLOUD_API_KEY`. Bearer tokens are: - Automatically obtained on startup - Refreshed every 50 minutes (tokens expire after 60 minutes) - Refreshed on 401 errors ## Deployment ### Docker (Recommended) Create a `Dockerfile`: ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app ./app COPY .env . EXPOSE 8000 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` Build and run: ```bash docker build -t watsonx-openai-proxy . docker run -p 8000:8000 --env-file .env watsonx-openai-proxy ``` ### Production Deployment For production, consider: 1. **Use a production ASGI server**: The included uvicorn is suitable, but configure workers: ```bash uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4 ``` 2. **Set up HTTPS**: Use a reverse proxy like nginx or Caddy 3. **Configure CORS**: Set `ALLOWED_ORIGINS` to specific domains 4. **Enable API key authentication**: Set `API_KEY` in environment 5. **Monitor logs**: Set `LOG_LEVEL=info` or `warning` in production 6. **Use environment secrets**: Don't commit `.env` file, use secret management ## Troubleshooting ### 401 Unauthorized - Check that `IBM_CLOUD_API_KEY` is valid - Verify your IBM Cloud account has watsonx.ai access - Check server logs for token refresh errors ### Model Not Found - Verify the model ID exists in watsonx.ai - Check that your project has access to the model - Use `/v1/models` endpoint to see available models ### Connection Errors - Verify `WATSONX_CLUSTER` matches your project's region - Check firewall/network settings - Ensure watsonx.ai services are accessible ### Streaming Issues - Some models may not support streaming - Check client library supports SSE (Server-Sent Events) - Verify network doesn't buffer streaming responses ## Development ### Running Tests ```bash # Install dev dependencies pip install pytest pytest-asyncio httpx # Run tests pytest tests/ ``` ### Code Structure ``` watsonx-openai-proxy/ ├── app/ │ ├── main.py # FastAPI application │ ├── config.py # Configuration management │ ├── routers/ # API endpoint routers │ │ ├── chat.py # Chat completions │ │ ├── completions.py # Text completions │ │ ├── embeddings.py # Embeddings │ │ └── models.py # Model listing │ ├── services/ # Business logic │ │ └── watsonx_service.py # watsonx.ai API client │ ├── models/ # Pydantic models │ │ └── openai_models.py # OpenAI-compatible schemas │ └── utils/ # Utilities │ └── transformers.py # Request/response transformers ├── tests/ # Test files ├── requirements.txt # Python dependencies ├── .env.example # Environment template └── README.md # This file ``` ## Contributing Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests if applicable 5. Submit a pull request ## License Apache 2.0 License - See LICENSE file for details. ## Related Projects - [watsonx-unofficial-aisdk-provider](../wxai-provider/) - Vercel AI SDK provider for watsonx.ai - [OpenCode watsonx plugin](../.opencode/plugins/) - Token management plugin for OpenCode ## Disclaimer This is **not an official IBM product**. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM. ## Support For issues and questions: - Check the [Troubleshooting](#troubleshooting) section - Review server logs (`LOG_LEVEL=debug` for detailed logs) - Open an issue in the repository - Consult [IBM watsonx.ai documentation](https://www.ibm.com/docs/en/watsonx-as-a-service)