- Emphasize that inline comments cause service startup failures - Show correct .env format without inline comments - Add example error message when inline comments are present - Include sed command to fix existing .env files with inline comments - Add model mapping examples in correct format
watsonx-openai-proxy
OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.
Features
- ✅ Full OpenAI API Compatibility: Drop-in replacement for OpenAI API
- ✅ Chat Completions:
/v1/chat/completionswith streaming support - ✅ Text Completions:
/v1/completions(legacy endpoint) - ✅ Embeddings:
/v1/embeddingsfor text embeddings - ✅ Model Listing:
/v1/modelsendpoint - ✅ Streaming Support: Server-Sent Events (SSE) for real-time responses
- ✅ Model Mapping: Map OpenAI model names to watsonx models
- ✅ Automatic Token Management: Handles IBM Cloud authentication automatically
- ✅ CORS Support: Configurable cross-origin resource sharing
- ✅ Optional API Key Authentication: Secure your proxy with an API key
Quick Start
Prerequisites
- Python 3.9 or higher
- IBM Cloud account with watsonx.ai access
- IBM Cloud API key
- watsonx.ai Project ID
Installation
- Clone or download this directory:
cd watsonx-openai-proxy
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables:
cp .env.example .env
# Edit .env with your credentials
- Run the server:
python -m app.main
Or with uvicorn:
uvicorn app.main:app --host 0.0.0.0 --port 8000
The server will start at http://localhost:8000
Configuration
Environment Variables
Create a .env file with the following variables:
# Required: IBM Cloud Configuration
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
WATSONX_PROJECT_ID=your_watsonx_project_id_here
WATSONX_CLUSTER=us-south # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor
# Optional: Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
# Optional: API Key for Proxy Authentication
API_KEY=your_optional_api_key_for_proxy_authentication
# Optional: CORS Configuration
ALLOWED_ORIGINS=* # Comma-separated or * for all
# Optional: Model Mapping
MODEL_MAP_GPT4=ibm/granite-4-h-small
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr
Model Mapping
You can map OpenAI model names to watsonx models using environment variables:
MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>
For example:
MODEL_MAP_GPT4=ibm/granite-4-h-smallmapsgpt-4toibm/granite-4-h-smallMODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instructmapsgpt-3.5-turbotoibm/granite-3-8b-instruct
Usage
With OpenAI Python SDK
from openai import OpenAI
# Point to your proxy
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="your-proxy-api-key" # Optional, if you set API_KEY in .env
)
# Use as normal
response = client.chat.completions.create(
model="ibm/granite-3-8b-instruct", # Or use mapped name like "gpt-4"
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
With Streaming
stream = client.chat.completions.create(
model="ibm/granite-3-8b-instruct",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
With cURL
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-proxy-api-key" \
-d '{
"model": "ibm/granite-3-8b-instruct",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Embeddings
response = client.embeddings.create(
model="ibm/slate-125m-english-rtrvr",
input="Your text to embed"
)
print(response.data[0].embedding)
Available Endpoints
GET /- API informationGET /health- Health checkGET /docs- Interactive API documentation (Swagger UI)POST /v1/chat/completions- Chat completionsPOST /v1/completions- Text completions (legacy)POST /v1/embeddings- Generate embeddingsGET /v1/models- List available modelsGET /v1/models/{model_id}- Get model information
Supported Models
The proxy supports all watsonx.ai models available in your project, including:
Chat Models
- IBM Granite models (3.x, 4.x series)
- Meta Llama models (3.x, 4.x series)
- Mistral models
- Other models available on watsonx.ai
Embedding Models
ibm/slate-125m-english-rtrvribm/slate-30m-english-rtrvr
See /v1/models endpoint for the complete list.
Authentication
Proxy Authentication (Optional)
If you set API_KEY in your .env file, clients must provide it:
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="your-proxy-api-key"
)
IBM Cloud Authentication
The proxy handles IBM Cloud authentication automatically using your IBM_CLOUD_API_KEY. Bearer tokens are:
- Automatically obtained on startup
- Refreshed every 50 minutes (tokens expire after 60 minutes)
- Refreshed on 401 errors
Deployment
Docker (Recommended)
Create a Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app ./app
COPY .env .
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Build and run:
docker build -t watsonx-openai-proxy .
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
Production Deployment
For production, consider:
-
Use a production ASGI server: The included uvicorn is suitable, but configure workers:
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4 -
Set up HTTPS: Use a reverse proxy like nginx or Caddy
-
Configure CORS: Set
ALLOWED_ORIGINSto specific domains -
Enable API key authentication: Set
API_KEYin environment -
Monitor logs: Set
LOG_LEVEL=infoorwarningin production -
Use environment secrets: Don't commit
.envfile, use secret management
Troubleshooting
401 Unauthorized
- Check that
IBM_CLOUD_API_KEYis valid - Verify your IBM Cloud account has watsonx.ai access
- Check server logs for token refresh errors
Model Not Found
- Verify the model ID exists in watsonx.ai
- Check that your project has access to the model
- Use
/v1/modelsendpoint to see available models
Connection Errors
- Verify
WATSONX_CLUSTERmatches your project's region - Check firewall/network settings
- Ensure watsonx.ai services are accessible
Streaming Issues
- Some models may not support streaming
- Check client library supports SSE (Server-Sent Events)
- Verify network doesn't buffer streaming responses
Development
Running Tests
# Install dev dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest tests/
Code Structure
watsonx-openai-proxy/
├── app/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration management
│ ├── routers/ # API endpoint routers
│ │ ├── chat.py # Chat completions
│ │ ├── completions.py # Text completions
│ │ ├── embeddings.py # Embeddings
│ │ └── models.py # Model listing
│ ├── services/ # Business logic
│ │ └── watsonx_service.py # watsonx.ai API client
│ ├── models/ # Pydantic models
│ │ └── openai_models.py # OpenAI-compatible schemas
│ └── utils/ # Utilities
│ └── transformers.py # Request/response transformers
├── tests/ # Test files
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── README.md # This file
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
Apache 2.0 License - See LICENSE file for details.
Related Projects
- watsonx-unofficial-aisdk-provider - Vercel AI SDK provider for watsonx.ai
- OpenCode watsonx plugin - Token management plugin for OpenCode
Disclaimer
This is not an official IBM product. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.
Support
For issues and questions:
- Check the Troubleshooting section
- Review server logs (
LOG_LEVEL=debugfor detailed logs) - Open an issue in the repository
- Consult IBM watsonx.ai documentation