354 lines
8.8 KiB
Markdown
354 lines
8.8 KiB
Markdown
# watsonx-openai-proxy
|
|
|
|
OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.
|
|
|
|
## Features
|
|
|
|
- ✅ **Full OpenAI API Compatibility**: Drop-in replacement for OpenAI API
|
|
- ✅ **Chat Completions**: `/v1/chat/completions` with streaming support
|
|
- ✅ **Text Completions**: `/v1/completions` (legacy endpoint)
|
|
- ✅ **Embeddings**: `/v1/embeddings` for text embeddings
|
|
- ✅ **Model Listing**: `/v1/models` endpoint
|
|
- ✅ **Streaming Support**: Server-Sent Events (SSE) for real-time responses
|
|
- ✅ **Model Mapping**: Map OpenAI model names to watsonx models
|
|
- ✅ **Automatic Token Management**: Handles IBM Cloud authentication automatically
|
|
- ✅ **CORS Support**: Configurable cross-origin resource sharing
|
|
- ✅ **Optional API Key Authentication**: Secure your proxy with an API key
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.9 or higher
|
|
- IBM Cloud account with watsonx.ai access
|
|
- IBM Cloud API key
|
|
- watsonx.ai Project ID
|
|
|
|
### Installation
|
|
|
|
1. Clone or download this directory:
|
|
|
|
```bash
|
|
cd watsonx-openai-proxy
|
|
```
|
|
|
|
2. Install dependencies:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Configure environment variables:
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your credentials
|
|
```
|
|
|
|
4. Run the server:
|
|
|
|
```bash
|
|
python -m app.main
|
|
```
|
|
|
|
Or with uvicorn:
|
|
|
|
```bash
|
|
uvicorn app.main:app --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
The server will start at `http://localhost:8000`
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Create a `.env` file with the following variables:
|
|
|
|
```bash
|
|
# Required: IBM Cloud Configuration
|
|
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
|
|
WATSONX_PROJECT_ID=your_watsonx_project_id_here
|
|
WATSONX_CLUSTER=us-south # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor
|
|
|
|
# Optional: Server Configuration
|
|
HOST=0.0.0.0
|
|
PORT=8000
|
|
LOG_LEVEL=info
|
|
|
|
# Optional: API Key for Proxy Authentication
|
|
API_KEY=your_optional_api_key_for_proxy_authentication
|
|
|
|
# Optional: CORS Configuration
|
|
ALLOWED_ORIGINS=* # Comma-separated or * for all
|
|
|
|
# Optional: Model Mapping
|
|
MODEL_MAP_GPT4=ibm/granite-4-h-small
|
|
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
|
|
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
|
|
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr
|
|
```
|
|
|
|
### Model Mapping
|
|
|
|
You can map OpenAI model names to watsonx models using environment variables:
|
|
|
|
```bash
|
|
MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>
|
|
```
|
|
|
|
For example:
|
|
- `MODEL_MAP_GPT4=ibm/granite-4-h-small` maps `gpt-4` to `ibm/granite-4-h-small`
|
|
- `MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct` maps `gpt-3.5-turbo` to `ibm/granite-3-8b-instruct`
|
|
|
|
## Usage
|
|
|
|
### With OpenAI Python SDK
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
# Point to your proxy
|
|
client = OpenAI(
|
|
base_url="http://localhost:8000/v1",
|
|
api_key="your-proxy-api-key" # Optional, if you set API_KEY in .env
|
|
)
|
|
|
|
# Use as normal
|
|
response = client.chat.completions.create(
|
|
model="ibm/granite-3-8b-instruct", # Or use mapped name like "gpt-4"
|
|
messages=[
|
|
{"role": "user", "content": "Hello, how are you?"}
|
|
]
|
|
)
|
|
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
### With Streaming
|
|
|
|
```python
|
|
stream = client.chat.completions.create(
|
|
model="ibm/granite-3-8b-instruct",
|
|
messages=[{"role": "user", "content": "Tell me a story"}],
|
|
stream=True
|
|
)
|
|
|
|
for chunk in stream:
|
|
if chunk.choices[0].delta.content:
|
|
print(chunk.choices[0].delta.content, end="")
|
|
```
|
|
|
|
### With cURL
|
|
|
|
```bash
|
|
curl http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer your-proxy-api-key" \
|
|
-d '{
|
|
"model": "ibm/granite-3-8b-instruct",
|
|
"messages": [
|
|
{"role": "user", "content": "Hello!"}
|
|
]
|
|
}'
|
|
```
|
|
|
|
### Embeddings
|
|
|
|
```python
|
|
response = client.embeddings.create(
|
|
model="ibm/slate-125m-english-rtrvr",
|
|
input="Your text to embed"
|
|
)
|
|
|
|
print(response.data[0].embedding)
|
|
```
|
|
|
|
## Available Endpoints
|
|
|
|
- `GET /` - API information
|
|
- `GET /health` - Health check
|
|
- `GET /docs` - Interactive API documentation (Swagger UI)
|
|
- `POST /v1/chat/completions` - Chat completions
|
|
- `POST /v1/completions` - Text completions (legacy)
|
|
- `POST /v1/embeddings` - Generate embeddings
|
|
- `GET /v1/models` - List available models
|
|
- `GET /v1/models/{model_id}` - Get model information
|
|
|
|
## Supported Models
|
|
|
|
The proxy supports all watsonx.ai models available in your project, including:
|
|
|
|
### Chat Models
|
|
- IBM Granite models (3.x, 4.x series)
|
|
- Meta Llama models (3.x, 4.x series)
|
|
- Mistral models
|
|
- Other models available on watsonx.ai
|
|
|
|
### Embedding Models
|
|
- `ibm/slate-125m-english-rtrvr`
|
|
- `ibm/slate-30m-english-rtrvr`
|
|
|
|
See `/v1/models` endpoint for the complete list.
|
|
|
|
## Authentication
|
|
|
|
### Proxy Authentication (Optional)
|
|
|
|
If you set `API_KEY` in your `.env` file, clients must provide it:
|
|
|
|
```python
|
|
client = OpenAI(
|
|
base_url="http://localhost:8000/v1",
|
|
api_key="your-proxy-api-key"
|
|
)
|
|
```
|
|
|
|
### IBM Cloud Authentication
|
|
|
|
The proxy handles IBM Cloud authentication automatically using your `IBM_CLOUD_API_KEY`. Bearer tokens are:
|
|
- Automatically obtained on startup
|
|
- Refreshed every 50 minutes (tokens expire after 60 minutes)
|
|
- Refreshed on 401 errors
|
|
|
|
## Deployment
|
|
|
|
### Docker (Recommended)
|
|
|
|
Create a `Dockerfile`:
|
|
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
COPY app ./app
|
|
COPY .env .
|
|
|
|
EXPOSE 8000
|
|
|
|
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
|
```
|
|
|
|
Build and run:
|
|
|
|
```bash
|
|
docker build -t watsonx-openai-proxy .
|
|
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
|
|
```
|
|
|
|
### Production Deployment
|
|
|
|
For production, consider:
|
|
|
|
1. **Use a production ASGI server**: The included uvicorn is suitable, but configure workers:
|
|
```bash
|
|
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
|
|
```
|
|
|
|
2. **Set up HTTPS**: Use a reverse proxy like nginx or Caddy
|
|
|
|
3. **Configure CORS**: Set `ALLOWED_ORIGINS` to specific domains
|
|
|
|
4. **Enable API key authentication**: Set `API_KEY` in environment
|
|
|
|
5. **Monitor logs**: Set `LOG_LEVEL=info` or `warning` in production
|
|
|
|
6. **Use environment secrets**: Don't commit `.env` file, use secret management
|
|
|
|
## Troubleshooting
|
|
|
|
### 401 Unauthorized
|
|
|
|
- Check that `IBM_CLOUD_API_KEY` is valid
|
|
- Verify your IBM Cloud account has watsonx.ai access
|
|
- Check server logs for token refresh errors
|
|
|
|
### Model Not Found
|
|
|
|
- Verify the model ID exists in watsonx.ai
|
|
- Check that your project has access to the model
|
|
- Use `/v1/models` endpoint to see available models
|
|
|
|
### Connection Errors
|
|
|
|
- Verify `WATSONX_CLUSTER` matches your project's region
|
|
- Check firewall/network settings
|
|
- Ensure watsonx.ai services are accessible
|
|
|
|
### Streaming Issues
|
|
|
|
- Some models may not support streaming
|
|
- Check client library supports SSE (Server-Sent Events)
|
|
- Verify network doesn't buffer streaming responses
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Install dev dependencies
|
|
pip install pytest pytest-asyncio httpx
|
|
|
|
# Run tests
|
|
pytest tests/
|
|
```
|
|
|
|
### Code Structure
|
|
|
|
```
|
|
watsonx-openai-proxy/
|
|
├── app/
|
|
│ ├── main.py # FastAPI application
|
|
│ ├── config.py # Configuration management
|
|
│ ├── routers/ # API endpoint routers
|
|
│ │ ├── chat.py # Chat completions
|
|
│ │ ├── completions.py # Text completions
|
|
│ │ ├── embeddings.py # Embeddings
|
|
│ │ └── models.py # Model listing
|
|
│ ├── services/ # Business logic
|
|
│ │ └── watsonx_service.py # watsonx.ai API client
|
|
│ ├── models/ # Pydantic models
|
|
│ │ └── openai_models.py # OpenAI-compatible schemas
|
|
│ └── utils/ # Utilities
|
|
│ └── transformers.py # Request/response transformers
|
|
├── tests/ # Test files
|
|
├── requirements.txt # Python dependencies
|
|
├── .env.example # Environment template
|
|
└── README.md # This file
|
|
```
|
|
|
|
## Contributing
|
|
|
|
Contributions are welcome! Please:
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Make your changes
|
|
4. Add tests if applicable
|
|
5. Submit a pull request
|
|
|
|
## License
|
|
|
|
Apache 2.0 License - See LICENSE file for details.
|
|
|
|
## Related Projects
|
|
|
|
- [watsonx-unofficial-aisdk-provider](../wxai-provider/) - Vercel AI SDK provider for watsonx.ai
|
|
- [OpenCode watsonx plugin](../.opencode/plugins/) - Token management plugin for OpenCode
|
|
|
|
## Disclaimer
|
|
|
|
This is **not an official IBM product**. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.
|
|
|
|
## Support
|
|
|
|
For issues and questions:
|
|
- Check the [Troubleshooting](#troubleshooting) section
|
|
- Review server logs (`LOG_LEVEL=debug` for detailed logs)
|
|
- Open an issue in the repository
|
|
- Consult [IBM watsonx.ai documentation](https://www.ibm.com/docs/en/watsonx-as-a-service)
|