Add AGENTS.md documentation for AI agent guidance
This commit is contained in:
353
README.md
Normal file
353
README.md
Normal file
@@ -0,0 +1,353 @@
|
||||
# watsonx-openai-proxy
|
||||
|
||||
OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.
|
||||
|
||||
## Features
|
||||
|
||||
- ✅ **Full OpenAI API Compatibility**: Drop-in replacement for OpenAI API
|
||||
- ✅ **Chat Completions**: `/v1/chat/completions` with streaming support
|
||||
- ✅ **Text Completions**: `/v1/completions` (legacy endpoint)
|
||||
- ✅ **Embeddings**: `/v1/embeddings` for text embeddings
|
||||
- ✅ **Model Listing**: `/v1/models` endpoint
|
||||
- ✅ **Streaming Support**: Server-Sent Events (SSE) for real-time responses
|
||||
- ✅ **Model Mapping**: Map OpenAI model names to watsonx models
|
||||
- ✅ **Automatic Token Management**: Handles IBM Cloud authentication automatically
|
||||
- ✅ **CORS Support**: Configurable cross-origin resource sharing
|
||||
- ✅ **Optional API Key Authentication**: Secure your proxy with an API key
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.9 or higher
|
||||
- IBM Cloud account with watsonx.ai access
|
||||
- IBM Cloud API key
|
||||
- watsonx.ai Project ID
|
||||
|
||||
### Installation
|
||||
|
||||
1. Clone or download this directory:
|
||||
|
||||
```bash
|
||||
cd watsonx-openai-proxy
|
||||
```
|
||||
|
||||
2. Install dependencies:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. Configure environment variables:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your credentials
|
||||
```
|
||||
|
||||
4. Run the server:
|
||||
|
||||
```bash
|
||||
python -m app.main
|
||||
```
|
||||
|
||||
Or with uvicorn:
|
||||
|
||||
```bash
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
The server will start at `http://localhost:8000`
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file with the following variables:
|
||||
|
||||
```bash
|
||||
# Required: IBM Cloud Configuration
|
||||
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
|
||||
WATSONX_PROJECT_ID=your_watsonx_project_id_here
|
||||
WATSONX_CLUSTER=us-south # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor
|
||||
|
||||
# Optional: Server Configuration
|
||||
HOST=0.0.0.0
|
||||
PORT=8000
|
||||
LOG_LEVEL=info
|
||||
|
||||
# Optional: API Key for Proxy Authentication
|
||||
API_KEY=your_optional_api_key_for_proxy_authentication
|
||||
|
||||
# Optional: CORS Configuration
|
||||
ALLOWED_ORIGINS=* # Comma-separated or * for all
|
||||
|
||||
# Optional: Model Mapping
|
||||
MODEL_MAP_GPT4=ibm/granite-4-h-small
|
||||
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
|
||||
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
|
||||
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr
|
||||
```
|
||||
|
||||
### Model Mapping
|
||||
|
||||
You can map OpenAI model names to watsonx models using environment variables:
|
||||
|
||||
```bash
|
||||
MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>
|
||||
```
|
||||
|
||||
For example:
|
||||
- `MODEL_MAP_GPT4=ibm/granite-4-h-small` maps `gpt-4` to `ibm/granite-4-h-small`
|
||||
- `MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct` maps `gpt-3.5-turbo` to `ibm/granite-3-8b-instruct`
|
||||
|
||||
## Usage
|
||||
|
||||
### With OpenAI Python SDK
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
# Point to your proxy
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8000/v1",
|
||||
api_key="your-proxy-api-key" # Optional, if you set API_KEY in .env
|
||||
)
|
||||
|
||||
# Use as normal
|
||||
response = client.chat.completions.create(
|
||||
model="ibm/granite-3-8b-instruct", # Or use mapped name like "gpt-4"
|
||||
messages=[
|
||||
{"role": "user", "content": "Hello, how are you?"}
|
||||
]
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
### With Streaming
|
||||
|
||||
```python
|
||||
stream = client.chat.completions.create(
|
||||
model="ibm/granite-3-8b-instruct",
|
||||
messages=[{"role": "user", "content": "Tell me a story"}],
|
||||
stream=True
|
||||
)
|
||||
|
||||
for chunk in stream:
|
||||
if chunk.choices[0].delta.content:
|
||||
print(chunk.choices[0].delta.content, end="")
|
||||
```
|
||||
|
||||
### With cURL
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer your-proxy-api-key" \
|
||||
-d '{
|
||||
"model": "ibm/granite-3-8b-instruct",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
### Embeddings
|
||||
|
||||
```python
|
||||
response = client.embeddings.create(
|
||||
model="ibm/slate-125m-english-rtrvr",
|
||||
input="Your text to embed"
|
||||
)
|
||||
|
||||
print(response.data[0].embedding)
|
||||
```
|
||||
|
||||
## Available Endpoints
|
||||
|
||||
- `GET /` - API information
|
||||
- `GET /health` - Health check
|
||||
- `GET /docs` - Interactive API documentation (Swagger UI)
|
||||
- `POST /v1/chat/completions` - Chat completions
|
||||
- `POST /v1/completions` - Text completions (legacy)
|
||||
- `POST /v1/embeddings` - Generate embeddings
|
||||
- `GET /v1/models` - List available models
|
||||
- `GET /v1/models/{model_id}` - Get model information
|
||||
|
||||
## Supported Models
|
||||
|
||||
The proxy supports all watsonx.ai models available in your project, including:
|
||||
|
||||
### Chat Models
|
||||
- IBM Granite models (3.x, 4.x series)
|
||||
- Meta Llama models (3.x, 4.x series)
|
||||
- Mistral models
|
||||
- Other models available on watsonx.ai
|
||||
|
||||
### Embedding Models
|
||||
- `ibm/slate-125m-english-rtrvr`
|
||||
- `ibm/slate-30m-english-rtrvr`
|
||||
|
||||
See `/v1/models` endpoint for the complete list.
|
||||
|
||||
## Authentication
|
||||
|
||||
### Proxy Authentication (Optional)
|
||||
|
||||
If you set `API_KEY` in your `.env` file, clients must provide it:
|
||||
|
||||
```python
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8000/v1",
|
||||
api_key="your-proxy-api-key"
|
||||
)
|
||||
```
|
||||
|
||||
### IBM Cloud Authentication
|
||||
|
||||
The proxy handles IBM Cloud authentication automatically using your `IBM_CLOUD_API_KEY`. Bearer tokens are:
|
||||
- Automatically obtained on startup
|
||||
- Refreshed every 50 minutes (tokens expire after 60 minutes)
|
||||
- Refreshed on 401 errors
|
||||
|
||||
## Deployment
|
||||
|
||||
### Docker (Recommended)
|
||||
|
||||
Create a `Dockerfile`:
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY app ./app
|
||||
COPY .env .
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
```
|
||||
|
||||
Build and run:
|
||||
|
||||
```bash
|
||||
docker build -t watsonx-openai-proxy .
|
||||
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
|
||||
```
|
||||
|
||||
### Production Deployment
|
||||
|
||||
For production, consider:
|
||||
|
||||
1. **Use a production ASGI server**: The included uvicorn is suitable, but configure workers:
|
||||
```bash
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
|
||||
```
|
||||
|
||||
2. **Set up HTTPS**: Use a reverse proxy like nginx or Caddy
|
||||
|
||||
3. **Configure CORS**: Set `ALLOWED_ORIGINS` to specific domains
|
||||
|
||||
4. **Enable API key authentication**: Set `API_KEY` in environment
|
||||
|
||||
5. **Monitor logs**: Set `LOG_LEVEL=info` or `warning` in production
|
||||
|
||||
6. **Use environment secrets**: Don't commit `.env` file, use secret management
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### 401 Unauthorized
|
||||
|
||||
- Check that `IBM_CLOUD_API_KEY` is valid
|
||||
- Verify your IBM Cloud account has watsonx.ai access
|
||||
- Check server logs for token refresh errors
|
||||
|
||||
### Model Not Found
|
||||
|
||||
- Verify the model ID exists in watsonx.ai
|
||||
- Check that your project has access to the model
|
||||
- Use `/v1/models` endpoint to see available models
|
||||
|
||||
### Connection Errors
|
||||
|
||||
- Verify `WATSONX_CLUSTER` matches your project's region
|
||||
- Check firewall/network settings
|
||||
- Ensure watsonx.ai services are accessible
|
||||
|
||||
### Streaming Issues
|
||||
|
||||
- Some models may not support streaming
|
||||
- Check client library supports SSE (Server-Sent Events)
|
||||
- Verify network doesn't buffer streaming responses
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Install dev dependencies
|
||||
pip install pytest pytest-asyncio httpx
|
||||
|
||||
# Run tests
|
||||
pytest tests/
|
||||
```
|
||||
|
||||
### Code Structure
|
||||
|
||||
```
|
||||
watsonx-openai-proxy/
|
||||
├── app/
|
||||
│ ├── main.py # FastAPI application
|
||||
│ ├── config.py # Configuration management
|
||||
│ ├── routers/ # API endpoint routers
|
||||
│ │ ├── chat.py # Chat completions
|
||||
│ │ ├── completions.py # Text completions
|
||||
│ │ ├── embeddings.py # Embeddings
|
||||
│ │ └── models.py # Model listing
|
||||
│ ├── services/ # Business logic
|
||||
│ │ └── watsonx_service.py # watsonx.ai API client
|
||||
│ ├── models/ # Pydantic models
|
||||
│ │ └── openai_models.py # OpenAI-compatible schemas
|
||||
│ └── utils/ # Utilities
|
||||
│ └── transformers.py # Request/response transformers
|
||||
├── tests/ # Test files
|
||||
├── requirements.txt # Python dependencies
|
||||
├── .env.example # Environment template
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Please:
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Add tests if applicable
|
||||
5. Submit a pull request
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 License - See LICENSE file for details.
|
||||
|
||||
## Related Projects
|
||||
|
||||
- [watsonx-unofficial-aisdk-provider](../wxai-provider/) - Vercel AI SDK provider for watsonx.ai
|
||||
- [OpenCode watsonx plugin](../.opencode/plugins/) - Token management plugin for OpenCode
|
||||
|
||||
## Disclaimer
|
||||
|
||||
This is **not an official IBM product**. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions:
|
||||
- Check the [Troubleshooting](#troubleshooting) section
|
||||
- Review server logs (`LOG_LEVEL=debug` for detailed logs)
|
||||
- Open an issue in the repository
|
||||
- Consult [IBM watsonx.ai documentation](https://www.ibm.com/docs/en/watsonx-as-a-service)
|
||||
Reference in New Issue
Block a user