Add AGENTS.md documentation for AI agent guidance

This commit is contained in:
2026-02-23 09:59:52 -05:00
commit 2e2b817435
21 changed files with 2513 additions and 0 deletions

353
README.md Normal file
View File

@@ -0,0 +1,353 @@
# watsonx-openai-proxy
OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.
## Features
-**Full OpenAI API Compatibility**: Drop-in replacement for OpenAI API
-**Chat Completions**: `/v1/chat/completions` with streaming support
-**Text Completions**: `/v1/completions` (legacy endpoint)
-**Embeddings**: `/v1/embeddings` for text embeddings
-**Model Listing**: `/v1/models` endpoint
-**Streaming Support**: Server-Sent Events (SSE) for real-time responses
-**Model Mapping**: Map OpenAI model names to watsonx models
-**Automatic Token Management**: Handles IBM Cloud authentication automatically
-**CORS Support**: Configurable cross-origin resource sharing
-**Optional API Key Authentication**: Secure your proxy with an API key
## Quick Start
### Prerequisites
- Python 3.9 or higher
- IBM Cloud account with watsonx.ai access
- IBM Cloud API key
- watsonx.ai Project ID
### Installation
1. Clone or download this directory:
```bash
cd watsonx-openai-proxy
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Configure environment variables:
```bash
cp .env.example .env
# Edit .env with your credentials
```
4. Run the server:
```bash
python -m app.main
```
Or with uvicorn:
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000
```
The server will start at `http://localhost:8000`
## Configuration
### Environment Variables
Create a `.env` file with the following variables:
```bash
# Required: IBM Cloud Configuration
IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
WATSONX_PROJECT_ID=your_watsonx_project_id_here
WATSONX_CLUSTER=us-south # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor
# Optional: Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
# Optional: API Key for Proxy Authentication
API_KEY=your_optional_api_key_for_proxy_authentication
# Optional: CORS Configuration
ALLOWED_ORIGINS=* # Comma-separated or * for all
# Optional: Model Mapping
MODEL_MAP_GPT4=ibm/granite-4-h-small
MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr
```
### Model Mapping
You can map OpenAI model names to watsonx models using environment variables:
```bash
MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>
```
For example:
- `MODEL_MAP_GPT4=ibm/granite-4-h-small` maps `gpt-4` to `ibm/granite-4-h-small`
- `MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct` maps `gpt-3.5-turbo` to `ibm/granite-3-8b-instruct`
## Usage
### With OpenAI Python SDK
```python
from openai import OpenAI
# Point to your proxy
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="your-proxy-api-key" # Optional, if you set API_KEY in .env
)
# Use as normal
response = client.chat.completions.create(
model="ibm/granite-3-8b-instruct", # Or use mapped name like "gpt-4"
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
```
### With Streaming
```python
stream = client.chat.completions.create(
model="ibm/granite-3-8b-instruct",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
```
### With cURL
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-proxy-api-key" \
-d '{
"model": "ibm/granite-3-8b-instruct",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
```
### Embeddings
```python
response = client.embeddings.create(
model="ibm/slate-125m-english-rtrvr",
input="Your text to embed"
)
print(response.data[0].embedding)
```
## Available Endpoints
- `GET /` - API information
- `GET /health` - Health check
- `GET /docs` - Interactive API documentation (Swagger UI)
- `POST /v1/chat/completions` - Chat completions
- `POST /v1/completions` - Text completions (legacy)
- `POST /v1/embeddings` - Generate embeddings
- `GET /v1/models` - List available models
- `GET /v1/models/{model_id}` - Get model information
## Supported Models
The proxy supports all watsonx.ai models available in your project, including:
### Chat Models
- IBM Granite models (3.x, 4.x series)
- Meta Llama models (3.x, 4.x series)
- Mistral models
- Other models available on watsonx.ai
### Embedding Models
- `ibm/slate-125m-english-rtrvr`
- `ibm/slate-30m-english-rtrvr`
See `/v1/models` endpoint for the complete list.
## Authentication
### Proxy Authentication (Optional)
If you set `API_KEY` in your `.env` file, clients must provide it:
```python
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="your-proxy-api-key"
)
```
### IBM Cloud Authentication
The proxy handles IBM Cloud authentication automatically using your `IBM_CLOUD_API_KEY`. Bearer tokens are:
- Automatically obtained on startup
- Refreshed every 50 minutes (tokens expire after 60 minutes)
- Refreshed on 401 errors
## Deployment
### Docker (Recommended)
Create a `Dockerfile`:
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app ./app
COPY .env .
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
Build and run:
```bash
docker build -t watsonx-openai-proxy .
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
```
### Production Deployment
For production, consider:
1. **Use a production ASGI server**: The included uvicorn is suitable, but configure workers:
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
```
2. **Set up HTTPS**: Use a reverse proxy like nginx or Caddy
3. **Configure CORS**: Set `ALLOWED_ORIGINS` to specific domains
4. **Enable API key authentication**: Set `API_KEY` in environment
5. **Monitor logs**: Set `LOG_LEVEL=info` or `warning` in production
6. **Use environment secrets**: Don't commit `.env` file, use secret management
## Troubleshooting
### 401 Unauthorized
- Check that `IBM_CLOUD_API_KEY` is valid
- Verify your IBM Cloud account has watsonx.ai access
- Check server logs for token refresh errors
### Model Not Found
- Verify the model ID exists in watsonx.ai
- Check that your project has access to the model
- Use `/v1/models` endpoint to see available models
### Connection Errors
- Verify `WATSONX_CLUSTER` matches your project's region
- Check firewall/network settings
- Ensure watsonx.ai services are accessible
### Streaming Issues
- Some models may not support streaming
- Check client library supports SSE (Server-Sent Events)
- Verify network doesn't buffer streaming responses
## Development
### Running Tests
```bash
# Install dev dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest tests/
```
### Code Structure
```
watsonx-openai-proxy/
├── app/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration management
│ ├── routers/ # API endpoint routers
│ │ ├── chat.py # Chat completions
│ │ ├── completions.py # Text completions
│ │ ├── embeddings.py # Embeddings
│ │ └── models.py # Model listing
│ ├── services/ # Business logic
│ │ └── watsonx_service.py # watsonx.ai API client
│ ├── models/ # Pydantic models
│ │ └── openai_models.py # OpenAI-compatible schemas
│ └── utils/ # Utilities
│ └── transformers.py # Request/response transformers
├── tests/ # Test files
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── README.md # This file
```
## Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## License
Apache 2.0 License - See LICENSE file for details.
## Related Projects
- [watsonx-unofficial-aisdk-provider](../wxai-provider/) - Vercel AI SDK provider for watsonx.ai
- [OpenCode watsonx plugin](../.opencode/plugins/) - Token management plugin for OpenCode
## Disclaimer
This is **not an official IBM product**. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.
## Support
For issues and questions:
- Check the [Troubleshooting](#troubleshooting) section
- Review server logs (`LOG_LEVEL=debug` for detailed logs)
- Open an issue in the repository
- Consult [IBM watsonx.ai documentation](https://www.ibm.com/docs/en/watsonx-as-a-service)