Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00
commit 2e2b817435
21 changed files with 2513 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,353 @@
+# watsonx-openai-proxy
+
+OpenAI-compatible API proxy for IBM watsonx.ai. This proxy allows you to use watsonx.ai models with any tool or application that supports the OpenAI API format.
+
+## Features
+
+- ✅ **Full OpenAI API Compatibility**: Drop-in replacement for OpenAI API
+- ✅ **Chat Completions**: `/v1/chat/completions` with streaming support
+- ✅ **Text Completions**: `/v1/completions` (legacy endpoint)
+- ✅ **Embeddings**: `/v1/embeddings` for text embeddings
+- ✅ **Model Listing**: `/v1/models` endpoint
+- ✅ **Streaming Support**: Server-Sent Events (SSE) for real-time responses
+- ✅ **Model Mapping**: Map OpenAI model names to watsonx models
+- ✅ **Automatic Token Management**: Handles IBM Cloud authentication automatically
+- ✅ **CORS Support**: Configurable cross-origin resource sharing
+- ✅ **Optional API Key Authentication**: Secure your proxy with an API key
+
+## Quick Start
+
+### Prerequisites
+
+- Python 3.9 or higher
+- IBM Cloud account with watsonx.ai access
+- IBM Cloud API key
+- watsonx.ai Project ID
+
+### Installation
+
+1. Clone or download this directory:
+
+```bash
+cd watsonx-openai-proxy
+```
+
+2. Install dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+3. Configure environment variables:
+
+```bash
+cp .env.example .env
+# Edit .env with your credentials
+```
+
+4. Run the server:
+
+```bash
+python -m app.main
+```
+
+Or with uvicorn:
+
+```bash
+uvicorn app.main:app --host 0.0.0.0 --port 8000
+```
+
+The server will start at `http://localhost:8000`
+
+## Configuration
+
+### Environment Variables
+
+Create a `.env` file with the following variables:
+
+```bash
+# Required: IBM Cloud Configuration
+IBM_CLOUD_API_KEY=your_ibm_cloud_api_key_here
+WATSONX_PROJECT_ID=your_watsonx_project_id_here
+WATSONX_CLUSTER=us-south  # Options: us-south, eu-de, eu-gb, jp-tok, au-syd, ca-tor
+
+# Optional: Server Configuration
+HOST=0.0.0.0
+PORT=8000
+LOG_LEVEL=info
+
+# Optional: API Key for Proxy Authentication
+API_KEY=your_optional_api_key_for_proxy_authentication
+
+# Optional: CORS Configuration
+ALLOWED_ORIGINS=*  # Comma-separated or * for all
+
+# Optional: Model Mapping
+MODEL_MAP_GPT4=ibm/granite-4-h-small
+MODEL_MAP_GPT35=ibm/granite-3-8b-instruct
+MODEL_MAP_GPT4_TURBO=meta-llama/llama-3-3-70b-instruct
+MODEL_MAP_TEXT_EMBEDDING_ADA_002=ibm/slate-125m-english-rtrvr
+```
+
+### Model Mapping
+
+You can map OpenAI model names to watsonx models using environment variables:
+
+```bash
+MODEL_MAP_<OPENAI_MODEL_NAME>=<WATSONX_MODEL_ID>
+```
+
+For example:
+- `MODEL_MAP_GPT4=ibm/granite-4-h-small` maps `gpt-4` to `ibm/granite-4-h-small`
+- `MODEL_MAP_GPT35_TURBO=ibm/granite-3-8b-instruct` maps `gpt-3.5-turbo` to `ibm/granite-3-8b-instruct`
+
+## Usage
+
+### With OpenAI Python SDK
+
+```python
+from openai import OpenAI
+
+# Point to your proxy
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="your-proxy-api-key"  # Optional, if you set API_KEY in .env
+)
+
+# Use as normal
+response = client.chat.completions.create(
+    model="ibm/granite-3-8b-instruct",  # Or use mapped name like "gpt-4"
+    messages=[
+        {"role": "user", "content": "Hello, how are you?"}
+    ]
+)
+
+print(response.choices[0].message.content)
+```
+
+### With Streaming
+
+```python
+stream = client.chat.completions.create(
+    model="ibm/granite-3-8b-instruct",
+    messages=[{"role": "user", "content": "Tell me a story"}],
+    stream=True
+)
+
+for chunk in stream:
+    if chunk.choices[0].delta.content:
+        print(chunk.choices[0].delta.content, end="")
+```
+
+### With cURL
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer your-proxy-api-key" \
+  -d '{
+    "model": "ibm/granite-3-8b-instruct",
+    "messages": [
+      {"role": "user", "content": "Hello!"}
+    ]
+  }'
+```
+
+### Embeddings
+
+```python
+response = client.embeddings.create(
+    model="ibm/slate-125m-english-rtrvr",
+    input="Your text to embed"
+)
+
+print(response.data[0].embedding)
+```
+
+## Available Endpoints
+
+- `GET /` - API information
+- `GET /health` - Health check
+- `GET /docs` - Interactive API documentation (Swagger UI)
+- `POST /v1/chat/completions` - Chat completions
+- `POST /v1/completions` - Text completions (legacy)
+- `POST /v1/embeddings` - Generate embeddings
+- `GET /v1/models` - List available models
+- `GET /v1/models/{model_id}` - Get model information
+
+## Supported Models
+
+The proxy supports all watsonx.ai models available in your project, including:
+
+### Chat Models
+- IBM Granite models (3.x, 4.x series)
+- Meta Llama models (3.x, 4.x series)
+- Mistral models
+- Other models available on watsonx.ai
+
+### Embedding Models
+- `ibm/slate-125m-english-rtrvr`
+- `ibm/slate-30m-english-rtrvr`
+
+See `/v1/models` endpoint for the complete list.
+
+## Authentication
+
+### Proxy Authentication (Optional)
+
+If you set `API_KEY` in your `.env` file, clients must provide it:
+
+```python
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="your-proxy-api-key"
+)
+```
+
+### IBM Cloud Authentication
+
+The proxy handles IBM Cloud authentication automatically using your `IBM_CLOUD_API_KEY`. Bearer tokens are:
+- Automatically obtained on startup
+- Refreshed every 50 minutes (tokens expire after 60 minutes)
+- Refreshed on 401 errors
+
+## Deployment
+
+### Docker (Recommended)
+
+Create a `Dockerfile`:
+
+```dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY app ./app
+COPY .env .
+
+EXPOSE 8000
+
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+
+Build and run:
+
+```bash
+docker build -t watsonx-openai-proxy .
+docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
+```
+
+### Production Deployment
+
+For production, consider:
+
+1. **Use a production ASGI server**: The included uvicorn is suitable, but configure workers:
+   ```bash
+   uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
+   ```
+
+2. **Set up HTTPS**: Use a reverse proxy like nginx or Caddy
+
+3. **Configure CORS**: Set `ALLOWED_ORIGINS` to specific domains
+
+4. **Enable API key authentication**: Set `API_KEY` in environment
+
+5. **Monitor logs**: Set `LOG_LEVEL=info` or `warning` in production
+
+6. **Use environment secrets**: Don't commit `.env` file, use secret management
+
+## Troubleshooting
+
+### 401 Unauthorized
+
+- Check that `IBM_CLOUD_API_KEY` is valid
+- Verify your IBM Cloud account has watsonx.ai access
+- Check server logs for token refresh errors
+
+### Model Not Found
+
+- Verify the model ID exists in watsonx.ai
+- Check that your project has access to the model
+- Use `/v1/models` endpoint to see available models
+
+### Connection Errors
+
+- Verify `WATSONX_CLUSTER` matches your project's region
+- Check firewall/network settings
+- Ensure watsonx.ai services are accessible
+
+### Streaming Issues
+
+- Some models may not support streaming
+- Check client library supports SSE (Server-Sent Events)
+- Verify network doesn't buffer streaming responses
+
+## Development
+
+### Running Tests
+
+```bash
+# Install dev dependencies
+pip install pytest pytest-asyncio httpx
+
+# Run tests
+pytest tests/
+```
+
+### Code Structure
+
+```
+watsonx-openai-proxy/
+├── app/
+│   ├── main.py              # FastAPI application
+│   ├── config.py            # Configuration management
+│   ├── routers/             # API endpoint routers
+│   │   ├── chat.py          # Chat completions
+│   │   ├── completions.py   # Text completions
+│   │   ├── embeddings.py    # Embeddings
+│   │   └── models.py        # Model listing
+│   ├── services/            # Business logic
+│   │   └── watsonx_service.py  # watsonx.ai API client
+│   ├── models/              # Pydantic models
+│   │   └── openai_models.py    # OpenAI-compatible schemas
+│   └── utils/               # Utilities
+│       └── transformers.py     # Request/response transformers
+├── tests/                   # Test files
+├── requirements.txt         # Python dependencies
+├── .env.example            # Environment template
+└── README.md               # This file
+```
+
+## Contributing
+
+Contributions are welcome! Please:
+
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Add tests if applicable
+5. Submit a pull request
+
+## License
+
+Apache 2.0 License - See LICENSE file for details.
+
+## Related Projects
+
+- [watsonx-unofficial-aisdk-provider](../wxai-provider/) - Vercel AI SDK provider for watsonx.ai
+- [OpenCode watsonx plugin](../.opencode/plugins/) - Token management plugin for OpenCode
+
+## Disclaimer
+
+This is **not an official IBM product**. It's a community-maintained proxy for integrating watsonx.ai with OpenAI-compatible tools. watsonx.ai is a trademark of IBM.
+
+## Support
+
+For issues and questions:
+- Check the [Troubleshooting](#troubleshooting) section
+- Review server logs (`LOG_LEVEL=debug` for detailed logs)
+- Open an issue in the repository
+- Consult [IBM watsonx.ai documentation](https://www.ibm.com/docs/en/watsonx-as-a-service)