Add AGENTS.md documentation for AI agent guidance

2026-02-23 09:59:52 -05:00
commit 2e2b817435
21 changed files with 2513 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,223 @@
+# AGENTS.md
+
+This file provides guidance to agents when working with code in this repository.
+
+## Project Overview
+
+**watsonx-openai-proxy** is an OpenAI-compatible API proxy for IBM watsonx.ai. It enables any tool or application that supports the OpenAI API format to seamlessly work with watsonx.ai models.
+
+### Core Purpose
+- Provide drop-in replacement for OpenAI API endpoints
+- Translate OpenAI API requests to watsonx.ai API calls
+- Handle IBM Cloud authentication and token management automatically
+- Support streaming responses via Server-Sent Events (SSE)
+
+### Technology Stack
+- **Framework**: FastAPI (async web framework)
+- **Language**: Python 3.9+
+- **HTTP Client**: httpx (async HTTP client)
+- **Validation**: Pydantic v2 (data validation and settings)
+- **Server**: uvicorn (ASGI server)
+
+### Architecture
+
+The codebase follows a clean, modular architecture:
+
+```
+app/
+├── main.py              # FastAPI app initialization, middleware, lifespan management
+├── config.py            # Settings management, model mapping, environment variables
+├── routers/             # API endpoint handlers (chat, completions, embeddings, models)
+├── services/            # Business logic (watsonx_service for API interactions)
+├── models/              # Pydantic models for OpenAI-compatible schemas
+└── utils/               # Helper functions (request/response transformers)
+```
+
+**Key Design Patterns**:
+- **Service Layer**: `watsonx_service.py` encapsulates all watsonx.ai API interactions
+- **Transformer Pattern**: `transformers.py` handles bidirectional conversion between OpenAI and watsonx formats
+- **Singleton Services**: Global service instances (`watsonx_service`, `settings`) for shared state
+- **Async/Await**: All I/O operations are asynchronous for better performance
+- **Middleware**: Custom authentication middleware for optional API key validation
+
+## Building and Running
+
+### Prerequisites
+```bash
+# Python 3.9 or higher required
+python --version
+
+# IBM Cloud credentials needed:
+# - IBM_CLOUD_API_KEY
+# - WATSONX_PROJECT_ID
+```
+
+### Installation
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Configure environment
+cp .env.example .env
+# Edit .env with your IBM Cloud credentials
+```
+
+### Running the Server
+```bash
+# Development (with auto-reload)
+uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
+
+# Production (with workers)
+uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
+
+# Using Python module
+python -m app.main
+```
+
+### Docker Deployment
+```bash
+# Build image
+docker build -t watsonx-openai-proxy .
+
+# Run container
+docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
+
+# Using docker-compose
+docker-compose up
+```
+
+### Testing
+```bash
+# Install test dependencies
+pip install pytest pytest-asyncio httpx
+
+# Run tests
+pytest tests/
+
+# Run with coverage
+pytest tests/ --cov=app
+```
+
+## Development Conventions
+
+### Code Style
+- **Async First**: Use `async`/`await` for all I/O operations (HTTP requests, file operations)
+- **Type Hints**: All functions should have type annotations for parameters and return values
+- **Docstrings**: Use Google-style docstrings for functions and classes
+- **Logging**: Use the `logging` module with appropriate log levels (info, warning, error)
+
+### Error Handling
+- Catch exceptions at router level and return OpenAI-compatible error responses
+- Use `HTTPException` with proper status codes and error details
+- Log errors with full context using `logger.error(..., exc_info=True)`
+- Return structured error responses matching OpenAI's error format
+
+### Configuration Management
+- All configuration via environment variables (`.env` file)
+- Use `pydantic-settings` for type-safe configuration
+- Model mapping via `MODEL_MAP_*` environment variables
+- Settings accessed through global `settings` instance
+
+### Token Management
+- Bearer tokens automatically refreshed every 50 minutes (expire at 60 minutes)
+- Token refresh on 401 errors from watsonx.ai
+- Thread-safe token refresh using `asyncio.Lock`
+- Initial token obtained during application startup
+
+### API Compatibility
+- Maintain strict OpenAI API compatibility in request/response formats
+- Use Pydantic models from `openai_models.py` for validation
+- Transform requests/responses using functions in `transformers.py`
+- Support both streaming and non-streaming responses
+
+### Adding New Endpoints
+1. Create router in `app/routers/` (e.g., `new_endpoint.py`)
+2. Define Pydantic models in `app/models/openai_models.py`
+3. Add transformation logic in `app/utils/transformers.py`
+4. Add watsonx.ai API method in `app/services/watsonx_service.py`
+5. Register router in `app/main.py` using `app.include_router()`
+
+### Streaming Responses
+- Use `StreamingResponse` with `media_type="text/event-stream"`
+- Format chunks as Server-Sent Events using `format_sse_event()`
+- Always send `[DONE]` message at the end of stream
+- Handle errors gracefully and send error events in SSE format
+
+### Model Mapping
+- Map OpenAI model names to watsonx models via environment variables
+- Format: `MODEL_MAP_<OPENAI_MODEL>=<WATSONX_MODEL_ID>`
+- Example: `MODEL_MAP_GPT4=ibm/granite-4-h-small`
+- Mapping applied in `settings.map_model()` before API calls
+
+### Security Considerations
+- Optional API key authentication via `API_KEY` environment variable
+- Middleware validates Bearer token in Authorization header
+- IBM Cloud API key stored securely in environment variables
+- CORS configured via `ALLOWED_ORIGINS` (default: `*`)
+
+### Logging Best Practices
+- Use structured logging with context (model names, request IDs)
+- Log level controlled by `LOG_LEVEL` environment variable
+- Log token refresh events at INFO level
+- Log API errors at ERROR level with full traceback
+- Include request/response details for debugging
+
+### Dependencies
+- Keep `requirements.txt` minimal and pinned to specific versions
+- FastAPI and Pydantic are core dependencies - avoid breaking changes
+- httpx for async HTTP - prefer over requests/aiohttp
+- Use `uvicorn[standard]` for production-ready server
+
+## Important Implementation Notes
+
+### watsonx.ai API Specifics
+- Base URL format: `https://{cluster}.ml.cloud.ibm.com/ml/v1`
+- API version parameter: `version=2024-02-13` (required on all requests)
+- Chat endpoint: `/text/chat` (non-streaming) or `/text/chat_stream` (streaming)
+- Text generation: `/text/generation`
+- Embeddings: `/text/embeddings`
+
+### Request/Response Transformation
+- OpenAI messages → watsonx messages: Direct mapping with role/content
+- watsonx responses → OpenAI format: Extract choices, usage, and metadata
+- Streaming chunks: Parse SSE format, transform delta objects
+- Generate unique IDs: `chatcmpl-{uuid}` for chat, `cmpl-{uuid}` for completions
+
+### Common Pitfalls
+- Don't forget to refresh tokens before they expire (50-minute interval)
+- Always close httpx client on shutdown (`await watsonx_service.close()`)
+- Handle both string and list formats for `stop` parameter
+- Validate model IDs exist in watsonx.ai before making requests
+- Set appropriate timeouts for long-running generation requests (300s default)
+
+### Performance Optimization
+- Reuse httpx client instance (don't create per request)
+- Use connection pooling (httpx default behavior)
+- Consider worker processes for production (`--workers 4`)
+- Monitor token refresh to avoid rate limiting
+
+## Environment Variables Reference
+
+### Required
+- `IBM_CLOUD_API_KEY`: IBM Cloud API key for authentication
+- `WATSONX_PROJECT_ID`: watsonx.ai project ID
+
+### Optional
+- `WATSONX_CLUSTER`: Region (default: `us-south`)
+- `HOST`: Server host (default: `0.0.0.0`)
+- `PORT`: Server port (default: `8000`)
+- `LOG_LEVEL`: Logging level (default: `info`)
+- `API_KEY`: Optional proxy authentication key
+- `ALLOWED_ORIGINS`: CORS origins (default: `*`)
+- `MODEL_MAP_*`: Model name mappings
+
+## API Endpoints
+
+- `GET /` - API information and available endpoints
+- `GET /health` - Health check (bypasses authentication)
+- `GET /docs` - Interactive Swagger UI documentation
+- `POST /v1/chat/completions` - Chat completions (streaming supported)
+- `POST /v1/completions` - Text completions (legacy)
+- `POST /v1/embeddings` - Generate embeddings
+- `GET /v1/models` - List available models
+- `GET /v1/models/{model_id}` - Get specific model info