Add AGENTS.md documentation for AI agent guidance
This commit is contained in:
223
AGENTS.md
Normal file
223
AGENTS.md
Normal file
@@ -0,0 +1,223 @@
|
||||
# AGENTS.md
|
||||
|
||||
This file provides guidance to agents when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
**watsonx-openai-proxy** is an OpenAI-compatible API proxy for IBM watsonx.ai. It enables any tool or application that supports the OpenAI API format to seamlessly work with watsonx.ai models.
|
||||
|
||||
### Core Purpose
|
||||
- Provide drop-in replacement for OpenAI API endpoints
|
||||
- Translate OpenAI API requests to watsonx.ai API calls
|
||||
- Handle IBM Cloud authentication and token management automatically
|
||||
- Support streaming responses via Server-Sent Events (SSE)
|
||||
|
||||
### Technology Stack
|
||||
- **Framework**: FastAPI (async web framework)
|
||||
- **Language**: Python 3.9+
|
||||
- **HTTP Client**: httpx (async HTTP client)
|
||||
- **Validation**: Pydantic v2 (data validation and settings)
|
||||
- **Server**: uvicorn (ASGI server)
|
||||
|
||||
### Architecture
|
||||
|
||||
The codebase follows a clean, modular architecture:
|
||||
|
||||
```
|
||||
app/
|
||||
├── main.py # FastAPI app initialization, middleware, lifespan management
|
||||
├── config.py # Settings management, model mapping, environment variables
|
||||
├── routers/ # API endpoint handlers (chat, completions, embeddings, models)
|
||||
├── services/ # Business logic (watsonx_service for API interactions)
|
||||
├── models/ # Pydantic models for OpenAI-compatible schemas
|
||||
└── utils/ # Helper functions (request/response transformers)
|
||||
```
|
||||
|
||||
**Key Design Patterns**:
|
||||
- **Service Layer**: `watsonx_service.py` encapsulates all watsonx.ai API interactions
|
||||
- **Transformer Pattern**: `transformers.py` handles bidirectional conversion between OpenAI and watsonx formats
|
||||
- **Singleton Services**: Global service instances (`watsonx_service`, `settings`) for shared state
|
||||
- **Async/Await**: All I/O operations are asynchronous for better performance
|
||||
- **Middleware**: Custom authentication middleware for optional API key validation
|
||||
|
||||
## Building and Running
|
||||
|
||||
### Prerequisites
|
||||
```bash
|
||||
# Python 3.9 or higher required
|
||||
python --version
|
||||
|
||||
# IBM Cloud credentials needed:
|
||||
# - IBM_CLOUD_API_KEY
|
||||
# - WATSONX_PROJECT_ID
|
||||
```
|
||||
|
||||
### Installation
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Configure environment
|
||||
cp .env.example .env
|
||||
# Edit .env with your IBM Cloud credentials
|
||||
```
|
||||
|
||||
### Running the Server
|
||||
```bash
|
||||
# Development (with auto-reload)
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
|
||||
# Production (with workers)
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
|
||||
|
||||
# Using Python module
|
||||
python -m app.main
|
||||
```
|
||||
|
||||
### Docker Deployment
|
||||
```bash
|
||||
# Build image
|
||||
docker build -t watsonx-openai-proxy .
|
||||
|
||||
# Run container
|
||||
docker run -p 8000:8000 --env-file .env watsonx-openai-proxy
|
||||
|
||||
# Using docker-compose
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Install test dependencies
|
||||
pip install pytest pytest-asyncio httpx
|
||||
|
||||
# Run tests
|
||||
pytest tests/
|
||||
|
||||
# Run with coverage
|
||||
pytest tests/ --cov=app
|
||||
```
|
||||
|
||||
## Development Conventions
|
||||
|
||||
### Code Style
|
||||
- **Async First**: Use `async`/`await` for all I/O operations (HTTP requests, file operations)
|
||||
- **Type Hints**: All functions should have type annotations for parameters and return values
|
||||
- **Docstrings**: Use Google-style docstrings for functions and classes
|
||||
- **Logging**: Use the `logging` module with appropriate log levels (info, warning, error)
|
||||
|
||||
### Error Handling
|
||||
- Catch exceptions at router level and return OpenAI-compatible error responses
|
||||
- Use `HTTPException` with proper status codes and error details
|
||||
- Log errors with full context using `logger.error(..., exc_info=True)`
|
||||
- Return structured error responses matching OpenAI's error format
|
||||
|
||||
### Configuration Management
|
||||
- All configuration via environment variables (`.env` file)
|
||||
- Use `pydantic-settings` for type-safe configuration
|
||||
- Model mapping via `MODEL_MAP_*` environment variables
|
||||
- Settings accessed through global `settings` instance
|
||||
|
||||
### Token Management
|
||||
- Bearer tokens automatically refreshed every 50 minutes (expire at 60 minutes)
|
||||
- Token refresh on 401 errors from watsonx.ai
|
||||
- Thread-safe token refresh using `asyncio.Lock`
|
||||
- Initial token obtained during application startup
|
||||
|
||||
### API Compatibility
|
||||
- Maintain strict OpenAI API compatibility in request/response formats
|
||||
- Use Pydantic models from `openai_models.py` for validation
|
||||
- Transform requests/responses using functions in `transformers.py`
|
||||
- Support both streaming and non-streaming responses
|
||||
|
||||
### Adding New Endpoints
|
||||
1. Create router in `app/routers/` (e.g., `new_endpoint.py`)
|
||||
2. Define Pydantic models in `app/models/openai_models.py`
|
||||
3. Add transformation logic in `app/utils/transformers.py`
|
||||
4. Add watsonx.ai API method in `app/services/watsonx_service.py`
|
||||
5. Register router in `app/main.py` using `app.include_router()`
|
||||
|
||||
### Streaming Responses
|
||||
- Use `StreamingResponse` with `media_type="text/event-stream"`
|
||||
- Format chunks as Server-Sent Events using `format_sse_event()`
|
||||
- Always send `[DONE]` message at the end of stream
|
||||
- Handle errors gracefully and send error events in SSE format
|
||||
|
||||
### Model Mapping
|
||||
- Map OpenAI model names to watsonx models via environment variables
|
||||
- Format: `MODEL_MAP_<OPENAI_MODEL>=<WATSONX_MODEL_ID>`
|
||||
- Example: `MODEL_MAP_GPT4=ibm/granite-4-h-small`
|
||||
- Mapping applied in `settings.map_model()` before API calls
|
||||
|
||||
### Security Considerations
|
||||
- Optional API key authentication via `API_KEY` environment variable
|
||||
- Middleware validates Bearer token in Authorization header
|
||||
- IBM Cloud API key stored securely in environment variables
|
||||
- CORS configured via `ALLOWED_ORIGINS` (default: `*`)
|
||||
|
||||
### Logging Best Practices
|
||||
- Use structured logging with context (model names, request IDs)
|
||||
- Log level controlled by `LOG_LEVEL` environment variable
|
||||
- Log token refresh events at INFO level
|
||||
- Log API errors at ERROR level with full traceback
|
||||
- Include request/response details for debugging
|
||||
|
||||
### Dependencies
|
||||
- Keep `requirements.txt` minimal and pinned to specific versions
|
||||
- FastAPI and Pydantic are core dependencies - avoid breaking changes
|
||||
- httpx for async HTTP - prefer over requests/aiohttp
|
||||
- Use `uvicorn[standard]` for production-ready server
|
||||
|
||||
## Important Implementation Notes
|
||||
|
||||
### watsonx.ai API Specifics
|
||||
- Base URL format: `https://{cluster}.ml.cloud.ibm.com/ml/v1`
|
||||
- API version parameter: `version=2024-02-13` (required on all requests)
|
||||
- Chat endpoint: `/text/chat` (non-streaming) or `/text/chat_stream` (streaming)
|
||||
- Text generation: `/text/generation`
|
||||
- Embeddings: `/text/embeddings`
|
||||
|
||||
### Request/Response Transformation
|
||||
- OpenAI messages → watsonx messages: Direct mapping with role/content
|
||||
- watsonx responses → OpenAI format: Extract choices, usage, and metadata
|
||||
- Streaming chunks: Parse SSE format, transform delta objects
|
||||
- Generate unique IDs: `chatcmpl-{uuid}` for chat, `cmpl-{uuid}` for completions
|
||||
|
||||
### Common Pitfalls
|
||||
- Don't forget to refresh tokens before they expire (50-minute interval)
|
||||
- Always close httpx client on shutdown (`await watsonx_service.close()`)
|
||||
- Handle both string and list formats for `stop` parameter
|
||||
- Validate model IDs exist in watsonx.ai before making requests
|
||||
- Set appropriate timeouts for long-running generation requests (300s default)
|
||||
|
||||
### Performance Optimization
|
||||
- Reuse httpx client instance (don't create per request)
|
||||
- Use connection pooling (httpx default behavior)
|
||||
- Consider worker processes for production (`--workers 4`)
|
||||
- Monitor token refresh to avoid rate limiting
|
||||
|
||||
## Environment Variables Reference
|
||||
|
||||
### Required
|
||||
- `IBM_CLOUD_API_KEY`: IBM Cloud API key for authentication
|
||||
- `WATSONX_PROJECT_ID`: watsonx.ai project ID
|
||||
|
||||
### Optional
|
||||
- `WATSONX_CLUSTER`: Region (default: `us-south`)
|
||||
- `HOST`: Server host (default: `0.0.0.0`)
|
||||
- `PORT`: Server port (default: `8000`)
|
||||
- `LOG_LEVEL`: Logging level (default: `info`)
|
||||
- `API_KEY`: Optional proxy authentication key
|
||||
- `ALLOWED_ORIGINS`: CORS origins (default: `*`)
|
||||
- `MODEL_MAP_*`: Model name mappings
|
||||
|
||||
## API Endpoints
|
||||
|
||||
- `GET /` - API information and available endpoints
|
||||
- `GET /health` - Health check (bypasses authentication)
|
||||
- `GET /docs` - Interactive Swagger UI documentation
|
||||
- `POST /v1/chat/completions` - Chat completions (streaming supported)
|
||||
- `POST /v1/completions` - Text completions (legacy)
|
||||
- `POST /v1/embeddings` - Generate embeddings
|
||||
- `GET /v1/models` - List available models
|
||||
- `GET /v1/models/{model_id}` - Get specific model info
|
||||
Reference in New Issue
Block a user