Cache-Cool is a simple LLM (Large Language Model) caching proxy for saving your LLM calls. It acts as a caching layer for LLM API calls, such as OpenAI or Claude, to improve performance and reduce costs by avoiding redundant requests to the LLM providers. The caching is implemented using both MongoDB and JSON files.
- GitHub Repository: https://github.com/msnp1381/Cache-Cool
- Project Name: Cache-Cool
- Project Description: A simple LLM caching proxy for saving your LLM calls.
- πΎ Cache Responses: Caches responses from LLM API calls to reduce redundancy.
- βοΈ Dynamic Configuration: Allows dynamic configuration of LLM service and caching mechanisms via the
/configure
endpoint. - π Supports Multiple LLMs: Configurable to support different LLM services (e.g., OpenAI, Claude, Groq).
- π Uses MongoDB and JSON for Caching: Leverages both MongoDB and JSON files for caching API responses.
- β»οΈ Implements LRU eviction for JSON and Mongo caches.
- β‘ Redis caching relies on Redis's default LRU mechanism.
- POST /{schema_name}/chat/completions:
schema_name is defined in confing.yaml
Forwards chat completion requests to the configured LLM service or returns cached responses.
- GET /configure: Retrieves current configuration details.
- PUT /configure: Updates configuration settings dynamically.
Before you start, make sure you have:
- π³ Docker: Installed on your system. Download Docker here
- π MongoDB: A running MongoDB instance for caching (local or remote).
- π Redis: Optional A running Redis instance for caching.
-
Clone the repository:
First, download the project files:
git clone https://github.com/msnp1381/cache-cool.git cd cache-cool
-
Build the Docker Image:
Now, create a Docker image for the project:
docker build -t cache-cool .
-
Run the Docker Container:
Make sure MongoDB is running and accessible. Update the
config.yaml
with your MongoDB connection details, then run:docker run -p 8000:8000 --env-file .env cache-cool
Replace
.env
with your environment file containing necessary environment variables (like MongoDB URI). -
Access the Application:
Open your browser and go to http://localhost:8000 to start using Cache-Cool!
-
Clone the repository:
First, download the project files:
git clone https://github.com/msnp1381/cache-cool.git cd cache-cool
-
Install Python Dependencies:
If you prefer using
requirements.txt
, install the dependencies as follows:python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` pip install -r requirements.txt
-
Run the Application with Uvicorn:
Start the FastAPI application using Uvicorn:
uvicorn app.main:app --reload
This will start the server at http://localhost:8000.
cache-cool uses a config.yaml
file for initial configuration. You can also update configurations dynamically using the /configure
endpoint.
llm_schemas:
openai:
endpoint: "https://api.openai.com/v1/chat/completions"
headers:
- "Content-Type: application/json"
- "Authorization: Bearer {api_key}"
temperature_threshold: 0.8
claude:
endpoint: "https://api.claude.ai/v1/chat/completions"
headers:
- "Content-Type: application/json"
- "Authorization: Bearer {api_key}"
temperature_threshold: 0.85
avalai:
endpoint: "https://api.avalapis.ir/v1/chat/completions"
headers:
- "Content-Type: application/json"
- "Authorization: {api_key}"
temperature_threshold: 0.85
groq:
endpoint: "https://api.groq.com/openai/v1/chat/completions"
headers:
- "Content-Type: application/json"
- "Authorization: {api_key}"
temperature_threshold: 0.8
mongodb:
uri: "mongodb://localhost:27017"
db_name: "llm_cache_db"
collection_name: "cache"
json_cache_file: "cache.json"
redis:
enabled: false
host: "localhost"
port: 6379
db: 0
current_llm_service: "openai"
use_json_cache: true
use_mongo_cache: true
cache_max_size: 3
Cache-Cool supports Least Recently Used (LRU) eviction to keep the cache size manageable and efficient.
- For JSON file and MongoDB caching, LRU is implemented by tracking the last access time of cache entries.
- You must enable
use_json_cache
oruse_mongo_cache
and setcache_max_size
in yourconfig.yaml
to activate LRU eviction. For example:
use_json_cache: true
use_mongo_cache: true
cache_max_size: 3
When the number of cached items exceeds cache_max_size
, the least recently accessed item is automatically evicted.
- For Redis, Cache-Cool relies on Redisβs built-in LRU eviction policies. To enable LRU in Redis, configure your
redis.conf
or via command line with:
maxmemory 100mb # Set maximum Redis memory usage (adjust as needed)
maxmemory-policy allkeys-lru # Use LRU eviction policy when maxmemory is exceeded
sudo systemctl restart redis # Restart Redis to apply changes
Make sure Redis caching is enabled in your config.yaml
:
redis:
enabled: true
host: "localhost"
port: 6379
db: 0
This way, Redis handles eviction automatically without Cache-Cool implementing it explicitly.
Hereβs how to use the API once the service is running:
look at usage.ipynb
We welcome contributions! Hereβs how you can help:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
This project includes contributions from:
This project is licensed under the MIT License - see the LICENSE file for details.
If you have any questions or issues, feel free to contact us at mohamadnematpoor@gmail.com.
Happy caching! π