YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
aurekai/semantic-cache-bench
Semantic caching benchmarks and performance suite for Aurekai. Validates cache consistency, hit rates, and query latency across different model architectures and corpus sizes.
Overview
Semantic caching is a core optimization in Aurekai that deduplicates semantically similar queries without exact matching. This repository hosts:
- Benchmark Datasets: Query corpora with semantic similarity annotations
- Evaluation Scripts: Performance measurement and validation tools
- Results: Baseline metrics across different models and cache configurations
- Methodology: Detailed documentation of benchmark setup and evaluation protocols
Quick Start
# Download benchmark suite
git clone https://huggingface.co/aurekai/semantic-cache-bench
cd semantic-cache-bench
# Run quick benchmark
akai semantic-cache:bench \
--dataset queries-10k.jsonl \
--model qwen3-8b \
--cache-size 1GB \
--output results.json
# Compare results
akai semantic-cache:compare \
--baseline baseline-results.json \
--current results.json
Benchmark Datasets
queries-1k (Minimal Validation)
- Size: 1,024 queries
- Purpose: Quick validation of cache functionality
- Format: JSONL with semantic similarity pairs
- Runtime: ~5 minutes on GPU
Schema:
{
"id": "q_001_234",
"query": "What are the benefits of renewable energy?",
"semantic_variations": [
"Advantages of wind and solar power",
"Why should we invest in renewables?"
],
"dissimilar_queries": [
"How do fossil fuels work?"
],
"expected_cache_hit": true,
"similarity_threshold": 0.87
}
queries-10k (Standard Benchmark)
- Size: 10,240 queries
- Purpose: Standard performance baseline
- Corpus: Diverse knowledge domains and query patterns
- Expected cache hit rate: 68-72%
- Runtime: ~45 minutes on GPU
queries-100k (Comprehensive)
- Size: 102,400 queries
- Purpose: Large-scale cache behavior validation
- Corpus: Realistic production query distribution
- Expected cache hit rate: 72-76%
- Runtime: ~8 hours on high-end GPU
- Disk space: 15 GB decompressed
Metrics & Evaluation
Cache Performance
| Metric | Qwen3-8B | LLaMA3-8B | Meaning |
|---|---|---|---|
| Hit Rate | 71.2% | 69.8% | % of queries found in cache |
| False Positives | 0.3% | 0.4% | Incorrect cache matches |
| False Negatives | 2.1% | 2.4% | Missed cache opportunities |
| Recall @ 0.90 | 94.2% | 92.8% | True positives at high threshold |
Latency Improvement
Cache Miss: 125 ms (full inference)
Cache Hit: 2 ms (embedding lookup + cache retrieval)
Speedup: 62.5x
Average (71% hit rate): 125*0.29 + 2*0.71 = 38 ms
Effective speedup: 3.3x vs. no cache
Memory Efficiency
- Embedding cache size: ~800 MB for 100K queries
- Memory per cached embedding: ~8 KB
- Compression ratio: 1.4x with optional zstd compression
- Peak memory during benchmark: 4 GB (with batch size 32)
Running Benchmarks
Standard Evaluation
# Benchmark specific model
akai semantic-cache:bench \
--model qwen3-8b \
--dataset queries-10k.jsonl \
--batch-size 32 \
--cache-size 2GB \
--output results.json
# With logging
akai semantic-cache:bench \
--model qwen3-8b \
--dataset queries-10k.jsonl \
--cache-size 2GB \
--verbose \
--log-interval 100 \
--output results.json
Comparison Between Models
# Run on multiple models
for model in qwen3-8b llama3-8b; do
akai semantic-cache:bench \
--model $model \
--dataset queries-10k.jsonl \
--output results-$model.json
done
# Compare results
akai semantic-cache:compare \
--results results-qwen3-8b.json results-llama3-8b.json \
--report comparison-report.md
Validation with Custom Threshold
# Test different similarity thresholds
akai semantic-cache:threshold-sweep \
--model qwen3-8b \
--dataset queries-10k.jsonl \
--thresholds "0.80,0.85,0.90,0.95" \
--output threshold-sweep.json
Benchmark Results
Latest Results (Aurekai v0.8.0-alpha.1)
Hardware: NVIDIA H100 80GB, AMD EPYC 9654 Date: 2026-05-02
| Model | Dataset | Hit Rate | P@0.90 | Latency (hit) | Latency (miss) |
|---|---|---|---|---|---|
| Qwen3-8B | 1K | 72.3% | 94.1% | 1.8ms | 124ms |
| Qwen3-8B | 10K | 71.2% | 93.8% | 1.9ms | 126ms |
| LLaMA3-8B | 1K | 70.1% | 92.4% | 2.1ms | 127ms |
| LLaMA3-8B | 10K | 69.8% | 92.1% | 2.2ms | 129ms |
See results/ for detailed breakdowns by domain and query type.
Implementation Notes
Cache Configuration
{
"semantic_cache": {
"enabled": true,
"similarity_threshold": 0.88,
"max_cache_size": "2GB",
"eviction_policy": "lru",
"embedding_model": "qwen3-8b",
"batch_size": 32,
"use_mmap": true
}
}
Threshold Selection
- Aggressive (0.80): High hit rate (75%+), more false positives
- Balanced (0.88): Recommended default, 71% hit rate, minimal false positives
- Conservative (0.95): Very few false positives, lower hit rate
Methodology
Query Similarity Annotation
Each benchmark dataset includes human-validated semantic similarity annotations:
- Query pairs sampled from corpus
- Annotators rate similarity (0-1)
- Disagreements resolved with third annotator
- Inter-rater reliability: Krippendorff's α = 0.89
Cache Consistency Validation
All cache results validated against ground truth:
# For each cached result:
1. Verify embedding matches original query
2. Re-rank all cached results for current query
3. Confirm top match was indeed in cache
4. Validate latency was significantly improved
Contributing Results
To contribute benchmark results:
- Run benchmark suite on your hardware
- Include system specs (GPU, CPU, memory, disk)
- Report all metrics from evaluation output
- Submit results via PR with hardware metadata
Result file format:
{
"metadata": {
"hardware": "NVIDIA H100, 512GB RAM",
"date": "2026-05-02",
"aurekai_version": "0.8.0-alpha.1"
},
"results": [
{
"model": "qwen3-8b",
"dataset": "queries-10k",
"hit_rate": 0.712,
"recall_at_0_90": 0.938
}
]
}
Related Repositories
- Main Aurekai Repo: https://github.com/aurekai/aurekai
- Model Memory: https://huggingface.co/aurekai/model-memory
- SAE Dictionaries: https://huggingface.co/aurekai/sae-dictionaries
- FPQx Alignments: https://huggingface.co/aurekai/fpqx-alignments
Tools & Scripts
akai semantic-cache:bench: Run full benchmark suiteakai semantic-cache:compare: Compare benchmark resultsakai semantic-cache:threshold-sweep: Test different thresholdsbenchmark_to_csv.py: Export results to CSV formatvisualize_results.py: Generate performance plots
Citation
If you reference these benchmarks in research:
@dataset{aurekai_semantic_cache_bench_2026,
title={Aurekai Semantic Cache Benchmarks},
author={Aurekai Community},
year={2026},
url={https://huggingface.co/aurekai/semantic-cache-bench}
}
License
Licensed under the Aurekai Open Source License. See main repository for details.