A high-performance file indexing system for Windows that can handle millions of files with real-time monitoring capabilities.
- Lightning Fast: Integrates with Everything.exe for instant file discovery
- Massive Scale: Handles 10M+ files efficiently with sharded storage
- Real-time Monitoring: Continuously monitors file system changes
- Archive Support: Indexes contents of ZIP, RAR, 7Z, and TAR files without extraction
- Production Ready: Health monitoring, automatic recovery, and Windows service support
- Smart Search: Full-text search with regex support and filtering
- Snapshot System: Safe concurrent access while indexing continues
- Resource Management: CPU and memory limiting with automatic throttling
- Windows 10/11 (64-bit)
- Python 3.10+
- Everything.exe (optional but recommended for 50x faster indexing)
- 8GB+ RAM recommended for large file systems
- 10GB+ free disk space for index storage
- Clone the repository:
git clone https://github.com/Built-Simple/file-indexer-pro.git
cd file-indexer-pro
- Install dependencies:
pip install -r requirements.txt
- Run initial setup:
python setup_indexer.py
- (Optional) Install Everything.exe for faster indexing:
- Download from https://www.voidtools.com/
- Install and run Everything.exe
- The indexer will automatically detect and use it
# Start the indexer
python run_production.py
# Create searchable snapshot
python snapshot_manager.py
# Search your files
python search_snapshot.py "*.pdf"
# Run as Windows service
python -m fileindexer.service install
python -m fileindexer.service start
# Monitor live stats
python live_stats_reader.py --watch
# Run snapshot daemon (updates every 30 minutes)
python reliable_snapshot_manager.py --daemon
# Search with filters
python search_snapshot.py "report" --type pdf --min-size 1MB
file-indexer-pro/
├── fileindexer/ # Core production system
│ ├── core/ # Main service architecture
│ ├── scanner/ # File scanning engine
│ ├── storage/ # Sharded storage system
│ ├── integrations/ # Everything.exe integration
│ └── monitoring/ # Health and metrics
├── snapshot_manager.py # Basic snapshot creation
├── reliable_snapshot_manager.py # Production snapshot system
├── search_snapshot.py # Search interface
├── live_stats_reader.py # Real-time statistics
├── setup_indexer.py # Initial setup script
└── docs/ # Documentation
Create config/settings.json
:
{
"index_dir": "C:\\ProgramData\\FileIndexer\\index",
"max_memory_gb": 4.0,
"worker_processes": 4,
"checkpoint_interval": 300,
"enable_everything": true,
"enable_archive_scanning": true,
"drives": {
"C:\\": {
"priority": "low",
"scan_archives": false,
"excluded_paths": ["Windows", "Program Files"]
},
"D:\\": {
"priority": "high",
"scan_archives": true
}
}
}
- Initial scan: ~50,000 files/second with Everything.exe
- Filesystem scan: ~5,000 files/second without Everything
- Archive processing: ~100 archives/minute
- Search response: <100ms for 10M files
- Memory usage: ~2-4GB for 10M files
- Disk usage: ~1GB per million files indexed
# Basic search
python search_snapshot.py "budget.xlsx"
# Regex search
python search_snapshot.py "IMG_202[0-9].*\.jpg$"
# Search with filters
python search_snapshot.py "video" --type mp4 --min-size 100MB
# Search in archives
python search_snapshot.py "report.pdf" --archives
# Interactive search mode
python search_snapshot.py --interactive
The indexer provides health endpoints when running:
- Health check:
http://localhost:8888/health
- Metrics:
http://localhost:8888/metrics
- Ready probe:
http://localhost:8888/ready
- Live probe:
http://localhost:8888/live
- Automatic recovery from crashes
- Transaction logging for data integrity
- Checkpoint system for resumable operations
- Circuit breakers for external services
- Resource limiting to prevent system overload
- Shard verification and repair
- Process recycling to prevent memory leaks
Contributions are welcome! Please read our Contributing Guide first.
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Check code style
black --check .
flake8 .
This project is licensed under the MIT License - see LICENSE file for details.
- Everything.exe by voidtools for ultra-fast file indexing
- The Python community for excellent libraries
- Contributors and testers who helped improve this project
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Wiki: Project Wiki
Status: 🟢 Active Development
Version: 0.1.0-alpha
Python: 3.10+
Platform: Windows