Skip to content

High-performance file indexing system for Windows with Everything.exe integration, sharded storage, and real-time monitoring

License

Notifications You must be signed in to change notification settings

Built-Simple/file-indexer-pro

Repository files navigation

File Indexer Pro

A high-performance file indexing system for Windows that can handle millions of files with real-time monitoring capabilities.

🚀 Features

  • Lightning Fast: Integrates with Everything.exe for instant file discovery
  • Massive Scale: Handles 10M+ files efficiently with sharded storage
  • Real-time Monitoring: Continuously monitors file system changes
  • Archive Support: Indexes contents of ZIP, RAR, 7Z, and TAR files without extraction
  • Production Ready: Health monitoring, automatic recovery, and Windows service support
  • Smart Search: Full-text search with regex support and filtering
  • Snapshot System: Safe concurrent access while indexing continues
  • Resource Management: CPU and memory limiting with automatic throttling

📋 Requirements

  • Windows 10/11 (64-bit)
  • Python 3.10+
  • Everything.exe (optional but recommended for 50x faster indexing)
  • 8GB+ RAM recommended for large file systems
  • 10GB+ free disk space for index storage

🛠️ Installation

  1. Clone the repository:
git clone https://github.com/Built-Simple/file-indexer-pro.git
cd file-indexer-pro
  1. Install dependencies:
pip install -r requirements.txt
  1. Run initial setup:
python setup_indexer.py
  1. (Optional) Install Everything.exe for faster indexing:

🚦 Quick Start

Basic Usage

# Start the indexer
python run_production.py

# Create searchable snapshot
python snapshot_manager.py

# Search your files
python search_snapshot.py "*.pdf"

Advanced Usage

# Run as Windows service
python -m fileindexer.service install
python -m fileindexer.service start

# Monitor live stats
python live_stats_reader.py --watch

# Run snapshot daemon (updates every 30 minutes)
python reliable_snapshot_manager.py --daemon

# Search with filters
python search_snapshot.py "report" --type pdf --min-size 1MB

📁 Project Structure

file-indexer-pro/
├── fileindexer/              # Core production system
│   ├── core/                 # Main service architecture
│   ├── scanner/              # File scanning engine
│   ├── storage/              # Sharded storage system
│   ├── integrations/         # Everything.exe integration
│   └── monitoring/           # Health and metrics
├── snapshot_manager.py       # Basic snapshot creation
├── reliable_snapshot_manager.py  # Production snapshot system
├── search_snapshot.py        # Search interface
├── live_stats_reader.py      # Real-time statistics
├── setup_indexer.py          # Initial setup script
└── docs/                     # Documentation

🔧 Configuration

Create config/settings.json:

{
  "index_dir": "C:\\ProgramData\\FileIndexer\\index",
  "max_memory_gb": 4.0,
  "worker_processes": 4,
  "checkpoint_interval": 300,
  "enable_everything": true,
  "enable_archive_scanning": true,
  "drives": {
    "C:\\": {
      "priority": "low",
      "scan_archives": false,
      "excluded_paths": ["Windows", "Program Files"]
    },
    "D:\\": {
      "priority": "high",
      "scan_archives": true
    }
  }
}

📊 Performance

  • Initial scan: ~50,000 files/second with Everything.exe
  • Filesystem scan: ~5,000 files/second without Everything
  • Archive processing: ~100 archives/minute
  • Search response: <100ms for 10M files
  • Memory usage: ~2-4GB for 10M files
  • Disk usage: ~1GB per million files indexed

🔍 Search Examples

# Basic search
python search_snapshot.py "budget.xlsx"

# Regex search
python search_snapshot.py "IMG_202[0-9].*\.jpg$"

# Search with filters
python search_snapshot.py "video" --type mp4 --min-size 100MB

# Search in archives
python search_snapshot.py "report.pdf" --archives

# Interactive search mode
python search_snapshot.py --interactive

🏥 Health Monitoring

The indexer provides health endpoints when running:

  • Health check: http://localhost:8888/health
  • Metrics: http://localhost:8888/metrics
  • Ready probe: http://localhost:8888/ready
  • Live probe: http://localhost:8888/live

🛡️ Reliability Features

  • Automatic recovery from crashes
  • Transaction logging for data integrity
  • Checkpoint system for resumable operations
  • Circuit breakers for external services
  • Resource limiting to prevent system overload
  • Shard verification and repair
  • Process recycling to prevent memory leaks

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide first.

Development Setup

# Install dev dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Check code style
black --check .
flake8 .

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

🙏 Acknowledgments

  • Everything.exe by voidtools for ultra-fast file indexing
  • The Python community for excellent libraries
  • Contributors and testers who helped improve this project

📞 Support


Status: 🟢 Active Development
Version: 0.1.0-alpha
Python: 3.10+
Platform: Windows

About

High-performance file indexing system for Windows with Everything.exe integration, sharded storage, and real-time monitoring

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages