.
├── docker-compose.yml
├── Dockerfile
├── .env
├── pyproject.toml
├── uv.lock
├── .pre-commit-config.yaml
├── README.md
├── data/
│ └── qdrant/
├── src/
│ ├── fdds/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── handlers.py
│ │ ├── inference.py
│ │ ├── evaluation.py
│ │ ├── evaluation_pipeline.py
│ │ ├── manage_pdfs.py
│ │ └── reranker.py
│ ├── ui-build/
│ └── chat.py
docker-compose.yml
Defines and manages services like Qdrant (vector DB), the API backend, and Jaeger (for monitoring and tracing).Dockerfile
: Builds the backend API service that serves the RAG-based chatbot.pyproject.toml
anduv.lock
: Configuration files for uv and project dependencies..pre-commit-config.yaml
: Configuration for pre-commit hooks to ensure code quality.data/qdrant
: Persistent volume for Qdrant's vector data storage.src/fdds/inference.py
contains methods to process a query and generate responses based on contextual data using RAG.src/fdds/manage_pdfs.py
script to ingest and delete PDF files from the list of URLs in Qdrant.src/fdds/evaluation.py
Evaluates RAG using a defined pipeline in thesrc/fdds/evaluation_pipeline.py
file (requires NEPTUNE_API_KEY).src/fdds/config.py
holds configuration settings for the project.src/ui-build
: Precompiled frontend UI assets for the chatbot interface.src/chat.py
: Contains the coreMyChat
class responsible for managing conversation flow.
- Python 3.11+
- Docker and Docker Compose
- Pre-commit
- uv (https://docs.astral.sh/uv/)
git clone git@github.com:deepsense-ai/fdds-rag.git
cd fdds-rag
uv python install 3.11
uv sync
pre-commit install
Ensure you created a .env
file in the project root directory with all needed variables which are listed below:
OPEN_API_KEY=<your_api_key>
NEPTUNE_API_KEY=<your_api_key>
QDRANT_API_KEY=<your_api_key>
API_KEY=<your_api_key>
Replace the placeholders with your actual credentials and settings, and modify src/fdds/config.py
to ensure it holds all necessary configurations like paths and connection details.
Before proceeding, ensure that Docker is running on your machine. Then, start the services using:
docker-compose up -d
This command will initialize and run three services in detached mode:
- Qdrant: Vector database
- API: Handles model inference and provides the assistant interface
- Jaeger: Monitoring and tracing system
All service ports are configured in
config.py
anddocker-compose.yml
.
If you don't already have a list of PDF URLs to ingest, you can generate one by running the web scraping script:
cd scripts/fdds_scrapper
uv run scrapy crawl get_pdf_links
This will crawl and extract all PDF file links from the sections specified in the start_urls list, which is defined in: scripts/fdds_scrapper/fdds_scrapper/spiders/pdf_spider.py
The collected links will be saved as: scripts/fdds_scrapper/pdfs.txt
.
Note: To customize which sections are scraped, modify the start_urls list in
pdf_spider.py
.
To load PDF documents into the Qdrant database, prepare a .txt file containing one PDF URL per line (no delimiters or special characters). If you followed the previous step, this file is already generated. To ingest the documents, run:
uv run fdds/src/manage_pdfs.py --ingest <path_to_txt_file>
To delete the corresponding documents from Qdrant, use:
uv run fdds/src/manage_pdfs.py --delete <path_to_txt_file>