The content of this repository comprises the work which is being been developed in the scope of RESCUE (RESilient Cloud for EUropE) project. The objective is to develop reusable, modular components to strengthen reliability and recover capabilities for (critical) digital services. Pilot Cyber Resilient Digital Twins for Data Centers and Edges that use open cloud infrastructure and are capable of hosting mission-critical applications at large scale.
GenAI-powered System for Privacy incident Analysis and Recovery
GASPAR is a comprehensive system designed to:
- Extract privacy-related fields from assessment documents
- Model data distributions and sample input data based on extracted fields
- Detect anomalous values in sampled data batches
- Create code filters to exclude anomalous data
- Safely deploy filters to quarantine problematic data
- Privacy rule extraction from IPA documents
- Adaptive data sampling based on distribution modeling
- Statistical and distribution-based anomaly detection
- Automatic filter generation for anomalous data
- Safe deployment with monitoring capabilities
- Support for multiple LLM providers (OpenAI, Anthropic, Mistral)
- Azure integration for cloud deployment
Local installation can be done using uv
:
$ uv python install 3.10
$ uv python pin 3.10
$ uv venv -p python3.10
$ source .venv/bin/activate
$ uv pip install -e .
For development mode:
$ pip install -e ".[dev]"
For Databricks deployment
pip install -e ".[databricks]"
from gaspar.config import load_config
from gaspar.pipeline.executor import PipelineExecutor
import asyncio
async def main():
# Load configuration from file
config = load_config("config.yaml")
# Initialize pipeline
executor = PipelineExecutor(config)
# Process IPA document
result = await executor.execute("documents/privacy_assessment.txt")
if result.success:
print("Privacy rules extracted successfully")
print(f"Generated {len(result.outputs['filters'])} filters")
if __name__ == "__main__":
asyncio.run(main())
After installation, the gaspar
command-line tool is available:
$ gaspar document.txt
# Process IPA document and start monitoring
$ gaspar -c config.yaml document.txt
# Use custom configuration
$ gaspar -v document.txt
# Enable verbose logging
GASPAR can be configured via YAML files. Example configuration:
# LLM Model Configuration
model:
provider: "openai"
model_name: "gpt-4"
api_key: ${OPENAI_API_KEY}
# Storage Configuration
storage:
type: "local"
local_path: "./data"
# Pipeline Configuration
pipeline:
batch_size: 100
max_retries: 3
temp_directory: "./temp"
For the configuration to be used in the repo, it was adapted to use a locally deployed version of GPT-4 To use Openai API or any other LLM API provider, update openai_model.py or create a new one
Running the tests can be done using tox
:
$ tox -p
Building the packages:
$ tox -e packages
$ ls dist/
Apache-2.0