Skip to content

Data Provenance Tracker is a Go-based system for tracking data lineage and provenance in data processing pipelines. It records and visualizes the complete history of data transformations, including filtering, joining, aggregation, and other operations.

License

Notifications You must be signed in to change notification settings

0xReLogic/Data-Provenance-Tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Provenance Tracker

Data Provenance Tracker is a Go-based system for tracking data lineage and provenance in data processing pipelines. It records and visualizes the complete history of data transformations, including filtering, joining, aggregation, and other operations. This project helps ensure transparency, reproducibility, and compliance in data workflows.

Features

  • Track data transformations in a Directed Acyclic Graph (DAG)
  • Record metadata for each transformation step
  • Visualize data lineage (planned)
  • Query provenance information (lineage, impact)
  • Support for common data operations: filter, join, aggregate, custom
  • Integration with data science and ETL workflows

Use Cases

  • Data science pipelines
  • ETL process tracking
  • Regulatory compliance and audit trails
  • Data quality management
  • Debugging and tracing data transformations

Getting Started

Prerequisites

  • Go 1.21 or higher
  • PostgreSQL (planned, for persistent storage)

Installation

git clone https://github.com/0xReLogic/Data-Provenance-Tracker.git
cd Data-Provenance-Tracker
go mod download

Quick Start (CLI)

Build the CLI tool:

go build -o provenance ./cmd/provenance

Register a data source:

provenance -command=register-source -source-name="raw_data.csv" -metadata=metadata.json

Track a transformation:

provenance -command=track-transformation -operation=filter -parent-ids=<sourceID> -metadata=filter_metadata.json

Export the provenance graph:

provenance -command=export-graph -output-file=lineage.json

Example Usage (Go API)

package main

import (
    "github.com/0xReLogic/Data-Provenance-Tracker/tracker"
)

func main() {
    t := tracker.New()
    sourceID := t.RegisterSource("raw_data.csv", map[string]interface{}{
        "format": "CSV",
        "rows": 10000,
    })
    filteredID := t.TrackTransformation(
        "filter",
        []string{sourceID},
        map[string]interface{}{"condition": "age > 18"},
    )
    aggregatedID := t.TrackTransformation(
        "aggregate",
        []string{filteredID},
        map[string]interface{}{"function": "AVG", "column": "salary"},
    )
    t.ExportGraph("lineage.json")
}

See more examples in examples/basic_usage.go and examples/data_science_pipeline.go.

Contributing

Contributions are welcome! Please open issues or pull requests for bug fixes, new features, or improvements. For major changes, please discuss them first via issue.

Contact

For questions, suggestions, or collaboration, please contact Allen Elzayn via GitHub Issues or email (see profile).

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Data Provenance Tracker is a Go-based system for tracking data lineage and provenance in data processing pipelines. It records and visualizes the complete history of data transformations, including filtering, joining, aggregation, and other operations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages