Data Provenance Tracker is a Go-based system for tracking data lineage and provenance in data processing pipelines. It records and visualizes the complete history of data transformations, including filtering, joining, aggregation, and other operations. This project helps ensure transparency, reproducibility, and compliance in data workflows.
- Track data transformations in a Directed Acyclic Graph (DAG)
- Record metadata for each transformation step
- Visualize data lineage (planned)
- Query provenance information (lineage, impact)
- Support for common data operations: filter, join, aggregate, custom
- Integration with data science and ETL workflows
- Data science pipelines
- ETL process tracking
- Regulatory compliance and audit trails
- Data quality management
- Debugging and tracing data transformations
- Go 1.21 or higher
- PostgreSQL (planned, for persistent storage)
git clone https://github.com/0xReLogic/Data-Provenance-Tracker.git
cd Data-Provenance-Tracker
go mod download
Build the CLI tool:
go build -o provenance ./cmd/provenance
Register a data source:
provenance -command=register-source -source-name="raw_data.csv" -metadata=metadata.json
Track a transformation:
provenance -command=track-transformation -operation=filter -parent-ids=<sourceID> -metadata=filter_metadata.json
Export the provenance graph:
provenance -command=export-graph -output-file=lineage.json
package main
import (
"github.com/0xReLogic/Data-Provenance-Tracker/tracker"
)
func main() {
t := tracker.New()
sourceID := t.RegisterSource("raw_data.csv", map[string]interface{}{
"format": "CSV",
"rows": 10000,
})
filteredID := t.TrackTransformation(
"filter",
[]string{sourceID},
map[string]interface{}{"condition": "age > 18"},
)
aggregatedID := t.TrackTransformation(
"aggregate",
[]string{filteredID},
map[string]interface{}{"function": "AVG", "column": "salary"},
)
t.ExportGraph("lineage.json")
}
See more examples in examples/basic_usage.go
and examples/data_science_pipeline.go
.
Contributions are welcome! Please open issues or pull requests for bug fixes, new features, or improvements. For major changes, please discuss them first via issue.
For questions, suggestions, or collaboration, please contact Allen Elzayn via GitHub Issues or email (see profile).
This project is licensed under the MIT License. See the LICENSE file for details.