Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

🌟 Overview

This repository accompanies the paper "Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution", which introduces a Grammatical Evolution (GE)-based framework for the automatic design and optimization of semantic similarity ensembles. Our method uses evolutionary computation to create optimized, interpretable ensembles that outperform state-of-the-art techniques on genetic ensembles when solving benchmark datasets.

✨ Key Contributions

First Application of Grammatical Evolution in Semantic Similarity: Introducing a novel approach to ensemble design.
Dynamic Similarity Aggregation: Automatically selects and combines multiple similarity measures for optimal performance.
Interpretability and Accuracy: Ensuring high correlation with human judgments while maintaining transparency.
Benchmark Validation: Rigorous evaluation against established datasets (MC30, GeReSiD50, WS353).

📊 Features

Automated Ensemble Learning: Uses Backus-Naur Form (BNF) grammar to guide the evolutionary process.
Optimized for Semantic Similarity: Evaluates ensembles based on Pearson and Spearman correlation coefficients.
Seamless Integration with PonyGE2: Built on the PonyGE2 framework for genetic programming.
Extensive Benchmarking: Compared with state-of-the-art methods across multiple datasets.

🛠️ Installation

To set up the environment and run experiments:

Clone the PonyGE2 repository:

git clone https://github.com/PonyGE/PonyGE2.git

Clone this repository:

git clone https://github.com/jorge-martinez-gil/sesige.git

Overwrite PonyGE2 files with those provided in this repository:
```
cp -r ./sesige/* ./PonyGE2/
```
Install dependencies:
```
pip install -r requirements.txt
```

📈 Datasets

We evaluate our method using the following benchmark datasets:

MC30: 30 word pairs with human-annotated similarity scores.
GeReSiD50: 50 phrase pairs from geospatial research, assessing domain-specific generalization.
WS353: 353 words widely used for evaluating semantic similarity in NLP.

⚙️ Usage

To run the Grammatical Evolution process:

Navigate to the src directory of PonyGE2:
```
cd ./PonyGE2/src
```

Execute the training script with the provided parameter file:

python ponyge.py --parameters ./parameters/semantic_similarity.txt

Results are stored in the output directory specified in the parameter file.

🧪 Experimental Results

We evaluated the GE-based ensembles against state-of-the-art methods, using Pearson (PCC) and Spearman (SRCC) correlation coefficients:

Dataset	Metric	GE	State-of-the-Art
MC30	PCC	0.794	0.845 (LGP)
	SRCC	0.859	0.822 (LGP)
GeReSiD50	PCC	0.743	0.756 (LGP)
	SRCC	0.779	0.752 (LGP)
WS353	PCC	0.827	0.817 (LGP)
	SRCC	0.817	0.817 (LGP)

🧬 Technical Details

Fitness Function

Optimized to maximize correlation with human-annotated similarity judgments (PCC & SRCC).

Genetic Operators

Crossover: Variable one-point crossover (probability = 0.8).
Mutation: Integer flip per codon.

Grammatical Evolution

Utilizes BNF grammar to define ensemble configurations.
Evolves candidate ensembles iteratively through genetic programming.

📚 Citation

If you use this work in your research, please cite:

@article{martinez2023semanticGE,
  author       = {Jorge Martinez-Gil},
  title        = {Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution},
  journal      = {CoRR},
  volume       = {abs/2307.00925},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2307.00925},
  doi          = {10.48550/arXiv.2307.00925},
  eprinttype   = {arXiv},
  eprint       = {2307.00925}
}

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
datasets		datasets
grammars		grammars
parameters		parameters
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

🌟 Overview

✨ Key Contributions

📊 Features

🛠️ Installation

📈 Datasets

⚙️ Usage

🧪 Experimental Results

🧬 Technical Details

Fitness Function

Genetic Operators

Grammatical Evolution

📚 Citation

📄 License

About

Uh oh!

Uh oh!

Languages

License

jorge-martinez-gil/sesige

Folders and files

Latest commit

History

Repository files navigation

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

🌟 Overview

✨ Key Contributions

📊 Features

🛠️ Installation

📈 Datasets

⚙️ Usage

🧪 Experimental Results

🧬 Technical Details

Fitness Function

Genetic Operators

Grammatical Evolution

📚 Citation

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages