Skip to content

naist-nlp/mbr-bias-diversity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding

@misc{kamigaito2025diversityexplainsinferencescaling,
      title={Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding}, 
      author={Hidetaka Kamigaito and Hiroyuki Deguchi and Yusuke Sakai and Katsuhiko Hayashi and Taro Watanabe},
      year={2025},
      eprint={2410.15021},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.15021}, 
}

Reported Results

Evaluation Results

The results for each setting are located in ./output/(full|1000)/results.

Generated Samples

The generated samples are located in ./output/(full|1000)/samples.

Generated Texts by MBR Decoding

The generated texts by MBR decoding are located in ./output/(full|1000)/decoded. Due to the space limitation of GitHub, we put its directories on our Google Drive.

Reproduce Results

Setup

python3.11 -m venv venv

Sample Generation

First, you need to prepare datasets for each task in model-based-mbr/dataset

Example: WMT'19 En-De Use sacrebleu to prepare the benchmark dataset.

cd ./model-based-mbr
mkdir -p ./dataset/wmt19-text
sacrebleu -t wmt19 -l en-de --echo src > ./dataset/wmt19-text/wmt19.en-de.en
sacrebleu -t wmt19 -l en-de --echo ref > ./dataset/wmt19-text/wmt19.en-de.de

Note that about the case of CNN/DailyMail and XSum datasets, you can refer to the explanations in (https://github.com/huggingface/transformers/blob/main/examples/legacy/seq2seq/README.md).

Then, setup required modules:

cd ./model-based-mbr
../venv/bin/pip3.11 install -r requirements.txt

After that, you can generate samples for each setting by the following commands:

cd ./model-based-mbr
# Full samples
bash ./jobs/all/(ancestral|beam|epsilon|nuclues|topk).sh (mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)
# 1000 samples
bash ./jobs/1000/(ancestral|beam|epsilon|nuclues|topk).sh (mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)

The generated results will be stored in model-based-mbr/samples/.

MBR decoding

First, you need to setup following modules

cd ./mbrs
../venv/bin/pip3.11 install -e .
cd ./transformers
../venv/bin/pip3.11 install -e .
cd ./bert_score
../venv/bin/pip3.11 install -e .

Then, you can run MBR decoding by the following commands:

cd mbrs
bash ./jobs/(1000|full)/(mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).sh

MAMBR decoding

First, you need to finetune models:

BERTScore

./scripts/finetune.sh [0-7]

COMET

cd models/comet
./scripts/download.sh
./scripts/finetune.sh [0-7]

You can run MAMBR decoding by the following commands:

cd mbrs
bash ./jobs/(1000|full)/(mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).sh

Evaluation

You can reproduce our results by running the following scripts:

# Bias and diversity results in main part 
./scripts/evaluate_bias_diversity.sh
# MAMBE results
./scripts/evaluate_mambr.sh
# Bias and diversity results in appendices 
./scripts/evaluate_bias_diversity_sup.sh

When you need to obtain the results on spot, you should see the following subsections.

Translation

# MBR decoding
./venv/bin/python3.11 ./scripts/evaluate/mt.py --input=./output/(1000|full)/decoded/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl.gz --output=./output/(1000|full)/results/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl

# MAMBR decoding
./venv/bin/python3.11 ./scripts/evaluate/mt.py --input=./output/(1000|full)/decoded/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl.gz --output=./output/(1000|full)/results/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl

Caption Generation (MSCOCO and NoCaps)

# MBR decoding
./venv/bin/python3.11 ./scripts/evaluate/captioning.py --input=./output/(1000|full)/decoded/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl.gz --output=./output/(1000|full)/results/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl

# MAMBR decoding
./venv/bin/python3.11 ./scripts/evaluate/captioning.py --input=./output/(1000|full)/decoded/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl.gz --output=./output/(1000|full)/results/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl

Summarization (Samsum and XSum)

# MBR decoding
./venv/bin/python3.11 ./scripts/evaluate/summarization.py --input=./output/(1000|full)/decoded/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl.gz --output=./output/(1000|full)/results/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl

# MAMBR decoding
./venv/bin/python3.11 ./scripts/evaluate/summarization.py --input=./output/(1000|full)/decoded/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl.gz --output=./output/(1000|full)/results/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl

Note: versions of each module

We modified the following versions of modules:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published