Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding

@misc{kamigaito2025diversityexplainsinferencescaling,
      title={Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding}, 
      author={Hidetaka Kamigaito and Hiroyuki Deguchi and Yusuke Sakai and Katsuhiko Hayashi and Taro Watanabe},
      year={2025},
      eprint={2410.15021},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.15021}, 
}

Reported Results

Evaluation Results

The results for each setting are located in ./output/(full|1000)/results.

Generated Samples

The generated samples are located in ./output/(full|1000)/samples.

Generated Texts by MBR Decoding

The generated texts by MBR decoding are located in ./output/(full|1000)/decoded. Due to the space limitation of GitHub, we put its directories on our Google Drive.

Reproduce Results

Setup

python3.11 -m venv venv

Sample Generation

First, you need to prepare datasets for each task in model-based-mbr/dataset

Example: WMT'19 En-De Use sacrebleu to prepare the benchmark dataset.

cd ./model-based-mbr
mkdir -p ./dataset/wmt19-text
sacrebleu -t wmt19 -l en-de --echo src > ./dataset/wmt19-text/wmt19.en-de.en
sacrebleu -t wmt19 -l en-de --echo ref > ./dataset/wmt19-text/wmt19.en-de.de

Note that about the case of CNN/DailyMail and XSum datasets, you can refer to the explanations in (https://github.com/huggingface/transformers/blob/main/examples/legacy/seq2seq/README.md).

Then, setup required modules:

cd ./model-based-mbr
../venv/bin/pip3.11 install -r requirements.txt

After that, you can generate samples for each setting by the following commands:

cd ./model-based-mbr
# Full samples
bash ./jobs/all/(ancestral|beam|epsilon|nuclues|topk).sh (mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)
# 1000 samples
bash ./jobs/1000/(ancestral|beam|epsilon|nuclues|topk).sh (mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)

The generated results will be stored in model-based-mbr/samples/.

MBR decoding

First, you need to setup following modules

cd ./mbrs
../venv/bin/pip3.11 install -e .
cd ./transformers
../venv/bin/pip3.11 install -e .
cd ./bert_score
../venv/bin/pip3.11 install -e .

Then, you can run MBR decoding by the following commands:

cd mbrs
bash ./jobs/(1000|full)/(mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).sh

MAMBR decoding

First, you need to finetune models:

BERTScore

./scripts/finetune.sh [0-7]

COMET

cd models/comet
./scripts/download.sh
./scripts/finetune.sh [0-7]

You can run MAMBR decoding by the following commands:

cd mbrs
bash ./jobs/(1000|full)/(mscoco-ft|nocaps|samsum|wmt19.en-de|wmt19.en-ru|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).sh

Evaluation

You can reproduce our results by running the following scripts:

# Bias and diversity results in main part 
./scripts/evaluate_bias_diversity.sh
# MAMBE results
./scripts/evaluate_mambr.sh
# Bias and diversity results in appendices 
./scripts/evaluate_bias_diversity_sup.sh

When you need to obtain the results on spot, you should see the following subsections.

Translation

# MBR decoding
./venv/bin/python3.11 ./scripts/evaluate/mt.py --input=./output/(1000|full)/decoded/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl.gz --output=./output/(1000|full)/results/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl

# MAMBR decoding
./venv/bin/python3.11 ./scripts/evaluate/mt.py --input=./output/(1000|full)/decoded/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl.gz --output=./output/(1000|full)/results/(wmt19.en-de|wmt19.en-ru)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl

Caption Generation (MSCOCO and NoCaps)

# MBR decoding
./venv/bin/python3.11 ./scripts/evaluate/captioning.py --input=./output/(1000|full)/decoded/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl.gz --output=./output/(1000|full)/results/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl

# MAMBR decoding
./venv/bin/python3.11 ./scripts/evaluate/captioning.py --input=./output/(1000|full)/decoded/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl.gz --output=./output/(1000|full)/results/(mscoco-ft|nocaps)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl

Summarization (Samsum and XSum)

# MBR decoding
./venv/bin/python3.11 ./scripts/evaluate/summarization.py --input=./output/(1000|full)/decoded/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl.gz --output=./output/(1000|full)/results/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064).jsonl

# MAMBR decoding
./venv/bin/python3.11 ./scripts/evaluate/summarization.py --input=./output/(1000|full)/decoded/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl.gz --output=./output/(1000|full)/results/(samsum|xsum)/(ancestral|beam|epsilon|nuclues|topk)_(004|008|016|032|064)_(001|002|004|008).jsonl

Note: versions of each module

We modified the following versions of modules:

mbrs: bdadd3a37ea8d97cb802c80daa57c8bba0f95a7b
model-based-mbr: 445c6123c1afe8ba10a91c3d91e104f2b0ceb8ea
bert_score: 19e7f551fe4fa43fdd07b8129ae947015b902b2d
transformers: c409cd81777fb27aadc043ed3d8339dbc020fb3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding

Reported Results

Evaluation Results

Generated Samples

Generated Texts by MBR Decoding

Reproduce Results

Setup

Sample Generation

MBR decoding

MAMBR decoding

Evaluation

Translation

Caption Generation (MSCOCO and NoCaps)

Summarization (Samsum and XSum)

Note: versions of each module

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bert_score		bert_score
mbrs		mbrs
model-based-mbr		model-based-mbr
models		models
output		output
scripts		scripts
transformers		transformers
.gitignore		.gitignore
README.md		README.md

naist-nlp/mbr-bias-diversity

Folders and files

Latest commit

History

Repository files navigation

Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding

Reported Results

Evaluation Results

Generated Samples

Generated Texts by MBR Decoding

Reproduce Results

Setup

Sample Generation

MBR decoding

MAMBR decoding

Evaluation

Translation

Caption Generation (MSCOCO and NoCaps)

Summarization (Samsum and XSum)

Note: versions of each module

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages