❄️ Collapse of Dense Retrievers [ ACL 2025 ] ❄️

A Framework for Identifying Biases in Retrievers

⚠️ The best accuracy of Dense Retrievers on the foil (default) set is lower than 🔴10%🔴.

Retrievers consistently score document_1 higher than document_2 in all subsets.
⇒ Retrieval biases often outweigh the impact of answer presence.

🏆 Leaderboard 🏆

Model	Accuracy	Paired t-Test Statistic	p-value
🥇ReasonIR-8B 🆕	8.0%	-36.92	< 0.01
🥈ColBERT (v2) 🆕	7.6%	-20.96	< 0.01
🥉COCO-DR Base MSMARCO	2.4%	-32.92	< 0.01
Dragon+	1.2%	-40.94	< 0.01
Dragon RoBERTa	0.8%	-36.53	< 0.01
Contriever MSMARCO	0.8%	-42.25	< 0.01
RetroMAE MSMARCO FT	0.4%	-41.49	< 0.01
Contriever	0.4%	-34.58	< 0.01

Evaluate any model using this code: https://colab.research.google.com/github/mohsenfayyaz/ColDeR/blob/main/Benchmark_Eval.ipynb

🔍 Dataset Examples 🔍

Dataset Subsets

foil (default):
- document_1: Foil Document with Multiple Biases but No Evidence: This document contains multiple biases, such as repetition and position biases. It includes two repeated mentions of the head entity in the opening sentence, followed by a sentence that mentions the head but not the tail (answer). So it does not include the evidence.
- document_2: Evidence Document with Unrelated Content: This document includes four unrelated sentences from another document, followed by the evidence sentence with both the head and tail entities. The document ends with the same four unrelated sentences.
answer_importance:
- document_1: Document with Evidence: Contains a leading evidence sentence with both the head entity and the tail entity (answer).
- document_2: Document without Evidence: Contains a leading sentence with only the head entity but no tail.
brevity_bias:
- document_1: Single Evidence, consisting of only the evidence sentence.
- document_2: Evidence+Document, consisting of the evidence sentence followed by the rest of the document.
literal_bias:
- document_1: Both query and document use the shortest name variant (short-short).
- document_2: The query uses the short name but the document contains the long name variant (short-long).
position_bias:
- document_1: Beginning-Evidence Document: The evidence sentence is positioned at the start of the document.
- document_2: End-Evidence Document: The same evidence sentence is positioned at the end of the document.
repetition_bias:
- document_1: More Heads, comprising an evidence sentence and two more sentences containing head mentions but no tails
- document_2: Fewer Heads, comprising an evidence sentence and two more sentences without head or tail mentions from the document
poison:
- document_1: Poisoned Biased Evidence: We add the evidence sentence to foil document 1 and replace the tail entity in it with a contextually plausible but entirely incorrect entity using GPT-4o.
- document_2: Correct Evidence Document with Unrelated Content: This document includes four unrelated sentences from another document, followed by the evidence sentence with both the head and tail entities. The document ends with the same four unrelated sentences.

Dataset Sources

Paper: https://arxiv.org/abs/2503.05037
Dataset: https://huggingface.co/datasets/mohsenfayyaz/ColDeR
Repository: https://github.com/mohsenfayyaz/ColDeR

Citation

BibTeX: If you found this work useful, please consider citing our paper:

@inproceedings{fayyaz-etal-2025-collapse,
    title = "Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence",
    author = "Fayyaz, Mohsen  and
      Modarressi, Ali  and
      Schuetze, Hinrich  and
      Peng, Nanyun",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.447/",
    pages = "9136--9152",
    ISBN = "979-8-89176-251-0",
    abstract = "Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robustness is critical to avoid downstream failures. In this work, we repurpose a relation extraction dataset (e.g., Re-DocRED) to design controlled experiments that quantify the impact of heuristic biases, such as a preference for shorter documents, on retrievers like Dragon+ and Contriever. We uncover major vulnerabilities, showing retrievers favor shorter documents, early positions, repeated entities, and literal matches, all while ignoring the answer{'}s presence! Notably, when multiple biases combine, models exhibit catastrophic performance degradation, selecting the answer-containing document in less than 10{\%} of cases over a synthetic biased document without the answer. Furthermore, we show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs, resulting in a 34{\%} performance drop than providing no documents at all.https://huggingface.co/datasets/mohsenfayyaz/ColDeR"
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
DecompX		DecompX
src		src
.env_sample		.env_sample
.gitignore		.gitignore
Benchmark_Eval.ipynb		Benchmark_Eval.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

❄️ Collapse of Dense Retrievers [ ACL 2025 ] ❄️

🏆 Leaderboard 🏆

🔍 Dataset Examples 🔍

Dataset Subsets

Dataset Sources

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

mohsenfayyaz/ColDeR

Folders and files

Latest commit

History

Repository files navigation

❄️ Collapse of Dense Retrievers [ ACL 2025 ] ❄️

🏆 Leaderboard 🏆

🔍 Dataset Examples 🔍

Dataset Subsets

Dataset Sources

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages