Skip to content

Implementation of our work, 'Unlearning That Lasts: Utility-Preserving, Robust, and almost Irreversible Forgetting in LLMs'

Notifications You must be signed in to change notification settings

nmndeep/JensUn-Unlearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JensUn unlearning and the LKF dataset

This repository contains code associated with our paper "Unlearning That Lasts: Utility-Preserving, Robust, and almost Irreversible Forgetting in LLMs"

Paper: https://arxiv.org/abs/2509.02820


Package requirements The necessary Python packages should be installed.
If not already done, execute the following command:

    conda env create -f environment.yml
    conda activate lkfjensun
    pip install flash-attn==2.7.4.post1

LKF dataset

All associated subsets of Lesser Known Facts (LKF) dataset can be found in the following HF-hub collection

  • For forget LKF has: Forget-Standard, Forget-train-paraphrases, Forget-eval-paraphrases
  • For retain LKF has: Retain-Standard, Retain-Train-paraphrases, Retain-eval-paraphrases
  • Subset used for Relearning experiments.
  • The prompts, paraphrase-generation scripts can be found in LKF_creation

Experiments


Fine-tuning for Unlearning

  • To unlearn with JensUn on LKF and evaluate on all tasks from our work:
    bash scripts/lkf_unlearn.sh
  • To unlearn with methods other than JensUn, see available methods at the bottom of trainer/init.py, then replace the first argument in runningargs in scripts/lkf_unlearn.sh.

  • To unlearn without paraphrases: use unlearn/lkf/default.yaml in the second argument in runningargs in scripts/lkf_unlearn.sh.

  • Available Unlearning methods (see here):

    • Preference optimization: DPO, NPO, SimNPO
    • Others: GradAscent, GradDiff, RMU, JensUn (multiple variants)
  • LLM can be changed here: scripts/lkf_unlearn.sh, see here for available ones


Unlearning Evaluations

  • Add your Gemini API KEY here for evaluating with the Judge.

  • Evaluate the pre-trained Llama-model first, default: Llama-3.2-3B-Instruct

    bash scripts/lkf_evaluation.sh

All evaluation results will be stored in a new folder evaluations.

Notes

- LLM-Judge evaluations may have failed API calls. While the code handles these, such samples are excluded from the final evaluation. Hence, the results may slightly vary.
- Repetitiveness evaluations from the pre-trained model are required for all further Quality (WinRate) evaluations.

Acknowledgements

This repository gratefully forks from

Citation

If you find this repository useful, please consider citing our paper:

@article{singh2025unlearning,
    title={Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs},
    author={Naman Deep Singh and Maximilian Müller and Francesco Croce and Matthias Hein},
    journal={arXiv preprint arXiv:2509.02820},
    year={2025}
}

About

Implementation of our work, 'Unlearning That Lasts: Utility-Preserving, Robust, and almost Irreversible Forgetting in LLMs'

Topics

Resources

Stars

Watchers

Forks