mislc

Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of Misinformation

Accepted to the Findings of EMNLP 2024

Requirements:

install python 3.10.x
run pip install -e requirements.txt
request the dataset/ directory for the legal database and gold labels For the dataset, please contact me at chufei.luo@queensu.ca! TBD: release on huggingface datasets

Generating the Pyserini retrieval index

unzip split_docs.zip (contact at chufei.luo@queensu.ca for access) to split_docs/
Run the following command:

python -m pyserini.index.lucene \
  --collection JsonCollection \
  --input split_docs \
  --index index \
  --generator DefaultLuceneDocumentGenerator \
  --threads 2 \
  --storePositions --storeDocvectors --storeRaw

Important options

run the gen_outputs.py script in the respective directory for each method.

retrieval - bm25 is the legal dataset, google is web search, mix is both

query_model_source (FLARE only) - openai for openAI models (you need to also specify --query_model_name), same for using the base model to generate queries

prompt_name - corresponds to the name of a .txt file in prompts/

constrained - include the system prompt to supress refusals

oracle - use the annotator-specified source when available

If you have any questions about the other commands in our script, please let me know!

Example commands

Generate results using FLARE retrieval method on only the legal database

python gen_outputs.py \
    --dataset_path ../dataset/annotations/dataset.csv \
    --model_name meta-llama/Llama-2-70b-chat-hf \
    --output_dir experiments/Llama-2-70b-chat-hf/query=gpt3.5 \
    --query_model_source openai \
    --query_model_name gpt-3.5-turbo \
    --model_parallelism \
    --retrieval bm25 \
    --web_doc_type snippet \
    --docs_path ../dataset/definitions/split_docs/ \
    --index_path ../dataset/definitions/index/ \
    --prompt_name basic_inst_retrieval \
    --max_new_tokens 128 \
    --max_length 4096 \
    --theta 0.5 \

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
flare		flare
prompts		prompts
ralm		ralm
zero-shot		zero-shot
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mislc

Generating the Pyserini retrieval index

Important options

Example commands

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

chufeiluo/mislc

Folders and files

Latest commit

History

Repository files navigation

mislc

Generating the Pyserini retrieval index

Important options

Example commands

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages