FakeNews-Detection-NLP-RandomForest

Project Overview

The goal of this project is to identify Persian tweets that contain news and to classify their truthfulness as either True or False using machine learning algorithms. Additionally, we analyze the spread and propagation patterns of these news tweets.

Background

First, we define what constitutes "news" in the context of our project. Any tweet that includes a claim is considered a news tweet.

Inspired by the methodology presented in the paper "The spread of true and false news online", we detect news tweets based on their interaction with other tweets:

Tweets that reply to other tweets and include links to news sources (such as IRNA, ISNA, BBC Persian, Eghtesad News, etc.) are likely responses to claim tweets.
We then identify the original tweets to which these responses relate and label those original tweets as news tweets.

An alternative approach is also proposed that can be used interchangeably.

Methodology

Data Collection:
We crawled Persian tweets from 2014 to 2020 using a Twitter scraping tool. We specifically collected tweets containing links to reliable news websites and separated these news tweets from non-news tweets.
Data Processing:
We extracted original tweet metadata using the Tweepy library, including:
- Username and user ID
- Number of followers and followings
- Timestamp
- Tweet text and ID
- Number of retweets, likes, and replies
- Hashtags, links, symbols, mentioned users
- Polls and media presence
Labeling:
Using the replies with news links, we labeled the original tweets as news tweets (True or False claims).
Model Training:
We collected diverse tweet categories (news, jokes, personal stories, etc.) and trained classification models using clustering and machine learning algorithms to identify and predict new news tweets.
Statistical Analysis:
We analyze tweet engagement metrics such as likes, replies, and retweets to study the importance and spread of news in the community.

Usage

Run the scraper to collect Persian tweets with news-related links.
Use the provided code to extract tweet metadata and label news tweets.
Train and test classification models on the dataset.
Analyze the spread patterns of true and false news using the engagement metrics.

Tools and Libraries

Python
Tweepy (for Twitter API access)
TwitterScraper (for historical tweet crawling)
Machine Learning libraries (e.g., scikit-learn, TensorFlow )
Data processing libraries (e.g., pandas, numpy)

References

Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Machine Learning		Machine Learning
NLP pipeline		NLP pipeline
Tweet Propagation Analysis		Tweet Propagation Analysis
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FakeNews-Detection-NLP-RandomForest

Project Overview

Background

Methodology

Usage

Tools and Libraries

References

About

Uh oh!

Releases

Packages

Languages

License

simahashemi/FakeNews-Detection-NLP-RandomForest

Folders and files

Latest commit

History

Repository files navigation

FakeNews-Detection-NLP-RandomForest

Project Overview

Background

Methodology

Usage

Tools and Libraries

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages