Skip to content

pandya-ae/thesis

Repository files navigation

thesis

Thesis work titled "News Authorship Identification using Recurrent Neural Networks"

Authorship identification has evolved from focusing on traditional texts like literature and historical documents to tackling the complexities of digital content, including online news and social media. This expansion is driven by the need to verify news sources and combat misinformation in today's digital landscape. News articles, characterized by their formal tone and standardized structure, present unique challenges due to the subtlety of stylistic differences and the potential dilution of individual authorial voices through editing processes. RNNs excel in capturing writing styles and contextual relationships by learning from sequences of text, making them well-suited for authorship identification.

In this research, deep learning models–long short-term memory (LSTM) and gated recurrent unit (GRU)–vectorized with GloVe word embeddings are trained on the ‘All the news’ dataset. Machine learning models–support vector machine (SVM) and logistic regression–vectorized with TF-IDF serve as the baseline. The four models are trained at sentence and article levels. The model performance is then evaluated and compared.

The result shows that at article-level training, the macro averaged accuracy of the LSTM, GRU, SVM, and logistic regression models are 82%, 85%, 91%, and 89% respectively. At sentence-level training, the macro averaged accuracy of the LSTM, GRU, SVM, and logistic regression models are 46%, 47%, 47%, and 48% respectively. Thus indicating that the machine learning models vectorized with TF-IDF trained at article level performed best, followed by the deep learning models vectorized with GloVe also trained at article level. Meanwhile all the models trained at sentence level severely underperformed.

About

Thesis work titled "News Authorship Identification using Recurrent Neural Networks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors