Skip to content

rxiddhi/Zipf-s-Law-project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Here's a concise project summary based on your slides and contributions, including the dataset used: 📘 Project Summary: Zipf’s Law – Finding Hidden Patterns in Data

Team Name: Roomies❤️🌻
Team Members: Tanima Samanta, Koyna Arya, Aparajita K Singh, Riddhi Khera

In this project, we explored Zipf’s Law, a statistical phenomenon commonly found in natural language datasets. The law states that in any large collection of text, the frequency of a word is inversely proportional to its rank in the frequency table. Our goal is to test this principle using real-world data and visualize the resulting patterns.

🔍 Dataset Used: We analyzed the lyrics of songs by the band COLDPLAY. The dataset was compiled to contain a representative sample of Coldplay’s discography, offering a rich and diverse text corpus for word frequency analysis.

🛠️ What We Did:

  • Preprocessed the text data by removing stopwords and punctuation and performing tokenization.
  • Calculated word frequencies and ranked words by their occurrence.
  • Visualized Zipfian patterns using rank-frequency and log-log plots.
  • Verified the Zipfian distribution
  • Collaboratively coded in Python using Google Colab and visualized results using Matplotlib.
  • Documented and explained the study's theoretical foundation and practical findings.

🎯 Each team member contributed equally, focusing on research, coding, visualization, documentation, and presentation.

The final results confirmed that Coldplay's lyrics follow Zipf’s Law, demonstrating that even in artistic or musical text, natural language follows statistically predictable patterns.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%