Creating your own dataset. This is fundamental Data Science. We execute this Data Science project by scraping multiple pages of data from the Internet Movie Database (IMDB) website, in a single script, to fetch IMDB's top 100 movies metadata. We utilize the software packages Beautiful Soup4 and requests.
Incidentally, the requests library is integral if you want to build a pure Python Application Programming Interface (API).
The target here is metadata. Metadata is defined as the information that describes and explains data. It provides context with details such as the source, type, owner, and relationships to other data sets. So, it can help you understand the relevance of a particular data set and guide you on how to use it.
Most of my Project models - which are classes themselves - contain Meta classes which further refine the data representation culminating in the data fields that are Client-side.
This is the targeted metadata:
- Movie Name
- Release Year
- Watch Time
- IMDb Rating
- Metascore
- Votes
- Gross Collection
- Description