IMDB Sentiment Analysis with DistilBERT

Description

This project fine-tunes a pre-trained DistilBERT model to perform sentiment analysis on movie reviews from the IMDB dataset. The IMDB dataset contains movie reviews labeled as either negative or positive. By training on this data, the model learns to accurately classify new reviews based on their sentiment.

Project Structure

main_llm.py
This script loads a subset of the IMDB dataset, tokenizes the text, and fine-tunes DistilBERT for binary sentiment classification. After training, the model and tokenizer are saved for later use.
inference.py
This script loads the saved model and tokenizer and performs sentiment inference on new text inputs.
README.md
This file provides an overview of the project, installation instructions, usage details, and other relevant information.

Installation

Python Version:
Use a compatible Python version (ideally between 3.7 and 3.9) for optimal compatibility.
Environment Setup:
Create and activate a virtual environment.
```
python -m venv .venv
source .venv/bin/activate
```
Required Packages:
Install the necessary libraries such as Hugging Face Transformers, Datasets, Evaluate, and PyTorch. Additional packages may be required for extended functionality.
```
pip3 install -r requirements.txt
```

Usage

Training:
Run the main_llm.py script to fine-tune DistilBERT on a subset of the IMDB dataset. This script will:

Load and shuffle the IMDB dataset.
Tokenize the reviews.
Fine-tune the model for binary sentiment classification.
Save the fine-tuned model and tokenizer to the ./saved_model directory.
```
python3 main.py
```

Inference:
After training, run the inference script to load the saved model and perform sentiment analysis on new movie reviews. The model outputs a sentiment prediction (either positive or negative) based on the input text.
```
python3 inference.py
```

Key Points

Supervised Learning:
The project uses supervised training by providing both the movie review texts (features) and their corresponding sentiment labels (targets).
IMDB Dataset:
The dataset consists of 50,000 movie reviews (25,000 for training and 25,000 for testing), though a smaller subset is used in this example for demonstration purposes.

Example from the dataset, here label "0" denotes a negative sentiment.
```
{
    "text": "I rented I AM CURIOUS-YELLOW...But really, this film doesn\"t have much of a plot.",
    "label": 0
}
```
Binary Sentiment Classification:
The model is configured to classify reviews into two categories (negative and positive) by setting the classification head to output two logits.
Fine-Tuning Process:
The pre-trained DistilBERT model is adapted to recognize sentiment patterns in text by training on the labeled IMDB reviews.

Contributing

Contributions are welcome! Feel free to explore the code, make improvements, or suggest new features. If you wish to contribute, please follow the repository’s contribution guidelines and open an issue or submit a pull request.

License

This project is open source. Please include any licensing details if applicable.

Contact

For any questions or issues, please open an issue in the repository or contact the project maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
example_from_dataset.json		example_from_dataset.json
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMDB Sentiment Analysis with DistilBERT

Description

Project Structure

Installation

Usage

Key Points

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jasonjias/llm_sentiment_analysis

Folders and files

Latest commit

History

Repository files navigation

IMDB Sentiment Analysis with DistilBERT

Description

Project Structure

Installation

Usage

Key Points

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages