Objective : To build a web application that automatically extracts and identifies named entities from text using advanced Natural Language Processing (NLP) and Machine Learning (ML) models. The system detects entities such as persons, organizations, locations, products, dates, monetary values, and provides analytics, sentiment analysis, contextual categorization, and multi-document processing. Features: Text highlighting, entity aggregation, and multi-doc analysis. Backend uses spaCy and HuggingFace Transformers, for better performance across news, business, academic, and social media text.
- People, organizations, countries, cities, dates, money, events, products, etc.
- Color-coded highlighting in extracted text
- Interactive sidebar with click-to-highlight feature
- Sentiment Analysis
- Contextual Categorization (e.g., grouping similar entities)
- Analytical Dashboard with charts & entity frequency
- Model Switching (spaCy small / Transformer model)
- Python 3.10+
- FastAPI
- spaCy NER / HuggingFace Transformers
- Uvicorn
- React 18
- Vite
- Tailwind CSS
- Axios
Install: Python, Node.js, npm, Git
git clone https://github.com/YOUR_USER_NAME/ner-ml-project.git
cd ner-ml-projectmkdir backend
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtSmall Model:
python -m spacy download en_core_web_smTransformer Model:
python -m spacy download en_core_web_trfuvicorn app.main:app --reload --port 8000- Backend URL: http://localhost:8000
- Health check: http://localhost:8000/health
cd ../frontend
mkdir frontend (if not there)
cd frontend
npm install
npm run dev- App URL: http://localhost:5173
- Entity types: PERSON, ORG, LOC, MISC
- 3,394 social media texts
- Noisy, real-world anomalies
- Useful for informal language
- Large, multi-genre corpus
- 18 entity categories
- Tokenization
- Feature representation
- Named Entity Recognition
- Post-processing
- JSON output
- Fast, lightweight
- Suitable for real-time UI
- BERT/RoBERTa-based
- Higher accuracy (~91% F1)
- Higher latency
nlp = spacy.load("./custom_ner_model")| Model | Precision | Recall | F1 Score | Latency |
|---|---|---|---|---|
| spaCy small | 89% | 90% | 89.5% | 120 ms |
| Transformer | 92% | 91% | 91.5% | 430 ms |
- ~90% F1 Score
- <500 ms average latency
- Supports documents up to 10k words