Offline Yelp ranking stack with route-aware recall, structured XGBoost rerank, and reward-model rescue reranking.
-
Updated
Apr 10, 2026 - Python
Offline Yelp ranking stack with route-aware recall, structured XGBoost rerank, and reward-model rescue reranking.
A polyglot persistence architecture (MongoDB & Neo4j) and machine learning pipeline to process and analyze the Yelp Open Dataset, featuring a resource-aware ETL pipeline built with Polars.
yelp(美版大众点评)点评数据分析与推荐项目前端仓库,是集成了大数据分析及可视化,以及大数据应用开发的WebApp应用.
Hybrid Learning-to-Rank system processing 2.4M Yelp reviews. Features a custom NLP pipeline (SBERT + VADER), Neural Ranking architecture, and MMR diversity re-ranking to solve the Cold-Start problem.
A large-scale data analysis project built on Apache Hadoop and Apache Spark, analyzing 7M+ Yelp reviews, 150K businesses, and 2M users. Covers business intelligence, user behavior, rating patterns, and review trends using PySpark and Hive on a multi-node cluster. Visualized through Apache Zeppelin notebooks.
Restaurant Performance & Customer Sentiment Analysis with Tableau Dashboards Using Yelp Dataset
Relevant descriptive analysis and visualization of businesses, users, and reviews in the Yelp dataset
DineSmart — Turning Unstructured Yelp Reviews into Strategic Insights. 📈 A comprehensive Business Analytics project integrating Sentiment Analysis and Machine Learning to help customers discover dining experiences and entrepreneurs identify market gaps. Built with Python, Gensim, and scikit-learn.
Data mining and analysis of Yelp reviews to discover cuisines, popular dishes, restaurant recommendations, and hygiene predictions.
In this NLP project, we will classify Yelp reviews into 1-star or 5-star categories using simplified methods, utilizing the Yelp Review Data Set from Kaggle, which includes a "stars" column for ratings and user votes on "cool," "useful," and "funny" reviews.
Submission for the FairwAI Hospitality Intern Challenge. This project analyzes bias signals in Yelp hospitality reviews using open-source data, Python, and fairness-focused keyword detection.
End-to-end analysis of Yelp reviews combining SQL extraction, NLP sentiment modeling, and executive reporting.
[Archived] Classical NLP pipeline (2019-2020) predicting Yelp review quality using TF-IDF, FastText, LDA, and traditional ML. Pre-transformer era techniques preserved as a learning resource.
Group analytics project for a predictive analytics course. Using the Yelp open dataset to predict restaurant success.
Interactive data visualization tool analyzing the Yelp Open Dataset using Python and Bokeh. Features include geospatial hexbin mapping, contour plots for operational strategy, and MVC-based dashboard architecture.
This project analyzes the Yelp dataset for the state of Arizona to extract insights about restaurant businesses and user behavior. Using Apache Spark and PySpark for distributed data processing, the project demonstrates how big data tools can be used to uncover patterns in customer reviews, business performance, and user engagement.
End-to-end Data Analytics Yelp Business Review Project
Config-driven ELT pipeline (Airflow + MinIO + PostgreSQL) for Yelp × Weather: JSON/CSV→Parquet, ODS/DWH, rejected lanes, and L1/DWH data-quality dashboards.
Fork & Friends is a data analysis and recommendation platform leveraging the Yelp dataset. It combines big data analytics with DeepSeek AI to provide intelligent friend and business recommendations.
Sentiment analysis of Yelp reviews using Apache Spark and machine learning models.
Add a description, image, and links to the yelp-dataset topic page so that developers can more easily learn about it.
To associate your repository with the yelp-dataset topic, visit your repo's landing page and select "manage topics."