Looker E-Commerce Dataset, a simulated retail e-commerce environment sourced from Kaggle. It comprises interconnected tables reflecting real-world business functions like orders, inventory, customers, and web events. The objective is to uncover trends in customer behavior, regional sales performance, and operational efficiencies. Using Python and Tableau, the analysis provides actionable insights for marketing, logistics, and customer experience optimization.
Key Findings:
- Top-performing countries: China and the USA generated the most revenue.
- Operational efficiency: Average processing and delivery times were consistent across distribution centers.
- Returns: Roughly 10% of all orders were returned.
- User behavior: Most customers preferred Chrome as a browser.
- Traffic source insights: Successful purchases were driven mainly by SEO and email campaigns.
- Revenue trends: Time series analysis showed clear revenue fluctuations over months, offering seasonality cues.
Note: The dataset is artificially generated and does not represent actual customer behavior.
The following open-source dataset was used:
| Table | Rows |
|---|---|
| Orders | 125,226 |
| Products | 29,120 |
| Users | 100,000 |
| Inventory Items | 490,705 |
| Order Items | 181,759 |
| Events | 2,431,963 |
| Distribution Centers | 10 |
Source: Looker E-commerce Dataset on Kaggle
- Focus marketing on returning customers with high order values.
- Launch loyalty incentives, such as cashback or credits for future purchases.
- Create personalized recommendations based on browsing and purchase history.
- Target top-performing regions (e.g., USA, China) with tailored ad campaigns.
- Increase investment in SEO and email-based promotions.
- Develop a predictive model for product success.
- Expand promotion of high-performing products.
- Customize user experience through AI-driven personalization.
- Begin collecting user feedback and reviews to enhance offerings.
01 Project Management
βββ Looker Ecommerce - Project Brief
03 Scripts
βββ Python scripts for analysis
04 Analysis/Visualizations
βββ Python-generated charts and Tableau dashboards
05 Sent To Client
βββ Summary reports and cleaned datasets
- Python: Pandas, NumPy, Seaborn, Plotly, Scikit-learn, Statsmodels
- Tableau
- Jupyter Notebooks
- Data Preparation: Cleaned and merged datasets using Python (Pandas, NumPy).
- Exploratory Analysis: Univariate and multivariate statistics to understand user behavior and operations.
- Machine Learning:
- Linear Regression for revenue prediction.
- K-Means Clustering to segment user behavior.
- Time-Series Analysis: Conducted Dickey-Fuller tests and trend visualizations.
- Data Visualization: Created visual reports using Matplotlib, Seaborn, Plotly, and Tableau.
Disclaimer: This is a synthetic dataset and is not reflective of real-world customer behavior.