0% found this document useful (0 votes)

14 views6 pages

Program EDA

The document discusses Python's dataclass feature, introduced in version 3.7, which simplifies the creation of classes that primarily hold data by automatically generating methods like __init__ and __repr__. It also details the creation of a 3-dimensional data cube using a pivot table in pandas to analyze product sales over time across different regions, including steps for data structure definition, data generation, and analysis techniques. The provided code exemplifies how to implement these concepts using a sample dataset of products.

Uploaded by

aswinip22062006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

Program EDA

Uploaded by

aswinip22062006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Points to ponder:

- In Python, a dataclass is a class designed primarily to hold data

values. Introduced in Python 3.7, the dataclasses module provides a
decorator, @dataclass, that automatically generates common
boilerplate methods for classes, making them more concise and easier
to manage when serving as data container
- Automatic Method Generation:
The @dataclass decorator automatically generates special methods like
__init__, __repr__, and __eq__ (dundar methods). This eliminates the
need to manually write these methods, reducing boilerplate code and potential
for errors.
- Default Values:
- You can easily assign default values to fields within a dataclass, simplifying
object creation when certain fields have common initial states
- Immutability (Frozen Dataclasses):
Dataclasses can be made immutable by setting frozen=True in the decorator,
preventing modification of field values after object instantiation, which is useful
for creating read-only data structures.
- Example:
from dataclasses import dataclass

@dataclass
class Point:
x: float
y: float
z: float = 0.0 # Field with a default value

# Creating instances
p1 = Point(1.0, 2.0)
p2 = Point(3.0, 4.0, 5.0)

# Automatic __repr__
print(p1) # Output: Point(x=1.0, y=2.0, z=0.0)
Pivot table:
pivot table, which behaves like a data cube and supports analytical
operations such as slicing, dicing, etc.
A pivot table is a data summarization tool used to organize and analyze large amounts
of data in a spreadsheet or database. It allows you to quickly summarize, sort,
reorganize, group, count, total, or average data, providing different perspectives and
insights into your data.

Syntax:

DataFrame.pivot_table( values=None, index=None, columns=None,

aggfunc='mean', fill_value=None, margins=False, dropna=True)

Parameters:

● values: Columns to aggregate.

● index: Columns to use as the new row index.

● columns: Columns to use as the new column headers.

● aggfunc: Aggregation functions like mean, sum, count etc. By

default it is mean.

● fill_value: Value to replace missing data.

● margins: Whether to add totals, default is false.

● dropna: Whether to exclude missing values from the DataFrame,

default is True.
This table is a common way to track product sales over time across different regions, helping
businesses analyze sales performance by product, country, and date.

Aim
To Implement data cube for datawarehouse on 3-dimensional data for product sales over time
across different regions (product, date, country.) using python
Algorithm:

Dimensions:

1. Product (nested under Category)

2. .Date
3. Country
Measure: Price

1. Import libraries & define data structures (Use dataclass to model each product record.
2. Define the Product entity (contains name (str), category(str) and price(float))
3. Instantiate dimension members (
- Create a list of Product objects (name, category, price).
- Create lists for all Date values and Country values

4. Generate the fact table

5. Load facts into a pandas DataFrame
6. Create the 3‑D data cube (pivot table)
7. Explore / analyze (slice &dice , Rollup, Extract)
8. Visualize or integrate downstream - Plot with matplotlib or feed to a BI tool.
- Store in a data‑warehouse table or OLAP engine for large‑scale analytics.

Result: cube is a compact 3‑dimensional structure that supports fast analytical queries
(slice, dice, roll‑up, drill‑down) using familiar pandas operations.
Code:
import pandas as pd
from dataclasses import dataclass

# Define the Product class

@dataclass
class Product:
name: str
category: str
price: float

# Sample product data

products = [
Product("Laptop", "Electronics", 1000),
Product("T-shirt", "Clothing", 20),
Product("Book", "Books", 15),
Product("Headphones", "Electronics", 100),
Product("Jeans", "Clothing", 50),
Product("Smartphone", "Electronics", 800),
Product("Sunglasses", "Accessories", 30),
Product("Watch", "Accessories", 50),
Product("Shoes", "Footwear", 80),
]

# Define dates and countries

dates = ["2023-05-01", "2023-05-02", "2023-05-03"]
countries = ["USA", "UK", "Germany"]

# Generate the full dataset

data = []
for product in products:
for date in dates:
for country in countries:
data.append({
"Category": product.category,
"Product": product.name,
"Date": date,
"Country": country,
"Price": product.price
})

# Convert to DataFrame
df = pd.DataFrame(data)

# Create a pivot table (data cube)

data_cube = pd.pivot_table(
df,
values="Price",
index=["Category", "Product"],
columns=["Date", "Country"],
aggfunc="first" # each price is unique and atomic
)
# Display the data cube
print("3D Data Cube (Product x Date x Country):\n")
print(data_cube)

# sliceup operation
# Selecting a single value for one dimension, reducing the cube’s
dimensionality.
# how Laptop prices across countries and dates
print(data_cube.loc[("Electronics", "Laptop")])

# Rollup operation
# Definition: Aggregating data up a hierarchy or grouping dimension.
# Roll-up by Product Category → Sum of prices for each category.
print(data_cube.groupby(level=0).sum())

#Export
data_cube.to_csv("MyDrive\cube.csv")

Guides
No ratings yet
Guides
23 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Data Mining Lab Manaul
No ratings yet
Data Mining Lab Manaul
32 pages
Eda Lab Assignment2
No ratings yet
Eda Lab Assignment2
10 pages
Pandas
No ratings yet
Pandas
20 pages
DMDW Fielding Set
No ratings yet
DMDW Fielding Set
11 pages
Supermarket Sales Insights
No ratings yet
Supermarket Sales Insights
8 pages
Summary For Exam
No ratings yet
Summary For Exam
8 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Python & Data Science Cheat Sheet
100% (4)
Python & Data Science Cheat Sheet
11 pages
Data Preparation Guide
No ratings yet
Data Preparation Guide
6 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Task 6
No ratings yet
Task 6
14 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Python & Pandas Cheat Sheet Guide
100% (2)
Python & Pandas Cheat Sheet Guide
5 pages
Customer Segmentation PDF
No ratings yet
Customer Segmentation PDF
18 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
Task2 Eda Cleaning
No ratings yet
Task2 Eda Cleaning
33 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
OLAP Operation in R
No ratings yet
OLAP Operation in R
6 pages
Data Cube Computation
No ratings yet
Data Cube Computation
5 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Pandas Plots
No ratings yet
Pandas Plots
14 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
BDA File
No ratings yet
BDA File
26 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Datascience
No ratings yet
Datascience
26 pages
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
No ratings yet
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
66 pages
Advanced Python & Data Science Guide
No ratings yet
Advanced Python & Data Science Guide
42 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
35 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Even Students
No ratings yet
Even Students
36 pages
Unit 5 2
No ratings yet
Unit 5 2
6 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Stationary Shop Management System (Ip Class Xii)
No ratings yet
Stationary Shop Management System (Ip Class Xii)
23 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Stationary Management System Ip Class Xii (2024-25)
No ratings yet
Stationary Management System Ip Class Xii (2024-25)
26 pages
DMV Unit-4-1 PDF
No ratings yet
DMV Unit-4-1 PDF
10 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Q.1 Explain Process of Working With Data From Files in Data Science
No ratings yet
Q.1 Explain Process of Working With Data From Files in Data Science
10 pages
IP Record Python 23-24 Aryan
No ratings yet
IP Record Python 23-24 Aryan
42 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
Experiment No 3 - Final
No ratings yet
Experiment No 3 - Final
44 pages
Experiment 8
No ratings yet
Experiment 8
9 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
Project Notes
No ratings yet
Project Notes
10 pages
R Studio
No ratings yet
R Studio
1 page
Vidhusan Varatharasan Resume
No ratings yet
Vidhusan Varatharasan Resume
2 pages
Program EDA
No ratings yet
Program EDA
8 pages
Repository Project Management System
No ratings yet
Repository Project Management System
62 pages
Authoritative Restore and Non Authoritative Restore
No ratings yet
Authoritative Restore and Non Authoritative Restore
2 pages
Create and Execute BSP Applications
No ratings yet
Create and Execute BSP Applications
222 pages
手機 MOTP APP 註冊手冊: Mobile Device Offline Registration Manual for using MOTP APP
No ratings yet
手機 MOTP APP 註冊手冊: Mobile Device Offline Registration Manual for using MOTP APP
16 pages
Authentication and Single Sign-On
100% (2)
Authentication and Single Sign-On
48 pages
01-Exadata Database Machine Overview
No ratings yet
01-Exadata Database Machine Overview
21 pages
Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.
No ratings yet
Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.
3 pages
Erp Final
No ratings yet
Erp Final
20 pages
Foundation Practice Exam 4
No ratings yet
Foundation Practice Exam 4
9 pages
Training Manual QM v04
No ratings yet
Training Manual QM v04
56 pages
Exercise 1 Linking Urls To Sap Web Lists: Its Exercises
No ratings yet
Exercise 1 Linking Urls To Sap Web Lists: Its Exercises
0 pages
SAP HANA Smart Data Access Using Hadoop Hive PDF
No ratings yet
SAP HANA Smart Data Access Using Hadoop Hive PDF
52 pages
CLOUD COMPUTING Pptfile
No ratings yet
CLOUD COMPUTING Pptfile
3 pages
VTU Exam Question Paper With Solution of BCS403 Database Management System July-2024-Poornima Manjunath, Ciyamala Kushbu
No ratings yet
VTU Exam Question Paper With Solution of BCS403 Database Management System July-2024-Poornima Manjunath, Ciyamala Kushbu
20 pages
Excel Unprotection Without Password
No ratings yet
Excel Unprotection Without Password
4 pages
Business Transaction Events in SAP
100% (1)
Business Transaction Events in SAP
9 pages
Eztrieve Presentation
No ratings yet
Eztrieve Presentation
60 pages
Automata and Compiler Design - Lecture Notes On UNIT 1
No ratings yet
Automata and Compiler Design - Lecture Notes On UNIT 1
25 pages
Prison Management System SRS
No ratings yet
Prison Management System SRS
19 pages
Oracle Dba Golden Gate $ Exadata
No ratings yet
Oracle Dba Golden Gate $ Exadata
10 pages
Dynamics GP Dex Ini Settings
No ratings yet
Dynamics GP Dex Ini Settings
6 pages
SQL Certification Study Guide
No ratings yet
SQL Certification Study Guide
2 pages
Data Analytics for Cyclistic Success
100% (2)
Data Analytics for Cyclistic Success
13 pages
An Approach For Analyzing ISO / IEC 25010 Product Quality Requirements Based On Fuzzy Logic and Likert Scale For Decision Support Systems
No ratings yet
An Approach For Analyzing ISO / IEC 25010 Product Quality Requirements Based On Fuzzy Logic and Likert Scale For Decision Support Systems
16 pages
Azure Active Directory
No ratings yet
Azure Active Directory
2 pages
CADWorx 2019 SP2 HF1 Updates & Fixes
No ratings yet
CADWorx 2019 SP2 HF1 Updates & Fixes
2 pages
Postgresql Database Notes
No ratings yet
Postgresql Database Notes
6 pages
Why Did BPCL Decide To Implement Erp
No ratings yet
Why Did BPCL Decide To Implement Erp
2 pages
Getting Started With AQtime 7 Standard For Embarcadero RAD Studio
No ratings yet
Getting Started With AQtime 7 Standard For Embarcadero RAD Studio
22 pages
BGRFC
No ratings yet
BGRFC
22 pages

Program EDA

Uploaded by

Program EDA

Uploaded by

Points to ponder:

-​ In Python, a dataclass is a class designed primarily to hold data

DataFrame.pivot_table( values=None, index=None, columns=None,

●​ values: Columns to aggregate.

●​ index: Columns to use as the new row index.

●​ columns: Columns to use as the new column headers.

●​ aggfunc: Aggregation functions like mean, sum, count etc. By

●​ fill_value: Value to replace missing data.

●​ margins: Whether to add totals, default is false.

●​ dropna: Whether to exclude missing values from the DataFrame,

1.​ Product (nested under Category)

4. Generate the fact table

# Define the Product class

# Sample product data

# Define dates and countries

# Generate the full dataset

# Create a pivot table (data cube)

You might also like

- In Python, a dataclass is a class designed primarily to hold data

● values: Columns to aggregate.

● index: Columns to use as the new row index.

● columns: Columns to use as the new column headers.

● aggfunc: Aggregation functions like mean, sum, count etc. By

● fill_value: Value to replace missing data.

● margins: Whether to add totals, default is false.

● dropna: Whether to exclude missing values from the DataFrame,

1. Product (nested under Category)