Data Analytics 02: Pivoting of Data. You Might Be Familiar With The Concept of Pivoting From BI Tools or Excel: Rotate

This document discusses pivoting and renaming aggregate data. It shows how to pivot a Titanic passenger data set to aggregate counts by gender and passenger class. Regular expressions are then used to automatically rename the pivoted columns in a more readable way. Several tasks are provided to further experiment with pivoting and renaming the data.

Uploaded by

Jhon Rey Balbastro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views2 pages

Data Analytics 02: Pivoting of Data. You Might Be Familiar With The Concept of Pivoting From BI Tools or Excel: Rotate

Uploaded by

Jhon Rey Balbastro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

DATA ANALYTICS 02

PIVOTING AND RENAMING

AGGREGATE DATA AND PIVOT.
In this lecture, you will learn about another common data blending technique, namely the
Pivoting of data. You might be familiar with the concept of Pivoting from BI tools or Excel: rotate
the data from a long table format (one attribute with a lot of examples) into a wide table format
(lots of attributes with a single example). This transformation is especially useful to aggregate
information along two or more dimensions as a step to prepare the data for machine learning.
Machine learning models need the data to be stored in a wide table format so you will encounter
this preprocessing step frequently before you start with the actual modeling.
PIVOT THE DATA.
Let’s create a table that shows how many passengers were in each class, broken down by
gender.

1. Drag the Titanic data into the process.

2. Add the operator Pivot and connect it.
3. In its Parameters, add Sex to the group by attributes.
4. Select Passenger Class as column grouping attribute.
5. Also use Passenger Class with function count as a new entry for aggregation attributes.

NOTE: The resulting data table has four columns and two rows. Each row represents one of the
values from the Sex column, (the group by attributes parameter). The three different values of
the column grouping attribute ( Passenger Class) become the three new columns. The actual
values in the table are representing the counts of each combination of groups in the rows, i.e.
the gender, and the groups in the columns, which is the passenger class in our case. For
example, we will get 144 females who booked first class.
RENAMING ATTRIBUTES WITH REGULAR EXPRESSIONS.
The names of the new columns reflect how they were created, but this is not always the easiest
to read. You could use the operator Rename for manually renaming the three attributes to
something nicer, like "Passenger Class First". Rename is the way to go if you only have few
attributes which should be renamed, but we will take a more advanced approach which would
allow you to rename hundreds of attributes at a time.

1. Search for the operator Rename by Replacing, add it, and connect it to Pivot.
2. Also connect the operator to the result port on the right.
3. Copy count$(.*)$_(.*) into the replace what parameter field. Make sure that you get all
the parentheses right!
4. Copy $1 $2 into the replace by parameter.

NOTE: You might already be familiar with regular expressions (this is the name for the strange
parameters we have used for the renaming). They are a powerful tool and can be found in many

Page 1 of 2
Dr. Stephan Kupsch
DATA ANALYTICS 02
different places in RapidMiner. The expression you have used for replace what means that you
look for something between count( and )_ and then something else after the underscore. Those
two elements are identified later by the round brackets. Each time you use round brackets, you
define a new so-called capturing group which you can refer to in the definition of the
replacement. Since the round brackets have a special meaning here, we need to quote the
brackets in the name itself with a backslash. Finally, we can use the capturing groups in the
replace by parameter with the dollar sign and the number identifying the group. $1 is for the
content of the first group, which always happens to be "Passenger Class", and $2 is for
identifying the second group. Those are the three different classes "First", "Second", and "Third".
RUN THE PROCESS

1. Run the process.

NOTE: Your data set should now have column names that look like Passenger Class First.
You rotated an aggregated data set in wide table format. Pivot can be difficult to configure
sometimes. Just keep in mind that the group by attributes parameter will define the groups in
the rows with one row per group while the values of the column grouping attribute parameter
will define the new columns.

TASKS:

 Can you change the process so that the column names will be changed to "First
Passenger Class", "Second Passenger Class", and "Third Passenger Class"?
 Can you also change them to just say "First Class", "Second Class", and "Third Class"?
 Change the Pivot so that the gender is transformed into new columns and the passenger
class is defining three groups of data. How many columns and rows are you getting now?
 Try to adapt the renaming so that it just uses the gender as column names after the new
pivoting.
 Now, remove the Rename by Replacing operator and remove the column grouping
attribute from Pivot. Set Sex and Passenger Class as group by attributes and use
Passenger Class with count as the aggregation attribute. Run the process and inspect the
result. In how far is it different from the first result you obtained through Pivot?

Page 2 of 2
Dr. Stephan Kupsch

09 - Pivoting and Renaming
No ratings yet
09 - Pivoting and Renaming
1 page
Data Collection and Collation Reporting Analysis
No ratings yet
Data Collection and Collation Reporting Analysis
24 pages
Pivot
No ratings yet
Pivot
13 pages
Excel PivotTable Slides PDF
No ratings yet
Excel PivotTable Slides PDF
39 pages
Data Preparation
100% (1)
Data Preparation
87 pages
Excel PivotTables PivotCharts
100% (3)
Excel PivotTables PivotCharts
56 pages
Day 02
No ratings yet
Day 02
22 pages
Week 2 - Spreadsheet Data Analysis
No ratings yet
Week 2 - Spreadsheet Data Analysis
37 pages
Data Analytics Final Syllabus
No ratings yet
Data Analytics Final Syllabus
7 pages
Excel 2013 Advanced Quick Reference
No ratings yet
Excel 2013 Advanced Quick Reference
3 pages
How To Create Pivot Tables
No ratings yet
How To Create Pivot Tables
7 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
85 pages
PivotTable and Grouping For Data Analysis - Reading Material
No ratings yet
PivotTable and Grouping For Data Analysis - Reading Material
9 pages
Week2 2
No ratings yet
Week2 2
25 pages
Pivot Tables
No ratings yet
Pivot Tables
11 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Lec2 - Data Preprocessing
No ratings yet
Lec2 - Data Preprocessing
30 pages
2 Data Prep
No ratings yet
2 Data Prep
95 pages
Chapter - 3 Data Pre - Processing
No ratings yet
Chapter - 3 Data Pre - Processing
54 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Data Cleaning: Missing Values: - For Example in Attribute Income If
No ratings yet
Data Cleaning: Missing Values: - For Example in Attribute Income If
30 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
Excel Chap 2 Describe Data
No ratings yet
Excel Chap 2 Describe Data
5 pages
Data Analysis Using Excel Handouts
No ratings yet
Data Analysis Using Excel Handouts
95 pages
Excel Data Analysis & Visualization Course
No ratings yet
Excel Data Analysis & Visualization Course
30 pages
Activity Apply A Pivot
No ratings yet
Activity Apply A Pivot
4 pages
Power BI Query Editor Guide
100% (1)
Power BI Query Editor Guide
64 pages
Lecture 4 Notes
No ratings yet
Lecture 4 Notes
41 pages
Abl Excel
No ratings yet
Abl Excel
15 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
Data Science & Analytics Overview
No ratings yet
Data Science & Analytics Overview
21 pages
Unit - 1 Data Preprocessing
No ratings yet
Unit - 1 Data Preprocessing
66 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Data Pre Processing - NG
No ratings yet
Data Pre Processing - NG
43 pages
Data Analytics With Financial Accounting Information: Winter 2022 Session 4
No ratings yet
Data Analytics With Financial Accounting Information: Winter 2022 Session 4
36 pages
DS Practical
No ratings yet
DS Practical
45 pages
Session 05 06
No ratings yet
Session 05 06
47 pages
Recover An Earlier Version of An Office File: More About Pivottables
No ratings yet
Recover An Earlier Version of An Office File: More About Pivottables
9 pages
Exc Report
No ratings yet
Exc Report
28 pages
Excel 2019 Advanced Quick Reference PDF
No ratings yet
Excel 2019 Advanced Quick Reference PDF
3 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
AI351 Lecture 1
No ratings yet
AI351 Lecture 1
32 pages
Subtotals, Pivottables, and Pivot Charts: Summarizing and Analyzing Data
No ratings yet
Subtotals, Pivottables, and Pivot Charts: Summarizing and Analyzing Data
41 pages
Steps: Try Mini-Graphs in Excel
No ratings yet
Steps: Try Mini-Graphs in Excel
11 pages
Rdbmsexp 6
No ratings yet
Rdbmsexp 6
6 pages
Data Transformation in Excel
No ratings yet
Data Transformation in Excel
5 pages
UNIT 2 Data Preprocessing
No ratings yet
UNIT 2 Data Preprocessing
72 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Data Mining Chapter3 0
No ratings yet
Data Mining Chapter3 0
32 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Unit 4
No ratings yet
Unit 4
66 pages
Pivot Tables & Charts in Excel
No ratings yet
Pivot Tables & Charts in Excel
17 pages
Lecture 3 - Data Preprocessing
No ratings yet
Lecture 3 - Data Preprocessing
50 pages
Unit - II
No ratings yet
Unit - II
56 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
TLE 7 8 Carpentry Module 10
100% (3)
TLE 7 8 Carpentry Module 10
24 pages
TLE 7 8 Carpentry Module 5 PDF
No ratings yet
TLE 7 8 Carpentry Module 5 PDF
22 pages
TLE 7 8 Carpentry Module 7
100% (1)
TLE 7 8 Carpentry Module 7
24 pages
TLE 7 8 Carpentry Module 9
No ratings yet
TLE 7 8 Carpentry Module 9
24 pages
TLE 7 8 Carpentry Module 8
0% (1)
TLE 7 8 Carpentry Module 8
24 pages
Data Analytics 02: Drag Connect It Change Remove Cabin, Life Boat, Name, and Ticket Number
No ratings yet
Data Analytics 02: Drag Connect It Change Remove Cabin, Life Boat, Name, and Ticket Number
2 pages
Research Presentation: Procedures On How To Host A Class Meeting or Webinar With Google Meet
No ratings yet
Research Presentation: Procedures On How To Host A Class Meeting or Webinar With Google Meet
7 pages
Data Analytics 01: Drag The Titanic Data Add Set Role Connect It Configure It
No ratings yet
Data Analytics 01: Drag The Titanic Data Add Set Role Connect It Configure It
2 pages
Standing Long Jump (187CM.) (73.62in.)
No ratings yet
Standing Long Jump (187CM.) (73.62in.)
7 pages
Data Analytics 01: Id Attributes Are Usually Ignored by Modeling Algorithms Because They Are Only Used As Unique
No ratings yet
Data Analytics 01: Id Attributes Are Usually Ignored by Modeling Algorithms Because They Are Only Used As Unique
2 pages
Economic Issues On Globalization in Education
100% (1)
Economic Issues On Globalization in Education
4 pages
Compare and Contrast Linear Model, Interaction Model, and Transactional Model Using A Venn Diagram and Answer The Following Questions. (20 Points)
No ratings yet
Compare and Contrast Linear Model, Interaction Model, and Transactional Model Using A Venn Diagram and Answer The Following Questions. (20 Points)
1 page
Topic: Stress Management Target Learner: Grade 11 & Grade 12
No ratings yet
Topic: Stress Management Target Learner: Grade 11 & Grade 12
6 pages
Entrepreneurship: Module 1: Quarter 0 - Week 1
100% (6)
Entrepreneurship: Module 1: Quarter 0 - Week 1
14 pages
Model, Render, Animate: Houdini Foundations
No ratings yet
Model, Render, Animate: Houdini Foundations
24 pages
Engine Parts Catalogue 2013
100% (1)
Engine Parts Catalogue 2013
4 pages
11TH Computer Application EM Unit Test
No ratings yet
11TH Computer Application EM Unit Test
3 pages
Spark
No ratings yet
Spark
6 pages
Polyglot Review
No ratings yet
Polyglot Review
20 pages
P2 - Explain The Audience Profiles of Different Social Media Websites
No ratings yet
P2 - Explain The Audience Profiles of Different Social Media Websites
4 pages
J1939-84 - OBD Communications Compliance Test Cases For Heavy Duty Components and Vehicles - 2015-02
No ratings yet
J1939-84 - OBD Communications Compliance Test Cases For Heavy Duty Components and Vehicles - 2015-02
101 pages
Neccesity of Interrupts
No ratings yet
Neccesity of Interrupts
8 pages
Wireframe 47
No ratings yet
Wireframe 47
116 pages
Basic Pentesting - 1 Walkthrough - Vulnhub - by Dinidhu Jayasinghe - InfoSec Write-Ups
No ratings yet
Basic Pentesting - 1 Walkthrough - Vulnhub - by Dinidhu Jayasinghe - InfoSec Write-Ups
6 pages
Snooker King System Installation and Wiring Manual: Company Address
No ratings yet
Snooker King System Installation and Wiring Manual: Company Address
4 pages
Data Mining & Database Systems Guide
No ratings yet
Data Mining & Database Systems Guide
6 pages
Sonali Garad Resume
No ratings yet
Sonali Garad Resume
1 page
3.3 UDP Video Slides
No ratings yet
3.3 UDP Video Slides
16 pages
Online Whiteboard
No ratings yet
Online Whiteboard
7 pages
Information Literacy Syllabus
No ratings yet
Information Literacy Syllabus
5 pages
Dormakaba Technical Brochure Slidingdoor Operator ES200T en
No ratings yet
Dormakaba Technical Brochure Slidingdoor Operator ES200T en
12 pages
MVP Monk Build Ragnarok Classic
No ratings yet
MVP Monk Build Ragnarok Classic
2 pages
Outlines Research & Development Company Profile 2025
No ratings yet
Outlines Research & Development Company Profile 2025
36 pages
Networks and Security Lecture Notes 1
No ratings yet
Networks and Security Lecture Notes 1
131 pages
Software-Based Phishing Defense
No ratings yet
Software-Based Phishing Defense
23 pages
KPI - Product Data Quality SCM Vs SAP
No ratings yet
KPI - Product Data Quality SCM Vs SAP
10 pages
Assignment No 4
No ratings yet
Assignment No 4
6 pages
JRM Pharmacy System
No ratings yet
JRM Pharmacy System
12 pages
Accounting Information System: Midterm Examination
100% (1)
Accounting Information System: Midterm Examination
9 pages
University of Southern Mississippi Completion
No ratings yet
University of Southern Mississippi Completion
6 pages
Zoom Q2HD Handy Video Recorder - Zoom
No ratings yet
Zoom Q2HD Handy Video Recorder - Zoom
8 pages
Praveen Chandrasekaran - Google Search
No ratings yet
Praveen Chandrasekaran - Google Search
1 page
Canada Employment 2
No ratings yet
Canada Employment 2
7 pages
PACS Guidance for Sheffield NHS Staff
No ratings yet
PACS Guidance for Sheffield NHS Staff
11 pages

Data Analytics 02: Pivoting of Data. You Might Be Familiar With The Concept of Pivoting From BI Tools or Excel: Rotate

Uploaded by

Data Analytics 02: Pivoting of Data. You Might Be Familiar With The Concept of Pivoting From BI Tools or Excel: Rotate

Uploaded by

DATA ANALYTICS 02

PIVOTING AND RENAMING

1. Drag the Titanic data into the process.

1. Run the process.

You might also like