0% found this document useful (0 votes)

23 views29 pages

Ses3056 L04

Principles of data mining lecture

Uploaded by

Kamila Kargabaeva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views29 pages

Ses3056 L04

Principles of data mining lecture

Uploaded by

Kamila Kargabaeva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

L04: Principles of data

mining, artificial intelligence,

big data, and cloud
computing (previously L05)

1
Warm-up
Please answer a few questions on the screen.

2
Lesson Intended Learning Outcomes
At the end of the lesson, you should be able to:
1. Explain the meaning of statistics and why it is important in science.
2. Provide a high-level explanation of the working principles of the artificial
neural network.
3. Explain why big data and cloud computing are important for machine
learning and data analysis in general.
4. Suggest example applications of AI in environmental science.

3
Write a reflection of around 500 words in
English on the following:
◦ What have you learned about the concepts of
statistics and AI today?
Classwork ◦ What have you learned about applications of
Please follow the instructions and
AI in environmental science today?
submit your answer to Moodle as a ◦ What are the concepts or techniques covered
MS Word file.
today that you find most interesting?
◦ What are the concepts or techniques covered
today that you find most challenging?
Put your answers into a MS Word file and
submit the file to Moodle.

4
Prelude: Review of
statistics concepts
What is statistics and why do we need it?

5
What is statistics?
Statistics:

統計collect,
summarize and
draw conclusions

from data.

6
Example:
How many
ducks wear a
red hat?

Ducks image: © 2014 Shelly ʕ•ᴥ•ʔ . Licensed under CC-BY.

Santa hat image from public domain

7
Population (all the 20 ducks)

Step 1:
Collecting
data
The most straightforward
way is to survey all the ducks
available and count how
many ducks wear a red hat.

In this case, we are looking

at the population of all the
20 ducks.

Ducks image: © 2014 Shelly ʕ•ᴥ•ʔ . Licensed under CC-BY.

Santa hat image from public domain

8
Population (all the 20 ducks)

Step 1:
Collecting
data
But the previous approach is
not practical when there is a
huge number of ducks, or if
we are too lazy to count all the
ducks.

An alternative way is to select

Sample (5 ducks)
a small subset (sample) of the
ducks by random, and count
these sampled ducks instead.
Ducks image: © 2014 Shelly ʕ•ᴥ•ʔ . Licensed under CC-BY.
Santa hat image from public domain

9
1
0

Step 2: Summarizing data

Population Sample
(the randomly
(all the ducks)
picked ducks)

Total number of ducks 20 5

(A summary of data) (population size, N) (sample size, n)

Number of ducks with red hat 5 1

(A summary of data)

Proportion 5/20=1/4=25% 1/5=20%

(A summary of data)
Step 3: Drawing conclusions about the
data
Descriptive Statistics Inferential Statistics
Using data gathered from a group to Using data gathered from a group to
describe or draw conclusions about that infer (guess) conclusions about the
same group only population from which the group was
Meaning taken

“1/5 (20%) of the ducks in the sample “As 1/5 (20%) of the ducks in the sample
wear a red hat” wear a red hat, approximately 1/5 (20%)
Example of the ducks in the population should be
conclusion wearing a red hat.” (while the correct
answer is actually 1/4 (25%))

11
Statistics in environmental science
In addition to counting (i.e., frequency), we often
summarize scientific data in the following ways:
◦ Central tendency, e.g., mean, median, mode
◦ We want to estimate the center of the distribution of data.
◦ E.g., What is the average temperature at a certain location over the day?
◦ Variability, e.g., mean absolution deviation (MAD), variance, standard
deviation (square root of mean square deviation)
Click here for some simulations.
◦ We want to know how dispersive the data is.
◦ E.g., How much does the temperature vary over the day?
◦ More advanced “statistics”, e.g., classification, regression
◦ E.g., Can we classify the data into different groups? Can we predict future data?
◦ We usually call this data mining because we try to discover (mine) complex patterns
from the data.
◦ To do this, we use machine learning techniques (to be covered below).

14
Frequency

Types of data Proportions Mean

analysis Central tendency Median

Descriptive
statistics ...
Statistics
Inferential
statistics Range
Data analysis
Regression
MAD
Variability
Data mining (AI) Classification
Standard Deviation

...
...

15
Each student, please share one
thing you have learned about
Activity (0) statistics in this session.
Please post your answer on Padlet
and show your real name in the
post.

16
Part 1: Principles of data
mining and artificial
intelligence
What are data mining, artificial intelligence, machine learning, and
artificial neural network? What are their working principles?

17
AI: Science or Fiction?

18
How powerful is AI
nowadays?
https://www.youtube.com/watch?v=tF4DML7FIWk

https://youtube.com/watch?v=SPb
TKfu0zUY&si=jQo5zqdcP6guKpD8

INT4029 WEB INTELLIGENCE 19

The ABCD of modern technologies
Artificial Intelligence Blockchain

Modern
technologies

Cloud Computing Data (or Big Data)

20
How do machines learn?
Computers are not intelligent at all. They are simply exceptionally good at following instructions
and processing data, so good that it looks like intelligence.

Training
Data Performs tasks like
human
(rules and examples;
(“Intelligence”)
mostly big data
these days) E.g., Chess player, self-driving cars,
Computer program chatbots, suggested keywords,
running on a fast computer “People who buy this also buy that”,
(could be a cloud computer) auto-correction, machine translation,
FaceID, …

22
Rule-based AI (e.g., Expert systems)

A high-level IF it is a polygon
with 3 sides
Training

example
THEN it is a triangle
ELSE it is not a
triangle
…
How to teach the AI to distinguish
between triangles and non-triangles?
Data Computer program
“Intelligence”
(rules) running on a fast computer
Remarks:

The “rule-based AI” at the top is the first generation Example-based AI (e.g., Neural networks)
AI in which the rules of making decisions are hard-
coded into the program. This could be useful but only
for relatively simple problems in which those rules
are well-defined and not changing.
Training
The “example-based AI” is a more power way of
machine learning. Specifically, the example presented
here is called “supervised learning” because it relies
on input/output pairs of data. These pairs provide
supervision to the learning of the algorithm. There is
“unsupervised learning” as well, in which there are
inputs but no outputs in the data. We will talk about
these in more detail in the practical sessions later. Data Computer program
“Intelligence”
(examples) running on a fast computer

23
What happens inside the neural network
in this case:
1. During training, we learn from the examples the
input (the shape) and the output (the answer). Is it a triangle?
Connections
Some
2. The algorithm calculates the values of the weights with different
mathematics
weights
of the connections to the hidden layer so that the happen here

input would generate correct outputs according to

the examples. This is called “creating the model”. Some

3. After the training, we provide the input, and the

mathematics
happen here
Yes
algorithm uses the previously calculated weights to
perform the mathematical transformations in the Some
middle layer to generate the outputs (prediction / mathematics
happen here
No
classification).

Some
mathematics
happen here

27
Let’s create our own image recognition program
here: https://teachablemachine.withgoogle.com.
Activity (1): The 1. Download the training set and testing set from
Teachable Machine Training and Testing Data -
Teachable Google Drive.
Machine 2. Upload the training image samples on the left.
3. Train the model.
4. Upload the testing images one by one and see
how well the machine classifies the image.
Step-by-step instructions can also be found on
Moodle.

29
Explore the following different scenarios in the
Teachable Machine and retrain the model to see the
results.
◦ Use one image sample from the training set as test input.
Activity (1) ◦ Use some wrong images in the training samples. Retrain.
(cont’) ◦ Reduce the number of photos in the training sets. Retrain.
◦ Use some completely irrelevant images for testing. Retrain.
◦ Use different numbers of training cycles (epochs). Retrain.

Describe what you find.

30
Part 2: Application
examples of AI
What are some real life examples of AI application in
environmental science?

31
Example usage (1): Prediction of indoor PM2.5 level
https://pubs.acs.org/doi/full/10.1021/acs.est.0c02549

32
Example usage (2): Rainfall prediction
https://create.arduino.cc/projecthub/kutluhan-aktar/iot-tensorflow-weather-station-predicts-rainfall-intensity-
534efe?ref=search&ref_id=neural%20network&offset=5
33
Example usage (3): Garden monitor
https://create.arduino.cc/projecthub/james-yu/an-urban-garden-monitor-
cc1c13?ref=search&ref_id=neural%20network&offset=18
34
Example usage (4): Water quality
https://create.arduino.cc/projecthub/clean-water-ai/clean-water-ai-e40806?ref=search&ref_id=neural%20network&offset=39

35
Work in groups of 4-6 to suggest one
example application of machine learning in
environmental science that has not been
mentioned in this lesson before.
Activity (2) Report the following on Padlet:
◦ What is the problem being solved?
◦ What type of data is the AI learning from?
◦ How does the AI solve this problem?
Send a representative to present your
answers to the class.

Computer Science in The Age of AI-Exploring and Classifying Data
No ratings yet
Computer Science in The Age of AI-Exploring and Classifying Data
9 pages
Chapter1 ML
No ratings yet
Chapter1 ML
101 pages
BCS602 Module 1 PDF
No ratings yet
BCS602 Module 1 PDF
36 pages
Eem520l1 2023
No ratings yet
Eem520l1 2023
20 pages
Day 1
No ratings yet
Day 1
54 pages
ML Module 1 (Bcs602)
No ratings yet
ML Module 1 (Bcs602)
48 pages
ML Notes (BCS602)
No ratings yet
ML Notes (BCS602)
186 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Machine Learning
No ratings yet
Machine Learning
137 pages
Ipmv Mod 6
No ratings yet
Ipmv Mod 6
99 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
Data Science
No ratings yet
Data Science
33 pages
CH 1. AI Project Life Cycle
No ratings yet
CH 1. AI Project Life Cycle
14 pages
Practical Machine Learning Illustrated With KNIME - Yu Geng
100% (1)
Practical Machine Learning Illustrated With KNIME - Yu Geng
312 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Machine Learning with Python
100% (1)
Machine Learning with Python
31 pages
ML-Unit 1
No ratings yet
ML-Unit 1
101 pages
Steven Skiena-The Algorithm Design Manual-En
50% (2)
Steven Skiena-The Algorithm Design Manual-En
27 pages
Textbook ML - Removed - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed - Removed
40 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
28 pages
AIML Module-3
No ratings yet
AIML Module-3
31 pages
Unit 3 - Data Science, Machine Learning
No ratings yet
Unit 3 - Data Science, Machine Learning
20 pages
Intelligent Systems 1
No ratings yet
Intelligent Systems 1
38 pages
Internship Project.1
100% (1)
Internship Project.1
32 pages
Machine Learning: Instructor: Prof. Ayesha
No ratings yet
Machine Learning: Instructor: Prof. Ayesha
31 pages
Chap1 Intro-2
No ratings yet
Chap1 Intro-2
34 pages
Supervised Learning (WWW - Anuupdates.org)
No ratings yet
Supervised Learning (WWW - Anuupdates.org)
60 pages
T1 Scheme 24 25
No ratings yet
T1 Scheme 24 25
5 pages
Ad8552 ML Unit V
No ratings yet
Ad8552 ML Unit V
78 pages
Textbook ML - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed
42 pages
MAchine Learning Notes
No ratings yet
MAchine Learning Notes
41 pages
ML Module1 Notes
No ratings yet
ML Module1 Notes
176 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
34 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
Comp Magzine 2020 Issue I 1 - Compressed 1
No ratings yet
Comp Magzine 2020 Issue I 1 - Compressed 1
35 pages
Data Science
No ratings yet
Data Science
25 pages
Ai Class 10
No ratings yet
Ai Class 10
78 pages
Syllabus
No ratings yet
Syllabus
7 pages
ML Overview
No ratings yet
ML Overview
26 pages
MAT8033 Lecture Slides
No ratings yet
MAT8033 Lecture Slides
29 pages
MAT8033 Lecture Slides
No ratings yet
MAT8033 Lecture Slides
62 pages
11 Ai Level 1 Notes
No ratings yet
11 Ai Level 1 Notes
8 pages
Final Unit 4
No ratings yet
Final Unit 4
107 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
34 pages
Aws ML
No ratings yet
Aws ML
125 pages
Analisis de Datos MIT
No ratings yet
Analisis de Datos MIT
340 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Unit - I & II
No ratings yet
Unit - I & II
59 pages
Slides - Intro
No ratings yet
Slides - Intro
23 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
83 pages
Module 1
No ratings yet
Module 1
100 pages
Learning Probabilistic Graphical Models in R - Sample Chapter
No ratings yet
Learning Probabilistic Graphical Models in R - Sample Chapter
37 pages
1 Introduction
No ratings yet
1 Introduction
81 pages
Machine Report
No ratings yet
Machine Report
29 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Big Data Data Mining and Data Science - George Dimitoglou
No ratings yet
Big Data Data Mining and Data Science - George Dimitoglou
386 pages
A Course in Machine Learning
No ratings yet
A Course in Machine Learning
50 pages
Module 1 Machine Learning
No ratings yet
Module 1 Machine Learning
56 pages
Форма
No ratings yet
Форма
1 page
Screenshot 2024-11-19 at 4.42.09 PM
No ratings yet
Screenshot 2024-11-19 at 4.42.09 PM
27 pages
Curriculum
No ratings yet
Curriculum
411 pages
Lecture Design Thinking 2024 Env Tech
No ratings yet
Lecture Design Thinking 2024 Env Tech
89 pages
Environmental Data & Micro:bit Intro
No ratings yet
Environmental Data & Micro:bit Intro
55 pages
SES3056 L07 Previously L06 Fundamentals of Data Preparation
No ratings yet
SES3056 L07 Previously L06 Fundamentals of Data Preparation
21 pages
L11 - Class
No ratings yet
L11 - Class
44 pages
Lesson Plan Psychology of Learners Group 5 Final Output
No ratings yet
Lesson Plan Psychology of Learners Group 5 Final Output
5 pages
The Effects of Information Technology IT On Employee Productivity in Shahr Bank Case Study of Shiraz Iran
No ratings yet
The Effects of Information Technology IT On Employee Productivity in Shahr Bank Case Study of Shiraz Iran
7 pages
@vtucode - in Module 1 RM 2021 Scheme 5th Semester
No ratings yet
@vtucode - in Module 1 RM 2021 Scheme 5th Semester
16 pages
Q1 W4 Module 4
No ratings yet
Q1 W4 Module 4
22 pages
Art of Listening
No ratings yet
Art of Listening
34 pages
Sigtips en Final 2011
No ratings yet
Sigtips en Final 2011
18 pages
Quiz 1-PE 101
No ratings yet
Quiz 1-PE 101
1 page
Grade 9 English: Spotting Faulty Logic
No ratings yet
Grade 9 English: Spotting Faulty Logic
1 page
Mil-Week-1 DLL
No ratings yet
Mil-Week-1 DLL
5 pages
Read and Writing m1 s2 (DM)
No ratings yet
Read and Writing m1 s2 (DM)
22 pages
Year 5 and 6 Independent Writing Activities.
No ratings yet
Year 5 and 6 Independent Writing Activities.
48 pages
Literacy Learning in Content Areas
No ratings yet
Literacy Learning in Content Areas
9 pages
Creotec Philippines Company Profile Iequip-Ilearn-Learning Systems Ver 2.0
63% (8)
Creotec Philippines Company Profile Iequip-Ilearn-Learning Systems Ver 2.0
2 pages
SPM
No ratings yet
SPM
24 pages
The Impact of RPG Games To Enhance Reading Skill, Irsyadul Anam
No ratings yet
The Impact of RPG Games To Enhance Reading Skill, Irsyadul Anam
8 pages
Ray 2000
No ratings yet
Ray 2000
3 pages
Understanding Pidgin and Creole
No ratings yet
Understanding Pidgin and Creole
17 pages
(Ebook PDF) Educational Psychology, 2nd Edition Download
100% (5)
(Ebook PDF) Educational Psychology, 2nd Edition Download
55 pages
Technical Education in India:: Challenges and Prospects
No ratings yet
Technical Education in India:: Challenges and Prospects
4 pages
NLC English 9 Consolidation LP v.1
No ratings yet
NLC English 9 Consolidation LP v.1
95 pages
ASSESSMENT TASK 1 - Practical Demonstration
100% (1)
ASSESSMENT TASK 1 - Practical Demonstration
2 pages
Detailed Lesson Plan (DLP) Format: Instructional Planning
No ratings yet
Detailed Lesson Plan (DLP) Format: Instructional Planning
5 pages
Hangman
No ratings yet
Hangman
2 pages
Motivation and Reinforcement Notes
No ratings yet
Motivation and Reinforcement Notes
8 pages
Report Writing Test PDF
No ratings yet
Report Writing Test PDF
3 pages
Assignment Cover Sheet: A001426098 Gopi Rajaratnam Aib 12 Month Mba Strategic Human Resource Management NA 16 2720
No ratings yet
Assignment Cover Sheet: A001426098 Gopi Rajaratnam Aib 12 Month Mba Strategic Human Resource Management NA 16 2720
16 pages
Outline Seminar: Name: Ni Nyoman Ratih Purnama Sari NIM: 1701541077
No ratings yet
Outline Seminar: Name: Ni Nyoman Ratih Purnama Sari NIM: 1701541077
4 pages
VI - Essentials of Psychological Testing
100% (1)
VI - Essentials of Psychological Testing
26 pages
3 Claims 4 Validities
No ratings yet
3 Claims 4 Validities
21 pages
Geent01x Course Outline
No ratings yet
Geent01x Course Outline
2 pages