0% found this document useful (0 votes)
23 views29 pages

Ses3056 L04

Principles of data mining lecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views29 pages

Ses3056 L04

Principles of data mining lecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

L04: Principles of data

mining, artificial intelligence,


big data, and cloud
computing (previously L05)

1
Warm-up
Please answer a few questions on the screen.

2
Lesson Intended Learning Outcomes
At the end of the lesson, you should be able to:
1. Explain the meaning of statistics and why it is important in science.
2. Provide a high-level explanation of the working principles of the artificial
neural network.
3. Explain why big data and cloud computing are important for machine
learning and data analysis in general.
4. Suggest example applications of AI in environmental science.

3
Write a reflection of around 500 words in
English on the following:
◦ What have you learned about the concepts of
statistics and AI today?
Classwork ◦ What have you learned about applications of
Please follow the instructions and
AI in environmental science today?
submit your answer to Moodle as a ◦ What are the concepts or techniques covered
MS Word file.
today that you find most interesting?
◦ What are the concepts or techniques covered
today that you find most challenging?
Put your answers into a MS Word file and
submit the file to Moodle.

4
Prelude: Review of
statistics concepts
What is statistics and why do we need it?

5
What is statistics?
Statistics:

統計collect,
summarize and
draw conclusions

from data.

6
Example:
How many
ducks wear a
red hat?

Ducks image: © 2014 Shelly ʕ•ᴥ•ʔ . Licensed under CC-BY.


Santa hat image from public domain

7
Population (all the 20 ducks)

Step 1:
Collecting
data
The most straightforward
way is to survey all the ducks
available and count how
many ducks wear a red hat.

In this case, we are looking


at the population of all the
20 ducks.

Ducks image: © 2014 Shelly ʕ•ᴥ•ʔ . Licensed under CC-BY.


Santa hat image from public domain

8
Population (all the 20 ducks)

Step 1:
Collecting
data
But the previous approach is
not practical when there is a
huge number of ducks, or if
we are too lazy to count all the
ducks.

An alternative way is to select


Sample (5 ducks)
a small subset (sample) of the
ducks by random, and count
these sampled ducks instead.
Ducks image: © 2014 Shelly ʕ•ᴥ•ʔ . Licensed under CC-BY.
Santa hat image from public domain

9
1
0

Step 2: Summarizing data


Population Sample
(the randomly
(all the ducks)
picked ducks)

Total number of ducks 20 5


(A summary of data) (population size, N) (sample size, n)

Number of ducks with red hat 5 1


(A summary of data)

Proportion 5/20=1/4=25% 1/5=20%


(A summary of data)
Step 3: Drawing conclusions about the
data
Descriptive Statistics Inferential Statistics
Using data gathered from a group to Using data gathered from a group to
describe or draw conclusions about that infer (guess) conclusions about the
same group only population from which the group was
Meaning taken

“1/5 (20%) of the ducks in the sample “As 1/5 (20%) of the ducks in the sample
wear a red hat” wear a red hat, approximately 1/5 (20%)
Example of the ducks in the population should be
conclusion wearing a red hat.” (while the correct
answer is actually 1/4 (25%))

11
Statistics in environmental science
In addition to counting (i.e., frequency), we often
summarize scientific data in the following ways:
◦ Central tendency, e.g., mean, median, mode
◦ We want to estimate the center of the distribution of data.
◦ E.g., What is the average temperature at a certain location over the day?
◦ Variability, e.g., mean absolution deviation (MAD), variance, standard
deviation (square root of mean square deviation)
Click here for some simulations.
◦ We want to know how dispersive the data is.
◦ E.g., How much does the temperature vary over the day?
◦ More advanced “statistics”, e.g., classification, regression
◦ E.g., Can we classify the data into different groups? Can we predict future data?
◦ We usually call this data mining because we try to discover (mine) complex patterns
from the data.
◦ To do this, we use machine learning techniques (to be covered below).

14
Frequency

Types of data Proportions Mean

analysis Central tendency Median


Descriptive
statistics ...
Statistics
Inferential
statistics Range
Data analysis
Regression
MAD
Variability
Data mining (AI) Classification
Standard Deviation

...
...

15
Each student, please share one
thing you have learned about
Activity (0) statistics in this session.
Please post your answer on Padlet
and show your real name in the
post.

16
Part 1: Principles of data
mining and artificial
intelligence
What are data mining, artificial intelligence, machine learning, and
artificial neural network? What are their working principles?

17
AI: Science or Fiction?

18
How powerful is AI
nowadays?
https://www.youtube.com/watch?v=tF4DML7FIWk

https://youtube.com/watch?v=SPb
TKfu0zUY&si=jQo5zqdcP6guKpD8

INT4029 WEB INTELLIGENCE 19


The ABCD of modern technologies
Artificial Intelligence Blockchain

Modern
technologies

Cloud Computing Data (or Big Data)

20
How do machines learn?
Computers are not intelligent at all. They are simply exceptionally good at following instructions
and processing data, so good that it looks like intelligence.

Training
Data Performs tasks like
human
(rules and examples;
(“Intelligence”)
mostly big data
these days) E.g., Chess player, self-driving cars,
Computer program chatbots, suggested keywords,
running on a fast computer “People who buy this also buy that”,
(could be a cloud computer) auto-correction, machine translation,
FaceID, …

22
Rule-based AI (e.g., Expert systems)

A high-level IF it is a polygon
with 3 sides
Training

example
THEN it is a triangle
ELSE it is not a
triangle

How to teach the AI to distinguish
between triangles and non-triangles?
Data Computer program
“Intelligence”
(rules) running on a fast computer
Remarks:

The “rule-based AI” at the top is the first generation Example-based AI (e.g., Neural networks)
AI in which the rules of making decisions are hard-
coded into the program. This could be useful but only
for relatively simple problems in which those rules
are well-defined and not changing.
Training
The “example-based AI” is a more power way of
machine learning. Specifically, the example presented
here is called “supervised learning” because it relies
on input/output pairs of data. These pairs provide
supervision to the learning of the algorithm. There is
“unsupervised learning” as well, in which there are
inputs but no outputs in the data. We will talk about
these in more detail in the practical sessions later. Data Computer program
“Intelligence”
(examples) running on a fast computer

23
What happens inside the neural network
in this case:
1. During training, we learn from the examples the
input (the shape) and the output (the answer). Is it a triangle?
Connections
Some
2. The algorithm calculates the values of the weights with different
mathematics
weights
of the connections to the hidden layer so that the happen here

input would generate correct outputs according to


the examples. This is called “creating the model”. Some

3. After the training, we provide the input, and the


mathematics
happen here
Yes
algorithm uses the previously calculated weights to
perform the mathematical transformations in the Some
middle layer to generate the outputs (prediction / mathematics
happen here
No
classification).

Some
mathematics
happen here

27
Let’s create our own image recognition program
here: https://teachablemachine.withgoogle.com.
Activity (1): The 1. Download the training set and testing set from
Teachable Machine Training and Testing Data -
Teachable Google Drive.
Machine 2. Upload the training image samples on the left.
3. Train the model.
4. Upload the testing images one by one and see
how well the machine classifies the image.
Step-by-step instructions can also be found on
Moodle.

29
Explore the following different scenarios in the
Teachable Machine and retrain the model to see the
results.
◦ Use one image sample from the training set as test input.
Activity (1) ◦ Use some wrong images in the training samples. Retrain.
(cont’) ◦ Reduce the number of photos in the training sets. Retrain.
◦ Use some completely irrelevant images for testing. Retrain.
◦ Use different numbers of training cycles (epochs). Retrain.

Describe what you find.

30
Part 2: Application
examples of AI
What are some real life examples of AI application in
environmental science?

31
Example usage (1): Prediction of indoor PM2.5 level
https://pubs.acs.org/doi/full/10.1021/acs.est.0c02549

32
Example usage (2): Rainfall prediction
https://create.arduino.cc/projecthub/kutluhan-aktar/iot-tensorflow-weather-station-predicts-rainfall-intensity-
534efe?ref=search&ref_id=neural%20network&offset=5
33
Example usage (3): Garden monitor
https://create.arduino.cc/projecthub/james-yu/an-urban-garden-monitor-
cc1c13?ref=search&ref_id=neural%20network&offset=18
34
Example usage (4): Water quality
https://create.arduino.cc/projecthub/clean-water-ai/clean-water-ai-e40806?ref=search&ref_id=neural%20network&offset=39

35
Work in groups of 4-6 to suggest one
example application of machine learning in
environmental science that has not been
mentioned in this lesson before.
Activity (2) Report the following on Padlet:
◦ What is the problem being solved?
◦ What type of data is the AI learning from?
◦ How does the AI solve this problem?
Send a representative to present your
answers to the class.

36

You might also like