0% found this document useful (0 votes)
8 views9 pages

MOOC PART 3 in Gndu

Data analytics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

MOOC PART 3 in Gndu

Data analytics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

INTRODUCTION TO

MACHINE LEARNING

Machine learning is a type of AI (Artificial Intelligence) where


computers learn from data and get better at making predictions and
decisions over time.

In machine learning, systems called Artificial Neural Networks


(ANNs) are trained to find patterns in large amounts of data, helping
them make predictions based on new information. The better the
algorithm, the more accurate the predictions.

Here are some real-life examples of machine learning:

1. Digital assistants that understand voice commands and play


music or search the web.
2. Websites recommending products, movies, or songs based on
your past choices.
3. Spam detectors that block unwanted emails.
4. Medical systems that help doctors diagnose conditions using
images.
5. Self-driving cars that recognize their surroundings and make
decisions while driving.

As more data becomes available, computers get more powerful, and


scientists develop better algorithms, machine learning will play an
even bigger role in our lives.
Step 1: Preparing the Training Data Set

The training data set is sample data used to teach the machine
learning model how to solve a problem. It can be either labeled (with
features and classifications) or unlabeled (the model has to find
patterns on its own).

Whether labeled or not, the training data should be:

• Randomized
• Balanced
• Unbiased

The data is split into two parts:

• Training subset: Used to train the model.


• Evaluation subset: Used to test and improve the model.

Step 2: Choosing the Training Algorithm

The choice of algorithm depends on:

• Type of data (labeled or unlabeled)


• Amount of data
• The problem you want to solve

There are different machine learning algorithms available for both


labeled and unlabeled data.
For labeled training data:

• Regression algorithms: Used to find relationships in data.


o Linear regression predicts a value based on one variable
(e.g., predicting salary based on student records).
o Logistic regression is used for binary outcomes (yes/no).
o Support vector machines are good for complex
classification problems.
• Decision trees: Use rules to make recommendations (e.g.,
recommending a stock to buy based on data).
• Instance-based algorithms: Like K-Nearest Neighbor (k-
NN), classify data points by comparing them to nearby points.

For unlabeled training data:

• Clustering: Groups similar data points without prior


knowledge. Popular methods include K-means, Two-Step, and
Kohonen clustering.
• Association algorithms: Extract "if-then" rules from data
patterns, similar to data mining.
• Multilayer Feedforward Neural Networks: An ANN with
multiple layers where data moves through layers to reach
conclusions. Uses Backpropagation for learning. Deep neural
networks have many hidden layers to refine results.
Step 3: Training the System

Training an Artificial Neural Network (ANN) is done in multiple


rounds (epochs). Each epoch involves:

1. Input data is fed into the system.


2. Data moves through layers of the network.
3. The output is compared with the target output, and errors are
calculated.
4. Weights and biases are adjusted backward through the layers.

Step 4: Applying to Practical Data

5. Once trained, the ANN is used to solve real-world problems.


Over time, it can continue learning and improve based on new
data, like medical images or user browsing history, depending
on the task.

Supervised Machine Learning

In supervised learning, the system learns from labeled data, where


each input has a correct output. The system compares its actual output
with the correct one and adjusts when there's a mismatch. While it's
easier to use, preparing the training data is challenging, and there’s a
risk of overfitting, where the system becomes too specific to the
training data and struggles with new data.

Unsupervised Machine Learning

Unsupervised learning finds patterns and relationships in large


amounts of raw data without labeled examples. It groups similar data
into clusters, discovering hidden patterns instead of making decisions
or predictions.
Reinforcement Machine Learning

Reinforcement learning involves learning by trial and error without


using sample training data. Successful actions are reinforced to find
the best solution or policy.

Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are inspired by the human brain.


They consist of many connected processing units called neurons.
Each neuron processes input, sums it, and based on a rule, either
sends an output or doesn’t. When many neurons work together, they
can perform complex tasks like classification and clustering. ANNs
can learn, which makes them very useful in solving problems.
Machine Learning (ML) and Deep Learning (DL) are based on ANN
structures.

Deep Learning (DL)

Deep Learning (DL) uses complex neural networks with many layers
to automatically learn features from large data sets, like images, text,
or sound. It mimics human learning by examples. DL models are
trained with lots of labeled data and can learn without manual feature
extraction. Unlike regular machine learning, DL improves with more
data. DL's rise is due to the availability of big data and powerful
hardware like GPUs, which speed up training significantly.

Deep Learning Architecture

Different types of neural networks (ANNs) are used for various tasks.
For example, recurrent networks are good for language and speech,
while Convolutional Neural Networks (CNNs) are best for image
processing and classification.
In Deep Learning, the network has many layers of neurons. The first
is the input layer, followed by hidden layers, and the final output
layer. Ordinary machine learning uses 2-3 hidden layers, but deep
learning can have hundreds.

Convolutional Neural Networks (CNNs)

CNNs are specialized neural networks that excel at processing


images, speech, and audio. They have three main types of layers:

1. Convolutional Layer: This is the first layer that extracts basic


features like colors and edges.
2. Pooling Layer: This layer reduces the size of the data, helping
the network focus on important features.
3. Fully-Connected (FC) Layer: This is the final layer that
combines all the learned features to identify the target object.

Best Practices in Machine Learning (ML) and Deep Learning


(DL)

1. Choosing Between ML and DL:


o Use Deep Learning (DL) when you have a large amount
of data (thousands of images) and powerful GPUs for
processing.
o If you lack these conditions, stick to Machine Learning
(ML).
2. Common Uses of DL:
o DL is mainly used for object classification.
3. Methods to Work with DL:
o Training from Scratch: Requires a large set of labeled
data and can take days or weeks. Not usually
recommended unless necessary.
o Transfer Learning: Fine-tunes an existing model (like
AlexNet) with new data for a specific task (e.g.,
recognizing bicycles). It’s less intensive and needs less
data.
o Feature Extraction: Uses DL to extract features from
data, which are then input into an ML model, like Support
Vector Machines (SVM).
4. Efficiency:
o Combining GPUs with software tools (like MATLAB)
can drastically reduce training time from days to hours or
even minutes.

A Black-box Approach to Regression Analysis

Regression Analysis is a statistical method used to find relationships


between variables and to predict the value of one variable (dependent)
based on another variable (independent).

Examples of Regression Analysis:

1. Advertisement Duration vs. Production Cost:


o Data: Duration of an ad film (in seconds) and its
production cost (in lakhs).
o Example:

Duration (sec) Cost (Lakh Rs)


10 8
25 22
30 25
60 47
Customers vs. Biryani Sales:

• Task: Predict the number of biryanis expected to be sold based


on customer visits.
• Data:

No. of Customers No. of Biryani Packets


517 215
410 189
630 230
285 122

Simple Regression

Simple Regression is a method that uses one independent variable (x)


to predict another dependent variable (y). The formula for simple
linear regression is:

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilony=β0+β1x+ϵ

• x: Independent variable (predictor)


• y: Dependent variable (response)
• β0: Y-intercept (where the line crosses the y-axis)
• β1: Slope of the line (how much y changes for each unit change
in x)
• ε: Random error (difference between the predicted and actual
values)

Multiple Regression

Multiple Regression uses two or more independent variables to


predict a dependent variable (y). It can be linear or nonlinear.

Linear Multiple Regression is expressed with the formula:


y=β0+β1x1+β2x2+…+βkxk+ϵy = \beta_0 + \beta_1 x_1 + \beta_2
x_2 + \ldots + \beta_k x_k + \epsilony=β0+β1x1+β2x2+…+βkxk+ϵ

• y: Dependent variable (what you're trying to predict)


• x₁, x₂, ... , xₖ: Independent variables (predictors)
• β0: Y-intercept (where the line crosses the y-axis)
• β1, β2, ... , βk: Slopes for each independent variable (how much
y changes for each predictor)
• ε: Random error (difference between predicted and actual
values)

Popular Data Analytic Tools

There are many tools for data analysis. Here are some popular ones:

1. Excel: A widely used spreadsheet software for calculations and


graphs. It’s user-friendly and accessible for everyone.
2. R: An open-source programming language great for statistical
analysis and data visualization. It has a simple syntax and is
ideal for complex computations.
3. Python: A versatile and readable programming language with a
rich library for various data analytics tasks. It's essential for data
analysts.
4. Apache Spark: Designed for analyzing large, unstructured data.
It can efficiently process vast amounts of data and distribute
tasks across multiple computers.
5. Tableau: A user-friendly tool that allows easy manipulation of
large datasets using a drag-and-drop interface.

Other tools include MS Power BI, KNIME, SAS, and Jupyter


Notebook.

You might also like