0% found this document useful (0 votes)
42 views57 pages

Vaja Prince

The document is a report on a summer internship at BrainyBeam, focusing on machine learning and Python programming. It outlines the internship's objectives, activities, and the tools used, including data management and visualization technologies. The report emphasizes the importance of internships for skill development, industry exposure, and enhanced employability.

Uploaded by

silentworld966
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views57 pages

Vaja Prince

The document is a report on a summer internship at BrainyBeam, focusing on machine learning and Python programming. It outlines the internship's objectives, activities, and the tools used, including data management and visualization technologies. The report emphasizes the importance of internships for skill development, industry exposure, and enhanced employability.

Uploaded by

silentworld966
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

A Report On

SUMMER INTERNSHIP – II (4350704)

At

SUBMITTED BY:

Vaja Prince Jiteshbhai

ENROLLMENT No. 229780307063

INDUSTRY MENTOR NAME:

Sagar Jasani
DURATION OF INTERNSHIP: 27/06/2024 to 07/08/2024 (Six Weeks)

SUBMITTED TO

COMPUTER ENGINEERING DEPARTMENT


Asiatic Institute of Science & Technology

Campus (Gondal)

1
2024 – 2025 COMPLATION CERTIFICATE

2
INDUSTRY LETTERHEAD

3
DECLARATION

I hereby declare that the Summer Internship Project Report titled


“Machine learning” in (Brainy Beam of the Company) is a result of our work
and my indebtedness to other work publications, references, if any, have been
duly acknowledged. If I am found guilty of copying from any other report or
published information and showing as my original work, or extending
plagiarism limit, I understand that I shall be liable and punishable by the
university, which may include ‘Fail’ in examination or any other punishment
that university may decide.

Enrolment No Student Name


229780307063 Vaja Prince

Date: - 27/06/2024

4
GUJARAT TECHNOLOGICAL UNIVERSITY
Syllabus for Master of Business Administration (Part-Time), 5th Semester
Subject Name: Summer Internship Project (SIP) Subject Code: 4350704

This is to certify that project work embodied in this report entitled Air Quality Index
was carried out by VajaPrince-229780307063 of Asiatic Institute of Science &
Technology-978.

The report is approved / not approved.

Comments of External Examiner:

This report is for the partial fulfilment of the requirement of the award of the degree of Master of
Business Administration offered by Gujarat Technological University.

(Examiner’s
Sign) Name of
Examiner:
Institute Name:
Institute Code:

Date:

5
COMPANY PROFILE

Company Name: BrainyBeam.


Address: Office at 2nd Floor, Dhanlaxmi Chambers, Near Gujrat
Vidhyaphith, Ashram Road, Ahmedabad
Contact no: 9033237336
Email id: Contact@gmail.com
Website: https://brainybeaminfotech.com/internship.php

About: -

BrainyBeam Info-Tech is a mix of skills and technologies that


modify the world around us. The main motivation of our team is to
develop applications that will impress and impress people for quality,
convenience and functionalities in order to make them essential for their
users. We have as fundamental characteristic the fulfilment of the
stipulated deadlines and present a competitive price to our clients. We
work and do our best to turn your dreams into apps!

We have worked on several domains for Android Apps, iOS Apps,


Windows Apps, Cross platform Apps. We have also worked different
web development platforms. Few of our apps and websites are
mentioned below.

6
ACKNOWLEDGEMENT

First and foremost, I would like to thank Asiatic Institute of Science & Technology

Campus (Gondal) forgiving me a golden opportunity to pursue my Diploma in Computer

Engineering. It was a great learning experience of my life.

I would also like to thank " BrainyBeam " for granting me permission to undergo

my internship-2 carried on at “Office at 2nd Floor, Dhanlaxmi Chambers, Near Gujrat

Vidhyaphith, Ashram Road, Ahmedabad” I would like to thank “Mr. Sagar Jasani” for

their continuous support, guidance time, effort in conducting my internship at your

Company.

Mainly I would like to thank “Ms. Janvi Ramani & Lecturer.”, for providing

continuous guideline and support during the whole internship and preparing this report.

Without his/ her support this report would not have sin the Internship.

Head of Department

Ms. Bansi Bhanvadiya

7
TABLE OF CONTENT

NO TOPICS PAGE NO

1. Introduction 08

2. Abstract 10

3. Objectives 11

4. Tools/Technology 12

5. Introduction to Internship & Project 24

6. Internship Activity & Planning 30

7. Project Overview 40

8. Conclusion 54

8
Roles and responsibilities during internship

Daily tasks and activities

DATE ACTIVITIES
27/6/2024 Basic knowledge about
NUMPY
1/ 7/2024 Pandas Import Dataset Library
Multidaimational Array

8/7/2024 s Normalization K-nearest

13/7/2024 Linear regression math

17/7/2024 Matplotlib Pyplot

19/7/2024 Encoding ,Fit_transfrom

21/7/2024 Logistic regression ,Data frame

28/7/2024 Pyplot,Catplot,Pairplot,Heatmap

3/8/2024 Calculating AQI

7/8/2024 How to run AQI Predation

9
1

INTRODUCTION

1.1 Introduction

During my 6-week internship at [BrainyBeam], I engaged in a


comprehensive study of the Machine learning, known for its versatility,
robustness, and cross-platform capabilities. The internship aimed to provide a
thorough understanding of Python core principles and advanced features,
equipping me with the skills needed to develop efficient and scalable software
solutions.

Certainly! Machine learning (ML) is a field of artificial intelligence (AI)


that focuses on developing algorithms and statistical models that enable
computers to perform tasks without explicit instructions. Instead, they learn from
and make predictions or decisions based on data. Here’s a detailed overview of
machine learning, including key concepts, types, techniques, and applications.

10
1.2 Importance of Internship Programs

1. Skill Development: Interns gain valuable skills that are directly relevant to their
career interests, enhancing their technical and soft skills.
2. Industry Exposure: Exposure to industry practices, tools, and methodologies helps
interns understand the professional environment and expectations.
3. Networking Opportunities: Internships provide opportunities to connect with industry
professionals, mentors, and peers, which can be beneficial for future career prospects.
4. Enhanced Employability: Practical experience gained during internships makes
candidates more attractive to potential employers, increasing their chances of
securing full-time positions.
5. Career Clarity: Internships help students explore different roles and industries, aiding
them in making informed career choices.

Practical Experience:

 Real-World Application: Internships allow students to apply theoretical knowledge


gained in the classroom to real-world problems.
 Skill Development: Interns gain practical skills and hands-on experience that are often
not covered in academic programs.

Innovation and Fresh Ideas:

 Diverse Input: Interns often bring new approaches and ideas that can inspire
innovation and improvements within the organization.

11
2

ABSTRACT

This internship report provides an overview of my experience and learning outcomes


during a 6-week internship at [BrainyBeam]. The primary focus of the internship was on
studying and applying the Python language, including both Core and Advanced Python
concepts. The report outlines the objectives, methodologies, and key projects undertaken
during the internship.

Throughout the internship, I was involved in several key activities, including data
Preprocessing, feature engineering, model training, and evaluation. I utilized various machine
learning techniques and tools, such as [Linear Regression, Random Forest e.g., decision trees,
neural networks, Tensor Flow, or Scikit-Learn], to tackle [Air Quality Index on, e.g., a
customer churn prediction problem or a sentiment analysis task].

One of the significant contributions was [Conducted an in-depth analysis of the existing
dataset to identify potential areas for improvement improving the model’s accuracy by 15%
through hyper parameter tuning and feature selection]. The internship provided me with hands-
on experience in [Python Libraries, Mathematic Function e.g., applying advanced machine
learning algorithms, managing large datasets, and interpreting model results].

The experience was invaluable in enhancing my understanding of practical machine learning


applications and the challenges associated with them. It also allowed me to develop both
technical and soft skills, including [Data Visualization, Machine Learning Algorithms,
Teamwork and Collaboration e.g., data analysis, teamwork, and problem-solving].

12
3

OBJECTIVE

 SIP aims at widening the student's perspective by providing an exposure to real life
organizational environment and its various functional activities.

 This will enable the students to explore an industry/organization, build a relationship with
a prospective employer, or simply hone their skills in a familiar field.

 SIP also provides invaluable knowledge and networking experience to the students.
During the internship, the student has the chance to put whatever he/she learned in the 3
year of Diploma Engineering into practice while working on a business plan or trying out
a new industry, job function or organization.

 The organization, in turn, benefits from the objective and unbiased perspective the
student provides based on concepts and skills imbibed in the third year at the Diploma
Engineering institute. The summer interns also serve as unofficial spokespersons of the
organization and help in image building on campus.

 An additional benefit that organizations may derive is the unique opportunity to evaluate
the student from a long-term perspective. Thus the SIP can become a gateway for final
placement of the student.

 The student should ensure that the data and other information used in the study report is
obtained with the permission of the institution concerned. The students should also
behave ethically and honestly with the organization.

13
4

4.1TOOLS AND TECHNOLOGY

 During the internship, we downloaded and utilized several key tools, Machine
learning (ML) tools and technologies are diverse and continually evolving.
Here's a rundown of some of the most widely used When working on machine
learning projects, an editor can significantly impact your productivity and
workflow. Here’s a breakdown of popular editors and IDEs (Integrated
Development Environments) used in the field

Data Management and Preparation Tools

 Pandas:
o Overview: A Python lfor data manipulation and analysis.
o Key Features: Data structures like Data Frame for handling structured data.
o Use Cases: Data cleaning, manipulation, and analysis.
 Apache Spark:
o Overview: An open-source unified analytics engine for large-scale data
processing.
o Key Features: Fast data processing, support for real-time analytics.
o Use Cases: Big data processing, ETL (Extract, Transform, Load) tasks.

Visualization Tools

 Matplotlib:
o Overview: A Python plotting library for creating static, animated, and
interactive visualizations.
o Key Features: Versatile plotting capabilities.
o Use Cases: Creating graphs and plots to visualize data and results.
 Seaborn:
o Overview: A Python library based on Matplotlib that provides a high-level
interface for drawing attractive statistical graphics.
o Key Features: Built-in themes and colour palettes.
o Use Cases: Statistical data visualization.

14
4.2Emerging Technologies

 AutoML:
o Overview: Automated Machine Learning that simplifies the process of building
machine learning models.
o Key Features: Model selection, hyper parameter tuning automation.
o Use Cases: Making machine learning accessible to non-experts.
 Explainable AI (XAI):
o Overview: Techniques to make machine learning models more interpretable
and understandable.
o Key Features: Model transparency, trust-building.
o Use Cases: Ensuring model decisions are understandable and trustworthy.

1. Data Preparation and Cleaning

 Technology/Tools: Pandas, NumPy, Apache Spark


 Description: This involves cleaning and transforming raw data into a format suitable for
analysis. Tools and libraries help handle missing values, normalize data, and perform
other Preprocessing tasks.

2. Model Training and Evaluation

 Technology/Tools: Scikit-learn, Tensor Flow, PyTorch, XGBoost


 Description: This phase includes training various ML models and evaluating their
performance using metrics such as accuracy, precision, recall, F1-score, and AUC-
ROC. Tools facilitate building, training, and fine-tuning models.

3. Visualization and Analysis

 Technology/Tools: Matplotlib, Seaborn, Plotly, Tableau, Power BI


 Description: Visualization tools help in understanding model performance and the
underlying data. They can display metrics, confusion matrices, feature importance, and
more, helping to convey insights effectively.

15
Tools
4.3 Visual Studio Code (VS Code):
VS Code is a popular, open-source code editor developed by Microsoft. It is
lightweight, highly customizable, and supports a wide range of programming
languages and frameworks through extensions.

 Image:

4.4 Sublime Text

 Features: Lightweight, fast, supports various programming languages, and offers a


powerful search and multi-editing feature.
 Extensions: Package Control for managing plugins, and various packages for Python
and data science.

16
Jupyter Notebook

 Creating a detailed and effective machine learning report in a


Jupyter Notebook involves several steps, each designed to make
your analysis transparent, reproducible, and insightful. Jupyter
Notebooks are ideal for this purpose because they allow you to
combine code, visualizations, and narrative text in a single,
interactive document. Here’s a comprehensive guide on how to
create a machine learning report in a Jupyter Notebook

 Best Practices

1. Clear Documentation: Use Markdown cells to clearly explain each step and
provide context.
2. Interactive Elements: Leverage interactive widgets or plots to make the notebook
engaging.
3. Reproducibility: Ensure all code cells are self-contained and executable in
sequence.
4. Visualization: Include meaningful plots and charts to visually represent data and
results.
5. Comments: Add comments in code cells to explain what each part of the code is
doing

17
 Machine learning with Python is highly popular due to its rich ecosystem of
libraries and frameworks that simplify the implementation of machine
learning algorithms. Python’s ease of use, extensive libraries, and strong
community support make it a preferred language for machine learning
projects. Here’s an overview of how to use Python for machine learning,
including key libraries, concepts, and a basic workflow.

Key Python Libraries for Machine Learning

1. NumPy
o Description: Fundamental package for scientific computing. It provides
support for large, multi-dimensional arrays and matrices, along with a
collection of mathematical functions.
o Usage: Data manipulation, numerical operations.

2. Pandas

o Description: Provides data structures and functions needed to work on


structured data seamlessly. It includes DataFrame and Series objects for
handling data.
o Usage: Data manipulation, analysis, and cleaning

3.Matplotlib

o Description: Comprehensive library for creating static, animated, and


interactive visualizations in Python.
o Usage: Plotting and visualizing data.

4. Seaborn

o Description: Based on Matplotlib, it provides a high-level interface for


drawing attractive statistical graphics.
o Usage: Advanced visualization and plotting

 Supervised learning

o supervised learning is a fundamental approach in machine learning, where the


goal is to learn from a labeled dataset to make predictions or decisions based
on new, unseen data. Here’s a more detailed look at supervised learning in
machine learning

18
Key Concepts in Supervised Learning

Labeled Data:

o Input Features: These are the variables or attributes used to make predictions.
For example, in a dataset predicting house prices, input features might include
square footage, number of bedrooms, and location.
o Output Labels: These are the known results or targets associated with each set
of input features. For example, in the house price prediction case, the label
would be the actual price of the house.

 Unsupervised learning
o Unsupervised learning is a type of machine learning where the algorithm is
trained on data that is not labeled, meaning the model learns patterns and
relationships without any predefined categories or outputs. Here’s a detailed
overview

Key Concepts in Unsupervised Learning

1. Unlabelled Data:
o Input Features: Data used in unsupervised learning doesn’t come with labels
or target values. Instead, the algorithm tries to discover the structure or patterns
within the data on its own.
2. Objectives:
o Clustering: Grouping similar data points together based on their features. For
example, clustering customers based on purchasing behaviour or segmenting
images into different categories.

19
 Reinforcement learning

o Reinforcement learning (RL) is a type of machine learning where an agent


learns to make decisions by interacting with an environment to maximize a
reward. Unlike supervised learning, where the model is trained on labeled data,
reinforcement learning involves learning from the consequences of actions,
typically through trial and error. Here’s a comprehensive overview

Key Concepts in Reinforcement Learning

1. Agent:
o The entity that makes decisions and performs actions in an environment. The
goal of the agent is to learn the best strategies to achieve the highest
cumulative reward.

2 Environment:

o The context or setting in which the agent operates. The environment responds
to the agent's actions and provides feedback in the form of rewards or
penalties.

Applications of Reinforcement Learning

 Game Playing: RL has been used to develop agents that can play games at
superhuman levels, such as Alpha Go for Go and OpenAI Five for Dota 2.
 Robotics: Learning to control robots for tasks like grasping objects, navigation, and
manipulation through interactions with the physical world.
 Autonomous Vehicles: Training self-driving cars to navigate and make driving
decisions based on sensor inputs and environmental conditions.
 Recommendation Systems: Optimizing recommendation strategies based on user
interactions and feedback to improve engagement and satisfaction.
 Finance: Algorithmic trading strategies and portfolio management that adapt to
changing market conditions.

3 Reward:

o A numerical value received by the agent after performing an action in a state.


Rewards provide feedback on the desirability of actions and guide the learning
process.

20
Scikit-Learn
o Scikit-learn is a popular and widely used library in Python for machine
learning. It provides simple and efficient tools for data analysis and modelling,
built on top of other scientific libraries like NumPy, SciPy, and Matplotlib.
Here’s an overview of Scikit-learn and its key features

Key Features of Scikit-learn

 Classification: Algorithms for predicting categorical outcomes. Examples include logistic


regression, decision trees, random forests, support vector machines (SVM), and k-nearest
neighbours (KNN).

 Regression: Algorithms for predicting continuous outcomes. Examples include linear


regression, ridge regression, and lasso regression.

 Model Selection: Tools for selecting and tuning models. Examples include grid search,
cross-validation, and various metrics for evaluating model performance.

 Preprocessing: Functions for scaling, normalizing, and transforming data. Examples include
standard scaling, normalization, and encoding categorical variables.

Ease of Use:

 Consistent API: Scikit-learn uses a consistent and simple API for all models and tools,
making it easy to switch between different algorithms and techniques. Most classes
follow a common pattern for initialization, fitting, and predicting.
 Documentation: Comprehensive and well-organized documentation, including user
guides, examples, and API references, making it accessible to both beginners and
experienced users.

Applications of Scikit-learn

 Data Preprocessing: Handling missing values, encoding categorical features, and


scaling features.

 Model Selection: Using techniques like cross-validation and hyperparameter tuning to


select the best model and parameters.

 Feature Selection and Engineering: Identifying important features and transforming data
for better model performance.

21
Anaconda Navigator

o Anaconda is a popular open-source distribution of the Python and R


programming languages, specifically designed for scientific computing, data
analysis, and machine learning. It simplifies the management and deployment
of various data science and machine learning tools by providing a
comprehensive platform that includes package management, environment
management, and pre-installed libraries. Here’s an overview of Anaconda and
its key feature

Key Features of Anaconda

Package Management:

o Conda: Anaconda includes conda, a powerful package manager that


handles the installation, updating, and removal of packages and their
dependencies. Conda works with Python and R packages and is not
limited to just those languages.
o Wide Range of Packages: Anaconda comes with over 1,500
popular data science and machine learning packages pre-installed,
including NumPy, pandas, SciPy, Scikit-learn, Tensor Flow, and
more.

 Image:

22
 Image:

Jupyter Notebooks:

 Interactive Computing: Anaconda includes Jupyter Notebook, a web-based


interactive computing environment where users can create and share documents
containing live code, equations, visualizations, and narrative text.

Installation:

 Download: Visit the Anaconda website and download the installer appropriate for your
operating system.
 Install: Follow the installation instructions provided on the website. The process
typically involves running the installer and following the prompts.

Using Anaconda Navigator:

 Launch: Open Anaconda Navigator from your applications menu or start menu.
 Manage Environments: Use the "Environments" tab to create, clone, and delete
environments.
 Install Packages: Use the "Environments" tab to install new packages into a specific
environment.
 Launch Applications: Use the "Home" tab to launch applications like Jupyter
Notebook or Spyder.

23
o Anaconda is a powerful and user-friendly tool for managing data science and
machine learning workflows, making it a popular choice among researchers,
data scientists, and developers.

Excel files
 if you’re looking to work with machine learning using data from Excel files
and want to know about handling that data for computer vision tasks (often
abbreviated as CV for Computer Vision), here's a step-by-step guide to get
you started

Understanding the Context

Computer Vision (CV) involves tasks where machines interpret and make decisions based on
visual input, such as images or videos. Common tasks include image classification, object
detection, and image segmentation.

Excel Data typically consists of tabular data and might not directly represent image data.
However, Excel files can include metadata or labels that could be relevant for machine
learning tasks involving images.

Summary

 Extract Data: Use libraries like pandas to handle Excel data.


 Prepare Data: Integrate image paths and labels from Excel, and pre-
process your images.
 Build Model: Use machine learning frameworks to create and train your
computer vision model.

24
Extracting Data from Excel

Before you use Excel data for machine learning, you need to extract and pre-process it. Python
is a popular choice for this, particularly with libraries like pandas.

 Load Data: Use pandas to read and inspect your Excel data.
 Pre-process: Handle missing values, encode categorical variables, and scale features.
 Split Data: Divide your data into training and testing sets.
 Model Building: Train and evaluate a machine learning model using libraries like Scikit-
learn.

 Save/Load Model: Persist your model using joblib.

o Classification algorithms use predictive calculations to assign data to preset


categories. Classification algorithms are trained on input data, and used to
answer questions like
o Linear regression algorithms show or predict the relationship between two
variable or factors by fitting a continuous straight line to the data. The line is
often calculated using the Squared Error Cost function. Linear regression is
one of the most popular types of regression analysis.
o Logistic regression algorithms fit a continuous S-shaped curve to the data.
Logistic regression is another popular type of regression analysis.
 Image:

25
5
INTRODUCTION INTERNSHIP & PROJECT

5.1. Internship Overview:

Industry: [Brainy Beam] Program Focus: Practical exposure to industry


practices and technical skill enhancement in innovative web and mobile apps
Development Company. Mentor: [Mr. Sagar Jasani] Objective: Gain hands-on
experience in Machine learning and application development. Activities: Collaborated
closely with faculty, engaged in various tasks, and utilized both institutional and
personal resources.

5.2. Project Overview:

Project: Air Quality Index (AQI) project using machine learning involves
developing models and systems to predict or analyze air quality based on various data
inputs. Here's a detailed breakdown of how you might approach such a project.

 Image:

26
5.3. Project Objectives:

 Develop a Predictive Model for AQI

Objective: Create a machine learning model that accurately predicts AQI values based on
historical data and real-time inputs.

 Implement a Classification System for Air Quality Levels

Objective: Develop a classification model that categorizes air quality into predefined
categories (e.g., Good, Moderate, Unhealthy, Hazardous).

 Enhance Real-Time Air Quality Monitoring

Objective: Build a real-time monitoring system that uses machine learning to process live
data and provide current AQI predictions and alerts.

 Identify and Analyze Pollution Sources

Objective: Use machine learning to identify and analyze sources of air pollution and their
impact on AQI.

 Optimize Data Handling and Preprocessing

Objective: Develop methods to handle missing, noisy, or incomplete data effectively, and
improve the quality of data used in the machine learning models.

 Develop User-Friendly Tools for AQI Visualization

Objective: Create user-friendly tools or applications that present AQI predictions and data in
an accessible and understandable format.

 Predict AQI values using historical and real-time data.


 Classify air quality into categories (e.g., Good, Unhealthy).
 Implement a real-time monitoring and alert system.
 Analyze sources of pollution and their impact on AQI.
 Enhance data handling and Preprocessing methods.

27
5.4. Project Phases:

 A machine learning project focused on predicting or analyzing the Air


Quality Index (AQI), the phases typically follow a structured workflow.
Here’s an outline of the key phases

Data Collection

 Data Sources: Identify and gather data from sources like air quality monitoring
stations, weather data, traffic data, and other environmental factors.
 Data Acquisition: Use APIs, public datasets, or build web scrapers if needed to collect
the necessary data.

Model Deployment

 Integration: Deploy the model into a production environment where it can make real-
time predictions or analyses. This might involve integrating with a web service or API.
 Monitoring: Continuously monitor the model’s performance to ensure it remains
accurate and relevant. Set up alerts for any significant deviations in performance.

Dataset .csv

28
Features:

Basic Air Quality Measurements

 Pollutant Concentrations: Levels of specific pollutants that contribute to AQI, such


as:
o Particulate Matter (PM2.5): Concentration of fine particles with a diameter of
2.5 micrometres or smaller.
o Particulate Matter (PM10): Concentration of particles with a diameter of 10
micrometres or smaller.
o Nitrogen Dioxide (NO2): Levels of NO2, a gas that can contribute to smog.
o Ozone (O3): Levels of ground-level ozone, which affects respiratory health.
o Sulfur Dioxide (SO2): Concentration of SO2, which can cause acid rain.
o Carbon Monoxide (CO): Levels of CO, which is harmful in high
concentrations.

Historical Data

 Previous AQI Values: Historical AQI data for the same location can help identify
trends and patterns.
 Historical Pollutant Levels: Past pollutant concentrations can provide context for
current levels.

API Category Health effects


range
0–50 Good Good air quality
51–100 Satisfactory Low air pollution and no ill effects on health.
101– Moderate Moderate pollution, Breathing discomfort to the people with
200 lungs, asthma and heart diseases
201– Poor Mild aggravation of symptoms among high-risk persons, like
300 those with heart or lung disease.
301– Very poor Significant aggravation of symptoms and decreased exercise
400 tolerance in persons with heart or lung disease.
401– Severe Severe aggravation of symptoms and endangers health
500

29
Testing and Deployment in a Machine Learning Project for Air Quality Index
(AQI)

o Once you've developed and trained your machine learning model for predicting or
analyzing the Air Quality Index (AQI), the next critical steps are testing and
deployment. Here’s a detailed breakdown of each phase

Testing
1. Model Evaluation

 Performance Metrics: Assess how well your model performs using metrics relevant
to your task:
o Regression Tasks: Mean Absolute Error (MAE), Root Mean Squared Error
(RMSE), R-squared.
o Classification Tasks: Accuracy, Precision, Recall, F1 Score, ROC-AUC.
 Cross-Validation: Implement cross-validation to ensure that your model’s
performance is robust and not over fitted to a specific subset of the data.
 Error Analysis: Examine the types of errors your model makes. For instance, does it
consistently overestimate or underestimate AQI? This analysis can help identify
patterns in prediction errors and guide further improvements.

Deployment
1. Model Integration

 API Development: Create an API (Application Programming Interface) if you need


the model to be accessed by other systems or applications. This allows for seamless
integration and real-time predictions.
 Integration with Systems: Integrate the model with existing systems or platforms
where AQI predictions or analyses are needed, such as environmental monitoring
dashboards, mobile apps, or web applications.

▪ Testing and deployment are critical phases that ensure your machine learning model
for AQI performs well in real-world scenarios and integrates seamlessly with other
systems. Thorough testing and careful deployment help maintain the model’s
accuracy, reliability, and utility, while ongoing monitoring and maintenance ensure
that it continues to meet the needs of users and stakeholders over time.

30
5.4 Purpose of the Internship

The primary goal of this internship is to provide hands-on experience in applying machine
learning techniques to real-world problems, specifically in analyzing and predicting the Air
Quality Index (AQI). This involves utilizing Python to develop, test, and deploy machine
learning models that can effectively predict or analyze AQI levels based on various
environmental and meteorological factors. Here’s a detailed breakdown

Skill Development

 Programming Skills: Enhance Python programming skills, particularly in data


manipulation (using libraries like Pandas and NumPy), visualization (using Matplotlib
and Seaborn), and machine learning (using Scikit-learn and other relevant libraries).
 Data Science Techniques: Learn and apply key data science techniques such as
feature engineering, model evaluation, cross-validation, and hyper parameter tuning.

Professional Networking and Collaboration

 Teamwork: Collaborate with experienced data scientists, machine learning engineers,


and domain experts, gaining exposure to professional practices and workflows.
 Mentorship: Receive guidance and feedback from mentors, helping to refine technical
skills and career objectives.

The purpose of this internship is to immerse you in the application


of machine learning for analyzing and predicting AQI using Python. By
engaging in practical projects, developing technical skills, and gaining
industry insights, you will contribute to environmental solutions while
preparing for a career in data science and machine learning

31
6
INTERNSHIP ACTIVITY AND PLANING

The work plan was designed to systematically cover the different


components of the Library of Python Language over a period of six
weeks, ensuring a comprehensive learning experience.

6.1 Internship Overview

This section provides an overview of my internship


experience at Brainy beam. The internship was a structured program
designed to offer practical exposure to industry practices and enhance
my technical skills. During this period, I was guided by Mr. Sagar
Jasani, whose mentorship was instrumental in navigating the various
challenges and learning opportunities presented.

The primary objective of the internship was to gain


hands-on experience in Application development, with a focus on
applying theoretical concepts in a real- world setting. The program
involved working closely participating in various tasks, and utilizing
the resources available at Brainy beam to achieve my learning goals.

32
6.2 Internship Report weekly

This chapter outlines the detailed planning and strategy involved in


executing the project and internship concepts 6-week wise description.

 Week-1 Dates: 27/06/2024 - 03/07/2024

Introduction Python Library


 Python is a powerful language for machine learning (ML), thanks
in large part to its rich ecosystem of libraries. Here’s an
introduction to some of the key Python libraries used in machine
learning.

1. NumPy

 Purpose: Fundamental package for numerical computing in Python.


 Features: Provides support for large, multi-dimensional arrays and
matrices, along with a collection of mathematical functions to operate on
these arrays.
 Usage: Often used for data manipulation and pre-processing, and as a
foundation for other libraries.

1. Creating Arrays
 np. array (object, dtypes=None, copy=True)
o Creates a NumPy array from a Python list or tuple.
o Example: arr = np. array ([1, 2, 3])

Array Operations
 np. reshape (a, newshape)
o Gives a new shape to an array without changing its data.
o Example: reshaped array = np. reshape (arr, (2, 3))

33
Jupyter Notebook
Jupyter Notebook is an open-source web application that allows you to create and
share documents containing live code, equations, visualizations, and narrative text. It's
especially popular in data analysis, machine learning, and academic research due to its
ability to mix code execution with rich text and visualizations.

Use Pip: -

1. Install Jupyter:

Code: -

Pip install notebook

 Image:

Basic Usage

1. Starting Jupyter Notebook:


o Run Jupyter notebook in your command line or terminal.
o This will open a new tab in your default web browser showing the Jupyter
dashboard.
2. Creating a New Notebook:
o In the Jupyter dashboard, click on the "New" button and select "Python 3" (or
another kernel if installed).

34
Advanced Features

1. Interactive Widgets:
o Use the ipywidgets library to create interactive widgets like sliders
and buttons.
2. Extensions:
o Install Jupyter extensions for additional functionality, such as
jupyter_contrib_nbextensions.
3. JupyterLab:
o JupyterLab is the next-generation interface for Jupyter, offering a
more flexible and powerful environment.

Jupyter Notebook is a powerful tool for data analysis, machine learning, and education,
providing an interactive and versatile environment for coding and documentation.

 Image:

35
 Week-2 Dates: 04/07/2024 - 10/07/2024

Pandas Library
Pandas is a powerful and versatile library for data manipulation and analysis in
Python. It provides data structures and functions needed to work with structured data
seamlessly. Here’s a detailed overview of Pandas, including its key features, core data
structures, and common functions.

Key Features

1. Data Structures:
o DataFrame: A 2D labeled data structure with columns of potentially different
types.
o Series: A 1D labeled array capable of holding any data type.
2. Data Manipulation:
o Data Cleaning: Handling missing data, duplicate removal, and data
transformation.
o Data Aggregation: Grouping, pivot tables, and summary statistics.
o Data Alignment: Automatic and explicit alignment of data.

 Image:

36
 Data Input and Output:

 Reading/Writing Data: Support for various file formats like CSV, Excel, SQL, and
JSON.

 Data Analysis:

 Statistical Analysis: Mean, median, variance, and more.


 Data Visualization: Integration with Matplotlib for plotting

 Image:

DataFrame
A DataFrame is a core data structure in the Pandas library, designed for handling
and analyzing structured data in Python. It’s akin to a spreadsheet or SQL table and
allows you to store and manipulate tabular data efficiently. Here’s an in-depth look at
DataFrame, including their creation, basic operations, and common methods.

DataFrame are a central part of data manipulation and analysis in Pandas. They
provide a flexible, intuitive interface for working with structured data, allowing you to
perform a wide range of operations, from data cleaning and transformation to analysis
and visualization. Understanding how to use DataFrame effectively is crucial for data
analysis tasks in Python.

37
 Week-3 Dates: 11/07/2024 - 17/07/2024

Normalization

Normalization in machine learning is a crucial Preprocessing step that can


significantly affect the performance of your models. It involves scaling numerical
features so that they contribute equally to the learning process. Here's a comprehensive
overview of normalization for your report.

o Normalization is a technique used to adjust the scales of numerical features in a


dataset to a common scale without distorting differences in the ranges of values.
This is important because many machine learning algorithms, especially those that
rely on distance measures (e.g., k-nearest neighbours, support vector machines) or
gradient-based optimization (e.g., neural networks), perform better when features
are on a similar scale.

 Image:

38
 Week-4 Dates: 18/07/2024 - 24/07/2024

Linear regression math


Linear Regression is one of the most fundamental and widely used algorithms in
machine learning and statistics. It is used for modelling the relationship between a dependent
variable and one or more independent variables. Linear regression aims to find the best-fitting
line through the data points, minimizing the difference between the observed values and the
values predicted by the model.

 Image:

Key Concepts

Simple Linear Regression: This involves predicting a dependent variable Y using a


single independent variable X. The relationship is modelled with a straight line

39
 Week-5 Dates: 25/07/2024 - 31/07/2024

Matplotlib Pyplot
Matplotlib is a comprehensive library for creating static, interactive, and animated
visualizations in Python. The pyplot module, a part of Matplotlib, provides a MATLAB-
like interface for creating a wide range of plots and figures. It is designed to work seamlessly
with NumPy arrays and Pandas DataFrame, making it a powerful tool for data analysis and
visualization.

Basic Concepts

1. Figure and Axes:


o A Figure is the entire window or page where the plot will be drawn. It
contains one or more Axes objects.
o An Axes is a single plot within a figure. It contains the data, labels, and other
plot elements.
2. Plot Elements:
o Line: Represents a continuous curve, useful for plotting trends.
o Scatter: Represents individual data points.
o Bar: Represents data in bar chart form, suitable for categorical data.
o Histogram: Represents the distribution of data.

 Image:

40
 Week-6 Dates: 01/08/2024 - 07/08/2024

Project Submission

Deployment

 Create a Web Application: Use frameworks like Flask or Django to build a web app
where users can input data and receive AQI predictions or classifications.
 API Integration: Develop an API that provides real-time AQI information and
predictions.

41
7

PROJECT OVERVIEW

7.1 Project Objective

When working on a project related to the Air Quality Index (AQI) using machine learning,
your objective would generally be to leverage machine learning techniques to analyze and
predict air quality. Here’s a detailed breakdown of a typical project objective

Library import

Code: -
#import all the necessary
import NumPy as np
import pandas as pd
import Seaborn as Sns
import Matplotlib. pyplot as plot
# import folium
# import json
from sklearn. model selection import train_test_split
from sklearn. linear model imports Linear Regression
# from sklearn. Metrics import mean_squared_log_error
from sklearn. Metrics import mean_squared_error
from sklearn. Metrics import r2_score, mean_squared_error
# from sklearn. model selection imports KFold
from sklearn. model selection imports cross_val_score
from collections import defaultdict
pd. options. Mode. chained assignment = None # default='warn'
%Matplotlib inline
import warnings
warnings. filter warnings("ignore")

42
Import Data

Load the CSV File: -

Use the pd. read_csv () function to load the CSV file into a DataFrame. Replace
"your_file.csv" with the path to your CSV file

data = pd. read_csv ("dataset.csv", low memory=False)

43
NumPy Library

NumPy is a fundamental library for numerical computing in Python, widely used in data
science, machine learning, and scientific computing. It provides support for large, multi-
dimensional arrays and matrices, along with a collection of mathematical functions to
operate on these arrays.

Pandas Library

Pandas is a powerful and versatile library in Python for data manipulation and analysis.
It’s widely used in data science and machine learning projects due to its ease of use and
efficiency in handling large datasets. Here's an overview of how you can use Pandas in the
context of an air quality index project

Matplotlib Library

Matplotlib is a widely-used plotting library for Python that provides a variety of ways
to create static, animated, and interactive visualizations. It is especially useful for
visualizing data, exploring trends, and presenting results. Here’s a guide on how to use
Matplotlib effectively, particularly in the context of an air quality index project.

Types
a = list(data['type'])
for I in range (0, Len(data)):
if str(a[I][0]) == 'R' and a[I][1] == 'e':
a[I] = 'Residential'
elif str(a[I][0]) == 'I':
a[I] = 'Industrial'
else:
a[i] = 'Other'
#the above code takes all the different types and changes them into 3 types (RESIDENTIAL,
INDUSTRIAL, OTHER)
data['type'] = a data['type']
value counts ()

44
g=Sns. Catplot (y="type", kind = "count", palette = "pastel", data = data, orient="h")

g.set_axis_labels ("Count”, “Type of Area")

plot. show ()

Pie chart

# Pie chart
labels = ['Residential','Industrial','Other']
#colours
colors = ['#ff9999','#66b3ff','#99ff99']
explode = (0.02, 0.02, 0.1)
fig1, ax1 = plt. subplots ()
ax1.pie (df.per, colors = colors, labels=labels, explode=explode, autopct='%1.1f%%’,
shadow=True, startangle=90)
ax1.axis('equal')
plt. tight layout ()
plt. show ()

45
Catplot SO2

so2 = data [['so2', 'state']]. group by(['state']). median (). sort values ("so2", ascending = False)

ax = Sns. Catplot (x="so2", y=so2.head(10). index. to list (), data=so2.head(10), kind="bar”,


palette="flare”, height=12)
plt. xticks(rotation=90)
plt. show ()

no2
No2= data [['no2', 'state']]. group by(['state']). median (). sort values ("no2", ascending =
False)
ax = Sns. Catplot (x="no2", y=no2.head(10). index. tolist (), data=no2.head(10),
palette="flare", kind="bar”, height=12)
plt. xticks(rotation=90)
plt. show ()

46
PM10 = data [['RSPMi', 'state']]. group by(['state']). median (). sort values ("RSPMi",
ascending = False)
ax = Sns. Catplot (x="RSPMi", y=PM10.head(10). index. tolist (), data=PM10.head(10),
palette="flare", kind="bar", height=12)
plt. xticks(rotation=90)
plt. show ()

spm=data [['spm', 'state']]. groupby(['state']). median (). sort values ("spm", ascending =
False)
ax = Sns. Catplot (x="spm", y=spm. Head (10). index. tolist (), data=spm. Head (10),
kind="bar”, palette="flare”, height=12)
plt. xticks(rotation=90)
plt. show ()

47
pm2_5=data [['pm2_5', 'state']]. groupby(['state']). median (). sort values ("pm2_5",
ascending = False). head (10)

ax = Sns. Catplot (x="pm2_5", y=pm2_5. head (10). index. tolist (), data=pm2_5. head (10),
kind="bar”, palette="flare”, height=12)

plt. xticks(rotation=90)

plt. show ()

48
Pair Plot SO2,NO2,RSPM,SPM,Pm_2.5

cols = ['so2', 'no2', 'rspm', 'spm', 'pm2_5']

Sns. pair plot(data[cols], height = 2.5)

plt. show ()

49
Heatmap Pivot

# Heatmap Pivot with State as Row, Year as Col, No2 as Value

f, ax = plt. subplots (fig size= (10,10))

ax.set_title ('{} by state and year’. Format('no2'))

Sns. Heatmap (data. pivot table ('no2', index='state',

columns=['year'], aggfunc='median’, margins=True),

annot = True, cmap = "YlGnBu", linewidths = 1, ax = ax, cbar_kws = {'label': 'Annual


Average'})

plt. show ()

50
CALCULATING AQI

def cal_SOi(so2):

si=0

if (so2<=40):

si= so2*(50/40)

elif (so2>40 and so2<=80):

si= 50+(so2-40) *(50/40)

elif (so2>80 and so2<=380):

si= 100+(so2-80) *(100/300)

elif (so2>380 and so2<=800):

si= 200+(so2-380) *(100/420)

elif (so2>800 and so2<=1600):

si= 300+(so2-800) *(100/800)

elif (so2>1600):

si= 400+(so2-1600) *(100/800)

return si

data['SOi’] =data['so2']. apply(cal_SOi)

51
Subplot SO2,NO2,RSPM,SPM,Pm_2.5

f, ax = plt. subplots (fig size = (10, 5))

ax.set_title ('{} by state and year’. Format('AQI'))

Sns. Heatmap (data. pivot table ('AQI', index ='state',

columns = ['year'], aggfunc = 'median', margins = True)

, cmap = "YlGnBu", linewidths = 0.5, ax = ax, cbar_kws = {'label': 'Annual Average'})

plt. show ()

52
Heat Map

corrmat = data[['SOi','Noi','RSPMi','SPMi','PMi','AQI']]. corr ()

f, ax = plt. subplots (fig size = (15, 10))

Sns. Heatmap (corrmat, vmax = 1, square = True, annot = True)

plt. show ()

53
Linear Regression

X_train, X_test, y_train, y_test = train_test_split (X, Y, test size=0.2, random state=101)

LR = Linear Regression ()

LR.fit (X_train, y_train)

LR. intercept_

5.3231677877565176

LR. predict(X_test)

array ([191.22302644, 167.4804519, 375.16302728, ..., 122.82011369,


153.95691479, 135.67699022])

predictions = LR. predict(X_test)


plt. Scatter (y_test, predictions)
plt. xlabel ('X Test')
plt. ylabel ('Predicted Y')

54
LR. score (X_test, y_test)

0.9794279918263086

print ('R^2_Square: %.2f '% r2_score (y_test, predictions))


print ('MSE: %.2f '% np. sort (mean_squared_error (y_test, predictions)))

R^2_Square:0.98
MSE:12.49

55
8

CONCLUTION
The internship program has been an invaluable experience, providing me with a
comprehensive understanding of Python Library and Machine learning. Over the course of
the internship, I have deepened my knowledge of Python Library, including core concepts
and advanced features, and applied these skills in developing a functional Air Quality Index
Machine learning.

the application of machine learning to the prediction and classification of air quality
indices has demonstrated significant potential for improving our understanding and
management of air quality. The model's performance metrics indicate that it effectively
predicts AQI values with reasonable accuracy, providing valuable insights into the key
factors influencing air quality. The analysis of feature importance reveals that meteorological
conditions and pollution sources are critical drivers of air quality variations.

The project highlights the potential of machine learning to transform environmental


monitoring by providing timely and accurate predictions that can aid in public health
advisories and policy-making. Despite promising results, there is an opportunity for further
improvement. Future work could involve refining the models with additional data sources,
exploring advanced algorithms, and ensuring the model’s robustness in diverse geographic
contexts.

I would like to express my sincere gratitude to the team at [Brainy Beam], especially
[Sagar Jasani] and for their guidance and support throughout the internship. Their mentorship
has been invaluable in my learning journey, providing insights into the industry and fostering
my growth as a programmer.

This internship has not only improved my technical skills but also enhanced my
problem-solving abilities and communication skills. I am thankful for this opportunity and
look forward to applying what I have learned in future endeavours.

56
THE – END

...

57

You might also like