0% found this document useful (0 votes)
42 views65 pages

Sub Bu Intership

This document is an internship report by Subramanya M M, submitted for the Bachelor of Computer Applications program at Bangalore University, detailing his experience as a Data Analyst intern at Disciples Corporate School. The report covers the internship's focus on AI-driven skill development, data analysis processes, and the importance of future skills in IT and ITeS sectors. It includes acknowledgments, an abstract, a company profile, and a table of contents outlining various technical topics explored during the internship.

Uploaded by

Pk gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views65 pages

Sub Bu Intership

This document is an internship report by Subramanya M M, submitted for the Bachelor of Computer Applications program at Bangalore University, detailing his experience as a Data Analyst intern at Disciples Corporate School. The report covers the internship's focus on AI-driven skill development, data analysis processes, and the importance of future skills in IT and ITeS sectors. It includes acknowledgments, an abstract, a company profile, and a table of contents outlining various technical topics explored during the internship.

Uploaded by

Pk gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

BANGALORE UNIVERSITY (JNANABHARATHI)

A
Internship Report On
DATA ANALYST
Submitted in partial fulfillment for the award of
VI semester of
BACHELOR OF COMPUTER APPLICATIONS

Carried out by
SUBRAMANYA M M
Under the guidance of
Mr. PANDURANGAPPA H

Asst. Professor, Dept of BCA


DBIMSCA
DON BOSCO INSTITUTE OF MANAGEMENT STUDIES AND
COMPUTER APPLICATIONS
Kumbalagodu, Mysore Road, Bengaluru-560074
2024-25

BANGALORE UNIVERSITY

DON BOSCO INSTITUTE OF MANAGEMENT STUDIES AND


COMPUTER APPLICATIONS

CERTIFICATE OF COMPLETION OF INTERNSHIP - 2025

This is to certify that Subramanya M M bearing U03CQ22S0023 has successfully completed


the INTERNSHIP under the supervision of Shiva Prasad K M from DISCIPLES CORPORATE
SCHOOL for the partial fulfillment of BCA VI semester course at DONBOSCO INSTITUTE OF

MANAGEMENT STUDIES AND COMPUTER APPLICATIONS, during the period of February


2025 to April 2025.

Head of the Department Principal

Prof. Pandurangappa H Dr. Tabreez Pasha

Signature of the Examiners:

1)

2)
DON BOSCO INSTITUTE OF MANAGEMENT STUDIES AND
COMPUTER APPLICATIONS
Kumbalagodu, Mysore Road, Bengaluru-560074

Student Declaration

I,SUBRAMANYA M M, hereby declare that this report entitled a study on “DATA


ANALYST” study conducted by me during internship duration from 13.O2.2025 to
31.03.2025 at Disciples Corporate School as partial fulfillment of the BCA VI
semester 2024-25.

Signature

SUBRAMANYA M M
ACKNOWLEDGMENT

The success and final outcomes of this internship require a lot of guidance and
assistance from many people. I am extremely fortunate to have their support till the
completion of my internship work.
I would express my heartfelt gratitude to all those who have contributed to the
completion of the internship.

I hereby acknowledge, with regards and respect, the encouragement and supervision by
Shiva Prasad K M, Managing Director, Web Beta Services, Bengaluru.

I would express my thanks to Dr. Tabreez Pasha, Principal, DBIMSCA, Bangalore


University, for his support and encouragement.

I would express my thanks Professor Pandurangappa H, HOD, Dept. of Computer


Applications, DBIMSCA, Bangalore University, especially for providing the platform
to develop skills in presentation of Synopsis and Project.

SUBRAMANYA M M
(U03CQ22S0023)
ABSTRACT

The rapid advancement of Artificial Intelligence (AI) is reshaping industries and


transforming the landscape of skill development and workforce training. As job roles
evolve and new opportunities emerge, there’s an increasing demand for AI-driven
solutions that can customize learning experiences, identify skill gaps, and deliver
actionable insights. This internship, focused on “AI for skilling,” explored the role of
an AI Data Scientist in developing flexible, scalable, and efficient skill-building
frameworks using AI technologies.

Throughout the internship, I engaged in various stages of the AI development process—


ranging from data collection and preprocessing to exploratory analysis, model training,
evaluation, and deployment. The project also involved integrating AI into educational
platforms by building models using real-world datasets to classify, predict, and suggest
personalized learning journeys. Additionally, I applied advanced machine learning
methods and created data visualizations to communicate insights, supporting informed
decision-making in curriculum design and learner engagement strategies.

This experience provided hands-on training with essential tools like Python, scikitlearn,
Pandas, TensorFlow, and cloud-based environments. It also emphasized the critical role
of ethical AI usage and data protection, especially in educational settings. Overall, the
internship enhanced my technical skills and deepened my understanding of how AI,
particularly through the lens of a Data Scientist, plays a key role in shaping modern,
adaptable learning ecosystems for a future-ready workforce.
COMPANY PROFILE

Disciples Corporate School


Disciples Corporate School is a professional training institution committed to providing high-quality
education and skill development in the fields of information technology, data analytics, and business
intelligence. The school aims to bridge the gap between academic knowledge and the practical demands
of the corporate world by offering industry-relevant programs designed to enhance both technical and
professional capabilities.

The institution focuses on a hands-on, application-oriented approach to learning. Its programs are
carefully structured to combine theoretical concepts with extensive lab sessions, workshops, and live
project experiences. Students are trained using the latest tools and technologies such as Python, SQL,
MS Excel, Power BI, and various data analysis frameworks, ensuring that they are industry-ready upon
completion of their courses.

In addition to technical training, Disciples Corporate School places strong emphasis on developing
essential soft skills such as teamwork, work management, professional communication, and ethical data
handling. The school regularly conducts seminars, problem-solving sessions, revision tests, and project
discussions to encourage active learning and improve students' confidence in applying their skills to
realworld business scenarios.

With a mission to nurture competent and responsible professionals, the institution continually updates its
curriculum in line with current market trends and industry expectations. Through its comprehensive
learning environment and experienced faculty, Disciples Corporate School strives to prepare students
for successful and impactful careers in the IT and data analytics industries.
TABLE OF CONTENTS

CHAPTER CONTENT PAGE.NO

1. Introduction to IT and ITeS 01-04

2. Introduction to AI and Big Data. 05-07

3. Applications of AI and Big Data 08-09

4. Basics of statistics 10-20

5. Data analysis using statistics and statistical 21-22


tools
6. MS Excel Lab Sessions 23-24

7. Introduction to Power BI 25-27

8. Power BI lab and introduction to SQL 28-30

9. Python basics 31-44

10. Conclusion and Bibliography 45-46


DON BOSCO INSTITUTE OF MANAGEMENT STUDIES AND
COMPUTER APPLICATIONS
Kumbalagodu, Mysore Road, Bengaluru-560074

Department of Computer Applications


BCA Internship Log Book

Sl.No. Technical Topics

1 Introduction to IT and ITeS

2 Introduction to IT and ITeS and future skills

3 Future skills

4 Introduction to AI

5 Introduction to Big Data

6 Applications of AI and Big Data

7 Basics of statistics

8 statistical equation

9 Solving problems in statistics

10 Project Discussion

11 Problem-solving session

12 Revision & Test

13 Problem-solving session

14 Applying statistics to data analysis

15 Applying statistics to data analysis


16 Data analysis using statistics and statistical tools

17 MS Excel Lab Sessions

18 Lab mock assessment

19 Introduction to Power BI

20 Power BI Lab Sessions

21 Revision & Test

22 Power BI lab and introduction to SQL

23 SQL Theory session

24 SQL Lab session

25 Python basics

26 Python Project Discussion

27 Python basics

28 Python Lab session

29 Data visualization with Python

30 DV Lab session

31 Revision & Test

32 Numpy and Pandas theory

33 NP Lab session

34 Project work

35 Python conclusion and basics of work management

36 work management

37 Working with colleagues


38 Project Discussion

39 Data handling and privacy

40 Data handling and privacy

41 upgrading skills

42 upgrading skills

43 Lab session: Building and maintaining professional relations

44 Revision & Test

Internship
Co-Ordinator HOD PRINCIPAL
DATA ANALYST 2024-25

CHAPTER 01
Introduction to IT and ITES

1.1 Introduction
In today’s fast-paced, technology-driven world, Information Technology (IT) and
Information Technology Enabled Services (ITeS) have become an essential part of
modern life. From the way we communicate, work, learn, and entertain ourselves, to
how businesses function and governments serve their citizens — technology influences
almost every aspect of our daily activities.

The rapid advancement in computing devices, software applications, and


telecommunication systems has led to the development of a global information society
where knowledge and data are valuable resources. IT and ITeS have played a pivotal
role in creating this environment by providing the tools, platforms, and services that
drive innovation and efficiency across industries.

This chapter introduces the concepts of IT and ITeS, their features, components,
importance, and their growing role in the modern digital economy.

1.2 Information Technology (IT)


Information Technology (IT) refers to the use of computer systems, software
applications, and telecommunications to store, retrieve, transmit, and manipulate data
or information. IT has transformed the way people and businesses operate by providing
efficient, secure, and scalable solutions for managing information. It plays a crucial role
in almost every sector, including education, healthcare, banking, business,
manufacturing, and government services.

Key areas in IT include:

● Software Development: The process of designing, coding, testing, and


maintaining applications and systems software.
● Networking: Connecting multiple computing devices for sharing information and
resources.
● Database Management: Organizing and managing large sets of structured
information through database management systems (DBMS).

DEPT OF BCA DBIMSCA 1


DATA ANALYST 2024-25

● Cloud Computing: Delivering on-demand computing services such as storage,


processing power, and applications over the internet.
● Cybersecurity: Protecting systems, networks, and programs from digital attacks
and unauthorized access.
● Web Development: Creating and maintaining websites and web applications for
various business and personal uses.

Information Technology helps organizations improve productivity, streamline operations,


and create new business opportunities by enabling real-time decision-making and
datadriven strategies.

1.3 Information Technology Enabled Services (ITeS)


Information Technology Enabled Services (ITeS), also known as Business Process
Outsourcing (BPO), refers to services that use IT infrastructure to deliver services to
clients worldwide. ITeS involves the outsourcing of business processes and services
through the internet, enabling companies to cut costs, increase efficiency, and focus on
core business functions.

Popular services under ITeS include:

● Customer Support: Call centers and helpdesk services providing technical and
customer assistance.
● Data Entry and Data Processing: Handling and managing large amounts of
information for clients.
● Medical Transcription: Converting voice-recorded reports dictated by
healthcare professionals into text format.
● Telemarketing: Using telecommunications to market products and services to
customers.
● Finance and Accounting Services: Outsourcing financial operations like
payroll, tax preparation, and bookkeeping.
● Human Resource Services: Managing employee records, payroll, recruitment,
and other HR-related services.

DEPT OF BCA DBIMSCA 2


DATA ANALYST 2024-25

1.4 Importance of IT and ITeS in Today’s World


IT and ITeS industries have become the backbone of the modern digital economy. They
offer solutions to automate processes, enhance communication, store vast amounts of
data securely, and analyze information for better decision-making.

Benefits of IT and ITeS include:

● Cost Efficiency: Reducing operational expenses by outsourcing non-core


activities.
● Access to Global Talent: Connecting businesses with skilled professionals
worldwide.
● Business Scalability: Quickly scaling operations based on business needs.
● Enhanced Customer Service: Offering 24/7 support through call centers and
online services.
● Innovation and Competitive Advantage: Using technology to create new
products, services, and business models.

1.5 Introduction to Future Skills


As the world becomes increasingly digital, the demand for new technical and
professional skills continues to grow. The Information Technology (IT) and Information
Technology Enabled Services (ITeS) industries are rapidly evolving due to
advancements in emerging technologies such as Artificial Intelligence (AI), Big Data,
Cloud Computing, and Cybersecurity.

Future skills are the competencies and abilities required to adapt, work, and succeed in
the digital workplace of the future. These skills ensure that professionals remain
relevant and competitive in the job market while meeting the demands of a
technologydriven economy.

1.6 Importance of Future Skills in IT and ITeS


The IT and ITeS sectors are constantly changing, and businesses need employees who
can quickly learn new technologies, think critically, and manage complex systems.

DEPT OF BCA DBIMSCA 3


DATA ANALYST 2024-25

Future skills help individuals to: ● Stay updated with new tools, programming
languages, and software.
● Contribute to innovative projects and problem-solving activities.
● Enhance productivity and operational efficiency.
● Maintain data security and system integrity.
● Meet the challenges of automation and digital transformation.

1.7 Key Future Skills for IT and ITeS Professionals


Some of the most in-demand future skills in the IT and ITeS industries include:

● Artificial Intelligence (AI) and Machine Learning (ML): Understanding how to


build smart systems that can learn from data and make decisions.
● Data Analytics and Big Data Management: Analyzing large datasets to gain
insights, predict outcomes, and support business decisions.
● Cloud Computing: Managing cloud-based infrastructure and applications such
as AWS, Azure, and Google Cloud.
● Cybersecurity: Protecting systems and data from cyber threats by identifying
risks and implementing security measures.
● Blockchain Technology: Understanding decentralized systems for secure
transactions and data management.
● Internet of Things (IoT): Working with interconnected devices and sensors to
collect and exchange data.
● Robotic Process Automation (RPA): Automating repetitive business processes
using software robots.
● Web Development and App Development: Creating dynamic websites and
mobile applications for different platforms.
● Project Management and Agile Methodologies: Managing IT projects using
modern, flexible work approaches.
● Soft Skills: Communication, leadership, teamwork, time management, and
adaptability to support technical expertise.

DEPT OF BCA DBIMSCA 4


DATA ANALYST 2024-25

CHAPTER 02
Introduction to AI and Big Data

2.1 Introduction to Artificial Intelligence (AI)


Artificial Intelligence (AI) is a transformative and rapidly evolving field of computer
science dedicated to creating machines and systems that can simulate human
intelligence. Unlike traditional computing, where systems follow explicit instructions, AI
empowers machines to learn from experience, adapt to new inputs, and perform tasks
that typically require human cognitive abilities. These tasks range from simple ones like
pattern recognition and basic problem-solving to more complex tasks involving
reasoning, language understanding, and decision-making.

At its core, AI seeks to replicate or even surpass human intellectual capabilities through
algorithms, which allow machines to perform actions like reasoning, problem-solving,
understanding natural language, recognizing images, interpreting speech, and making
decisions based on data. In simple terms, AI enables machines to do things that would
otherwise require human thought and intervention.

Types of AI
1. Narrow AI (Weak AI):

○ This type of AI is designed and trained for a specific task. It performs a


single task with high efficiency but cannot generalize beyond that.
Examples include voice assistants like Siri, recommendation systems,
and image recognition software.

2. General AI (Strong AI):

○ General AI refers to a machine that has the ability to understand, learn,


and apply intelligence across a wide variety of tasks at the level of a
human being. This type of AI is still theoretical and doesn't exist yet.

3. Superintelligent AI:

DEPT OF BCA DBIMSCA 5


DATA ANALYST 2024-25

○ This is a theoretical future stage of AI where machines surpass human


intelligence in all aspects, including creativity, problem-solving, and
decision-making.
2.2 Introduction to Big Data
In the modern digital world, data is being generated at an unprecedented rate from a
wide range of sources such as social media platforms, mobile applications, sensors,
financial transactions, healthcare systems, and online services. The term Big Data
refers to extremely large and complex data sets that are difficult to capture, store,
manage, and analyze using traditional database management tools and techniques.

Big Data is not solely defined by its size, but also by the challenges it presents in terms
of its speed, variety, and complexity. It involves enormous volumes of structured,
semistructured, and unstructured data that continuously flows in from various sources
in real time or near-real-time. To extract meaningful insights and make informed
decisions, organizations need advanced technologies and analytical methods capable
of handling such massive and diverse data sets.

Big Data has become an essential asset for businesses, governments, healthcare
providers, and research institutions. It plays a critical role in improving operational
efficiency, enhancing customer experiences, optimizing business processes, and
supporting evidence-based decision-making. Through proper analysis of Big Data,
organizations can uncover hidden patterns, predict future trends, and gain a
competitive advantage in the market.

For example, social media companies use Big Data analytics to understand user
preferences and behaviors, while healthcare providers analyze patient data to improve
diagnostics and treatment outcomes. Similarly, e-commerce platforms utilize Big Data
to personalize recommendations and detect fraudulent activities.

In essence, Big Data is transforming industries and driving innovation by enabling


organizations to harness the full potential of the vast and continuously growing data
available in the digital age.

2.2.1 Importance of Big Data


Big Data has become a vital asset for businesses, governments, healthcare providers,
and researchers alike. It enables organizations to:

DEPT OF BCA DBIMSCA 6


DATA ANALYST 2024-25

● Understand customer behavior and preferences.


● Monitor and predict market trends.
● Detect fraud and enhance security.
● Improve operational efficiency.
● Innovate products and services based on real-time insights.
● Support decision-making with data-driven evidence.

2.2.2 7 Vs of Big Data


1.Volume refers to the enormous amount of data generated from various sources every
second. This includes data from social media platforms, sensors, mobile devices,
financial transactions, videos, images, and many other digital services. Traditional data
processing systems cannot handle such massive quantities of data, which range from
terabytes to petabytes and even exabytes.

2.Velocity is the speed at which data is produced, transferred, and processed. In the
digital age, data flows in continuously and must be handled in real-time or near-realtime
to support instant decision-making and services.
3.Variety refers to the different forms and types of data collected from multiple sources.
Data can be structured (organized databases), semi-structured (XML, JSON files), or
unstructured (text, images, videos, audio). Managing this diverse range of data types is
a major challenge for Big Data systems..
4.Veracity indicates the accuracy, reliability, and trustworthiness of data. Since Big
Data comes from numerous sources, it may contain errors, inconsistencies, or
misleading information. Ensuring data quality and validity is essential for making
reliable business decisions.

5.Value is one of the most important characteristics of Big Data. Not all collected data is
useful; the true goal is to extract valuable and actionable insights from it. Big Data
analytics aims to discover meaningful patterns and information that can benefit
businesses, governments, and individuals.

6.Variability refers to the inconsistency and changing nature of data. Data can fluctuate
dramatically in terms of volume, type, and meaning over time. Systems need to
manage these variations effectively to deliver accurate results.
7.Visualization involves presenting Big Data insights in a clear, understandable, and
interactive way. Since Big Data is complex and massive, effective visualization through

DEPT OF BCA DBIMSCA 7


DATA ANALYST 2024-25

dashboards, charts, graphs, and reports is essential for decision-makers to interpret


patterns and trends quickly.

CHAPTER 03
Applications of AI and Big Data

3.1 Applications of Artificial Intelligence (AI)


Artificial Intelligence (AI) is transforming industries by enabling machines to perform
tasks that traditionally require human intelligence. AI applications have become an
essential part of daily life and business operations, offering efficiency, precision, and
automation. The following are some of the major areas where AI is being applied:

3.1.1 Healthcare

● AI assists in diagnosing diseases through medical imaging and data analysis. ●


Virtual health assistants and chatbots help manage patient care.
● AI-powered systems predict disease outbreaks and assist in drug discovery.

3.1.2 Finance

● AI is used in fraud detection, risk management, and automated trading. ●


Chatbots provide customer service in banking and financial services.
● AI analyzes large financial data sets for investment and credit decisions.

3.1.3 E-Commerce

● Recommendation systems suggest products based on customer behavior. ● AI


chatbots manage customer inquiries and support services.
● AI optimizes pricing, inventory control, and demand forecasting.

3.1.4 Transportation ● Autonomous vehicles use AI for navigation and

traffic management.

● AI improves route optimization in ride-hailing services like Uber and Ola.


● Predictive maintenance systems monitor vehicle health and performance.

DEPT OF BCA DBIMSCA 8


DATA ANALYST 2024-25

3.1.5 Education

● AI-based e-learning platforms provide personalized learning experiences.


● Intelligent tutoring systems adapt to individual students’ needs.
● AI automates administrative tasks like grading and scheduling.

3.2 Applications of Big Data


Big Data refers to the enormous volume of data generated from various sources at high
speed and in diverse formats. By analyzing this data, organizations can derive valuable
insights to improve operations and decision-making. Below are the key application
areas of Big Data:

3.2.1 Business and Marketing


● Analyzing customer behavior and preferences for targeted marketing. ●
Monitoring sales performance and market trends in real-time.
● Developing personalized marketing campaigns based on consumer data.

3.2.2 Healthcare
● Analyzing patient records and medical data for early disease detection. ●
Predictive analytics for managing hospital resources and epidemics.
● Supporting medical research and clinical trials with large data sets.

3.2.3 Government and Public Services


● Big Data aids in public safety, traffic management, and smart city planning.
● Fraud detection systems in tax, benefits, and public services rely on Big Data.

3.2.4 Social Media and Entertainment


● Social media platforms analyze user-generated content for trends and opinions.
● Personalized recommendations on entertainment platforms like Netflix and
YouTube.
● Real-time tracking of viral content, news, and public sentiment.

DEPT OF BCA DBIMSCA 9


DATA ANALYST 2024-25

3.2.5 Agriculture
● Big Data applications in precision agriculture monitor crop health and soil.
● Weather prediction models support farmers in making informed decisions.
● Analytics improve resource management and boost crop yield predictions.
CHAPTER 04
Basics of statistics

Statistics is a branch of mathematics that deals with the collection, analysis,


interpretation, and presentation of data. It is a powerful tool that can be used to gain
insights into the world around us, from understanding patterns in consumer behavior to
predicting the outcome of elections.

4.1 Key Concepts in Statistics

4.1.1 Data

Data refers to facts, figures, or information collected for analysis. It can be quantitative
(numerical) or qualitative (descriptive). In statistics, data serves as the foundation for
analysis and inference.

4.1.2 Population and Sample

● Population refers to the entire group of individuals or items that are of interest in
a study.

● Sample is a subset of the population that is selected for analysis. The sample
should be representative of the population to make valid inferences.

4.1.3 Variables

A variable is a characteristic or attribute that can be measured or observed. Variables


are typically classified as:

● Qualitative (Categorical) Variables: These represent categories or labels (e.g.,


gender, color, type of product).

DEPT OF BCA DBIMSCA 10


DATA ANALYST 2024-25

● Quantitative (Numerical) Variables: These represent measurable quantities


and are often further divided into discrete (whole numbers) or continuous (any
value within a range) variables.

4.2 Types of Statistics


Statistics is broadly classified into two categories: Descriptive Statistics and
Inferential Statistics.

4.2.1 Descriptive Statistics

Descriptive statistics involves the summarization and presentation of data in a


manageable and understandable form. It includes methods for organizing, displaying,
and describing data through numbers, graphs, and tables.

● Measures of Central Tendency:


○ Mean: The average of all data points.
○ Median: The middle value in a dataset when arranged in ascending or
descending order.
○ Mode: The most frequently occurring value in a dataset.
● Measures of Dispersion:
○ Range: The difference between the maximum and minimum values.
○ Variance: A measure of how spread out the values in a dataset are.
○ Standard Deviation: The square root of the variance, representing the
average distance between each data point and the mean.
● Data Visualization:
○ Bar Graphs, Pie Charts, Histograms: Visual tools for presenting data in
a more intuitive manner.
○ Box Plots: Used to display the distribution of data and identify outliers.

4.2.2 Inferential Statistics

Inferential statistics involves making predictions or generalizations about a population


based on sample data. It allows us to draw conclusions about the population's
characteristics using probability theory and statistical tests.

● Hypothesis Testing: A method to test an assumption about a population


parameter.
● Confidence Intervals: A range of values used to estimate the true population
parameter, with a certain level of confidence.

DEPT OF BCA DBIMSCA 11


DATA ANALYST 2024-25

● Regression and Correlation: Used to understand relationships between


variables and make predictions. Regression analyzes the relationship between
dependent and independent variables, while correlation measures the strength
and direction of a relationship between two variables.

4.3 Importance of Statistics


Statistics plays a vital role in decision-making, research, and problem-solving across
various fields:

● Business: Helps businesses in decision-making, market research, quality


control, and performance analysis.
● Healthcare: Used to analyze clinical data, evaluate treatments, and predict
health trends.
● Social Sciences: Essential in surveys, experimental studies, and demographic
analysis.
● Government: Plays a key role in census data collection, policy making, and
economic analysis.

4.4 Statistical Equations


Statistics involves various mathematical formulas used to calculate measures of central
tendency, dispersion, correlation, regression, and more. Here are some of the essential
statistical equations.

4.4.1 Measures of Central Tendency ●

Mean (Average)

The mean is the sum of all data points divided by the number of data points.

DEPT OF BCA DBIMSCA 12


DATA ANALYST 2024-25

Example:
Given data: 4, 6, 8, 10, 12

Mean=(4+6+8+10+12)/5 = 40/5 = 8
● Median
The median is the middle value of a data set when arranged in ascending or
descending order.

○ If the number of data points is odd, the median is the middle number.
○ If the number of data points is even, the median is the average of the two
middle numbers.

Example:
Given data: 4, 6, 8, 10, 12
Median = 8 (middle value)

4.4.2 Measures of Dispersion ●

Range

The range is the difference between the maximum and minimum values in a
dataset.

Range=Maximum−Minimum Example:

Given data: 4, 6, 8, 10, 12

Range=12−4=8

● Variance (for population)


Variance measures how far each data point is from the mean. The formula for
population variance is:

DEPT OF BCA DBIMSCA 13


DATA ANALYST 2024-25

Where: ● Xi is each data

point.

● μ is the population mean.


● N is the number of data points.

Example:
Given data: 4, 6, 8, 10, 12, Mean
(μ) = 8,

σ2=((4−8)2+(6−8)2+(8−8)2+(10−8)2+(12−8)2)/5

=(16+4+0+4+16 )/5

=40/5 = 8

● Standard Deviation (for population)


The standard deviation is the square root of the variance.

σ=sqrt(σ 2)

Example:
Given variance = 8, σ

= sqrt(8) ≈ 2.83

4.4.3 Correlation and Regression ●

Pearson Correlation Coefficient (r)

DEPT OF BCA DBIMSCA 14


DATA ANALYST 2024-25

The Pearson correlation measures the linear relationship between two variables.

Example:
Given two sets of data: X=[2,4,6,8]
Y=[1,2,3,4], we can plug values into the
formula to find rrr.

● Simple Linear Regression


The equation of a simple linear regression line is:

Y=β0+β 1 X

Where:

● Y is the dependent variable.


● X is the independent variable.
● β0 is the y-intercept.
● β1 is the slope of the line.

4.5 Solving Problems in Statistics


Let's apply some of these equations to solve problems.

DEPT OF BCA DBIMSCA 15


DATA ANALYST 2024-25

Problem 1: Calculate the Mean and Standard Deviation

Given the data: 5, 8, 10, 12, 15, 18, 20

Step 1: Find the mean.

Mean=(5+8+10+12+15+18+20 )/7 = 88/7 ≈ 12.57

Step 2: Find the standard deviation.

1. Calculate the squared differences from the mean:

(5−12.57)2 ≈58.56

(8−12.57)2 ≈21.64

(10−12.57)2 ≈6.60
(12−12.57)2 ≈0.32

(15−12.57)2 ≈5.91
(18−12.57)2 ≈29.58

(20−12.57)2 ≈54.73

2. Calculate the variance: σ2= (58.56+21.64+6.60+0.32+5.91+29.58+54.73 )/7 =


177.34 /7 ≈ 25.48
3. Find the standard deviation:
σ = sqrt(25.48) ≈ 5.05

Problem 2: Find the Pearson Correlation Coefficient Given

data:

X=[1,2,3,4,5]
Y=[2,4,5,4,5]

Step 1: Write down the formula

DEPT OF BCA DBIMSCA 16


DATA ANALYST 2024-25

Step 2: Make a table to find the required values

X Y XY X2 Y2

1 2 2 1 4

4 8 4 16
2

3 5 15 9 25

4 4 16 16 16

5 5 25 25 25

Σ Σ Σ Σ Σ

DEPT OF BCA DBIMSCA 17


DATA ANALYST 2024-25

15 20 66 55 86

Step 3: Plug values into the formula

Given:

● n=5

● ∑X=15

● ∑Y=20

● ∑XY=66

● ∑X2=55

● ∑Y2=86

Step 4: Simplify Numerator:

5×66=330
15×20=300
30330−300=30

Denominator:

5×55=275
(15)2=225
275−225=50

DEPT OF BCA DBIMSCA 18


DATA ANALYST 2024-25

And

5×86=430
(20)2=400
430−400=30

Now multiply the two results:

sqrt(50×30) = sqrt(1500) ≈ 38.73

Step 5: Final calculation r =


sqrt(38.7330) ≈ 0.775

The Pearson Correlation Coefficient (r) is approximately 0.775

4.6 Project Discussion


Project discussions are essential in data analysis and statistics projects as they help in
clarifying the objectives, understanding the data sources, selecting appropriate
statistical methods, and deciding on the tools and software to be used. These sessions
provide a platform for brainstorming, refining ideas, and planning the workflow of data
analysis.

Key points of a project discussion:


● Define the problem statement clearly.
● Identify relevant data sources.
● Discuss data collection methods.
● Select suitable statistical techniques.
● Allocate tasks and set deadlines.

4.7 Problem Solving Session


Problem-solving sessions involve addressing challenges that arise during data
collection, analysis, or interpretation. In statistics and data analytics projects, this could

DEPT OF BCA DBIMSCA 19


DATA ANALYST 2024-25

include issues like missing data, outliers, inappropriate data formats, or choosing the
right model.

Activities in a problem-solving session:

● Reviewing and cleaning data.


● Handling missing or inconsistent values.
● Identifying and removing outliers.
● Testing different statistical models.
● Evaluating results for accuracy and reliability.

4.8 Revision & Test


Revision and testing are crucial steps to reinforce the concepts learned and to evaluate
the understanding of statistical techniques and data analysis procedures. Regular
revision ensures that fundamental concepts like mean, median, standard deviation,
correlation, and regression remain clear.

Revision & Test topics may include:

● Measures of central tendency (Mean, Median, Mode).


● Measures of dispersion (Range, Variance, Standard Deviation).
● Correlation and Regression analysis.
● Hypothesis testing and interpretation.
● Data visualization techniques.
● Solving statistical problems and case studies.

4.9 Applying Statistics to Data Analysis


Applying statistics to data analysis involves using statistical formulas, tools, and
techniques to uncover patterns, trends, and insights from data. This process transforms
raw data into meaningful information for decision-making.

Steps involved:

1. Data Collection: Gathering data from reliable sources.

DEPT OF BCA DBIMSCA 20


DATA ANALYST 2024-25

2. Data Cleaning: Removing or correcting inaccuracies.

3. Descriptive Analysis: Summarizing data using measures like mean, median,


and standard deviation.

4. Inferential Analysis: Making predictions and testing hypotheses.

5. Data Visualization: Representing data insights using graphs, charts, and


dashboards.

6. Decision Making: Drawing conclusions and suggesting solutions based on


statistical findings.

DEPT OF BCA DBIMSCA 21


DATA ANALYST 2024-25

05 Data analysis using statistics and statistical


tools

5.1 Introduction
Data analysis is a systematic process of inspecting, cleaning, transforming, and
modeling data to discover useful information, suggest conclusions, and support
decision-making. Statistical techniques play a crucial role in this process by offering
methods to summarize data, identify patterns, and test assumptions. In this chapter, we
explore how statistics and various statistical tools are applied in data analysis.

5.2 Role of Statistics in Data Analysis


Statistics provide the foundation for data analysis by:

● Summarizing large sets of data.


● Identifying trends, relationships, and patterns.
● Making predictions and forecasting future outcomes.
● Supporting decision-making through hypothesis testing.
● Validating models and analytical outcomes.

Key statistical measures used include:

● Mean, Median, Mode


● Standard Deviation and Variance
● Correlation and Regression
● Probability and Distribution Analysis
● Hypothesis Testing (t-test, chi-square, ANOVA)

5.3 Statistical Tools for Data Analysis


Several software tools and applications help perform statistical analysis effectively.
These tools simplify complex calculations, visualize data trends, and perform advanced
analytics.

DEPT OF BCA DBIMSCA 22


DATA ANALYST 2024-25

CHAPTER

Popular statistical tools include:

Tool Purpose

SPSS (Statistical Package for the Descriptive and inferential statistical analysis,
Social Sciences) hypothesis testing, correlation, regression

R
Open-source software for advanced statistical
modeling, visualization, and machine learning.

Data manipulation, statistical testing, data


Python (with libraries like Pandas, visualization, predictive modeling.
NumPy, SciPy, StatsModels,
Matplotlib, Seaborn)

Minitab
Specialized software for quality improvement
and statistical analysis

Power BI
Business analytics service with data
visualization and reporting capabilities.

Tableau
Interactive data visualization and business
analytics tool with built-in statistical functions
like trend lines and forecasting.

DEPT OF BCA DBIMSCA 23


DATA ANALYST 2024-25

06 MS Excel lab Sessions

6.1 MS Excel Lab Sessions


MS Excel is one of the most popular and beginner-friendly tools for performing data
analysis, statistical calculations, and data visualization. During the lab sessions,
students and trainees get hands-on practice with various features of Excel that are
essential for data handling and analysis.

Topics Covered in Excel Lab Sessions:

● Basic Excel operations: Data entry, formatting, and worksheet management.


● Use of formulas and functions: SUM(), AVERAGE(), IF(), VLOOKUP(),

HLOOKUP(), COUNT(), COUNTA(), ROUND(), etc.


● Data Sorting and Filtering.
● Working with Pivot Tables and Pivot Charts.
● Creating various types of charts: Column, Line, Pie, Bar, Scatter.
● Performing descriptive statistics: Mean, Median, Mode, Standard Deviation,
Variance.
● Applying conditional formatting for data insights.
● Using Data Validation for restricting entries.
● Introduction to What-If Analysis tools like Goal Seek and Scenario Manager.
● Simple regression analysis and trendline forecasting.
● Working with multiple sheets and data consolidation.

DEPT OF BCA DBIMSCA 24


DATA ANALYST 2024-25

CHAPTER
6.2 Lab Mock Assessment
Purpose of the Mock Assessment:
To evaluate the practical understanding of MS Excel functions, data analysis
techniques, and reporting skills acquired during the lab sessions. It provides a
simulated environment similar to actual working conditions where students apply their
knowledge to solve given data problems.

Assessment Structure:

Section Activity

Basic Operations
Data entry, data cleaning, cell formatting, sheet
organization

Formula and Function Application


Use of statistical functions (AVERAGE, MEDIAN,
MODE, COUNTIF, etc.)

Data Analysis
Sorting, filtering, applying Pivot Tables, generating
Pivot Charts

Visualization
Creating suitable charts and applying conditional
formatting

What-If Analysis
Using Goal Seek or Scenario Manager for
business decision-making scenarios

Descriptive Statistics
Calculating mean, median, standard deviation
using in-built tools or manually.

DEPT OF BCA DBIMSCA 25


DATA ANALYST 2024-25

Regression and Forecasting


Performing a simple linear regression and adding
trendlines.

Report Preparation
Preparing a summary sheet with visualizations
and interpretations

07 Introduction to Power BI

7.1 Introduction to Power BI


Power BI is a powerful business analytics tool developed by Microsoft that enables
users to visualize data, share insights across an organization, and convert raw data into
meaningful and interactive information. It provides a platform for creating interactive
reports, dashboards, and data models, making data-driven decision-making more
accessible and efficient for businesses and individuals.

Power BI is designed to handle large volumes of data from various sources such as
Excel files, databases, cloud services, and web APIs. It simplifies the process of data
integration, analysis, and reporting by providing a user-friendly, drag-and-drop interface
combined with powerful data transformation and modeling capabilities.

The main objective of Power BI is to bridge the gap between data and decision-making
by presenting complex datasets in a visually appealing and easy-to-understand manner
through charts, graphs, maps, and key performance indicators (KPIs). It supports

DEPT OF BCA DBIMSCA 26


DATA ANALYST 2024-25

CHAPTER
realtime data monitoring, predictive analytics, and collaboration features, making it a
comprehensive tool for modern business intelligence needs.

Key Features of Power BI:

● Data Connectivity: Connects to multiple data sources such as Excel, SQL


Server, Oracle, Google Analytics, SharePoint, and cloud services like Azure.
● Data Transformation: Cleans, filters, and shapes raw data using Power Query
Editor.
● Data Modeling: Creates relationships between different data tables and defines
custom calculations using Data Analysis Expressions (DAX).
● Interactive Visualizations: Provides a wide range of customizable charts, maps,
and dashboards
● Real-time Analytics: Supports real-time data streaming and dashboard updates.

7.2 Power BI Lab Sessions


Power BI Lab Sessions provide hands-on practice for students and trainees to explore
business data, create interactive visualizations, and build insightful dashboards. The
sessions are designed to introduce participants to real-time business analytics and
reporting through practical exercises using Power BI Desktop and Power BI Service.

Topics Covered in Power BI Lab Sessions:

● Introduction to Power BI Interface and Components.


● Connecting Power BI to various data sources (Excel, CSV, SQL databases).
● Data loading and basic data transformation using Power Query Editor.
● Creating relationships between data tables.
● Performing calculations using DAX (Data Analysis Expressions).
● Designing interactive visualizations like bar charts, line graphs, pie charts, maps,
and cards.
● Building dynamic and visually appealing dashboards.
● Implementing filters, slicers, and drill-down functionalities.

DEPT OF BCA DBIMSCA 27


DATA ANALYST 2024-25

● Real-time data analysis with live data connections.


● Publishing reports to Power BI Service and sharing dashboards online.
● Embedding Power BI reports into websites or applications.

Objective:
To enable participants to analyze and present business data in an interactive and
meaningful way using Power BI, equipping them with essential business intelligence
skills.

7.3 Revision Session


The Revision Session is conducted to reinforce key concepts and ensure students are
well-prepared for practical assessments. During the revision:

● Important features and shortcuts of Power BI are recapped.


● Sample business scenarios are discussed for data analysis.
● Quick exercises on creating charts, dashboards, and applying DAX formulas.
● Clarification of common doubts and problem areas.

DEPT OF BCA DBIMSCA 28


DATA ANALYST 2024-25

● Revision of the steps for data cleaning, modeling, visualization, and report
publishing.

Outcome:
Participants gain confidence and readiness for the lab assessment by revising crucial
concepts and hands-on techniques.

7.4 Lab Test / Mock Assessment


The Power BI Lab Test evaluates participants' ability to analyze a dataset, perform data
transformations, create visualizations, and build a comprehensive report.

Assessment Structure:

Section Activity

Data Connection & Loading Import data from given sources like Excel or CSV files.

Data Transformation & Apply Power Query functions for cleaning and
Cleaning structuring data.

Data Modeling
Define relationships between tables and create
calculated columns/measures using DAX.

Visualization Creation
Create charts, graphs, maps, and KPI indicators as per
requirements.

Dashboard Development Combine multiple visuals into an interactive dashboard.

DEPT OF BCA DBIMSCA 29


DATA ANALYST 2024-25

Report Publishing
Publish the final report to Power BI Service or export as
PDF.

Interpretation and Summary

Prepare a summary of insights obtained from the report.

CHAPTER 08
Power BI lab and introduction to SQL

8.1 Introduction to Power BI Lab


The Power BI Lab is designed to provide hands-on experience in business intelligence,
data visualization, and interactive dashboard creation. It introduces students to real-world
data analysis scenarios, teaching them how to transform raw data into actionable insights
using Microsoft Power BI.

Objectives:

● To familiarize students with Power BI Desktop and Power BI Service.


● To enable students to connect multiple data sources and perform data cleaning.
● To practice data modeling, relationship creation, and calculations using DAX. ● To
build and publish interactive reports and dashboards.

DEPT OF BCA DBIMSCA 30


DATA ANALYST 2024-25

8.2 Power BI Lab Activities


Lab Sessions Include: ● Installing and setting

up Power BI Desktop.

● Loading and transforming data using Power Query Editor.


● Creating relationships between multiple data tables.
● Developing visualizations: bar charts, line graphs, pie charts, maps, cards, and KPI
indicators.
● Applying DAX formulas for custom calculations.
● Designing interactive dashboards with filters, slicers, and drill-down features.
● Publishing reports to Power BI Service.
● Real-time dashboard creation with streaming data.
● Exporting reports to PDF or embedding them into web applications.

8.3 Introduction to SQL


Structured Query Language (SQL) is a specialized programming language designed for
managing, manipulating, and retrieving data stored in relational databases. It serves as
the standard language for interacting with relational database management systems
(RDBMS) such as MySQL, Oracle, Microsoft SQL Server, PostgreSQL, and SQLite.

SQL plays a vital role in organizing data in a structured manner using tables, enabling
users to efficiently store, query, and analyze large volumes of data. It provides a simple
yet powerful syntax to perform various operations, including inserting new records,
updating existing data, deleting unwanted records, and retrieving specific information
based on conditions.

In today’s data-driven world, SQL has become an essential tool for data analysts, software
developers, data engineers, and business intelligence professionals. Its versatility allows it
to be applied across industries such as finance, healthcare, ecommerce, government, and
technology, wherever organized data management is required.

Key Characteristics of SQL:

DEPT OF BCA DBIMSCA 31


DATA ANALYST 2024-25

● It is declarative, meaning users specify what they want to do with the data, and the
system determines how to execute the request.
● SQL supports data definition, data manipulation, data control, and transaction
management.
● It can handle complex queries involving multiple tables using operations such as
JOIN, GROUP BY, and aggregate functions.
● It ensures data consistency and integrity through features like primary keys, foreign
keys, constraints, and transactions.

8.4 Features of SQL


● Simple and easy-to-learn syntax.
● High performance for complex queries and data operations.
● Scalability for handling large datasets.
● Portability across different database systems (like MySQL, PostgreSQL, Oracle,
SQL Server).
● Supports transactions to ensure data accuracy.

8.5 Common SQL Commands

Command Purpose

CREATE To create a new database, table, or other objects.

INSERT To add new records into a table.

SELECT To retrieve data from one or more tables.

UPDATE To modify existing records in a table.

DEPT OF BCA DBIMSCA 32


DATA ANALYST 2024-25

DELETE To remove records from a table.

WHERE To filter records based on specific conditions.

ORDER BY To sort data in ascending or descending order

GROUP BY To group records by one or more columns.

JOIN
To combine rows from two or more tables based on a
related column.

DEPT OF BCA DBIMSCA 33


DATA ANALYST 2024-25

CHAPTER 09
Python Basics

9.1 Python Project Discussion


The Python Project Discussion session is designed to introduce students to the scope,
structure, and requirements of real-world programming projects using Python. In this
phase, students identify suitable project ideas, outline their objectives, define the
modules required, and plan the development process.

Objectives:

● To select and finalize project topics.


● To discuss project architecture and workflow.
● To identify the libraries and tools needed.
● To plan project timelines and task distribution.

Outcome:
Students will gain clarity on project expectations, objectives, and deliverables.

9.2 Python Basics


This session covers the foundational concepts of the Python programming language.
Python is a high-level, interpreted, and versatile language known for its readability and
simplicity, making it suitable for beginners and professionals alike.

Topics Covered:
● Introduction to Python and its features.
● Python installation and IDE setup.
● Data types: integers, floats, strings, booleans.
● Variables and operators.
● Conditional statements (if, elif, else).
● Loops (for, while).
● Functions and modules.
● Basic file handling.

DEPT OF BCA DBIMSCA 34


DATA ANALYST 2024-25

9.2.1 Introduction to Python and Its Features


Python was created by Guido van Rossum and released in 1991. It is an open-source
language with a strong community and extensive libraries for various applications.

Key Features:

● Easy to Learn and Use: Simple syntax similar to English.


● Interpreted Language: Code is executed line by line, making debugging easier.
● Dynamically Typed: No need to declare variable data types.
● Object-Oriented: Supports object-oriented programming concepts.
● Large Standard Library: Pre-built modules for common tasks.
● Cross-Platform: Runs on Windows, macOS, and Linux.

9.2.2 Python Installation and IDE Setup


To write and run Python programs, users need to:

● Install Python Interpreter from the official website (https://www.python.org/).

● Use text editors like Notepad++, or integrated development environments (IDEs)


like IDLE, PyCharm, Visual Studio Code, or Jupyter Notebook.

9.2.3 Data Types


Python supports various data types for storing different kinds of values.

Basic Data Types:


● Integer (int): Whole numbers (e.g., 10, -3)

● Float (float): Decimal numbers (e.g., 10.5, -3.14)

● String (str): Sequence of characters (e.g., "Hello")

DEPT OF BCA DBIMSCA 35


DATA ANALYST 2024-25

● Boolean (bool): True or False values

Collection Data Types:

● List: Ordered, mutable collection (e.g., [1, 2, 3])

● Tuple: Ordered, immutable collection (e.g., (1, 2, 3))

● Set: Unordered collection of unique items (e.g., {1, 2, 3})

● Dictionary: Collection of key-value pairs (e.g., {"name": "John", "age": 25})

9.2.4 Variables and Operators


● Variables: Used to store data values. No need to declare data type.

name = "Alice"

age = 21

● Operators: Used to perform operations on variables and values.

○ Arithmetic Operators: +, -, *, /, //, %, **

○ Comparison Operators: ==, !=, >, <, >=, <=

○ Logical Operators: and, or, not

○ Assignment Operators: =, +=, -=, *=, /=

○ Membership Operators: in, not in

DEPT OF BCA DBIMSCA 36


DATA ANALYST 2024-25

9.2.5 Conditional Statements


Conditional statements control program flow based on specific conditions.

Types:

● if statement

● if-else statement

● if-elif-else statement

DEPT OF BCA DBIMSCA 37


DATA ANALYST 2024-25

9.2.6 Loops
Loops are used to execute a block of code repeatedly.

Types:

● for loop

● while loop

● Control Statements:
○ break: Exits the loop prematurely.
○ continue: Skips to the next iteration.

○ pass: Does nothing, acts as a placeholder.

DEPT OF BCA DBIMSCA 38


DATA ANALYST 2024-25

9.2.7 Functions and Modules


● Functions: Block of reusable code to perform a specific task.

● Modules: A file containing Python code (functions, variables, classes) that can be
imported into other programs.

9.2.8 Basic File Handling


Python allows reading and writing files.

Opening a file:

Writing to a file:

Modes:

DEPT OF BCA DBIMSCA 39


DATA ANALYST 2024-25

● "r": Read

● "w": Write (overwrite)

● "a": Append

● "r+": Read and Write

9.3 Python Lab Session


Python Lab Sessions provide practical, hands-on experience to reinforce theoretical
knowledge. Through these sessions, students write, test, and debug Python programs.

Lab Activities:

● Writing basic Python programs.


● Creating functions for calculations and string manipulation.
● Working with lists, tuples, dictionaries, and sets. ● Implementing conditional
statements and loops.
● File input/output operations.
● Simple problem-solving exercises.

Outcome:
Students develop confidence in coding simple programs and applying core concepts.

9.4 Data Visualization with Python


This section introduces data visualization techniques using Python libraries. Visualizing
data helps in understanding patterns, trends, and insights through graphical
representation.
Topics Covered:
● Importance of data visualization in analytics.
● Introduction to Matplotlib and Seaborn libraries.
● Creating bar charts, line graphs, histograms, scatter plots, and pie charts.

DEPT OF BCA DBIMSCA 40


DATA ANALYST 2024-25

● Customizing charts with titles, labels, legends, and colors. ● Handling large
datasets for visualization.

Outcome:
Students learn to represent data visually for easy analysis and reporting.

9.5 Data Visualization Lab Session


The DV Lab Session offers practical experience in creating various types of visualizations
using Python.

Lab Activities:

● Plotting graphs using Matplotlib.


● Generating advanced plots with Seaborn.
● Customizing charts with different styles and themes.
● Combining multiple plots for comparative analysis.

Outcome:
Students develop the ability to convert raw data into meaningful visuals.

9.6 Revision and Test


A Revision and Test session is conducted to review all key concepts covered in Python
basics, data visualization, and lab exercises. It helps students consolidate their learning
and assess their understanding through a practical assessment.

Activities:

● Recap of Python syntax and libraries.


● Quick coding exercises.
● Data visualization challenges.
● Problem-solving scenarios.

DEPT OF BCA DBIMSCA 41


DATA ANALYST 2024-25

Outcome:
Students prepare for final evaluations with a thorough understanding of course material.

9.7 Introduction to NumPy


NumPy (Numerical Python) is an open-source Python library used for working with
numerical data. It provides support for large, multi-dimensional arrays and matrices along
with a collection of mathematical functions to operate on these arrays efficiently.

9.7.1 Features of NumPy


● Efficient Array Storage: Supports multi-dimensional arrays (ndarrays). ● Fast
Operations: Performs element-wise calculations at high speed.
● Broad Mathematical Functions: Includes statistical, algebraic, and trigonometric
operations.
● Interoperability: Can integrate with other libraries like Pandas, Matplotlib, and
Scikit-learn.
● Memory Efficient: Consumes less memory compared to regular Python lists.

9.7.2 Common NumPy Operations


● Creating Arrays:

DEPT OF BCA DBIMSCA 42


2024-25
DATA ANALYST

Array Operations:

● Useful Functions:

○ np.zeros(): Creates an array of zeros.

○ np.ones(): Creates an array of ones.

○ np.arange(): Creates an array with a range of values.

○ np.mean(), np.median(), np.std(): Statistical calculations.

9.8 Introduction to Pandas


Pandas is a powerful, open-source Python library used for data manipulation,
analysis, and preparation. It provides flexible and efficient data structures, such as
Series (1D) and DataFrame (2D), for handling and analyzing structured data.

Pandas is widely used in data science, machine learning, business analytics, and
financial modeling for its ability to manage and manipulate large datasets easily.

9.8.1 Features of Pandas


● Easy Handling of Missing Data: Includes functions for identifying and handling
NaN values.
● Data Alignment: Automatic and explicit data alignment.
● Powerful Data Aggregation and Grouping: Facilitates easy grouping and
summarizing of data.

DEPT OF BCA DBIMSCA 43


DATA ANALYST 2024-25


● Reading and Writing Data: Can read/write data from CSV, Excel, SQL
databases, and more.
● Flexible Indexing: Supports label-based and position-based indexing.

9.9.2 Common Pandas Operations


● Creating a Series:

● Creating a DataFrame:

● Reading Data from CSV:

● Basic Data Analysis:


○ df.describe(): Statistical summary of data.

○ df.info(): Summary of DataFrame.

○ df.head(), df.tail(): View top or bottom rows.

DEPT OF BCA DBIMSCA 44


2024-25
○ df.isnull(): Detect missing values.

○ df.drop(), df.fillna(): Handle missing data.

● Filtering and Selecting Data:

DATA ANALYST

Grouping Data:

9.10 NumPy and Pandas Lab Session


In this session, students practiced the practical application of NumPy and Pandas for
handling, analyzing, and manipulating data efficiently. The lab included hands-on
exercises such as:

● Creating NumPy arrays and performing arithmetic operations.


● Computing mean, median, and standard deviation using NumPy.
● Loading datasets into Pandas DataFrames.
● Filtering, grouping, and summarizing data.
● Handling missing values with dropna() and fillna().
● Exporting processed data to CSV files.

9.11 Project Work


Students undertook Python-based projects involving real-world datasets. Projects
combined multiple skills learned throughout the course, such as:

● Data collection and preprocessing.


● Data visualization with Matplotlib and Seaborn.

DEPT OF BCA DBIMSCA 45


DATA ANALYST 2024-25


● Using NumPy for numerical computations.
● Applying Pandas for structured data analysis.
● Presenting insights through visual dashboards.

9.12 Python Conclusion and Basics of Work Management


At the end of the Python module:
● Students revised key concepts like data types, control structures, functions, file
handling, and libraries like NumPy and Pandas.
● Discussed best practices in coding, error handling, and project documentation.

DEPT OF BCA DBIMSCA 46


DATA ANALYST 2024-25


Introduced to work management — understanding task prioritization,
scheduling, goal-setting, and productivity tools used in IT industry environments.

9.13 Work Management


Work management involves planning, organizing, and executing tasks effectively to
meet deadlines and project goals.

Key Aspects: ● Time management and task

delegation.

● Prioritizing high-impact work.


● Using tools like Trello, Jira, or Microsoft Planner for tracking tasks. ●
Managing work-life balance and stress control.

These skills are crucial in IT project management and professional development.

9.14 Working with Colleagues


This topic emphasized:

● Professional communication and teamwork.


● Active listening and respecting diverse perspectives.
● Conflict resolution and maintaining a positive work culture.
● Collaboration tools like Slack, Zoom, and Microsoft Teams.

Students learned through role-playing exercises and simulated team projects.

DEPT OF BCA DBIMSCA 47


DATA ANALYST 2024-25


9.15 Project Discussion
Students presented their projects, discussed challenges faced, and
received constructive feedback from peers and instructors. The discussion
included: Project idea explanation. ● Data collection methods.
● Data analysis process.
● Interpretation of results.
● Future improvements.

It encouraged teamwork, critical thinking, and presentation skills.

9.16 Data Handling and Privacy


A vital part of data analysis is ensuring responsible data handling.

Key Concepts Covered:

● Data privacy laws (like GDPR).


● Ethical use of personal data.
● Protecting sensitive data from unauthorized access.
● Techniques like anonymization and encryption. ● Data governance policies in IT
projects.

Students also reviewed case studies of data breaches and their impact.

9.17 Upgrading Skills


In the IT industry, continuous learning is essential. Students were introduced to:

● The importance of upskilling and reskilling.


● Platforms for online learning: Coursera, Udemy, LinkedIn Learning.
● Following industry blogs, attending webinars, and joining tech communities.
● Setting personal development goals.

DEPT OF BCA DBIMSCA 48


DATA ANALYST 2024-25

9.18 Lab Session: Building and Maintaining Professional Relations


This lab session focused on: ● Networking

techniques in the IT sector.

● Using LinkedIn for professional connections.


Email and workplace etiquette.
● Building trust and effective long-term professional relationships. ● Group
discussions and mock interviews.

9.19 Revision & Test


In the final segment:

● All theoretical and practical topics were revised.


● Mock assessments were conducted covering Python, NumPy, Pandas, Power
BI, SQL, data privacy, and work management.
● Students received feedback and suggestions for improvement.

DEPT OF BCA DBIMSCA 49


DATA ANALYST 2024-25

DEPT OF BCA DBIMSCA 50


DATA ANALYST 2024-25

CHAPTER 10
Conclusion and Bibliography

10.1 Conclusion
This report provided a comprehensive overview of essential concepts, tools, and
practices in the field of data analysis and information technology. Through
structured modules, practical sessions, and project work, students developed hands-on
experience with modern tools and techniques widely used in the IT industry.

Key learnings from this program included:

● Fundamentals of Artificial Intelligence and Big Data, and their real-world


applications.

● The importance of statistics and its role in data-driven decision-making.

● Practical use of MS Excel, Power BI, Python, NumPy, Pandas, and SQL for
data handling, analysis, and visualization.

● Understanding of data privacy, work management, and professional


relationship building within an IT environment.

● Execution of end-to-end data analysis projects, allowing students to apply their


theoretical knowledge to real datasets, extract insights, and present their findings
effectively.

The course not only enhanced technical abilities but also encouraged soft skills such as
teamwork, time management, and communication, which are equally important in
professional settings.

This well-rounded training program prepared students to confidently handle data


analysis tasks, collaborate within teams, manage projects, and continuously upgrade
their skills to stay relevant in a rapidly evolving technology landscape.

DEPT OF BCA DBIMSCA 51


DATA ANALYST 2024-25

10.2 Bibliography
The following resources and references were used throughout the preparation of this
report and the associated training program:

1. Python Documentation
https://docs.python.org/3/

2. NumPy Official Guide https://numpy.org/doc/stable/

3. Pandas User Guide https://pandas.pydata.org/docs/

4. Microsoft Excel Documentation https://support.microsoft.com/excel

5. Power BI Learning Resources


https://learn.microsoft.com/enus/power-bi/

6. SQL Tutorials - W3Schools https://www.w3schools.com/sql/

7. Coursera - Data Analysis and Visualization Courses


https://www.coursera.org/

8. Khan Academy - Statistics and Probability


https://www.khanacademy.org/math/statistics-probability

9. Big Data Concepts - IBM Big Data & Analytics Hub


https://www.ibmbigdatahub.com/

10. Data Privacy and Security Guidelines https://gdpr.eu/

11. Artificial Intelligence Overview - IBM Cloud Learn Hub


https://www.ibm.com/cloud/learn/what-is-artificial-intelligence

52
DATA ANALYST 2024-25

DEPT OF BCA DBIMSCA

DEPT OF BCA DBIMSCA 53

You might also like