PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
REG.NO:
DATE :
EX.NO : 01
Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
AIM:
To install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
Procedure:
1. Installation
NumPy, SciPy, Jupyter, Statsmodels, and Pandas can be easily installed using Python's package
manager, pip. Open a terminal or command prompt and type the following commands one by one:
● pip install numpy
● pip install scipy
● pip install jupyter
● pip install statsmodels
● pip install pandas
Explore the Features:
● NumPy: NumPy is a fundamental package for scientific computing with Python. It
provides support for arrays, matrices, and high-level mathematical functions to operate on
these arrays.
● SciPy: SciPy is built on top of NumPy and provides additional functionality for scientific
computing. It includes modules for optimization, integration, interpolation, linear algebra,
and more.
● Jupyter: Jupyter is a web-based interactive computing platform that allows you to create
and share documents containing live code, equations, visualizations, and narrative text.
● Statsmodels: Statsmodels is a Python module that provides classes and functions for
estimating many different statistical models, as well as for conducting statistical tests and
exploring data.
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
● Pandas: Pandas is a powerful data analysis and manipulation library for Python. It
provides data structures like Series and DataFrame, which are ideal for working with
structured data.
2. Launch Jupyter Notebook.
3. Explore the Features
● Create an array
● Perform element-wise operations
● Basic statistical functions
NUMPY
PROGRAM:
import numpy as np
print("===== 1. Array Creation =====")
arr1 = np.array([1, 2, 3, 4])
print("Array from list:", arr1)
arr2 = np.zeros((2, 3))
print("Array of zeros:\n", arr2)
arr3 = np.ones((3, 2))
print("Array of ones:\n", arr3)
arr4 = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
print("Array with range:", arr4)
arr5 = np.linspace(0, 1, 5) # [0. , 0.25, 0.5 , 0.75, 1.]
print("Array with linspace:", arr5)
print("\n===== 2. Array Operations =====")
arr6 = np.array([1, 2, 3, 4])
arr7 = np.array([5, 6, 7, 8])
sum_arr = arr6 + arr7
print("Array addition:", sum_arr)
prod_arr = arr6 * arr7
print("Array multiplication:", prod_arr)
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
exp_arr = arr6 ** 2
print("Array exponentiation:", exp_arr)
print("\n===== 3. Indexing and Slicing =====")
# Creating a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Indexing
element = matrix[1, 2]
print("Element at [1, 2]:", element)
# Slicing: Extracting subarrays
sub_matrix = matrix[0:2, 1:3]
print("Sub-matrix:\n", sub_matrix)
# Boolean indexing
mask = matrix > 5
print("Elements greater than 5:\n", matrix[mask])
print("\n===== 4. Broadcasting =====")
arr8 = np.array([1, 2, 3])
arr9 = np.array([[10], [20], [30]]) # Shape (3, 1)
broadcasted_result = arr8 + arr9
print("Broadcasted result:\n", broadcasted_result)
print("\n===== 5. Linear Algebra =====")
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Matrix multiplication
matmul_result = np.dot(a, b)
print("Matrix multiplication result:\n", matmul_result)
# Determinant of a matrix
det_a = np.linalg.det(a)
print("Determinant of matrix a:", det_a)
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# ===== 6. Statistical Operations =====
print("\n===== 6. Statistical Operations =====")
arr10 = np.array([1, 2, 3, 4, 5])
mean_val = np.mean(arr10)
print("Mean:", mean_val)
std_val = np.std(arr10)
print("Standard Deviation:", std_val)
median_val = np.median(arr10)
print("Median:", median_val)
OUTPUT:
===== 1. Array Creation =====
Array from list: [1 2 3 4]
Array of zeros:
[[0. 0. 0.]
[0. 0. 0.]]
Array of ones:
[[1. 1.]
[1. 1.]
[1. 1.]]
Array with range: [0 2 4 6 8]
Array with linspace: [0. 0.25 0.5 0.75 1. ]
===== 2. Array Operations =====
Array addition: [ 6 8 10 12]
Array multiplication: [ 5 12 21 32]
Array exponentiation: [ 1 4 9 16]
===== 3. Indexing and Slicing =====
Element at [1, 2]: 6
Sub-matrix:
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
[[2 3]
[5 6]]
Elements greater than 5:
[6 7 8 9]
===== 4. Broadcasting =====
Broadcasted result:
[[11 12 13]
[21 22 23]
[31 32 33]]
===== 5. Linear Algebra =====
Matrix multiplication result:
[[19 22]
[43 50]]
Determinant of matrix a: -2.0
===== 6. Statistical Operations =====
Mean: 3.0
Standard Deviation: 1.4142135623730951
Median: 3.0
PANDAS
import pandas as pd
import numpy as np
print("===== 1. Create DataFrame =====")
# Creating a DataFrame from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'Age': [24, 27, 22, 32, 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Salary': [70000, 80000, 120000, 90000, 100000]
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
}
df = pd.DataFrame(data)
print("DataFrame created from a dictionary:\n", df)
print("\n===== 2. DataFrame Operations =====")
age_column = df['Age']
print("Age column:\n", age_column)
row_2 = df.iloc[2]
print("\nRow 2:\n", row_2)
row_label = df.loc[1] # 1 is the index label for Bob
print("\nRow with label 1 :\n", row_label)
print("\n===== 3. Filtering and Conditions =====")
filtered_df = df[df['Age'] > 25]
print("Filtered DataFrame (Age > 25):\n", filtered_df)
# Filtering using multiple conditions (Age > 25 and Salary < 100000)
filtered_df_multi_cond = df[(df['Age'] > 25) & (df['Salary'] < 100000)]
print("\nFiltered DataFrame (Age > 25 and Salary < 100000):\n", filtered_df_multi_cond)
print("\n===== 4. Summary Statistics =====")
summary_stats = df.describe()
print("Summary statistics of numeric columns:\n", summary_stats)
mean_salary = df['Salary'].mean()
print("\nMean Salary:", mean_salary)
max_salary = df['Salary'].max()
print("\nMaximum Salary:", max_salary)
print("\n===== 5. Grouping Data =====")
# Group by 'City' and calculate the mean salary for each city
grouped_by_city = df.groupby('City')['Salary'].mean()
print("Average Salary grouped by City:\n", grouped_by_city)
print("\n===== 6. Sorting Data =====")
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# Sorting the DataFrame by 'Salary' in descending order
sorted_by_salary = df.sort_values(by='Salary', ascending=False)
print("DataFrame sorted by Salary (descending):\n", sorted_by_salary)
# Sorting the DataFrame by 'Age' in ascending order
sorted_by_age = df.sort_values(by='Age', ascending=True)
print("\nDataFrame sorted by Age (ascending):\n", sorted_by_age)
print("\n===== 7. Adding and Removing Columns =====")
df['Experience'] = [2, 5, 1, 8, 4]
print("DataFrame with 'Experience' column added:\n", df)
df_dropped = df.drop(columns=['Experience'])
print("\nDataFrame after dropping 'Experience' column:\n", df_dropped)
print("\n===== 8. Merging DataFrames =====")
# Creating another DataFrame to merge with
data2 = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'Department': ['HR', 'IT', 'Finance', 'Marketing', 'Sales']
}
df2 = pd.DataFrame(data2)
# Merging the two DataFrames based on 'Name' column
merged_df = pd.merge(df, df2, on='Name')
print("Merged DataFrame:\n", merged_df)
print("\n===== 9. Handling Missing Data =====")
df_with_na = df.copy()
df_with_na.loc[1, 'Salary'] = np.nan # Introducing NaN for Bob's salary
print("DataFrame with missing data:\n", df_with_na)
# Fill missing data (for 'Salary' column, using the mean)
df_filled = df_with_na.fillna({'Salary': df['Salary'].mean()})
print("\nDataFrame after filling missing data:\n", df_filled)
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# Dropping rows with missing data
df_dropped_na = df_with_na.dropna()
print("\nDataFrame after dropping rows with missing data:\n", df_dropped_na)
OUTPUT
===== 1. Create DataFrame =====
DataFrame created from a dictionary:
Name Age City Salary
0 Alice 24 New York 70000
1 Bob 27 Los Angeles 80000
2 Charlie 22 Chicago 120000
3 David 32 Houston 90000
4 Edward 29 Phoenix 100000
===== 2. DataFrame Operations =====
Age column:
0 24
1 27
2 22
3 32
4 29
Name: Age, dtype: int64
Row 2:
Name Charlie
Age 22
City Chicago
Salary 120000
Name: 2, dtype: object
Row with label 1 (Bob):
Name Bob
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Age 27
City Los Angeles
Salary 80000
Name: 1, dtype: object
===== 3. Filtering and Conditions =====
Filtered DataFrame (Age > 25):
Name Age City Salary
1 Bob 27 Los Angeles 80000
3 David 32 Houston 90000
4 Edward 29 Phoenix 100000
Filtered DataFrame (Age > 25 and Salary < 100000):
Name Age City Salary
1 Bob 27 Los Angeles 80000
===== 4. Summary Statistics =====
Summary statistics of numeric columns:
Age Salary
count 5.000000 5.0
mean 26.800000 94000.0
std 3.774917 17124.1
min 22.000000 70000.0
25% 24.000000 80000.0
50% 27.000000 90000.0
75% 29.000000 100000.0
max 32.000000 120000.0
Mean Salary: 94000.0
Maximum Salary: 120000
===== 5. Grouping Data =====
Average Salary grouped by City:
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
City
Chicag
RESULT:
Thus the program to explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages is executed.
REG.NO:
DATE:
EX.NO : 02
PROGRAM TO REMOVE ROWS IN NUMPY ARRAY THAT
CONTAINS NON-NUMERIC VALUES
AIM:
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
The aim of the program is to remove rows from a NumPy array that contain non-
numeric values.
ALGORITHM:
● Import the numpy library to work with arrays.
● Create a Sample NumPy Array.
● Check for Non-Numeric Values.
● Use a vectorized approach np.vectorize()) to check whether each element is
numeric(either an integer or a float).
○ Filter the Rows Containing Only Numeric Values.
● Use logical indexing to identify the rows where all elements are numeric.
● Row Validation: np.all(mask, axis=1) ensures that only rows with all numeric
values are retained.
● Remove any rows where at least one element is non-numeric.
● Print both the original array and the cleaned array (with non-numeric rows
removed).
PROGRAM:
import numpy as np
data=np.array([[1,2,3],[4,’x’,6],[7,8,9],[‘a’,2,3],[10,11,12]])
mask=np.Vectorize(lambda X : isinstance(x,(int,float)))(data)
Valid_rows=np.all(mask,axis=1)
Cleaned_data=data[Valid_rows]
print(“Original Array:”)
print(data)
print(“\n Cleaned Array(rows with non-numeric values removed):”)
print(Cleaned_data)
OUTPUT:
Original Array:
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
[[‘1’ ‘2’ ’3’]
[‘4’ ‘x’ ‘6’]
[‘7’ ‘8’ ‘9’]
[‘a’ ‘2’ ‘3’]
[‘10’ ‘11’ ‘12’]]
Cleaned Array (rows with non-numeric values removed):
[[‘1’ ‘2’ ‘3’]
[‘7’ ‘8’ ‘9’]
[‘10’ ‘11’ ‘12’]]
RESULT:
Thus the above program to remove rows in numpy array that contains non-numeric
values is executed.
REG.NO:
DATE:
EX.NO : 03
CREATE AN EMPTY & A FULL NUMPY ARRAY
AIM:
The aim of the program is to create and initialize NumPy arrays using two different
functions empty() and full().
ALGORITHM:
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
● Install NumPy
● Import NumPy
● Creating an Empty Array using np.empty(shape)
● Creating a Full Array using np.full(shape, fill_value)
PROGRAM:
import numpy as np
empty_array=np.empty((3,3))
print(“Empty Array”)
print(empty_array)
full_array=np.full((3,3),7)
print(“Full Array”)
print(full_array)
OUTPUT:
Empty Array:
[[0.00000000e+000 1.77956813e-321 0.00000000e+000]
[6.93909653e-310 6.93909653e-310 0.00000000e+000]
[6.93909653e-310 6.93909653e-310 2.12199579e-314]]
Full Array:
[[7 7 7]
[7 7 7]
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
[7 7 7]]
RESULT:
Thus the program to create and initialize NumPy arrays using two different functions empty()
and full() is executed.
REG.NO:
DATE:
ADDITIONAL PROGRAM : 01
CASE STUDY:
You are a data analyst working for a school discrete.the school wants to track the grades of
students in different subjects across multiple terms.you ask with a performing basic
operations of numpy.you have five grades in three subjects for two terms.
TASK TO PERFORM:
1.Increase the grade for all students by 5 points in each subjects.
2.Calculate the average grade for each students across all the subjects.
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
3.Find the highest grade in each subject.
4.extract the grade of student 3.
5.Reshape the grade array so that the subject becomes rows and student becomes column.
6.Add 10 points to each student the math grade and 5 points to science and no change in
english.
AIM:
To analyze and track students' grades across different subjects and multiple terms using
NumPy. The analysis involves performing basic operations such as calculating the average,
maximum, and minimum grades for each subject and term.
ALGORITHM:
● Import Necessary Library:
Import NumPy to handle numerical computations efficiently.
● Define Grade Data:
Create a NumPy array to store students' grades.
The array should have dimensions corresponding to (students × subjects × terms).
● Perform Basic Operations:
Compute the average grade for each student in each term.
Find the highest and lowest grades in each subject across terms.
Calculate the overall average grade for each subject over both terms.
● Display the Results:
Print the processed data, including individual subject averages and term-wise
performance.
PROGRAM:
import numpy as np
grades = np.array([
[75, 80, 85],
[88, 76, 90],
[92, 85, 87],
[78, 88, 82],
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
[85, 89, 84]
])
# 1. Increase all grades by 5 points
grades += 5
print("Grades after adding 5 points:")
print(grades)
# 2. Calculate the average grade for each student across all subjects
average_grades = np.mean(grades, axis=1)
print("\nAverage grade for each student:")
print(average_grades)
# 3. Find the highest grade in each subject
highest_grades = np.max(grades, axis=0)
print("\nHighest grade in each subject:")
print(highest_grades)
# 4. Extract grade of Student 3 (index 2 in 0-based indexing)
student_3_grades = grades[2]
print("\nGrades of Student 3:")
print(student_3_grades)
# 5. Reshape array so that subjects become rows and students become columns
reshaped_grades = grades.T
print("\nReshaped grades (Subjects as rows, Students as columns):")
print(reshaped_grades)
# 6. Add 10 points to Math, 5 to Science, and no change in English
grades[:, 0] += 10 # Math
grades[:, 1] += 5 # Science (English remains the same)
print("\nGrades after specific modifications:")
print(grades)
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
OUTPUT:
Grades after adding 5 points:
[[80 85 90]
[93 81 95]
[97 90 92]
[83 93 87]
[90 94 89]]
Average grade for each student:
[85. 89.66666667 93. 87.66666667 91. ]
Highest grade in each subject:
[97 94 95]
Grades of Student 3:
[97 90 92]
Reshaped grades (Subjects as rows, Students as columns):
[[80 93 97 83 90]
[85 81 90 93 94]
[90 95 92 87 89]]
Grades after specific modifications:
[[ 90 90 90]
[103 86 95]
[107 95 92]
[ 93 98 87]
[100 99 89]]
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
RESULT:
Thus the Student performance analysis is executed successfully.
23AD1413 – FOUNDATIONS OF DATA SCIENCE LABORATORY