Task 8: Use the predict () function to make predictions from that model on new data.
The
new dataset must have all of the columns from the training data, but they can be in a
different order with different values
Tools: RStudio, Python
Aim
To build a predictive model using machine learning, train it on a given dataset, and use the
predict() function to make predictions on a new dataset with the same feature columns but
different values.
Procedure
1. Load the necessary libraries in R or Python.
2. Load the training dataset and preprocess it if required.
3. Train a regression or classification model using the training dataset.
4. Load the new dataset, ensuring it has the same columns as the training dataset.
5. Use the predict() function to generate predictions for the new dataset.
6. Evaluate the predictions (optional).
Sample Dataset
We will use a dataset that contains work experience (years) as an independent variable
(X) and salary (USD) as a dependent variable (Y).
Training Data (train.csv)
Employee Experience (Years) Salary (USD)
1 1 3000
2 2 3500
3 3 4000
4 4 4500
5 5 5000
6 6 5500
New Data (new_data.csv)
Employee Experience (Years)
7 2.5
8 4.5
9 5.5
Python Implementation using Scikit-Learn
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# Load the training dataset
train_data = pd.DataFrame({
'Experience': [1, 2, 3, 4, 5, 6],
'Salary': [3000, 3500, 4000, 4500, 5000, 5500]
})
# Split into X (features) and y (target)
X_train = train_data[['Experience']]
y_train = train_data['Salary']
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Load the new dataset (without salary)
new_data = pd.DataFrame({'Experience': [2.5, 4.5, 5.5]})
# Make predictions
predictions = model.predict(new_data)
# Display predictions
new_data['Predicted Salary'] = predictions
print(new_data)
Output:
Experience Predicted Salary
0 2.5 3750.0
1 4.5 4750.0
2 5.5 5250.0
Results
The model was successfully trained using a linear regression approach.
• Using the predict() function, we estimated salaries for new employees based on
their experience.
• The predictions suggest that a linear increase in salary occurs with increasing years of
experience.