PRESIDENCY UNIVERSITY
School of Computer Science and Engineering
INTERNSHIP REPORT
PIP4004 - INTERNSHIP
Data Analytics
STUDENT DETAILS
Name: Hari Teja Reddy Kusam
Roll No: 20211CBC0011
Section: 8 - CBC -1
Batch No.: CBC_01
SUPERVISION & COORDINATORS
Program: CSE (Blockchain)
Head of Department (HoD): Dr. S. Pravintha Raja
Program Project Coordinator: Suma N G
School Project Coordinators:
- Mr. Md Ziaur Rahman
- Dr. Sampath A K
- Dr. Abdul Khadar A
Supervised by: Mr. Ramamurthy Ketha, Assistant Professor, School of Computer Science and
Engineering, Presidency University.
PRESIDENCY UNIVERSITY School of Computer Science and Engineering
TABLE OF CONTENTS
ABSTRACT
Data Analytics is the process of examining, cleaning, transforming, and modeling data to
derive useful insights, support decision-making, and optimize business operations. It
involves various techniques, including data mining, statistical analysis, machine
learning, and data visualization.
The primary goal of data analytics is to uncover patterns, trends, and correlations within
large datasets, enabling organizations to make data-driven decisions. Businesses across
industries use data analytics for forecasting, customer behavior analysis, fraud
detection, operational efficiency, and performance monitoring.
There are four key types of data analytics:
1. Descriptive Analytics – Summarizes historical data to understand past trends.
2. Diagnostic Analytics – Identifies reasons behind past outcomes using data
patterns.
3. Predictive Analytics – Uses statistical models and machine learning to forecast
future trends.
4. Prescriptive Analytics – Recommends actionable strategies based on insights.
Tools such as SQL, Python, R, Power BI, Tableau, and Hadoop play a crucial role in
handling and visualizing data efficiently. As organizations increasingly rely on big data, data
analytics continues to drive innovation, competitiveness, and efficiency in various
domains, including healthcare, finance, marketing, and supply chain management.
CHAPTER - 1
INTRODUCTION
1.1 Overview of the Internship and Objectives
Data analytics plays a crucial role in modern decision-making across industries.
Organizations leverage data analytics to extract meaningful insights, optimize operations,
and drive business strategies. This internship provided hands-on experience in data
mining, data analysis, and data visualization, equipping me with essential skills for
handling real-world datasets.
The primary objective of this internship was to collect, process, analyze, and visualize
Training & Placement Officer (TPO) details from various colleges to aid in recruitment
and placement strategies. The tasks involved:
● Gathering structured and unstructured data from multiple sources.
● Cleaning and standardizing the collected data for accuracy and consistency.
● Performing SQL-based data analysis to identify trends and insights.
● Creating interactive dashboards in Power BI to present findings effectively.
Through this internship, I developed a strong understanding of how data-driven
decision-making impacts recruitment processes and gained hands-on experience with data
analytics tools such as SQL, Excel, and Power BI.
1.2 Importance of Data Analytics in Professional Training and Recruitment
The application of data analytics in the education and recruitment sectors is growing
significantly. Companies and institutions are increasingly relying on data-driven insights to
improve hiring strategies and optimize candidate selection.
Key benefits of using data analytics in recruitment include:
● Enhanced decision-making: Identifying key trends in placement and recruitment.
● Efficient data management: Storing and maintaining structured data for future
use.
● Improved candidate targeting: Identifying the right candidates based on data
trends.
● Performance tracking: Analyzing past recruitment trends to refine future
strategies.
By utilizing SQL for data extraction, Excel for processing, and Power BI for
visualization, organizations can streamline their recruitment processes and improve
decision-making. The insights gained from this internship can help colleges and
companies enhance their placement and hiring strategies using structured, data-driven
methodologies.
This chapter provides an overview of the internship objectives and emphasizes the critical
role of data analytics in professional training and recruitment. Subsequent chapters will
explore the methodologies, challenges, and findings in greater detail.
1.3 Scope of the Project
The scope of this project extends across various stages of data collection, processing,
analysis, and visualization, with a primary focus on leveraging data analytics for
recruitment and placement strategies. This internship project was designed to create a
structured dataset of Training & Placement Officer (TPO) details from different colleges
and analyze key trends to improve placement processes.
The key areas covered in this project include:
● Data Collection: Gathering contact details of TPOs, Vice Chancellors, and Heads of
Departments from various colleges.
● Data Cleaning: Removing duplicates, correcting inconsistencies, and standardizing
formats to maintain data integrity.
● Data Analysis: Using SQL queries to filter, manipulate, and derive insights from the
collected data.
● Data Visualization: Creating interactive dashboards and reports in Power BI to
represent trends and findings.
● Decision Making: Providing actionable insights that can help organizations
improve their campus recruitment strategies.
This project also demonstrates the real-world application of data analytics in educational
institutions and recruitment firms, making it a valuable tool for placement coordinators
and hiring teams. By integrating statistical analysis, automation, and visualization
techniques, the findings from this project can streamline recruitment efforts and
improve efficiency in data management.
1.4 Methodology Adopted
● The methodology adopted for this project follows a structured and iterative data
analytics workflow, ensuring a systematic approach to data handling and
interpretation. The process was divided into the following key phases:
● Data Collection:
○ Extracting TPO details from various sources such as online directories,
college websites, and databases.
○ Utilizing web scraping techniques and manual collection methods to
compile relevant data.
● Data Cleaning & Preprocessing:
○ Identifying and removing duplicate entries.
○ Standardizing formats such as phone numbers, email addresses, and
institution names.
○ Handling missing or incorrect values to ensure data consistency and
reliability.
● Data Storage & Management:
○ Structuring and storing data using SQL databases and Excel spreadsheets.
○ Organizing information into well-defined tables for easy retrieval and
analysis.
● Data Analysis & Trend Identification:
○ Using SQL queries to filter and aggregate data.
○ Conducting statistical analysis to derive patterns, trends, and
correlations.
● Data Visualization & Reporting:
○ Creating interactive dashboards in Power BI to visually represent key
insights.
○ Generating charts, graphs, and tables to highlight placement trends and
recruitment efficiency.
○ Ensuring data-driven storytelling to enhance stakeholder understanding
and decision-making.
● Conclusion & Recommendations:
○ Summarizing key findings from the data analysis.
○ Providing actionable recommendations to improve campus recruitment
strategies.
○ Identifying potential areas for future enhancement in data analytics for
education and recruitment.
● By following this methodical approach, this project successfully demonstrates the
power of data analytics in transforming raw information into meaningful
insights that can help organizations improve their recruitment efficiency and
decision-making processes.
CHAPTER - 2
Literature Review
2.1 Data Mining
2.1.1 Definition and Importance
Data mining is the process of extracting, cleaning, and organizing large volumes of raw
data to identify patterns, relationships, and useful insights. It is widely used across
industries to improve decision-making, business intelligence, and strategic planning.
2.1.2 Techniques Used
Some of the commonly used techniques in data mining include:
● Web Scraping: Automated extraction of data from websites.
● Manual Data Collection: Gathering information from publicly available records.
● Database Extraction: Pulling structured data from relational databases using SQL
queries.
● Data Cleaning: Standardizing, removing duplicates, and handling missing values to
improve data accuracy.
In this project, data mining was used to collect and structure Training & Placement
Officer (TPO) details from multiple colleges. The process involved extracting, verifying,
and organizing the collected data for further analysis.
2.2 Data Analysis & Statistical Techniques
2.2.1 Role of Data Analysis in Decision-Making
Data analysis is crucial for transforming raw data into actionable insights. It involves the
application of statistical methods, querying techniques, and pattern recognition to
derive meaningful trends that can influence business or academic decisions.
2.2.2 Tools and Technologies Used in Data Analysis
● SQL (Structured Query Language): Used for querying and manipulating large
datasets.
● Python & Pandas: Used for statistical analysis and data transformation.
● Excel: Applied for organizing and structuring tabular data.
For this internship project, SQL was used extensively to filter and analyze TPO data.
Statistical techniques such as descriptive analysis, trend forecasting, and clustering
were also utilized to gain deeper insights.
2.3 Data Visualization
2.3.1 Importance of Data Visualization
Data visualization helps in presenting complex datasets in a simplified, interactive
manner, making insights more understandable and accessible to stakeholders.
2.3.2 Tools for Data Visualization
● Power BI: A business intelligence tool used to create interactive dashboards and
reports.
● Tableau: Another widely used visualization tool for analyzing big data.
● Matplotlib & Seaborn (Python): Used for plotting graphs and data distribution
analysis.
For this project, Power BI was used to develop dashboards that visually represent key
trends in TPO recruitment data, including:
● Geographical distribution of TPO contacts.
● Analysis of missing or incomplete records.
● Trend identification for recruitment success rates.
2.4 Research Gaps in Existing Methods
While data analytics has been widely adopted in recruitment and education, certain
challenges still exist in effectively managing and utilizing large-scale data:
1. Lack of Centralized Data Storage
○ Many institutions do not maintain centralized and structured databases
for Training & Placement data.
○ The absence of standardized formats makes data cleaning and integration
challenging.
2. Challenges in Manual Data Collection
○ Scattered and unstructured data requires extensive effort for compilation.
○ Data validation and accuracy checks consume significant time and
resources.
3. Limited Use of Visualization for Placement Analysis
○ Many institutions rely on traditional reporting formats instead of
dynamic dashboards.
○ There is a need for improved real-time analytics in the recruitment
process.
2.5 Summary of Literature Review
This chapter reviewed key aspects of data mining, analysis, and visualization relevant to
this internship project. The research highlighted existing challenges in TPO data
collection, accuracy, and utilization. The findings from this review have directly
influenced the methodology and approach adopted in this project, which will be
discussed in the next chapter.
3. Research Gaps of Existing Methods
The report highlights several limitations in current methods of managing and utilizing data
for recruitment and placement strategies, particularly in the context of TPO data. These gaps
are critical to understanding the challenges faced during report generation and data-driven
decision-making:
1. Lack of Centralized Data Storage
○ Description: Many educational institutions do not maintain a centralized,
structured database for storing TPO-related information. This results in
fragmented data spread across various sources, such as college websites,
directories, or manual records.
○ Impact on Report Generation: Without a unified repository, generating
comprehensive and accurate reports becomes time-consuming and prone to
errors. Analysts must gather data from multiple, often inconsistent sources,
leading to inefficiencies and incomplete insights.
2. Absence of Standardized Formats
○ Description: The lack of uniformity in data formats (e.g., phone numbers,
email addresses, institution names) complicates data integration and
cleaning processes.
○ Impact on Report Generation: Non-standardized data requires extensive
preprocessing to ensure consistency before it can be analyzed or visualized.
This increases the effort needed to produce reliable reports and delays the
reporting timeline.
3. Challenges in Manual Data Collection
○ Description: Much of the TPO data is scattered and unstructured, requiring
significant manual effort to compile. Additionally, validating the accuracy of
this data is resource-intensive.
○ Impact on Report Generation: Manual collection slows down the process
of creating datasets for analysis, while validation issues (e.g., outdated
contact details or duplicates) can compromise the quality of reports,
reducing their usefulness for decision-making.
4. Limited Use of Visualization for Placement Analysis
○ Description: Many institutions rely on traditional reporting methods (e.g.,
static tables or text-based summaries) rather than leveraging dynamic,
interactive visualization tools like dashboards.
○ Impact on Report Generation: The absence of advanced visualization
limits the ability to present complex data in an accessible and actionable
format. Stakeholders may struggle to interpret trends or insights quickly,
reducing the effectiveness of the reports.
5. Need for Real-Time Analytics
○ Description: Current methods often lack real-time data processing
capabilities, meaning reports may reflect outdated information rather than
current trends.
○ Impact on Report Generation: Without real-time updates, reports may fail
to provide timely insights, which is particularly critical in fast-paced
recruitment environments where placement strategies need to adapt quickly.
4. Proposed Methodology
The methodology follows a systematic approach to transform raw, unstructured data into
actionable insights. It is divided into distinct phases, each addressing a critical aspect of the
data analytics process. These phases are designed to overcome the research gaps identified
(e.g., lack of centralized storage, manual collection challenges, and limited visualization) and
ensure efficient report generation and decision-making.
1. Data Collection
○ Objective: Gather TPO details, including contact information of Training &
Placement Officers, Vice Chancellors, and Heads of Departments from
various colleges.
○ Methods:
■ Web Scraping: Automated extraction of data from college websites
and online directories using tools or scripts.
■ Manual Collection: Compiling data from publicly available records
or direct sources where automation was not feasible.
■ Database Extraction: Pulling relevant structured data from existing
relational databases, if available.
○ Purpose: To create a comprehensive dataset from diverse sources,
addressing the gap of scattered and unstructured data.
2. Data Cleaning & Preprocessing
○ Objective: Ensure data accuracy, consistency, and reliability for analysis.
○ Steps:
■ Duplicate Removal: Identifying and eliminating redundant entries
(e.g., repeated TPO contacts).
■ Format Standardization: Converting data into uniform formats
(e.g., consistent phone number styles, email structures, and
institution names).
■ Handling Missing/Incorrect Values: Filling gaps with verified data
where possible or flagging incomplete records for exclusion or
further investigation.
○ Purpose: To mitigate the absence of standardized formats and improve data
integrity, making it suitable for downstream analysis and reporting.
3. Data Storage & Management
○ Objective: Organize the cleaned data into a structured and accessible
format.
○ Approach:
■ SQL Databases: Storing data in relational tables with defined
schemas for efficient querying and retrieval.
■ Excel Spreadsheets: Maintaining supplementary datasets or
backups in tabular form for manual review or smaller-scale analysis.
○ Purpose: To establish a centralized storage system, addressing the research
gap of lacking centralized data repositories, and enabling scalable data
management.
4. Data Analysis & Trend Identification
○ Objective: Derive meaningful insights and patterns from the structured
dataset.
○ Techniques:
■ SQL Queries: Filtering, aggregating, and joining data to answer
specific questions (e.g., placement trends by region or institution).
■ Statistical Analysis: Applying descriptive statistics, trend
forecasting, or clustering to uncover correlations and key metrics
(e.g., recruitment success rates).
○ Purpose: To transform raw data into actionable insights, facilitating
data-driven decision-making for recruitment strategies.
5. Data Visualization & Reporting
○ Objective: Present findings in an interactive and understandable format for
stakeholders.
○ Tools and Outputs:
■ Power BI Dashboards: Creating dynamic visualizations such as
charts, graphs, and maps to highlight trends (e.g., geographical
distribution of TPOs, recruitment efficiency).
■ Reports: Generating summarized tables and narratives to support
the visual insights.
○ Approach:
■ Ensuring interactivity for stakeholders to explore data (e.g., filtering
by college or placement metrics).
■ Focusing on data-driven storytelling to make insights accessible and
actionable.
○ Purpose: To address the limited use of visualization in placement analysis
by providing modern, dynamic reporting tools that enhance stakeholder
understanding.
6. Conclusion & Recommendations
○ Objective: Summarize findings and provide actionable strategies based on
the analysis.
○ Steps:
■ Compiling key insights from the data analysis and visualization
phases.
■ Offering recommendations to improve campus recruitment
processes (e.g., targeting specific colleges with high placement
potential).
■ Identifying areas for future enhancement (e.g., integrating real-time
data feeds).
○ Purpose: To close the loop by translating insights into practical outcomes
and suggesting ways to address remaining gaps, such as the need for
real-time analytics.
Key Features of the Proposed Methodology
● Iterative Workflow: The process is designed to allow refinement at each stage (e.g.,
revisiting data cleaning if new inconsistencies are found during analysis).
● Tool Integration: Combines SQL for data manipulation, Excel for
preprocessing/storage, and Power BI for visualization, leveraging the strengths of
each tool.
● Scalability: Structured storage and analysis methods ensure the approach can
handle larger datasets or additional data sources in the future.
● Focus on Actionability: Emphasis on generating insights and recommendations
that directly improve recruitment and placement strategies.
5. Objectives
- Extract, organize, and clean TPO details from different colleges.
- Perform SQL-based data analysis to identify trends.
- Develop Power BI dashboards for reporting.
6. System Design & Implementation
- Architecture and workflow of data collection and visualization.
- Implementation of SQL queries and Power BI dashboards.
7. Timeline for Execution of Project
- Month-wise breakdown of internship tasks.
8. Results & Discussions
- Findings from the data mining process.
- Insights from SQL analysis.
- Effectiveness of Power BI dashboards.
9. Conclusion
- Summary of key contributions.
- Future scope for improving data collection and analysis.
10. References
- Mode Analytics SQL Tutorial
- SQL Documentation
- Power BI Documentation
- Power BI Community Forum
- Google Data Analytics Professional Certificate