0% found this document useful (0 votes)
21 views56 pages

Group 3

Uploaded by

Lê Hoàng Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views56 pages

Group 3

Uploaded by

Lê Hoàng Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

UNIVERSITY OF ECONOMIC AND FINANCE

FACULTY OF INFORMATION TECHNOLOGY

PROJECT

TOPIC: DATA ANALYST JOB MARKET ANALYSIS

Course : Business Intelligence


Course code : EBU1134E
Class : A01E
Lecture : Tran Thanh Cong
Group : 03
Members : Nguyen Thi Bich Ngoc – 205120647
Vo Nguyen Khanh Linh – 205120786
Vo Duy Tung – 205120646
Le Hoang Nam – 205120395
Vo Trung Thanh - 205120577

TP. Ho Chi Minh – 2024


UNIVERSITY OF ECONOMIC AND FINANCE
FACULTY OF INFORMATION TECHNOLOGY

PROJECT

TOPIC: DATA ANALYST JOB MARKET ANALYSIS

Course : Business Intelligence


Course code : EBU1134E
Class : A01E
Lecture : Tran Thanh Cong
Group : 03
Members : Nguyen Thi Bich Ngoc – 205120647
Vo Nguyen Khanh Linh – 205120786
Vo Duy Tung – 205120646
Le Hoang Nam – 205120395
Vo Trung Thanh - 205120577

TP. Ho Chi Minh – 2024


TABLE OF CONTENT
CHAPTER 1: INTRODUCTION .................................................................................... 2

I. Overview about Google Colab .................................................................................. 2

II. Overview about Kaggle ............................................................................................ 2

III. Overview about topic .............................................................................................. 2

CHAPTER 2: IDENTIFYING THE PROBLEM .......................................................... 3

I. Assessment of the current job market ..................................................................... 3

II. The need of the problem .......................................................................................... 3

III. The goal of the project ............................................................................................ 4

CHAPTER 3: DATA DESCRIPTION ............................................................................ 5

I. Data source ................................................................................................................. 5

II. Data collection ........................................................................................................... 5

III. Data preparation ..................................................................................................... 5

1. Import Python libarary ............................................................................................ 5

2. Read data ................................................................................................................. 6

3. Head of data ............................................................................................................. 7

4. Tail of data ............................................................................................................... 8

5. Info of data............................................................................................................... 8

6. Statistics summary ................................................................................................. 12

CHAPTER 4: DATA CLEANING ................................................................................ 13

I. Check duplicate ........................................................................................................ 13

II. Missing value calculation ....................................................................................... 13

III. Data Reduction ...................................................................................................... 15

IV. Data Cleaning ........................................................................................................ 18

i
1. Identifying and Displaying Attribute Columns with Missing Values ................... 18

2. Handling with missing values ............................................................................... 19

3. Continue handling with missing value .................................................................. 20

4. Renaming columns in data frame .......................................................................... 20

5. Check Null ............................................................................................................. 22

6. Continue to handle the remaining missing values ................................................. 23

7. Check to see if there are any remaining missing values........................................ 24

8. Cleaned data table .................................................................................................. 25

CHAPTER 5: EXPLORATORY DATA ANALYSIS (EDA) ..................................... 26

I. Goal of EDA .............................................................................................................. 26

II. EDA Univariate Analysis ....................................................................................... 26

1. Top 10 Industries with most jobs .......................................................................... 26

2. Top 10 companies with the most job postings ...................................................... 28

3. Top 10 most common job titles ............................................................................. 29

4. Top 10 In-Demand Skills ...................................................................................... 30

5. Top 10 Employee Benefits Offered by Companies............................................... 31

6. Distribution of Work Types Offered by Jobs ........................................................ 33

7. Distribution of Company Size ............................................................................... 34

III. Data Transformation ............................................................................................ 35

1. The number of applications for job positions ........................................................ 35

2. The number of views for job posting .................................................................... 36

IV. EDA Bivariate Analysis ........................................................................................ 38

1. Top 5 Most Required Skills by Salary Level ........................................................ 38

2. Average Salaries Offered by Top Employers ........................................................ 39

ii
V. EDA Multivariate Analysis .................................................................................... 41

1. Top 10 View and Applies Title for jobs ................................................................ 41

2. Number of Views and Applies for Top 10 Companies ......................................... 42

3. Heatmap ................................................................................................................. 43

CHAPTER 6: CONCLUSION AND RECOMMENDATION .................................... 45

I. CONCLUSION......................................................................................................... 45

II. RECOMMENDATION ......................................................................................... 45

ASSIGNMENT SHEET .................................................................................................... 1

REFERENCES .................................................................................................................. 2

iii
FIGURE

Figure 1: Import Python into Google Colab (Source: Group 3).......................................... 6


Figure 2: Import data file into Google Colab (Source: Group 3) ........................................ 6
Figure 3: Read data (Source: Group 3)................................................................................ 7
Figure 4: Head of data (Source: Group 3) ........................................................................... 8
Figure 5: Tail of data (Source: Group 3) ............................................................................. 8
Figure 6: Info of data (Source: Group 3) ........................................................................... 11
Figure 7: statistics summary (Source: Group 3) ................................................................ 12
Figure 8: Check duplicate (Source: Group 3).................................................................... 13
Figure 9: Mising Value Calculation 1 (Source: Group 3) ................................................. 14
Figure 10: Mising Value Calculation 2 (Source: Group 3) ............................................... 15
Figure 11: Data reduction 1 (Source: Group 3) ................................................................. 16
Figure 12: Data reduction 2 (Source: Group 3) ................................................................. 17
Figure 13: Data cleaning 1 (Source: Group 3) .................................................................. 18
Figure 14: Data cleaning 2 (Source: Group 3) .................................................................. 19
Figure 15: Data cleaning 3 (Source: Group 3) .................................................................. 20
Figure 16: Data cleaning 4 (Source: Group 3) .................................................................. 20
Figure 17: Data cleaning 5 (Source: Group 3) .................................................................. 22
Figure 18: Data cleaning 6 (Source: Group 3) .................................................................. 23
Figure 19: Data cleaning 7 (Source: Group 3) .................................................................. 24
Figure 20: Data cleaning 7 (Source: Group 3) .................................................................. 25
Figure 21: Top 10 Industries with Most job (Source: Group 3) ........................................ 27
Figure 22: Top 10 Companies ưith Most job Postings (Source: Group 3)........................ 28
Figure 23: Top 10 Most Common job Titles (Source: Group 3)....................................... 29
Figure 24: Top 10 In-Demand Skills (Source: Group 3) ................................................... 30
Figure 25: Top 10 Employee Benefits Offered by Companies (Source: Group 3) ........... 31
Figure 26: Distribution of Work Types Offered by jobs (Source: Group 3) ..................... 33

iv
Figure 27: Distribution of Company Sizes (Source: Group 3) .......................................... 34
Figure 28: The number of applications for job positions (Source: Group 3) .................... 35
Figure 29: The number of views for job posting (Source: Group 3) ................................. 36
Figure 30: Top 5 Most Required Skills by Salary Level (Source: Group 3) .................... 38
Figure 31: Average Salaries Offered by Top Employers (Source: Group 3) .................... 39
Figure 32: Top 10 Views and Applies Title for jobs (Source: Group 3)........................... 41
Figure 33: The Number of Views and Applies for Top 10 companies (Source: Group 3)42
Figure 34: Heatmap (Source: Group 3) ............................................................................. 43

v
TABLE
Table 1: Assingment sheet (Source: Group 3) .................................................................... 1

vi
BUSINESS INTELLIGENCE GROUP 3

PREFACE
The year 2023 marks an important stage in the global economy's recovery journey after the
fluctuations and challenges of the previous year. With the spread of stability and hope for
the future, the job market has also become more vibrant and diverse than ever. Trends and
changes in the economy, technology, and society have strongly impacted the way
employers find and select talent, as well as the way workers find and build careers.
Data has become an extremely valuable resource, and the ability to leverage it effectively
has become one of the determining factors in the success of every organization. To be able
to analyze data Trend of the Job Market in 2023 we used Business Intelligence (BI) a
powerful tool, changing the way we perceive and manage information. Business
Intelligence isn't just about collecting data, it's also about turning that data into valuable
information. This includes using technology and analytical processes to create an overview
and deeper into the organization's operations. From numbers and data from many different
sources such as social networks and connected devices, BI helps us understand how the
organization operates, discovering many other trends and patterns. From there, strategic
decisions can be made that are suitable for projects of individuals, organizations and
businesses.
Our group studying in class code A01E would like to sincerely thank Mr. Tran Thanh
Cong, lecturer of "Business Intelligence" for supporting us in implementing our ideas and
helping us step by step complete this project. We would like to express our deep gratitude
to Mr. Tran Thanh Cong for accompanying us throughout the semester, providing us with
a lot of useful knowledge related to the research process of this project. During the process
of preparing the report, due to the knowledge, qualifications, and practical experience of
all members being low, time to research the topic was limited, so the topic was bound to
make mistakes.
Therefore, we look forward to receiving your comments and suggestions so that we can
learn from experience and acquire more knowledge to improve our experience in future
topics. Once again, our group sincerely thank you.

1
BUSINESS INTELLIGENCE GROUP 3

CHAPTER 1: INTRODUCTION
I. Overview about Google Colab
Google Colab, short for Google Colaboratory, is a cloud-based platform provided
by Google that provides free access to the Jupyter notebook environment. It allows
users to write and execute Python code in a web browser, collaborate with others,
and take advantage of Google's powerful hardware infrastructure without requiring
any setup on the user's local machine. In this project we will use google colab for
analysis. With google colab, it helps us work easier and more efficiently.

II. Overview about Kaggle


Kaggle is an online platform for data science enthusiasts that offers data
competitions, datasets, and collaborative notebooks. It hosts competitions where
users develop predictive models, provides datasets for analysis, and provides tools
like notebooks for coding and sharing. Kaggle also has courses, discussion forums,
and job boards, making it a one-stop shop for data science learning, collaboration,
and career opportunities. In this project, we use kaggle to find the right dataset for
our project, helping us clearly analyze our topic.

III. Overview about topic


Within the framework of the Business Intelligence thematic report, our team chose
the topic of Job Market Analysis.
In this project, the aim is to provide useful information to job seekers, employers
and retainers, helping them make smart choices in the changing job market. With
the rich data source from Kaggle, the potential to explore this dataset is huge and
includes discovering the highest paying job titles, companies, and locations; salary
prediction; and check out how industries and companies differ through their
internship offerings and benefits.
From the data source, we use the Python programming language to process and
visualize them scientifically.

2
BUSINESS INTELLIGENCE GROUP 3

CHAPTER 2: IDENTIFYING THE PROBLEM


I. Assessment of the current job market
In 2020 and 2021, the world faced a global crisis that shook the core of the job
market - the COVID-19 pandemic. The pandemic has caused mass layoffs and
economic instability. With growing economic uncertainty causing many employers
to pull back on hiring plans, the job market appears to have cooled after the previous
hot pace of post-pandemic recovery.
In the job market in 2023, many challenges and opportunities appear due to the
influence of many different factors. Among them, technology plays a key role, as
advances in artificial intelligence, automation and big data continue to change the
labor landscape. Jobs that require digital skills and the ability to adapt quickly are
likely to increase, while some traditional jobs may be replaced or improved by
technology. This poses the challenge of retraining and developing the skills of
existing human resources, while also creating opportunities for those who are
creative and flexible in using technology. However, it is also necessary to pay
attention to the risk of job loss and strengthen protection measures for affected
workers. At the same time, the growth of new industries, such as information
technology, digital health, and renewable energy, provide new opportunities for job
creation and economic development. This poses challenges in resource allocation
and human resource training to meet new market demands.

II. The need of the problem


Analyzing employment trends is an indispensable part of the career building and
management process. This helps better understand the current labor market, by
analyzing trends, opportunities and challenges in the profession we are interested
in. This provides an overview of the recruitment situation, developments in job
sectors and skills requirements from employers. Analyzing employment trends
helps shape careers more effectively. By identifying careers that are growing and
have future potential, we can decide on a suitable and promising career path. At the

3
BUSINESS INTELLIGENCE GROUP 3

same time, this analysis also helps identify the necessary skills and knowledge to
advance further in our career.

III. The goal of the project


Studying LinkedIn job posting data from 2023 is an important step in understanding
and predicting future labor market trends. This provides an overview of skills
demand, the distribution of job opportunities across different sectors, and what
influences pay gaps. Through analyzing data from LinkedIn, we can uncover
important trends and provide detailed insights into the labor market. Specifically,
this research will focus on three main aspects:
• Skills demand: By analyzing job posting data from LinkedIn, it is possible to
identify the most in-demand skills in the 2023 labor market. This helps
individuals and organizations better understand what skills are available to
them. skills needed to succeed in their profession and prepare for future careers.
• Distribution of job opportunities across different industries: Data from
LinkedIn provides information on the number and types of jobs posted in
different industries. By analyzing this data, we can identify rapidly growing
occupations, industries with high job demand, and potential career
opportunities.
• Factors that influence pay gaps: By comparing salaries in different
occupations and identifying factors that influence pay gaps, we can better
understand the income distribution in the market. labor school. This can include
factors such as technical skills, job level, geographic location and company type.

4
BUSINESS INTELLIGENCE GROUP 3

CHAPTER 3: DATA DESCRIPTION


I. Data source
Kaggle: provide a wide variety of datasets covering diverse topics.
Public dataset: the data from various sources and freely available for anyone with
no restrictions on how we use it in our projects. These datasets are contributed by
users and organizations and are often used in competitions to develop predictive
models and analytical solutions.
Specific usage: the data provided is usually relevant to the specific challenge and
can be used for that purpose. We can read, analyze and build models using the
competition data. Creating the environment when we can have participants
collaborate, share insights, and learn from each other's approaches.

II. Data collection


Linkedin job posting 2023 data:
• Our expectations that we can analyze and understand about companies,
industries, market trends, job finding needed. After we downloaded it, we got
job_posting.csv. This file we will use in the further analysis.
• In our data, we have: job title, type of work, skill, experience, career, location,
company, date submitted, job description, salary,v.v. which is helpful for
employees and employers. It provides an overview of the job market, skills and
experience needed for different positions in 2023.
• Kaggle is a reputable and trustworthy platform so the data related to linkedin
can ensure its reliability that helps us easily analyze and understand.

III. Data preparation


1. Import Python libarary
Import Python library to use Python in google colab as the first step of data analysis.
Then import the data file we chose to start our analysis.

5
BUSINESS INTELLIGENCE GROUP 3

Figure 1: Import Python into Google Colab (Source: Group 3)

Figure 2: Import data file into Google Colab (Source: Group 3)

2. Read data
To read the data package, we utilized code. We are able to access the data frame,
which gives us a complete picture of the data.

6
BUSINESS INTELLIGENCE GROUP 3

Figure 3: Read data (Source: Group 3)

3. Head of data
To better understand our dataframe, we use code to display the first three data. This
data shows overview information of the job recruitment. This provides the ability to
see comprehensive data columns and understand the type of data, which can be
helpful in the predicted analysis.

7
BUSINESS INTELLIGENCE GROUP 3

Figure 4: Head of data (Source: Group 3)

4. Tail of data
Similarly, this image shows the last three data of the packet. It refers to the most
recent observations and helps us to better understand the dataset.

Figure 5: Tail of data (Source: Group 3)

5. Info of data
Info function helps us to know the formats and number of not-null observations of
each field in the dataframe. We divide into two main fields for the job posting to
make it easier to analyze.
• Job details: it include all of information about the job which people primary
look up:

8
BUSINESS INTELLIGENCE GROUP 3

o Job_description: all details about the job which is hired looking for like
requirements, role and responsibilities, skills required.
o Formatted_work_type: the time people will work, like full-time, part-time,
internship, contract, temporary.
o Job_location: the place people work, maybe in the company office.
o Remote_allowed: the job allowed to work remote or not. If it allows, it will
be shown “-1” and if it does not allow, it will be shown “1”.
o Job_application_type: the type of application hired required, such as onsite
apply, complex offsite apply, simple offsite apply.
o Formatted_experience_level: depending on experience required of each job
description, it would show the level, such as internship, entry level, mid-
senior level, associate, director.
o Skills_desc: it shows fully detailed job skills required.
o Sponsored: to refer to job positions sponsored by certain companies. where
“0” is not sponsored and “1” is sponsored.
o Job_benefits: it will display all information of the insurance which is
provided for employees working in this company and this job, such as 401K
(retirement saving), medical insurance,...
o Job_required_skill_code: for each job skill required, it will have the code
to identify each other.
o Max_salary: the maximum salary the hired can offer.
o Min_salary: the minimum salary the hired can offer.
o Med_salary: the average salary the hired can offer.
o Pay_period: the time people get the salary, like monthly.
o Currency: depending on the location of the company, the salary will be paid
based on the currency of this country.
o Compensation_type: the amount of money employees would receive when
having any condition that needs compensation.

9
BUSINESS INTELLIGENCE GROUP 3

• Company details: it include all of information about the company which make
employees in advance:
o Industry_name: the industry name
o Company_name: the company name
o Company_description: the details of the company where it helps employees
have more information about the company.
o Company_size: calculation the number of employees, annual revenue,
number of customers, or number of offices/branches in the company.
o Company_state: the location of the company.
o Company_country: the location of the company.
o Company_city: the location of the company.
o Company_employee_count: the number of employees in the company.
o Company_follower_count: the number of followers of the company in
Linkedin.
o Company_speciality: shows the company's industry which they focus on
working, such as cloud computing, child music lessons,...
o Company_industry: shows the company’s industry but it would be larger,
such as technology, healthy care,...
o Company_size_label: A company's brand name or brand image identifies
the company.

10
BUSINESS INTELLIGENCE GROUP 3

Figure 6: Info of data (Source: Group 3)

11
BUSINESS INTELLIGENCE GROUP 3

6. Statistics summary
The table shows that there are 3,757,940,104 job_id in the dataset while it has
101,174,062 company_id.
The maximum of the maximum salary is $1,300,000 and the minimum of the
maximum salary is $10. The maximum of the minimum salary is $800,000 and the
minimum of the minimum salary is $7.25.
The average views of each job description is around 49,2 views.
While the average job is 3,723,714,700; the average applies to only nearly 10
applications.
The average company employs around 22129 people. The company follower is
more 836,919 followers.

Figure 7: statistics summary (Source: Group 3)

12
BUSINESS INTELLIGENCE GROUP 3

CHAPTER 4: DATA CLEANING


I. Check duplicate
The "Check for duplication" step in the image performs the task of checking for
duplicate records in the job posting data. Specifically, this step will verify if there
are any records with the same job_id. The result of the "Check for duplication" step
is 0, meaning no duplicate records were found in the job posting data.

Figure 8: Check duplicate (Source: Group 3)

II. Missing value calculation


The "Missing Value Calculation" step in the image calculates the total number of
missing values for the variables in the job posting dataset. The numbers in the right
column indicate the quantity of missing values for each corresponding variable.

13
BUSINESS INTELLIGENCE GROUP 3

Figure 9: Mising Value Calculation 1 (Source: Group 3)

The "Missing Value Calculation" step in the image calculates the proportion of
total missing values for the variables in the job posting dataset. The numbers in the
right column indicate the ratio of missing values to the total number of values for
each corresponding variable.

14
BUSINESS INTELLIGENCE GROUP 3

Figure 10: Mising Value Calculation 2 (Source: Group 3)

III. Data Reduction


Step 1: Creating the Main Data Table from Other Data Tables

15
BUSINESS INTELLIGENCE GROUP 3

Figure 11: Data reduction 1 (Source: Group 3)

• Merge the two data tables (job_postings) and (benefits) into a single table
(merged_jobs) using “job_id” as the key.
• Merge the two data tables (companies) and (employee_counts) into a single
table (merged_companies) using “company_id” as the key.
• Merge the two tables (merged_jobs) and (merged_companies) into the main
data table (data) using “company_id” as the key.
• Print the main data table (data) after merging and examine the results

Step 2: Remove unnecessary columns from the dataset for analysis

16
BUSINESS INTELLIGENCE GROUP 3

Figure 12: Data reduction 2 (Source: Group 3)

• Create a variable columns_to_drop including columns that are not valuable for
the analysis process: pay_period , original_listed_time , expiry , closed_time ,
listed_time , currency , compensation_type , job_posting_url , application_url ,
inferred , zip_code , address , url , time_recorded.
• Use the drop(columns)command to remove unnecessary data columns from the
main dataset.
Purpose of Data Reduction: Streamline the dataset by eliminating data that is not
valuable for the analysis process, facilitating the creation of accurate and visually
appealing charts for analysis.

17
BUSINESS INTELLIGENCE GROUP 3

IV. Data Cleaning


1. Identifying and Displaying Attribute Columns with Missing Values

Figure 13: Data cleaning 1 (Source: Group 3)

Create the missing_data variable and use the isnull().sum() command to identify and
sum missing values in the data columns of the main data table (data).
Displays the data column name and the number of missing value cells in that
column, sorted by decreasing number of columns with missing values.

18
BUSINESS INTELLIGENCE GROUP 3

2. Handling with missing values

Figure 14: Data cleaning 2 (Source: Group 3)

Use the fillna() structure to fill in “Not Specified” columns with missing values
For numerical columns, we fill in the number 0.
Fill in “Unknown” in the missing data cell of the remote_allowed column using the
fillna() command.
Create the variable remaining_missing and use the isnull().sum() command to
identify and sum empty values in the data columns of the main data table (data).
Displays the data column name and the number of missing values cells in that
column, sorted by decreasing number of columns with missing values.

19
BUSINESS INTELLIGENCE GROUP 3

3. Continue handling with missing value

Figure 15: Data cleaning 3 (Source: Group 3)

Use the fillna() structure to fill "Not Specified" into missing values in the
description_x column.
Delete duplicate rows in the main data table using the drop_duplicates() command.
4. Renaming columns in data frame

Figure 16: Data cleaning 4 (Source: Group 3)

Use the rename() command to rename the data columns in the data table to make
them clearer and easier to understand. We rename the columns:
• 'name' is changed to 'company_name'

20
BUSINESS INTELLIGENCE GROUP 3

• 'description_y' is changed to 'company_description'


• 'state' is changed to 'company_state'
• 'country' is changed to 'company_country'
• 'city' is changed to 'company_city'
• 'employee_count' is changed to 'company_employee_count'
• 'follower_count' is changed to 'company_follower_count'
• 'description_x' is changed to 'job_description'
• 'title' is changed to 'job_title'
• 'location' is changed to 'job_location'
• 'application_type' is changed to 'job_application_type'
• 'type' is changed to 'job_benefits'

21
BUSINESS INTELLIGENCE GROUP 3

5. Check Null

Figure 17: Data cleaning 5 (Source: Group 3)

Continue to check the columns in the main data table to see if there are any missing
values.
The checking results show that the columns company_id , max_salary , min_salary
and med_salary still have missing values.

22
BUSINESS INTELLIGENCE GROUP 3

6. Continue to handle the remaining missing values

Figure 18: Data cleaning 6 (Source: Group 3)

Use the fillna() command and the mode() function to replace Null values in the
company_id column with the most frequently occurring values.
Use the fillna()command and the median()function to replace Null values in the
max_salary, min_salary and med_salary columns with the median values in those
respective columns.

23
BUSINESS INTELLIGENCE GROUP 3

7. Check to see if there are any remaining missing values

Figure 19: Data cleaning 7 (Source: Group 3)

Use the isnull().sum() command to check if there are any missing values remaining
in the data table.
The check results show that the final data table has been completely cleaned, with
no remaining missing values, and can be used for analysis.

24
BUSINESS INTELLIGENCE GROUP 3

8. Cleaned data table

Figure 20: Data cleaning 7 (Source: Group 3)

Purpose of Data Cleaning:


Cleaning data is an essential step in the data processing process. This helps to
improve the quality of the data and ensures accuracy, consistency, and completeness
of the data before it is used for analysis. This includes:
• Identifying and removing inaccurate or unwanted data
• Standardizing data
• Identifying and handling missing values

25
BUSINESS INTELLIGENCE GROUP 3

CHAPTER 5: EXPLORATORY DATA ANALYSIS (EDA)


I. Goal of EDA
With the vast amount of data at hand, the potential for exploration within this dataset
is immense. This encompasses analyzing the highest-paying job titles, companies,
and locations; predicting salary and benefits using Natural Language Processing
(NLP); and meticulously examining variations among industries and companies
regarding internship opportunities and benefits. Future updates to the dataset will
allow for deeper exploration into time-based trends, such as company growth, the
prevalence of remote job opportunities, and changes in demand for specific job titles
over time. Therefore, we have opted to conduct an Exploratory Data Analysis
(EDA) based on the following questions:
• Which Fields Have the Most Job Openings?
• Which Types of Jobs Are in High Demand?
• What Skills Are Most in Demand?
• What Are the Average Salaries and Benefits for Recruitment Positions?
• What Is the Ratio of Views to Applications?

II. EDA Univariate Analysis


1. Top 10 Industries with most jobs

26
BUSINESS INTELLIGENCE GROUP 3

Figure 21: Top 10 Industries with Most job (Source: Group 3)

Based on the figure 21, it can be observed that sectors related to healthcare,
technology, and human resources were the sectors with the most abundant
workforce, exceeding 2500 positions, with the highest being over 3500. Following
these sectors were business and finance-related sectors, with numbers around 2000.
In summary, based on the table above, it is evident that the technology sector has
the most job positions distributed from the highest to the average range. Based on
this, candidates can prepare the skills, experience, and knowledge required to
develop a suitable job application strategy.

27
BUSINESS INTELLIGENCE GROUP 3

2. Top 10 companies with the most job postings

Figure 22: Top 10 Companies ưith Most job Postings (Source: Group 3)

Based on the figure 22, the column chart showing the Top 10 companies with Most
job Postings, it can be observed that the number of companies posting job openings
on LinkedIn was gradually increasing, indicating a gradual vibrancy in the job
market. Insight Global emerged as the company with the highest number of
vacancies, accounting for 20.4% of the total positions. With more than 1750 job
postings, Insight Global surpassed Google by three times (with more than 500 job
postings). The presence of companies from various fields, ranging from human
resources consulting services such as Insight Global and Robert Half to large
technology companies like Google, was notable. This indicated that job
opportunities were not solely concentrated in one specific industry. The growth of
technical positions as well as other roles such as marketing, project management,

28
BUSINESS INTELLIGENCE GROUP 3

and finance were all in demand and contributed to the expansion of the job market.
This analysis, based on the bar graph, provided an overview of the diversity and
dynamism present in the job market at that time.
3. Top 10 most common job titles

Figure 23: Top 10 Most Common job Titles (Source: Group 3)

Based on the figure 23, the bar chart showed the top 10 most common job titles.
This figure displayed the frequency requirement of these jobs. The most job titles
required was the Retail Sales Associate with more than 250 postings on Linkedin.
And, the least common was block advisor - remote Tax Professional with neary 150
postings. The other jobs fluctuated between 150-250 postings. In the top 10 common
job titles, it has 3 jobs related to sales. We can define that, in 2023, the market needs
a number of sellers and customer service which helps to connect their products to
the customers. Beside, the need of an accountant also accounted for the amount of

29
BUSINESS INTELLIGENCE GROUP 3

job requirements, especially people who have strong experience like seniors or
managers. The least job frequency required was Block Advisor - Remote Tax
Professional. It can be understood that companies prefer working at their company
more than working at home, but it started to change into a remote type of work based
on the economic conditions.
4. Top 10 In-Demand Skills

Figure 24: Top 10 In-Demand Skills (Source: Group 3)

Based on the figure 24, it was evident that IT skills were in the highest demand,
surpassing 14,000, while other skills such as Sales, Management, Manufacturing,
English, Fintech, Business Development, Accounting, and various others were also
being sought after, ranging quite uniformly from 8,000 to just under 12,000. These
figures indicated that skills were being divided into two main categories: one
focusing on technical expertise and the other encompassing skills necessary for roles
in economics and management. This illustrated a clear trend in job separation, from

30
BUSINESS INTELLIGENCE GROUP 3

which candidates in the current job market should have discerned the specific skills
required and the corresponding job positions in order to seize the opportunities
effectively.
5. Top 10 Employee Benefits Offered by Companies

Figure 25: Top 10 Employee Benefits Offered by Companies (Source: Group 3)

Based on the figure 25, it showed the top 10 employee benefits offered by
companies. 401(k) was the most offered benefit, accounting for more than 17,500,
significantly higher than other benefits.
A 401(k) is a retirement savings plan in which employees can contribute a portion
of their salary to a retirement account. Companies often contributed a portion of the

31
BUSINESS INTELLIGENCE GROUP 3

money to this account. This showed that companies are interested in helping
employees save for retirement and ensure financial security for the future. In
addition, health insurance (including types such as Medical, Vision, Disability
Insurance) was also provided quite often (over 5000 to nearly 7000). This showed
concern for the health of workers, helping them feel secure in protecting their health.
Other benefits such as Dental Insurance, Tuition Assistance, transportation benefits
and maternity/paternity benefits were also prioritized by companies and are efforts
to provide to their employees.
From the bar chart above, we can see that as per today's modern trend, companies
are focusing on individual retirement savings plans like 401(k). Companies are also
particularly interested in providing health and financial benefits to their employees.
This can affect employee satisfaction and long-term performance.

32
BUSINESS INTELLIGENCE GROUP 3

6. Distribution of Work Types Offered by Jobs

Figure 26: Distribution of Work Types Offered by jobs (Source: Group 3)

Based on the figure 26, the bar chart shows the distribution of work types offered
by jobs. In linkedin, the work type included full-time, contract, part-time,
temporary, internship, volunteer and other. Vividly, the need of employees who can
work full-time accounted for the largest proportion with over 50000 job
requirements. Besides, there are also fewer part-time and contract jobs with about
10,000 postings. The figure showed the need for an internship in the emergency
situation when it had around 100 times of posting. It can lead to the amount of
graduated students who do not have jobs and the unemployment rate would increase.
It makes it more difficult for them to enter the markets.

33
BUSINESS INTELLIGENCE GROUP 3

7. Distribution of Company Size

Figure 27: Distribution of Company Sizes (Source: Group 3)

Based on the figure 27, the bar chart showed the distribution of company sizes. We
define the company size based on the number of employees, annual revenue, number
of customers, or number of offices/branches. In linkedin, the most companies
posting there are the largest with more than 25000 companies. Other types of
companies accounted for around 5000-13000 companies. The smallest companies
were around nearly 3000 companies. It can be understood that employees can be
trusted when finding a job in linkedin. Moreover, it provided the chance for them to
find a job easier.

34
BUSINESS INTELLIGENCE GROUP 3

III. Data Transformation


1. The number of applications for job positions

Figure 28: The number of applications for job positions (Source: Group 3)

Based on the figure 28, the chart depicted the number of applications for job
positions. It was evident that there was a high density of applications (depicted by
dark blue color) concentrated within the range of 0 to 2 applications per position.
This suggested intense competition for job openings, particularly those that did not
require significant experience or specialized skills. Additionally, there were a few
vacancies with a higher than average number of applications. The standard deviation
indicated high variability in the data, implying fluctuations in the number of
applications across different job positions. Furthermore, the skewness of the data
indicated a slight leftward skew, suggesting that there were many vacancies with
below-average applications. Overall, this analysis provided insights into the

35
BUSINESS INTELLIGENCE GROUP 3

competitive nature of the job market, especially in relation to the distribution of job
applications across various positions.
2. The number of views for job posting

Figure 29: The number of views for job posting (Source: Group 3)

Based on the figure 29, the density distribution chart depicted the number of job
posting views. It was observed that there was an uneven distribution of view counts,
with the highest density concentrated around the range of 2-4 views. A distinct
"peak" was evident around fewer views (with "views_log" close to 0), followed by
a decrease in density as the view count increased. This pattern might suggest that
some job postings lacked quality or failed to capture the attention of job seekers
effectively.

36
BUSINESS INTELLIGENCE GROUP 3

Furthermore, as the view count increased, the density tended to decrease, indicating
fewer job postings with high view counts. This trend suggested that only a select
few job postings, which were likely meticulously crafted in terms of content and
presentation, were able to attract significant viewership.
Overall, this chart could have helped recruitment managers better understand the
performance of job postings in the past and focus on improving the appeal of
postings with fewer views. By identifying patterns and trends in job posting views,
recruitment strategies could be adjusted to enhance the attractiveness of job postings
and increase their visibility to potential candidates.

37
BUSINESS INTELLIGENCE GROUP 3

IV. EDA Bivariate Analysis


1. Top 5 Most Required Skills by Salary Level

Figure 30: Top 5 Most Required Skills by Salary Level (Source: Group 3)

Based on the figure 30, it was observed that the IT and MNFC skill groups exhibited
relatively consistent salary distributions across various levels. This suggested that

38
BUSINESS INTELLIGENCE GROUP 3

these two skills were consistently in high demand across all salary tiers. Conversely,
the MGMT, HCPR, and OTHR skills showed similar structures, particularly HCPR
and OTHR, indicating that at higher salary levels, the focus tended to be on technical
skills rather than soft skills and other related competencies. Similarly, this trend
persisted in average salary levels. Only at the low salary level was there a high
demand for IT skills, while other skills were evenly distributed without significant
discrepancies.
2. Average Salaries Offered by Top Employers

Figure 31: Average Salaries Offered by Top Employers (Source: Group 3)

Based on the figure 31, the bar chart illustrates the average salary of each top
employer, arranged in descending order from the highest minimum average salary
to the lowest.

39
BUSINESS INTELLIGENCE GROUP 3

The chart shows the variation in the salary range between the minimum and
maximum salary for each different company. Imagen Technologies has the highest
minimum salary among the listed companies (500,000), while The Dedham Group
has the highest maximum salary (nearly 1,000,000) and also has a significant
difference between the minimum and maximum salary (Min salary = ½ Max salary).
The average salaries of the top employers are relatively high, and there is a large
difference between the minimum and maximum salaries of these employers. This
indicates that top employers are willing to offer high salaries to employees, and the
salary disparity creates high competition in the job market. Through the chart, we
can also observe an increasing demand for labor in high-tech, engineering, medical,
etc., sectors due to attractive remuneration.
Conclusion: The chart provides information on the average salary of the top
employers, helping viewers easily compare the salaries of different employers and
make informed decisions when searching for jobs.

40
BUSINESS INTELLIGENCE GROUP 3

V. EDA Multivariate Analysis


1. Top 10 View and Applies Title for jobs

Figure 32: Top 10 Views and Applies Title for jobs (Source: Group 3)

Based on the figure 32, the chart displays the number of views and applications for
the top 10 job positions. Each job position is represented by a pair of columns in
blue (views) and orange (applications). "Customer Success Manager" stands out in
terms of views with nearly 6000, whereas "Junior Software Engineer" has the
highest number of applications at around 1700. For "Executive Assistant", despite
having a relatively high view count, there are almost no applications.
From the chart, it can be observed that the number of views does not correlate with
the number of applications. Many job positions have high view counts but very few
applications. Additionally, it can be noted that some high-level management

41
BUSINESS INTELLIGENCE GROUP 3

positions like "Executive Assistant, Vice President, CEO/President" have a


significant number of views but very few applications.
2. Number of Views and Applies for Top 10 Companies

Figure 33: The Number of Views and Applies for Top 10 companies (Source: Group 3)

Based on the figure, this was a combined chart, with columns representing the
number of applications and a line representing the number of views for the top 10
job positions at various companies. Insight Global stood out as the company with
the most prominent results, with nearly 17,500 views and over 3,500 applications.
Google followed closely behind with approximately 12,000 views, which was
nearly 8 times the number of people applying to this company. The companies with
the lowest performance in the Top 10 were, in order, Aya Healthcare, CareerStaff
Unlimited, Fusion Medical Staffing, and Vivian Health.

42
BUSINESS INTELLIGENCE GROUP 3

It was observed that most of the companies in the Top 10 with the highest views and
applications were job recruitment agencies, while the companies with the lowest
results on this list were those related to providing healthcare and medical staff.
3. Heatmap

Figure 34: Heatmap (Source: Group 3)

Based on the figure 34, the Heat map displayed the relationship between different
variables, including remote allowed, sponsored, max salary, med salary, min salary,
company size, company employee count, and company follower count. It was
observed that Remote Allowed had a negative correlation with Sponsored (-0.086)
and Company Size (-0.18), and there was a strong positive correlation between
Company Employee Count and Company Follower Count (0.8). Furthermore, Max
Salary, Med Salary, and Min Salary had very low correlations with other variables
except themselves (1).
From the prominent correlation values above, the following observations and key
points could be made: Remote jobs were less likely to be sponsored or belong to
larger companies. Larger companies with more employees tended to have more
followers on social media. Additionally, the salary levels were not strongly

43
BUSINESS INTELLIGENCE GROUP 3

dependent on whether the job was remote, sponsored, company size, or company
popularity.

44
BUSINESS INTELLIGENCE GROUP 3

CHAPTER 6: CONCLUSION AND RECOMMENDATION


I. CONCLUSION
This Exploratory Data Analysis (EDA) provided valuable insights into the job
market landscape based on LinkedIn job posting data from 2023. Here’s a summary
of our key findings:
High Demand Skills and Industries: Sales, Management, IT emerged as the most
in-demand skills. Industries like technology, healthcare, and manufacturing
exhibited a significant demand for skilled workers. Moreover, these skills such as
Sales, Management, and Manufacturing skills are in high demand across various
industries. Soft skills like Communication and Teamwork are also highly valued.
Salary Trends: The analysis revealed a correlation between certain skills and salary
levels. Companies prioritize offering 401(k) retirement plans and health insurance
benefits. Average salaries vary depending on the company and skillset.
Job Market Competition: The high number of applications for some positions
signifies a competitive job market for both employers and job seekers. Insight
Global leads the pack with the most job postings, followed by companies like
Google and Robert Half. Retail Sales Associate is the most common job title,
followed by positions in customer service, accounting, and finance.
Work Type Distribution: The data suggests a growing trend of remote work
opportunities becoming more prevalent. Full-time positions dominate the market,
followed by part-time and contract jobs. Internship opportunities appear limited.

II. RECOMMENDATION
Enhancing Skill Development:
Given the increasing demand in the technology and manufacturing sectors,
individuals should focus on updating and improving their technical skills. Engaging
in online courses and training programs can significantly enhance their
competitiveness in the job market.

45
BUSINESS INTELLIGENCE GROUP 3

Optimizing Job Descriptions:


Employers should improve job descriptions and requirements to attract high-quality
candidates. Ensuring that job postings address crucial factors such as career
advancement opportunities, work environment, and company benefits is essential in
attracting top talent.
Creating Opportunities for Internships and Other Positions:
Despite potential limitations in internship and entry-level positions, companies
should still create opportunities for students and recent graduates to gain practical
work experience and develop skills. This can be achieved through internship
programs and project-based assignments.
Prioritizing Employee Development:
Companies need to invest in the development of their existing workforce by
providing training programs and advancement opportunities. This not only helps
retain top talent but also attracts new recruits with potential and capability.

46
BUSINESS INTELLIGENCE GROUP 3

ASSIGNMENT SHEET

No. Name Task Accomplishment


1 Vo Duy Tung (Leader) - Define the objectives and scope of the
report.
100%
- Support and monitor work progress.
- Unify results and edit reports.
2 Nguyen Thi Bich - Collect data from Kaggle source,
Ngoc LinkedIn Job Postings 2023.
- Data visualization. 100%
- Exploratory Data Analysis (EDA).
- Editing powerpoint.
3 Vo Nguyen Khanh - Exploratory Data Analysis (EDA).
Linh - Answer research questions.
100%
- Write conclusions and suggestions.
- Editing report.
4 Le Hoang Nam - Univariate and two-variable analysis.
100%
- Prepare data for analysis.
5 Vo Trung Thanh - Multivariate analysis.
100%
- Prepare data for analysis.
Table 1: Assingment sheet (Source: Group 3)

1
BUSINESS INTELLIGENCE GROUP 3

REFERENCES
Mahadevan, M. (2024, January 10). Analytics Vidhya. From
https://www.analyticsvidhya.com/blog/2022/07/step-by-step-exploratory-data-
analysis-eda-using-python/
ALI, A. (2024). Kaggle. From
https://www.kaggle.com/code/ahsanali429/dsreport/notebook#DS-Report-(Data-
analysis-of-job-postings-data-using-linkedin-social-networking-website)
GOYAL, P. (2023). Kaggle. From https://www.kaggle.com/code/pratul007/decoding-the-
job-market-an-in-depth-exploration
TAMRAPARANI, D. (2023). Kaggle. From
https://www.kaggle.com/code/deepatamraparani/getting-started-basic-analysis
FINDLEY, E. (2023). Kaggle. From
https://www.kaggle.com/code/enricofindley/linkedin-job-postings-2023-data-
analysis#EDA-(Exploratory-Data-Analysis)
DIMA806. (2023). Kaggle. From https://www.kaggle.com/code/dima806/linkedin-
postings-salary-autoviz-catboost-shap#Explanations-with-SHAP-values

You might also like