Group 3
Group 3
PROJECT
PROJECT
5. Info of data............................................................................................................... 8
i
1. Identifying and Displaying Attribute Columns with Missing Values ................... 18
ii
V. EDA Multivariate Analysis .................................................................................... 41
3. Heatmap ................................................................................................................. 43
I. CONCLUSION......................................................................................................... 45
REFERENCES .................................................................................................................. 2
iii
FIGURE
iv
Figure 27: Distribution of Company Sizes (Source: Group 3) .......................................... 34
Figure 28: The number of applications for job positions (Source: Group 3) .................... 35
Figure 29: The number of views for job posting (Source: Group 3) ................................. 36
Figure 30: Top 5 Most Required Skills by Salary Level (Source: Group 3) .................... 38
Figure 31: Average Salaries Offered by Top Employers (Source: Group 3) .................... 39
Figure 32: Top 10 Views and Applies Title for jobs (Source: Group 3)........................... 41
Figure 33: The Number of Views and Applies for Top 10 companies (Source: Group 3)42
Figure 34: Heatmap (Source: Group 3) ............................................................................. 43
v
TABLE
Table 1: Assingment sheet (Source: Group 3) .................................................................... 1
vi
BUSINESS INTELLIGENCE GROUP 3
PREFACE
The year 2023 marks an important stage in the global economy's recovery journey after the
fluctuations and challenges of the previous year. With the spread of stability and hope for
the future, the job market has also become more vibrant and diverse than ever. Trends and
changes in the economy, technology, and society have strongly impacted the way
employers find and select talent, as well as the way workers find and build careers.
Data has become an extremely valuable resource, and the ability to leverage it effectively
has become one of the determining factors in the success of every organization. To be able
to analyze data Trend of the Job Market in 2023 we used Business Intelligence (BI) a
powerful tool, changing the way we perceive and manage information. Business
Intelligence isn't just about collecting data, it's also about turning that data into valuable
information. This includes using technology and analytical processes to create an overview
and deeper into the organization's operations. From numbers and data from many different
sources such as social networks and connected devices, BI helps us understand how the
organization operates, discovering many other trends and patterns. From there, strategic
decisions can be made that are suitable for projects of individuals, organizations and
businesses.
Our group studying in class code A01E would like to sincerely thank Mr. Tran Thanh
Cong, lecturer of "Business Intelligence" for supporting us in implementing our ideas and
helping us step by step complete this project. We would like to express our deep gratitude
to Mr. Tran Thanh Cong for accompanying us throughout the semester, providing us with
a lot of useful knowledge related to the research process of this project. During the process
of preparing the report, due to the knowledge, qualifications, and practical experience of
all members being low, time to research the topic was limited, so the topic was bound to
make mistakes.
Therefore, we look forward to receiving your comments and suggestions so that we can
learn from experience and acquire more knowledge to improve our experience in future
topics. Once again, our group sincerely thank you.
1
BUSINESS INTELLIGENCE GROUP 3
CHAPTER 1: INTRODUCTION
I. Overview about Google Colab
Google Colab, short for Google Colaboratory, is a cloud-based platform provided
by Google that provides free access to the Jupyter notebook environment. It allows
users to write and execute Python code in a web browser, collaborate with others,
and take advantage of Google's powerful hardware infrastructure without requiring
any setup on the user's local machine. In this project we will use google colab for
analysis. With google colab, it helps us work easier and more efficiently.
2
BUSINESS INTELLIGENCE GROUP 3
3
BUSINESS INTELLIGENCE GROUP 3
same time, this analysis also helps identify the necessary skills and knowledge to
advance further in our career.
4
BUSINESS INTELLIGENCE GROUP 3
5
BUSINESS INTELLIGENCE GROUP 3
2. Read data
To read the data package, we utilized code. We are able to access the data frame,
which gives us a complete picture of the data.
6
BUSINESS INTELLIGENCE GROUP 3
3. Head of data
To better understand our dataframe, we use code to display the first three data. This
data shows overview information of the job recruitment. This provides the ability to
see comprehensive data columns and understand the type of data, which can be
helpful in the predicted analysis.
7
BUSINESS INTELLIGENCE GROUP 3
4. Tail of data
Similarly, this image shows the last three data of the packet. It refers to the most
recent observations and helps us to better understand the dataset.
5. Info of data
Info function helps us to know the formats and number of not-null observations of
each field in the dataframe. We divide into two main fields for the job posting to
make it easier to analyze.
• Job details: it include all of information about the job which people primary
look up:
8
BUSINESS INTELLIGENCE GROUP 3
o Job_description: all details about the job which is hired looking for like
requirements, role and responsibilities, skills required.
o Formatted_work_type: the time people will work, like full-time, part-time,
internship, contract, temporary.
o Job_location: the place people work, maybe in the company office.
o Remote_allowed: the job allowed to work remote or not. If it allows, it will
be shown “-1” and if it does not allow, it will be shown “1”.
o Job_application_type: the type of application hired required, such as onsite
apply, complex offsite apply, simple offsite apply.
o Formatted_experience_level: depending on experience required of each job
description, it would show the level, such as internship, entry level, mid-
senior level, associate, director.
o Skills_desc: it shows fully detailed job skills required.
o Sponsored: to refer to job positions sponsored by certain companies. where
“0” is not sponsored and “1” is sponsored.
o Job_benefits: it will display all information of the insurance which is
provided for employees working in this company and this job, such as 401K
(retirement saving), medical insurance,...
o Job_required_skill_code: for each job skill required, it will have the code
to identify each other.
o Max_salary: the maximum salary the hired can offer.
o Min_salary: the minimum salary the hired can offer.
o Med_salary: the average salary the hired can offer.
o Pay_period: the time people get the salary, like monthly.
o Currency: depending on the location of the company, the salary will be paid
based on the currency of this country.
o Compensation_type: the amount of money employees would receive when
having any condition that needs compensation.
9
BUSINESS INTELLIGENCE GROUP 3
• Company details: it include all of information about the company which make
employees in advance:
o Industry_name: the industry name
o Company_name: the company name
o Company_description: the details of the company where it helps employees
have more information about the company.
o Company_size: calculation the number of employees, annual revenue,
number of customers, or number of offices/branches in the company.
o Company_state: the location of the company.
o Company_country: the location of the company.
o Company_city: the location of the company.
o Company_employee_count: the number of employees in the company.
o Company_follower_count: the number of followers of the company in
Linkedin.
o Company_speciality: shows the company's industry which they focus on
working, such as cloud computing, child music lessons,...
o Company_industry: shows the company’s industry but it would be larger,
such as technology, healthy care,...
o Company_size_label: A company's brand name or brand image identifies
the company.
10
BUSINESS INTELLIGENCE GROUP 3
11
BUSINESS INTELLIGENCE GROUP 3
6. Statistics summary
The table shows that there are 3,757,940,104 job_id in the dataset while it has
101,174,062 company_id.
The maximum of the maximum salary is $1,300,000 and the minimum of the
maximum salary is $10. The maximum of the minimum salary is $800,000 and the
minimum of the minimum salary is $7.25.
The average views of each job description is around 49,2 views.
While the average job is 3,723,714,700; the average applies to only nearly 10
applications.
The average company employs around 22129 people. The company follower is
more 836,919 followers.
12
BUSINESS INTELLIGENCE GROUP 3
13
BUSINESS INTELLIGENCE GROUP 3
The "Missing Value Calculation" step in the image calculates the proportion of
total missing values for the variables in the job posting dataset. The numbers in the
right column indicate the ratio of missing values to the total number of values for
each corresponding variable.
14
BUSINESS INTELLIGENCE GROUP 3
15
BUSINESS INTELLIGENCE GROUP 3
• Merge the two data tables (job_postings) and (benefits) into a single table
(merged_jobs) using “job_id” as the key.
• Merge the two data tables (companies) and (employee_counts) into a single
table (merged_companies) using “company_id” as the key.
• Merge the two tables (merged_jobs) and (merged_companies) into the main
data table (data) using “company_id” as the key.
• Print the main data table (data) after merging and examine the results
16
BUSINESS INTELLIGENCE GROUP 3
• Create a variable columns_to_drop including columns that are not valuable for
the analysis process: pay_period , original_listed_time , expiry , closed_time ,
listed_time , currency , compensation_type , job_posting_url , application_url ,
inferred , zip_code , address , url , time_recorded.
• Use the drop(columns)command to remove unnecessary data columns from the
main dataset.
Purpose of Data Reduction: Streamline the dataset by eliminating data that is not
valuable for the analysis process, facilitating the creation of accurate and visually
appealing charts for analysis.
17
BUSINESS INTELLIGENCE GROUP 3
Create the missing_data variable and use the isnull().sum() command to identify and
sum missing values in the data columns of the main data table (data).
Displays the data column name and the number of missing value cells in that
column, sorted by decreasing number of columns with missing values.
18
BUSINESS INTELLIGENCE GROUP 3
Use the fillna() structure to fill in “Not Specified” columns with missing values
For numerical columns, we fill in the number 0.
Fill in “Unknown” in the missing data cell of the remote_allowed column using the
fillna() command.
Create the variable remaining_missing and use the isnull().sum() command to
identify and sum empty values in the data columns of the main data table (data).
Displays the data column name and the number of missing values cells in that
column, sorted by decreasing number of columns with missing values.
19
BUSINESS INTELLIGENCE GROUP 3
Use the fillna() structure to fill "Not Specified" into missing values in the
description_x column.
Delete duplicate rows in the main data table using the drop_duplicates() command.
4. Renaming columns in data frame
Use the rename() command to rename the data columns in the data table to make
them clearer and easier to understand. We rename the columns:
• 'name' is changed to 'company_name'
20
BUSINESS INTELLIGENCE GROUP 3
21
BUSINESS INTELLIGENCE GROUP 3
5. Check Null
Continue to check the columns in the main data table to see if there are any missing
values.
The checking results show that the columns company_id , max_salary , min_salary
and med_salary still have missing values.
22
BUSINESS INTELLIGENCE GROUP 3
Use the fillna() command and the mode() function to replace Null values in the
company_id column with the most frequently occurring values.
Use the fillna()command and the median()function to replace Null values in the
max_salary, min_salary and med_salary columns with the median values in those
respective columns.
23
BUSINESS INTELLIGENCE GROUP 3
Use the isnull().sum() command to check if there are any missing values remaining
in the data table.
The check results show that the final data table has been completely cleaned, with
no remaining missing values, and can be used for analysis.
24
BUSINESS INTELLIGENCE GROUP 3
25
BUSINESS INTELLIGENCE GROUP 3
26
BUSINESS INTELLIGENCE GROUP 3
Based on the figure 21, it can be observed that sectors related to healthcare,
technology, and human resources were the sectors with the most abundant
workforce, exceeding 2500 positions, with the highest being over 3500. Following
these sectors were business and finance-related sectors, with numbers around 2000.
In summary, based on the table above, it is evident that the technology sector has
the most job positions distributed from the highest to the average range. Based on
this, candidates can prepare the skills, experience, and knowledge required to
develop a suitable job application strategy.
27
BUSINESS INTELLIGENCE GROUP 3
Figure 22: Top 10 Companies ưith Most job Postings (Source: Group 3)
Based on the figure 22, the column chart showing the Top 10 companies with Most
job Postings, it can be observed that the number of companies posting job openings
on LinkedIn was gradually increasing, indicating a gradual vibrancy in the job
market. Insight Global emerged as the company with the highest number of
vacancies, accounting for 20.4% of the total positions. With more than 1750 job
postings, Insight Global surpassed Google by three times (with more than 500 job
postings). The presence of companies from various fields, ranging from human
resources consulting services such as Insight Global and Robert Half to large
technology companies like Google, was notable. This indicated that job
opportunities were not solely concentrated in one specific industry. The growth of
technical positions as well as other roles such as marketing, project management,
28
BUSINESS INTELLIGENCE GROUP 3
and finance were all in demand and contributed to the expansion of the job market.
This analysis, based on the bar graph, provided an overview of the diversity and
dynamism present in the job market at that time.
3. Top 10 most common job titles
Based on the figure 23, the bar chart showed the top 10 most common job titles.
This figure displayed the frequency requirement of these jobs. The most job titles
required was the Retail Sales Associate with more than 250 postings on Linkedin.
And, the least common was block advisor - remote Tax Professional with neary 150
postings. The other jobs fluctuated between 150-250 postings. In the top 10 common
job titles, it has 3 jobs related to sales. We can define that, in 2023, the market needs
a number of sellers and customer service which helps to connect their products to
the customers. Beside, the need of an accountant also accounted for the amount of
29
BUSINESS INTELLIGENCE GROUP 3
job requirements, especially people who have strong experience like seniors or
managers. The least job frequency required was Block Advisor - Remote Tax
Professional. It can be understood that companies prefer working at their company
more than working at home, but it started to change into a remote type of work based
on the economic conditions.
4. Top 10 In-Demand Skills
Based on the figure 24, it was evident that IT skills were in the highest demand,
surpassing 14,000, while other skills such as Sales, Management, Manufacturing,
English, Fintech, Business Development, Accounting, and various others were also
being sought after, ranging quite uniformly from 8,000 to just under 12,000. These
figures indicated that skills were being divided into two main categories: one
focusing on technical expertise and the other encompassing skills necessary for roles
in economics and management. This illustrated a clear trend in job separation, from
30
BUSINESS INTELLIGENCE GROUP 3
which candidates in the current job market should have discerned the specific skills
required and the corresponding job positions in order to seize the opportunities
effectively.
5. Top 10 Employee Benefits Offered by Companies
Based on the figure 25, it showed the top 10 employee benefits offered by
companies. 401(k) was the most offered benefit, accounting for more than 17,500,
significantly higher than other benefits.
A 401(k) is a retirement savings plan in which employees can contribute a portion
of their salary to a retirement account. Companies often contributed a portion of the
31
BUSINESS INTELLIGENCE GROUP 3
money to this account. This showed that companies are interested in helping
employees save for retirement and ensure financial security for the future. In
addition, health insurance (including types such as Medical, Vision, Disability
Insurance) was also provided quite often (over 5000 to nearly 7000). This showed
concern for the health of workers, helping them feel secure in protecting their health.
Other benefits such as Dental Insurance, Tuition Assistance, transportation benefits
and maternity/paternity benefits were also prioritized by companies and are efforts
to provide to their employees.
From the bar chart above, we can see that as per today's modern trend, companies
are focusing on individual retirement savings plans like 401(k). Companies are also
particularly interested in providing health and financial benefits to their employees.
This can affect employee satisfaction and long-term performance.
32
BUSINESS INTELLIGENCE GROUP 3
Based on the figure 26, the bar chart shows the distribution of work types offered
by jobs. In linkedin, the work type included full-time, contract, part-time,
temporary, internship, volunteer and other. Vividly, the need of employees who can
work full-time accounted for the largest proportion with over 50000 job
requirements. Besides, there are also fewer part-time and contract jobs with about
10,000 postings. The figure showed the need for an internship in the emergency
situation when it had around 100 times of posting. It can lead to the amount of
graduated students who do not have jobs and the unemployment rate would increase.
It makes it more difficult for them to enter the markets.
33
BUSINESS INTELLIGENCE GROUP 3
Based on the figure 27, the bar chart showed the distribution of company sizes. We
define the company size based on the number of employees, annual revenue, number
of customers, or number of offices/branches. In linkedin, the most companies
posting there are the largest with more than 25000 companies. Other types of
companies accounted for around 5000-13000 companies. The smallest companies
were around nearly 3000 companies. It can be understood that employees can be
trusted when finding a job in linkedin. Moreover, it provided the chance for them to
find a job easier.
34
BUSINESS INTELLIGENCE GROUP 3
Figure 28: The number of applications for job positions (Source: Group 3)
Based on the figure 28, the chart depicted the number of applications for job
positions. It was evident that there was a high density of applications (depicted by
dark blue color) concentrated within the range of 0 to 2 applications per position.
This suggested intense competition for job openings, particularly those that did not
require significant experience or specialized skills. Additionally, there were a few
vacancies with a higher than average number of applications. The standard deviation
indicated high variability in the data, implying fluctuations in the number of
applications across different job positions. Furthermore, the skewness of the data
indicated a slight leftward skew, suggesting that there were many vacancies with
below-average applications. Overall, this analysis provided insights into the
35
BUSINESS INTELLIGENCE GROUP 3
competitive nature of the job market, especially in relation to the distribution of job
applications across various positions.
2. The number of views for job posting
Figure 29: The number of views for job posting (Source: Group 3)
Based on the figure 29, the density distribution chart depicted the number of job
posting views. It was observed that there was an uneven distribution of view counts,
with the highest density concentrated around the range of 2-4 views. A distinct
"peak" was evident around fewer views (with "views_log" close to 0), followed by
a decrease in density as the view count increased. This pattern might suggest that
some job postings lacked quality or failed to capture the attention of job seekers
effectively.
36
BUSINESS INTELLIGENCE GROUP 3
Furthermore, as the view count increased, the density tended to decrease, indicating
fewer job postings with high view counts. This trend suggested that only a select
few job postings, which were likely meticulously crafted in terms of content and
presentation, were able to attract significant viewership.
Overall, this chart could have helped recruitment managers better understand the
performance of job postings in the past and focus on improving the appeal of
postings with fewer views. By identifying patterns and trends in job posting views,
recruitment strategies could be adjusted to enhance the attractiveness of job postings
and increase their visibility to potential candidates.
37
BUSINESS INTELLIGENCE GROUP 3
Figure 30: Top 5 Most Required Skills by Salary Level (Source: Group 3)
Based on the figure 30, it was observed that the IT and MNFC skill groups exhibited
relatively consistent salary distributions across various levels. This suggested that
38
BUSINESS INTELLIGENCE GROUP 3
these two skills were consistently in high demand across all salary tiers. Conversely,
the MGMT, HCPR, and OTHR skills showed similar structures, particularly HCPR
and OTHR, indicating that at higher salary levels, the focus tended to be on technical
skills rather than soft skills and other related competencies. Similarly, this trend
persisted in average salary levels. Only at the low salary level was there a high
demand for IT skills, while other skills were evenly distributed without significant
discrepancies.
2. Average Salaries Offered by Top Employers
Based on the figure 31, the bar chart illustrates the average salary of each top
employer, arranged in descending order from the highest minimum average salary
to the lowest.
39
BUSINESS INTELLIGENCE GROUP 3
The chart shows the variation in the salary range between the minimum and
maximum salary for each different company. Imagen Technologies has the highest
minimum salary among the listed companies (500,000), while The Dedham Group
has the highest maximum salary (nearly 1,000,000) and also has a significant
difference between the minimum and maximum salary (Min salary = ½ Max salary).
The average salaries of the top employers are relatively high, and there is a large
difference between the minimum and maximum salaries of these employers. This
indicates that top employers are willing to offer high salaries to employees, and the
salary disparity creates high competition in the job market. Through the chart, we
can also observe an increasing demand for labor in high-tech, engineering, medical,
etc., sectors due to attractive remuneration.
Conclusion: The chart provides information on the average salary of the top
employers, helping viewers easily compare the salaries of different employers and
make informed decisions when searching for jobs.
40
BUSINESS INTELLIGENCE GROUP 3
Figure 32: Top 10 Views and Applies Title for jobs (Source: Group 3)
Based on the figure 32, the chart displays the number of views and applications for
the top 10 job positions. Each job position is represented by a pair of columns in
blue (views) and orange (applications). "Customer Success Manager" stands out in
terms of views with nearly 6000, whereas "Junior Software Engineer" has the
highest number of applications at around 1700. For "Executive Assistant", despite
having a relatively high view count, there are almost no applications.
From the chart, it can be observed that the number of views does not correlate with
the number of applications. Many job positions have high view counts but very few
applications. Additionally, it can be noted that some high-level management
41
BUSINESS INTELLIGENCE GROUP 3
Figure 33: The Number of Views and Applies for Top 10 companies (Source: Group 3)
Based on the figure, this was a combined chart, with columns representing the
number of applications and a line representing the number of views for the top 10
job positions at various companies. Insight Global stood out as the company with
the most prominent results, with nearly 17,500 views and over 3,500 applications.
Google followed closely behind with approximately 12,000 views, which was
nearly 8 times the number of people applying to this company. The companies with
the lowest performance in the Top 10 were, in order, Aya Healthcare, CareerStaff
Unlimited, Fusion Medical Staffing, and Vivian Health.
42
BUSINESS INTELLIGENCE GROUP 3
It was observed that most of the companies in the Top 10 with the highest views and
applications were job recruitment agencies, while the companies with the lowest
results on this list were those related to providing healthcare and medical staff.
3. Heatmap
Based on the figure 34, the Heat map displayed the relationship between different
variables, including remote allowed, sponsored, max salary, med salary, min salary,
company size, company employee count, and company follower count. It was
observed that Remote Allowed had a negative correlation with Sponsored (-0.086)
and Company Size (-0.18), and there was a strong positive correlation between
Company Employee Count and Company Follower Count (0.8). Furthermore, Max
Salary, Med Salary, and Min Salary had very low correlations with other variables
except themselves (1).
From the prominent correlation values above, the following observations and key
points could be made: Remote jobs were less likely to be sponsored or belong to
larger companies. Larger companies with more employees tended to have more
followers on social media. Additionally, the salary levels were not strongly
43
BUSINESS INTELLIGENCE GROUP 3
dependent on whether the job was remote, sponsored, company size, or company
popularity.
44
BUSINESS INTELLIGENCE GROUP 3
II. RECOMMENDATION
Enhancing Skill Development:
Given the increasing demand in the technology and manufacturing sectors,
individuals should focus on updating and improving their technical skills. Engaging
in online courses and training programs can significantly enhance their
competitiveness in the job market.
45
BUSINESS INTELLIGENCE GROUP 3
46
BUSINESS INTELLIGENCE GROUP 3
ASSIGNMENT SHEET
1
BUSINESS INTELLIGENCE GROUP 3
REFERENCES
Mahadevan, M. (2024, January 10). Analytics Vidhya. From
https://www.analyticsvidhya.com/blog/2022/07/step-by-step-exploratory-data-
analysis-eda-using-python/
ALI, A. (2024). Kaggle. From
https://www.kaggle.com/code/ahsanali429/dsreport/notebook#DS-Report-(Data-
analysis-of-job-postings-data-using-linkedin-social-networking-website)
GOYAL, P. (2023). Kaggle. From https://www.kaggle.com/code/pratul007/decoding-the-
job-market-an-in-depth-exploration
TAMRAPARANI, D. (2023). Kaggle. From
https://www.kaggle.com/code/deepatamraparani/getting-started-basic-analysis
FINDLEY, E. (2023). Kaggle. From
https://www.kaggle.com/code/enricofindley/linkedin-job-postings-2023-data-
analysis#EDA-(Exploratory-Data-Analysis)
DIMA806. (2023). Kaggle. From https://www.kaggle.com/code/dima806/linkedin-
postings-salary-autoviz-catboost-shap#Explanations-with-SHAP-values