4001MHR
Data Skills for Business
”
PART A
Introduction:
Data is the collection of unprocessed facts or opinions which are then processed into
useful information.
DATA
PROCESS
INFORMATION
So, for companies to run smoothly, they should have accurate data as this data is then
processed into required information and then that information is used in decision
making. Thus, data accuracy has its utmost importance. Data accuracy is defined as data
without error that can used as an authentic source of information to make important
decisions of a company.
Data accuracy is the most crucial aspect of data quality. It ensures that company’s
all processes are based on reliable and authentic information which helps in decision
making in all areas such as planning, budgeting, forecasting, also understanding user
needs and market dynamics. If in any case data is incorrect or irrelevant it leads to poor
decision making. According to research by Gartner, poor data quality costs businesses
an average of $9.7 million to $14.2 million annually. Hence, in current scenario where
many companies are using big data which required huge processing, data accuracy
seems an important aspect in a well-being of a business.
Importance of Data Accuracy:
Data accuracy has its importance as now a day all lower level and strategic level
decisions are based on different streams of data. Also, different companies are driving
their marketing campaigns based on available customers data. Facebook, one of the
biggest social media platforms use customer data to target them during different
advertising campaigns on Facebook. This excess usage of data leads to increase demand
of data scientist. In 2020, there was approximately 2.7 million open jobs in data science
and in related careers. Several reasons why data accuracy is important for businesses:
1- Better decision making:
The highest quality data helps to make better decisions on time which leads to
significant profits and reduced costs. This also provides business with competitive
advantage over other businesses as they are one step ahead. One famous story is that
how Walmart used data and the accuracy of data leads to huge surge in sales. That story
is famously known as “Beer and Nappies”. People may get confuse what is the
relationship between beer and nappies. But Walmart saw that with their impressive way
of data analysis and managing data quality. Walmart discovered through their data
management techniques that the sales of diaper and beer were correlated on Friday
nights. It determined that this correlation was based on the fact that working men who
were asked to pick up diapers on their way home from work. On Fridays the men think
that they deserved a six pack of beer for hectic weekdays. Walmart move these items
closer together and saw a massive growth in sales of both items altogether.
2- Improved Marketing:
Accurate data leads to an efficient marketing campaign as it targets specific people who
have shown a particular interest in your business in past and who has any affiliation in
past. This leads to less cost marketing as these customers become loyal and become a
frequent purchaser. Facebook uses data in their marketing drives as they target people
based on their preferences, their gender, age and also on the basis whether they have
searched any keyword relating to that item in recent past, which ultimately leads to
precise marketing. According to Forbes, a report from 2017 states that most marketing
data is 10 to 20 percent accurate. But now business model has changed completely so
does the company’s vision for data driven campaigns. Since inception of big data many
companies have their millions of dollars data warehouse and have special attention of
management in this area. This leads to effective and cost-beneficial marketing
campaigns.
3- Increase productivity and reduced costs:
Accurate data lead staff to focus on other activities instead of fixing bugs in data which
resultantly improves employee’s productivity and leading to cost efficient production.
Also, having inaccurate data may leads to decision which will ultimately cost business
huge sum of money. It can also harm business brand name and may lead to poor
business strategies. In short-term investing on efficient data system seems a big thing
but in long-term it seems a good investment.
4- Understanding Performance:
It can be said that data can help in assessing the performance in any field. Now a days it
has been immensely being used in sports especially football. Different football teams
with their scouts have a team of professional data scientists that help them to analyze
data in a more accurate way. In today’s world different websites are being operating
which provides detailed data about performance of players. This data is then used by
teams to continuously update their performance.
5- Break the deadlock:
Most of the time businesses faced a situation in which understanding the problem
seems like a tough task such a low sale despite heavy marketing. In such situations data
breakdown helps to understand what is the actual problem. At that point if inaccurate
data is available it may lead to poor problem solving as management may not be able
to figure out real problem. So having accurate data will help us to find root cause of the
failed strategy.
6- Understanding User needs:
Having accurate data regarding user needs is a blessing. Because business revenue is
gathered from customers and if business is able to understand their needs and
preferences their revenue will shoot up. Accurate data will provide all relevant
information regarding each user choices this will help us to plan our product, our
distribution channels and also our market strategy.
Improvement in Data Accuracy:
Before taking a business decision, special attention should be given to data accuracy.
There are several ways by which data accuracy can be improved:
1- Gather data from right sources:
One of the best ways of getting data is by gathering it through reliable sources. For
example: if you are getting any reference from BBC and from a local newspaper, the
most reliable source is BBC. Getting data from different websites required special due
diligence as there is excess amount of data and most of it is inaccurate as websites are
being run by non-professionals. Also, management should perform a careful review of
its internal and external sources of information.
2- Make data entry easier:
Most of the time data is authentic from its sources but when it comes to data entry a
single error may leads to inaccurate data. The sole reason of inaccuracy is that due to
huge burden of work it may lead to wrong entry. So, management should provide
training to their employees regarding this matter and should develop a data entry
manual. Also, management should hire more staff to reduce burden of work on staff.
3- Reviewing Data:
Reviewing data is an efficient way of checking its correctness. Companies need to
establish a committee or team which should regularly review sources of data and
authenticity of data collected. As there is excess amount of irregular data which is being
refined so there should be a team that is monitoring all these processes.
4- Using high quality data tools:
Manually solving data accuracy problem is not an easy task and with time many
software’s have been developed and solved nearly all our problem in a spur of moment.
But the most difficult task is getting software that fulfills all our needs and tailored to
our requirement. As I have seen that in Amazon, jungle scout is a software that helps to
gather large amount of accurate data with a one click. Nowadays businesses are using
Cloudingo, IBM InfoSphere QualityStage etc. These software’s help to gather data, clean
data and analyze data in a more useful manner for the betterment of business.
5- Integration of Data:
A business is using different software’s to manage its data. If businesses start to
manually transfer data from one software to other it may lead to wrong data entry and
ultimately to poor decision making. Therefore, now a days business integrates their data
means by a single click all data files are updated. So, integration is important as it
improves data accuracy.
Conclusion:
With the speed the world is evolving the use of data has increased significantly, now
there are numerous sources of data available. Hence, extra care is required in collecting
data. When data is collected it is not in a form to be used simultaneously so it should be
processed to converted into an accurate set of data. Those companies that have quality
set of data has a competitive advantage over its competitors. Now a days companies
use different software’s for collecting and analyzing data. These software’s are expensive
but provide numerous benefits in smooth running of business. For example, it can help
in managing data of production, marketing, launch of a new product, expansion of
business, acquisition of a subsidiary, these all required accurate data for its successful
achievement of objectives. Companies should develop KPIs to accountable staff in act of
management of data.
Part B
1- Calculation
a- Mean
Different age groups have higher average gross income while other have lower gross
income. Mean tells us about an average. In this case it shows average income which is
48651. Mean is calculated in Excel by the help of this formula =AVERAGE(C6:C16). Mean
value of 46851 is close to higher value of 57970 which shows public at large have good
average gross income. Mean of Average spending on leisure is 4531 which is close to
lower of average spending on leisure that is 1300 that shows despite having good
average gross income spending on leisure is not significant.
Age Average Gross Income($) Average Spending on Leisure($)
15-19 21130 1300
20-24 38250 2700
25-29 47950 2300
30-34 51380 4750
35-39 53750 3550
40-44 55650 5600
45-49 57350 8600
50-54 57970 7740
55-59 53000 6300
60-64 51230 4600
65+ 47500 2400
b- Median
Median is the middle number in a sorted, ascending or descending, list of numbers and
can be more illustrative of that data set than average. In this scenario, the middle
number in average gross income is 51380 which is calculated in Excel with the help of
this formula. This median average gross income for the age group from 15 to 65+ is
51380 with half of the lower incomes below that and the higher incomes being above
that amount. For average leisure on spending median is 4600 with half of leisure on
spending below that and higher spending on leisure above that amount.
c- Maximum
The formula of max has been implemented in the provided data that can be supplied to get the
maximum value. The maximum is the highest value in the provided data set. In this case
the maximum average gross income is of people in age group of 50-54 that is 57970.
Also, the maximum average spending on leisure is 8600 that is from the people in age
group of 45-49. The MAX values can be calculated by using the formula: =MAX (F6:F16).
d- Minimum
After getting maximum value from provided data, the next thing is to get smallest value
also known as minimum value or MIN. The MIN function is applied to get the last
numeric value in a provided set of data, as well as the least numeric value in the
provided data. MIN can be calculated using this formula =MIN(C6:C16) in Excel
worksheet. The MIN value of 21130 is the lowest average gross income and also 1300 is
the lowest average spending on leisure.
e- Range
Range is the spread of data from lowest to the highest value in the distribution. It is
commonly used to measure variability. Range is calculated by subtracting highest value
from lowest value. Large range means high variability whereas small range means low
variability. In Excel it can be calculated by subtracting MAXIMUM value from MINIMUM
value. The range of 36840 is a large number and indicates that there is a high variability
in average gross income whereas the range of average spending on leisure is 7300
which also indicates high variability in average spending on leisure.
f- Variance
Variance means variability from the average or mean. In Excel it is calculated by using
the formula =VAR.S(C6:C16). A large variance indicates that numbers in the data set are
far from the mean and far from each other. A small variance indicates that number in
data seta are closer to the mean or average. This variance has an immense important in
the investment world as it tells us about the variability. The more the variability the
riskier it is. In this provided data the variance of average gross income is 113791709 which
is a very high value which indicates data is far away from mean and signals variability. The
variance of average annual spending on leisure is 5545209 which is also a very high value and
indicates data is far away from mean and signals variability.
g- Standard deviation:
Standard deviation also explains dispersion of dataset from its mean or average and is
calculated as square root of the variance. If data points are far away from the mean, there is
higher deviation within the dataset thus there will be more spread of data and ultimately higher
standard deviation. It also determines volatility and is especially used in investments world to
check the risk of investments. The standard deviation of average gross income is 10667 which
seems to be a high value and indicates data is away from mean and shows variability and for
average spending on leisure the standard deviation is 2355 which is a medium value and
indicates and bit far distance from mean shows variability. The formula to calculate standard
deviation in excel is =STDEV.S(F6:F16).
Average Gross Income($) Average Spending on Leisure($)
Mean 48651 4531
Median 51380 4600
Maximum 57970 8600
Minimum 21130 1300
Range 36840 7300
Variance 113791709 5545209
Standard deviation 10667 2355
2. Interpretation of Measurements
a. Strength & Weakness
CEO wants to know an opinion from a marketing manager about provided age of people their
average gross income and their average spending on leisure. We are not provided with details
about the company business whether it is luxurious business then high average spending on
leisure is required and if it is a bread-and-butter business then it has nothing to do with
spending on leisure. And if its product is least expensive still spending on leisure don’t have
much importance. If we look at the highest average gross income it is from the people of age
group of 50-54. Now the question which we need to think about is that who are our targeted
customers for our product. If our targeted public lies in that age group then we can spend and
design our marketing campaign around these people. Also, the highest average spent on leisure
is from people of age of 45-49 and if our product for that age group and it is a blessing in
disguise for us because they are the people with the highest spending. Minimum average gross
income and minimum spending on leisure is from age group of 15-19 as they are young people,
they are not earning so they don’t have the capacity to spend on luxurious stuff. If our product is
for this age group then we have a slightest chance of getting market share as these age group
has the lowest disposable income. Looking at mean of gross income data provided it shows
high income of 48651 which is a good figure and open doors to improve our business and
marketing campaign. Also analyzing mean of average spending on leisure that is 4531 shows
that much amount on an average people spent on leisure stuff.
b- Visualization
The graph below depicts age group and their average gross income. The plot was displayed
using a column chart that show different age groups and their annual gross income. This graph
helps us to easily understand things and with more clarity. As it can be seen that age group from
50-54 are earning the highest and age group from 15-19 earns the lowest.
A g e an d A v e r ag e G r o s s I n c o me
60000
50000
40000
Gross Income
30000
20000
10000
0
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65+
Age
Average Gross Income
The graph below depicts age group and their average gross income. The plot was displayed
using a column chart that show different age groups on x-axis and average gross income on y-
axis. The column chart makes it easy for us to understand it at first glance. Looking at the
column chart we came to conclusion that people with age group from 45-49 has the highest
leisure spending. And young people of age group from 15-19 have the lowest spending on
leisure stuff. This chart provides us with useful information for developing our marketing
strategies.
A g e & A v e r ag e S p e n d i n g o n L e i s ur e
9000
8000
7000
Spending on Leisure
6000
5000
4000
3000
2000
1000
0
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65+
Age
Average Spending on Leisure
This line graph makes a comparison in-between average gross income and average spend on
leisure. You can easily see that an age group with income of 57350 spent 8600 on leisure stuffs.
And one age group having income of 21130 spend 1300 on leisure goods. So, this graph makes
a good comparison and provide marketing manager with a good way of looking at data.
Average Gross Income VS Average spend on Leisure
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
0 2 4 6 8 10 12 14
Average Gross Income Average spend on Leisure
The plot of obtained values for the dataset of average gross income is shown in the figure
below. In this diagram, many values are depicted, with the mean values hovering around 48651.
The maximum value is approximately 57970. The median number is 51380. The lowest possible
value is 21130. The standard deviation is 10667, and the range value is 36840.
Measurements
10667.3196769811; 5%
48650.90909
09091; 21%
Mean
Median
Maximum
36840; 16%
Minimum
Range
21130; 9% Standard deviation
51380; 23%
57970; 26%
3. Correlation Coefficient
The correlation coefficient is considered as a statistical measure of how strong bonding or
relationship exists between two variables' relative movements. The range of values is between -
1.0 to 1.0. There was an error in the correlation measurement if the evaluated number was more
than 1.0 or less than -1.0. A perfect negative correlation can be represented through a
correlation of -1.0, whereas an effective positive correlation can be represented through a
correlation of 1.0. A correlation of 0.0 indicates that there is no linear linkage between the two
variables' movements.
A value of exactly 1.0 indicates that the two variables have a perfect positive association. There is
a positive increase in the second variable for every positive increase in the first. A score of -1.0
indicates that the two variables have a perfect negative relationship. This demonstrates that the
variables move in opposite directions, with a positive increase in one leading to a decrease in
the other. There is no linear relationship between two variables if their correlation is 0
Correlation coefficient helps us to find a relationship between two variables whether
they increase simultaneously or decrease simultaneously or either they don’t have any
relationship. In this below scenario the correlation coefficient is 0.749336 which shows a positive
correlation and it is close to 1 which shows that it has near to perfect correlation which means
with the increase in income there is an increase in spending. To calculate correlation coefficient
we have used that formula in Excel =CORREL(C6:C16,F6:F16).
Age Average Gross Income($) Average Spending on Leisure($)
15-19 21130 1300
20-24 38250 2700
25-29 47950 2300
30-34 51380 4750
35-39 53750 3550
40-44 55650 5600
45-49 57350 8600
50-54 57970 7740
55-59 53000 6300
60-64 51230 4600
65+ 47500 2400
0.74933
Correlation Coefficient 6
4. Trend (Best Fit) Line
Trend line or line of best fit refers to a line through a scatter plot of data points that expresses
the relationship between those points. Line of best fit develops a shows correlation if any exists
between data. When the points are closer together there is a strong correlation between data
and when it is far away then there is no correlation or poor correlation. This graph shows that
there is a strong or healthy relationship between average gross income and average spending
on leisure. This graph shows relationship between data at first instance.
Gr o ss In c o m e VS S p en d i n g o n L ei su r e
70000
60000
AVERAGE GROSS INCOME
50000
40000
30000
20000
10000
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Average Spending on leisure
5- Findings & Recommendation
As we have seen that that there is healthy correlation between gross income and spending on
leisure. So, if there is an increase in incomes due to good economy it will ultimately lead to more
disposable income and leading to more spending and revenue for the companies. Therefore, as
we have look upon different statistical tools, we came to conclusion that company should launch
a product for people in age group of 44-54 as they have the highest income and have the
ultimate highest spending.