Cost Reduction Faster, Better Decision Making New Products and Services
Cost Reduction Faster, Better Decision Making New Products and Services
SERIAL
TITLE PAGE NO
NO
Abstract
2
Introduction
1.1 Cost reduction
1 3
1.2 Faster, better decision making
1.3 New products and services
2 Literature Survey 5
System Specification
3 3.1 Hardware Specification 13
3.2 Software Specification
Project Description
4.1 Introduction
4 4.2 The data mining process 14
4.3 The pillars of artificial intelligence
4.4 Distributed SQL processing
Result and Discussion
5 5.1 Sample Data 37
5.2 Sample Screen Shot
Conclusion and Feature Enhancement
6 6.1 Conclusion 46
6.2 Feature Enhancement
7 References 48
ABSTRACT
1
Recent technology innovations, many of which are based on the capture and analysis
of big data, are transforming the automotive industry in a pace deemed inconceivable just a
short time ago. At the heart of this transformation is the new role of the car itself, and the
increasingly sophisticated abilities that “intelligent cars” possess to communicate with
individuals, enterprises, and devices around them. Company leaders in the automotive industry
clearly recognize that by embracing the concept of big data, they can access a mass of
opportunities for differentiation, growth, and innovation that revolutionize the very core of
existing business models. In order to unlock this potential, the key challenge is to develop and
implement a big data strategy, which is tailored to the capture, analysis, and interpretation of
the ever increasing quantities of structured and unstructured data which will be received from
drivers, vehicles, and other devices. Only those companies which incorporate a big data
strategy in their transformation agendas will be able to reap the rewards offered by the zetta
byte revolution.
1.INTRODUCTION
2
Big data analytics helps organizations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher
profits and happier customers. In his report Big Data in Big Companies, IIA Director of
Research Tom Davenport interviewed more than 50 businesses to understand how they used big
data. He found they got value in the following ways:
Cost reduction: Big data technologies such as Hadoop and cloud-based analytics bring
significant cost advantages when it comes to storing large amounts of data – plus they can
identify more efficient ways of doing business.
Faster, better decision making: With the speed of Hadoop and in-memory analytics, combined
with the ability to analyze new sources of data, businesses are able to analyze information
immediately – and make decisions based on what they’ve learned.
New products and services: With the ability to gauge customer needs and satisfaction through
analytics comes the power to give customers what they want. Davenport points out that with big
data analytics, more companies are creating new products to meet customers’ needs
The automotive industry continues to face a dynamic set of challenges. For those with the
right ambition it represents an exciting time with opportunities to differentiate and stand out from
the crowd. One area that has the opportunity to deliver significant competitive advantages is
analytics.
The concept of big data has been around for years; most organizations now understand
that if they capture. But even in the 1950s, decades before anyone uttered the term “big data,”
businesses were using basic analytics (essentially numbers in a spreadsheet that were manually
examined)touncoverinsightsandtrends.
The new benefits that big data analytics brings to the table, however, are speed and
efficiency. Whereas a few years ago a business would have gathered information, run analytics
and unearthed information that could be used for future decisions, today that business can
3
identify insights for immediate decisions. The ability to work faster – and stay agile – gives
organizations a competitive edge they didn’t have before.
4
2.LITERATURE SURVEY
We examine whether firms that emphasize decision making based on data and business
analytics (“data driven decision making” or DDD) show higher performance. Using detailed
survey data on the business practices and information technology investments of 179 large
publicly traded firms, we find that firms that adopt DDD have output and productivity that is 5-
6% higher than what would be expected given their other investments and information
technology usage. Furthermore, the relationship between DDD and performance also appears in
other performance measures such as asset utilization, return on equity and market value. Using
instrumental variables methods, we find evidence that the effect of DDD onthe productivity
donot appear to be due to reverse causality. Our results provide some of the first large scale data
on the direct connection between data-driven decision making and firm performance.
How do firms make better decisions? In more and more companies, managerial decisions
rely less on a leader’s “gut instinct” and instead on data-based analytics. At the same time, we
have been witnessing a data revolution; firms gather extremely detailed data from and propagate
knowledge to their consumers, suppliers, alliance partners, and competitors. Part of this trend is
due to the widespread diffusion of enterprise information technology such as Enterprise
Resource Planning (ERP), Supply Chain Management (SCM), and Customer Relationship
Management (CRM) systems (Aral et al. 2006; McAfee 2002), which capture and process vast
quantities of data as part of their regular operations.
Increasingly these systems are imbued with analytical capabilities, and these capabilities
are further extended by Business Intelligence (BI)systems that enable a broader array of data
analytic tools to be applied to operational data. Moreover, the opportunities for data collection
outside of operational systems have increased substantially. Mobile phones, vehicles, factory
automation systems, and other devices are routinely instrumented to generate streams of data on
their activities, making possible an emerging field of “reality mining” (Pentland and Pentland
2008). Manufacturers and retailers use RFID tags to track individual items as they pass through
the supply chain, and they use the data they provide optimize and reinvent their business
processes. Similarly, click stream data and keyword searches collected from websites generate a
plethora of data, making customer behavior and customer-firm interactions visible without
having to resort to costly or ad-hoc focus groups or customer behavior studies.
5
Leading-edge firms have moved from passively collecting data to actively conducting
customer experiments to develop and test new products. For instance, Capital One Financial
pioneered a strategy of “test and learn” in the credit card industry where large number of
potential card offers were field-tested using randomized trials to determine customer acceptance
and customer profitability (Clemons and Thatcher 1998). While these trials were quite
expensive, they were driven by the insight that existing data can have limited relevance for
understanding customer behavior in products that do not yet exist; some of the successful trials
created led to products such as “balance transfer cards,” which revolutionized the credit car
industry.
Online firms such as Amazon, eBay, and Google also rely heavily on field experiments as
part of a system of rapid innovation, utilizing the high visibility and high volume of online
customer interaction to validate and improve new product or pricing strategies. Increasingly, the
culture of experimentation has diffused to other information-intensive industries such as retail
financial services (Toronto-Dominion Bank, Wells Fargo, PNC), retail (Food Lion,
Sears,Famous Footwear), and services (CKE Restaurants, Subway) (see Davenport
2009).Information theory (e.g., Blackwell 1953) and the information-processing view of
organizations (e.g. Galbraith 1974) suggest that more precise and accurate information should
facilitate greater use of information in decision making and therefore lead to higher firm
performance. ). However, there is little independent, large sample empirical evidence on the
value or performance implications of adopting these technologies. In this paper, we develop a
measure of the use of “data-driven decision making” (DDD) that captures business practices
surrounding the collection and analysis of external and internal .
Business intelligence and analytics (BI&A) has emerged as an important area of study for
both practitioners and researchers, reflecting the magnitude and impact of data-related problems
to be solved in contemporary business organizations. This introduction to the MIS Quarterly
Special Issue on Business Intelligence Research first provides a framework that identifies the
evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A
3.0 are defined and described in terms of their key characteristics and capabilities. Current
research in BI&A is analyzed and challenges and opportunities associated with BI&A research
6
and education are identified. We also report a bibliometric study of critical BI&A publications,
researchers, and research topics based on more than a decade of related academic and industry
publications. Finally, the six articles that comprise this special issue are introduced and
characterized in terms of the proposed BI&A research framework.
Business intelligence and analytics (BI&A) and the related field of big data analytics
have become increasingly important in both the academic and the business communities over the
past two decades. Industry studies have highlighted this significant development. For example,
based on a survey of over 4,000 information technology (IT) professionals from 93 countries and
25 industries, the IBM Tech Trends Report (2011) identified business analytics as one of the four
major technology trends in the 2010s.
Hal Varian, Chief Economist at Google and emeritus profes- sor at the University of
California, Berkeley, commented on the emerging opportunities for IT professionals and students
in data analysis as follows: The opportunities associated with data and analysis in dif- ferent
organizations have helped generate significant interest in BI&A, which is often referred to as the
techniques, tech- nologies, systems, practices, methodologies, and applications that analyze
critical business data to help an enterprise better understand its business and market and make
timely business decisions. In addition to the underlying data processing and analytical
technologies, BI&A includes business-centric practices and methodologies that can be applied to
various high-impact applications such as e-commerce, market intelli- gence, e-government,
healthcare, and security.
This introduction to the MIS Quarterly Special Issue on Business Intelligence Research
provides an overview of this exciting and high-impact field, highlighting its many chal- lenges
and opportunities. Figure 1 shows the key sections of this paper, including BI&A evolution,
7
applications, and emerging analytics research opportunities. We then report on a bibliometric
study of critical BI&A publications, researchers, and research topics based on more than a
decade of related BI&A academic and industry publications. Educa- tion and program
development opportunities in BI&A are presented, followed by a summary of the six articles that
appear in this special issue using our research framework. The final and visualization technol-
ogiesIn this field that offers new directions for BI&A research.
This paper provides a summary of how big data can be useful for small and medium
enterprises in the 21st century. Organisations can draw meaningful information from analysis of
how large amounts of data are created on a daily basis, which can help them make informed
decisions. Case studies in which SMEs have successfully utilised the potential of big data are
cited in the paper. The paper shows the potential of big data for mobile application development
enterprises, which fall under the SME bracket, and how they can utilise the benefits of big data,
by presenting applications-based analytical results. The conclusion presents a syllogism whereby
a mobile app development company has succeeded in using big data for generating revenue
streams, and therefore provides an example of how big data could be utilised by other firms.
The rapid growth of the Internet, smartphones, social media, wireless technologies, etc. in
the digital world has led to an explosion of data. It is predicted that the next technological
revolution will be based upon the science of ‘big data’, through which large amounts of data can
be processed, captured, stored, shared and analysed. According to estimates by analysts, these
large quantities of data will increase by 40 times in the upcoming decade (Intuit 2020 Report
2012). Previously, the ability to gather and capitalise upon large amounts of information was
limited to large enterprises, since they possessed a pool of statisticians who could obtain
meaningful information from the data.
However, due to the democracy provided by big data, small and medium enterprises can
utilise the benefits of data analysis, which will help them to obtain meaningful insights into
competition, markets, bottom lines and top-line results, and enhance their decision-making
abilities (Brown and Duguid 2002, p. 29, 39). The ease of access to large amounts of data has
made data a vital resource, along with labour and capital, in any industry. As predicted, the key
8
driver for growth in the 21st century for any organisation, big or small, is data that encompasses
many societal aspects, such as health care, business, entertainment, government, finance, etc.
(Erbes et al. 2012, pp. 66–72).
Gartner, an IT and research firm, has analysed and graphically presented the results of
over 2000 technologies, grouped into 98 categories, that outline and depict the life cycle of
technologies which have a high potential (Ross 2011). Technology is expected to stabilise during
the later stages of its life cycle, since by that time society and industry will accept it as
indispensable, and will not create undue hype. In this research, high expectations for the future
are placed benefits that will enable them to make more informed decisions, improve
communities and open new avenues for business opportunities.
Whether industrial 4.0 nor Internet industry, for today's industrial manufacturing
enterprises, it should be to make full use of information and communication technology to deal
with the arrival of smart and effective large data, combining products, machinery and human
resources into together, according to the unexpected speed about the mode of sales product, it
can change the manufacturing enterprises to process innovation and reform. This paper takes the
automobile manufacturing industry as an example, based on sale car large data analysis, using
data mining technology, through the Java program to prepare web crawler program for data
collection. To give some suggestions for the automobile manufacturing industry in the
9
production of automobile, it reduces the inventory of automobile enterprises and the waste of
resources.
Manufacturing industry is the pillar industry of the national economy, however, there is
still a big gap in our country with the independently intellectual property rights of innovative
design, advanced manufacturing technology , equipment, modern design and management, so
China is not manufacturing power, At the same time, with the formation of the world economic
integration, foreign technology, capital, products influx of a large number of China, Chinese
enterprises, especially the manufacturing industry is facing an unprecedented fierce competition.
Manufacturing industry to achieve new development, we must seize the opportunity, which
requires enterprises advanced management philosophy of mining.
The increasing focus on big data and how it has potential to influence almost every
industry, gives it a often deterministic presence that presents it as a implement-and-reap
beneficial solution for enterprises. The project is inspired by the limited focus on the potential for
SMEs to harness big data instead of multinational enterprises. As noted, mostly larger enterprises
has launched initiatives to compliment their analytical proficiencies, but as technologies mature,
and more companies adopt frameworks for handling data, and learn how to organize within this
new framework, SMEs might find an easier time reaping some of the benefits.
Also led by cheaper and more easily accessible servers and data centers, delivered
through cloud vendors, SMEs now face less of a constraint on upfront investment, rather the
challenges presents themselves as organizational and strategic of nature. The right technologies
still needs to be chosen, but with well supported and documented open source data systems being
available, it has increasingly become a question of choosing right, choosing a scalable option .
10
While manufacturers have been generating highly distributed data from various systems,
devices and applications, a number of challenges in both data management and data analysis
require new approaches to support the big data era. These challenges for industrial big data
analytics is real-time analysis and decision- making from massive heterogeneous data sources in
manufacturing space. This survey presents new concepts, methodologies, and applications
scenarios of industrial big data analytics, which can provide dramatic improvements in velocity
and veracity problem solving.
When manufacturers have been entered the age of big data, Data sizes can range from a
few dozen terabytes to many petabytes of data in a single data set. For example, the GE company
that produces a personal care product generates
5,000 data samples every 33 milliseconds, resulting in . Big data analytics will be vital
foundation for forecasting manufacturing, machine fleet, and proactive maintenance. Compared
to big data in general, industrial big data has the potential to create value in different sections of
manufacturing business chain. For example, valuable information regarding the hidden
degradation or inefficiency patterns within machines or manufacturing processes can lead to
informed and effective maintenance decisions which can avoid costly failures and unplanned
downtime. However, the ability to perform analysis on the data is constrained by the increasingly
distributed nature of industrial data sets.
11
Highly distributed data sources bring about challenges in industrial data access,
integration, and sharing. Furthermore, massive data produced by different sources are often
defined using different representation methods and structural specifications. Bringing those data
together becomes a challenge because the data are not properly prepared for data integration and
management, and the technical infrastructures lack the appropriate information infrastructure
services to support big data analytics if it remains distributed. Recently, industrial big data
analytics has attracted extensive research interests from both academia and industry.
According to a report from McKinsey institute, the effective use of industrial big data
has the underlying benefits to transform economies,and delivering a new wave of productive
growth. Taking advantages of valuable industrial big data analytics will become basic
competition for to days enterprises and will create new competitors who are able to attract
employees that have the critical skills on industrial big data.
The GE Corporate published while book about an industrial big data platform. It
illustrates industrial big data requirements that must be addressed in order for industrial operators
to achieve the many efficient opportunities in a cost-effective manner. The industrial big data
software platform brings these capabilities together in a single technology infrastructure that
opens the whole capabilities for service provider . Brian corporate describes industrial big data
analytics. The industrial big data analytics will focus on high-performance operational data
management system, cloud-based data storage, and hybrid service platforms.
ABB corporate proposes that turn industrial big data into decision-making so that
enterprise have additional context and insight to enable better decision making. In 2015,
industrial big data analytics in is proposed for manufacturing maintenance and service
innovation, which discusses automate data processing, health assessment and prognostics in
industrial big data environment.
12
3.SYSTEM SPECIFICATION
Language : HTML
Operating System : Windows 8.1
Front End : PHP
13
4.PROJECT DESCRIPTION
4.1.Introduction
Data science and machine learning are now key technologies in our everyday lives, as we
can see in a multitude of applications, such as voice recognition in vehicles and on cell phones,
automatic facial and traffic sign recognition, as well as chess and, more recently, Go machine
algorithms which humans can no longer beat. The analysis of large data volumes based on
search, pattern recognition, and learning algorithms provides insights into the behavior of
processes, systems, nature, and ultimately people, opening the door to a world of fundamentally
new possibilities. In fact, the now already implementable idea of autonomous driving is virtually
a tangible reality for many drivers today with the help of lane keeping assistance and adaptive
cruise control systems in the vehicle.
The fact that this is just the tip of the iceberg, even in the automotive industry, becomes
readily apparent when one considers that, at the end of 2015, Toyota and Tesla's founder, Elon
Musk, each announced investments amounting to one billion US dollars in artificial intelligence
research and development almost at the same time. The trend towards connected, autonomous,
and artificially intelligent systems that continuously learn from data and are able to make optimal
decisions is advancing in ways that are simply revolutionary, not to mention fundamentally
important to many industries.
This includes the automotive industry, one of the key industries in Germany, in which
international competitiveness will be influenced by a new factor in the near future – namely the
new technical and service offerings that can be provided with the help of data science and
machine learning. This article provides an overview of the corresponding methods and some
current application examples in the automotive industry. It also outlines the potential
applications to be expected in this industry very soon. Accordingly, sections 2 and 3 begin by
addressing the sub domains of data mining (also referred to as “big data analytics”) and artificial
intelligence, briefly summarizing the corresponding processes, methods, and areas of application
and presenting them in context.
14
logistics through to the end customer. Based on such an example, section 5 describes the vision
for future applications using three examples: one in which vehicles play the role of autonomous
agents that interact with each other in cities, one that covers integrated production optimization,
and one that describes companies themselves as autonomous agents. Whether these visions will
become a reality in this or any other way cannot be said with certainty at present – however, we
can safely predict that the rapid rate of development in this area will lead to the creation of
completely new products, processes, and services, many of which we can only imagine today.
This is one of the conclusions drawn in section 6, together with an outlook regarding the
potential future effects of the rapid rate of development in this area.
Gartner uses the term “prescriptive analytics“ to describe the highest level of ability to
make business decisions on the basis of data-based analyses. This is illustrated by the question
“what should I do?” and prescriptive analytics supplies the required decision-making support, if
a person is still involved, or automation if this is no longer the case. The levels below this, in
ascending order in terms of the use and usefulness of AI and data science, are defined as follows:
descriptive analytics (“what has happened?”), diagnostic analytics (“why did it happen?”), and
predictive analytics (“what will happen?”) (see Figure 1).
The last two levels are based on data science technologies, including data mining and
statistics, while descriptive analytics essentially uses traditional business intelligence concepts
(data warehouse, OLAP). In this article, we seek to replace the term “prescriptive analytics“
with the term “optimizing analytics.“ The reason for this is that a technology can “prescribe”
many things, while, in terms of implementation within a company, the goal is always to make
something “better” with regard to target criteria or quality criteria. This optimization can be
supported by search algorithms, such as evolutionary algorithms in nonlinear cases and operation
research (OR) methods in – much rarer – linear cases. It can also be supported by application
experts who take the results from the data mining process and use them to draw conclusions
regarding process improvement.
15
One good example are the decision trees learned from data, which application experts
can understand, reconcile with their own expert knowledge, and then implement in an
appropriate manner. Here too, the
Application is used for optimizing purposes, admittedly with an intermediate human step.
Within this context, another important aspect is the fact that multiple criteria required for the
relevant application often need to be optimized at the same time, meaning that multi- criteria
optimization methods – or, more generally, multi- criteria decision-making support methods –
are necessary. These methods can then be used in order to find the best possible compromises
between conflicting goals. The examples mentioned include the frequently occurring conflicts
between cost and quality, risk and profit, and, in a more technical example, between the weight
and passive occupant safety of a body. Optimzing Analytics “What am I supposed to do?”
Decision support, multi criterial optimization Predictive Analytics “What will happen?”
Modelling Diagnostic Analytics “Why did it happen?”. Modelling Descriptive Analytics “What
happened?” Business Intelligence
Figure 4.1: The four levels of data analysis usage within a company
These four levels form a framework, within which it is possible to categorize data
analysis competence and potential benefits for a company in general. This framework is depicted
in Figure 1 and shows the four layers which build upon each other, together with the respective
technology category required for implementation. The traditional Cross-Industry Standard
16
Process for Data Mining (CRISP-DM) includes no optimization or decision- making support
whatsoever. Instead, based on the business understanding, data understanding, data preparation,
modeling, and evaluation sub-steps, CRISP proceeds directly to the deployment of results in
business processes. Here too, we propose an additional optimization step that in turn comprises
multi-criteria optimization and decision- making support. This approach is depicted
schematically in Figure 4.2.
It is important to note that the original CRISP model deals with a largely iterative
approach used by data scientists to analyze data manually, which is reflected in the iterations
between business understanding and data understanding as well as data preparation and
modeling. However, evaluating the modeling results with the relevant application experts in the
evaluation step can also result in having to start the process all over again from the business
understanding sub-step, making it necessary to go through all the sub-steps again partially or
completely (e.g., if additional data needs to be incorporated).
The manual, iterative procedure is also due to the fact that the basic idea behind this
approach – as up-to-date as it may be for the majority of applications – is now almost 20 years
old and certainly only partially compatible with a big data strategy.
17
The fact is that, in addition to the use of nonlinear modeling methods (in contrast to the
usual generalized linear models derived from statistical modeling) and knowledge extraction
from data, data mining rests on the fundamental idea that models can be derived from data with
the help of algorithms and that this modeling process can run automatically for the most part –
because the algorithm “does the work.” In applications where a large number of models need to
be created, for example for use in making forecasts (e.g., sales forecasts for individual vehicle
models and markets based on historical data), automatic modeling plays an important role.
The same applies to the use of online data mining, in which, for example, forecast
models (e.g., for forecasting product quality) are not only constantly used for a production
process, but also adapted (i.e., retrained) continuously whenever individual process aspects
change (e.g., when a new raw material batch is used).
This type of application requires the technical ability to automatically Generate data, and
integrate and process it in such a way that data mining algorithms can be applied to it. In
addition, automatic modeling and automatic optimization are necessary in order to update
models and use them as a basis for generating optimal proposed actions in online applications.
These actions can then be communicated to the process expert as a suggestion or – especially in
the case of continuous production processes – be used directly to control the respective process.
If sensor systems are also integrated directly into the production process – to collect data
in real time – this results in a self-learning cyber- physical system 3 that facilitates
implementation of the Industry 4.04 vision in the field of production engineering.
18
This approach is depicted schematically in Figure 3. Data from the system is acquired
with the help of sensors and integrated into the data management system. Using this as a basis,
forecast models for the system's relevant outputs (quality, deviation from target value, process
variance, etc.) are used continuously in order to forecast the system's output.
Other machine learning options can be used within this context in order, for example, to
predict maintenance results (predictive maintenance) or to identify anomalies in the process. The
corresponding models are monitored continuously and, if necessary, automatically retrained if
any process drift is observed. Finally, the multi-criteria optimization uses the models to
continuously computer Systems “in which information and software components are connected
to mechanical and electronic components and in which data is transferred and exchanged, and
monitoring and control tasks are carried out, in real-time using infrastructures such as the
Internet.” (Translation of the following article in Gabler Wirtschaftslexikon, Springer: Industry
4.0 is defined therein as “a marketing term that is also used in science communication and refers
to a ‘future project’ of the German federal government.
In order to differentiate it from “traditional” data mining, the term “big data” is
frequently defined now with three (sometimes even four or five) essential characteristics:
volume, velocity, and variety, which refer to the large volume of data, the speed at which data is
generated, and the heterogeneity of the data to be analyzed, which can no longer be categorized
into the conventional relational database schema. Veracity, i.e., the fact that large uncertainties
may also be hidden in the data (e.g., measurement inaccuracies), and finally value, i.e., the value
that the data and its analysis represents for a company's business processes, are often cited as
additional characteristics.
So it is not just the pure data volume that distinguishes previous data analytics methods
from big data, but also other technical factors that require the use of new methods– such as
19
Hadoop and MapReduce – with appropriately adapted data analysis algorithms in order to allow
the data to be saved and processed. In addition, so- called “in-memory databases” now also make
it possible to apply traditional learning and modeling algorithms in main memory to large data
volumes.
This means that if one were to establish a hierarchy of data analysis and modeling
methods and techniques, then, in very simplistic terms, statistics would be a subset of data
mining, which in turn would be a subset of big data. Not every application requires the use of
data mining or big data technologies. However, a clear trend can be observed, which indicates
that the necessities and possibilities involved in the use of data mining and big data are growing
at a very rapid pace as increasingly large data volumes are being collected and linked across all
processes and departments of a company. Nevertheless, conventional hardware architecture with
additional main memory is often more than sufficient for analyzing large data volumes in the
gigabyte range.
An early definition of artificial intelligence from the IEEE Neural Networks Council was
“the study of how to make computers do things at which, at the moment, people are better.”
Although this still applies, current research is also focused on improving the way that software
does things at which computers have always been better, such as analyzing large amounts of
data. Data is also the basis for developing artificially intelligent software systems not only to
collect information, but also to:
• Learn
20
• Behave adaptively
• Plan
• Make inferences
• Solve problems
• Think abstractly
At the most general level, Machine Learning (ML) algorithms can be subdivided into two
categories: supervised and unsupervised, depending on whether or not the respective algorithm
requires a target variable to be specified.
Apart from the input variables (predictors), supervised learning algorithms also require
the known target values (labels) for a problem. In order to train an ML model to identify traffic
signs using cameras, images of traffic signs – preferably with a variety of configurations – are
required as input variables. In this case, light conditions, angles, soiling, etc. are compiled as
noise or blurring in the data; nonetheless, it must be possible to recognize a traffic sign in rainy
conditions with the same accuracy as when the sun is shining. The labels, i.e., the correct
designations, for such.
Data are normally assigned manually. This correct set of input variables and their correct
classification constitute a training data set. Although we only have one image per training data
set in this case, we still speak of multiple input variables, since ML algorithms find relevant
features in training data and learn how these features and the class assignment for the
classification task indicated in the example are associated.
21
Supervised learning is used primarily to predict numerical values (regression) and for
classification purposes (predicting the appropriate class), and the corresponding data is not
limited to a specific format – ML algorithms are more than capable of processing images, audio
files, videos, numerical data, and text. Classification examples include object recognition (traffic
signs, objects in front of a vehicle, etc.), face recognition, credit risk assessment, voice
recognition, and customer churn, to name but a few.
Unsupervised learning algorithms do not focus on individual target variables, but instead
have the goal of characterizing a data set in general. Unsupervised ML algorithms are often used
to group (cluster) data sets, i.e., to identify relationships between individual data points (that can
consist of any number of attributes) and group them into clusters. In certain cases, the output
from unsupervised ML algorithms can in turn be used as an input for supervised methods.
Examples of unsupervised learning include forming customer groups based on their buying
behavior or demographic data, or clustering time series in order to group millions of time series
from sensors into groups that were previously not obvious.
In other words, machine learning is the area of artificial intelligence (AI) that enables
computers to learn without being programmed explicitly. Machine learning focuses on
developing programs that grow and change by themselves as soon as new data is provided.
Accordingly, processes that can be represented in a flowchart are not suitable candidates for
machine learning – in contrast, everything that requires dynamic and changing solution strategies
22
and cannot be constrained to static rules is potentially suitable for solution with ML. For
example, ML is used when:
Even though ML is used in certain data mining applications, and both look for patterns in
data, ML and data mining are not the same thing. Instead of extracting data that people can
understand, as is the case with data mining, ML methods are used by programs to improve their
own understanding of the data provided. Software that implements ML methods recognizes
patterns in data and can dynamically adjust the behavior based on them.
If, for example, a self-driving car (or the software that interprets the visual signal from
the corresponding camera) has been trained to initiate a braking maneuver if a pedestrian appears
in front it, this must work with all pedestrians regardless of whether they are short, tall, fat, thin,
clothed, coming from the left, coming from the right, etc. In turn, the vehicle must not brake if
there is a stationary garbage bin on the side of the road.
The level of complexity in the real world is often greater than the level of complexity of
an ML model, which is why, in most cases, an attempt is made to subdivide problems into sub
problems and then apply ML models to these sub problems. The output from these models is
then integrated in order to permit complex tasks, such as autonomous vehicle operation, in
structured and unstructured environments.
23
4.5 Computer Vision
Computer vision (CV) is a very wide field of research that merges scientific theories from
various fields (as is often the case with AI), starting from biology, neuroscience, and psychology
and extending all the way to computer science, mathematics, and physics. First, it is important to
know how an image is produced physically. Before light hits sensors in a two-dimensional array,
it is refracted, absorbed, scattered, or reflected, and an image is produced by measuring the
intensity of the light beams through each element in the image (pixel). The three primary focuses
of CV are:
All three areas overlap and influence each other. If, for example, the focus in an
application is on obstacle recognition in order to initiate an automated braking maneuver in the
event of a pedestrian appearing in front of the vehicle, the most important thing is to identify the
pedestrian as an obstacle. Interpreting the entire scene – e.g., understanding that the vehicle is
moving towards a family having a picnic in a field – is not necessary in this case.
24
Vision in biological organisms is regarded as an active process that includes controlling
the sensor and is tightly linked to successful performance of an action. Consequently, CV
systems are not passive either. In other words, the system must:
Having said that, the goal of CV systems is not to understand scenes in images – first and
foremost, the systems must extract the relevant information for a specific task from the scene.
This means that they must identify a “region of interest” that will be used for processing.
Moreover, these systems must feature short response times, since it is probable that scenes will
change over time and that a heavily delayed action will not achieve the desired effect. Many
different methods have been proposed for object recognition purposes (“what” is located
“where” in a scene), including:
Object detectors, in which case a window moves over the image and a filter response is
determined for each position by comparing a template and the sub-image (window content), with
each new object parameterization requiring a separate scan. More sophisticated algorithms
simultaneously make calculations based on various scales and apply filters that have been
learned from a large number of images.
Alignment-based methods use parametric object models that are trained on data.
Algorithms search for parameters, such as scaling, translation, or rotation, that adapt a model
optimally to the corresponding features in the image, whereby an approximated solution can be
found by means of a reciprocal process, i.e., by features, such as contours, corners, or others,
“selecting” characteristic points in the image for parameter solutions that are compatible with the
found feature.
25
With object recognition, it is necessary to decide whether algorithms need to process 2-D
or 3-D representations of objects – 2-D representations are very frequently a good compromise
between accuracy and availability. Current research (deep learning) shows that even distances
between two points based on two 2-D images captured from different points can be accurately
determined as an input. In daylight conditions and with reasonably good visibility, this input can
be used in addition to data acquired with laser and radar equipment in order to increase accuracy
– moreover, a single camera is sufficient to generate the required data.
This allows a large variety of different basic shapes to be described with a small set of
parameters. If 3-D images are acquired using stereo cameras, statistical methods (such as
generating a stereo point cloud) are used instead of the aforementioned shape-based methods,
because the data quality achieved with stereo cameras is poorer than that achieved with laser
scans. Other research directions include tracking, contextual scene understanding, and
monitoring, although these aspects are currently of secondary importance to the automotive
industry.
Making inferences is the area of KRR in which data-based answers need to be found
without human intervention or assistance, and for which data is normally presented in a formal
26
system with distinct and clear semantics. Since 1980, it has been assumed that the data involved
is a mixture of simple and complex structures, with the former having a low degree of
computational complexity and forming the basis for research involving large databases.
The latter are presented in a language with more expressive power, which requires less
space for representation, and they correspond to generalizations and fine-grained information.
Mathematical logic is the formal basis for many applications in the real world, including
calculation theory, our legal system and corresponding arguments, and theoretical developments
and evidence in the field of research and development. The initial vision was to represent every
type of knowledge in the form of logic and use universal algorithms to make inferences from it,
but a number of challenges arose – for example, not all types of knowledge can be represented
simply.
Moreover, compiling the knowledge required for complex applications can become very
complex, and it is not easy to learn this type of knowledge in a logical, highly expressive
language. In addition, it is not easy to make inferences with the required highly expressive
language – in extreme cases, such scenarios cannot be implemented computationally, even if the
first two challenges are overcome. Currently, there are three ongoing debates on this subject,
with the first one focusing on the argument that logic is unable to represent many concepts, such
as space, analogy, shape, uncertainty, etc., and consequently cannot be included as an active part
in developing AI to a human level.
The counterargument states that logic is simply one of many tools. At present, the
combination of representative expressiveness, flexibility, and clarity cannot be achieved with any
other method or system. The second debate revolves around the argument that logic is too slow
27
for making inferences and will therefore never play a role in a productive system. The
counterargument here is that ways exist to approximate the inference process with logic, so
processing is drawing close to remaining within the required time limits, and progress is being
made with regard to logical inference.
Finally, the third debate revolves around the argument that it is extremely difficult, or
even impossible, to develop systems based on logical axioms into applications for the real world.
The counterarguments in this debate are primarily based on the research of individuals currently
researching techniques for learning logical axioms from natural-language texts.
In principle, a distinction is made between four different types of logic which are not
discussed any further in this article:
Propositional logic
First-order predicate logic
Modal logic
Non-monotonic logic
28
Is the domain dynamic to the extent that a sequence of decisions is
required or static in the sense that a single decision or multiple
simultaneous decisions need to be made?
Is the domain deterministic, non-deterministic, or stochastic?
Is the objective to optimize benefits or to achieve a goal?
Is the domain known to its full extent at all times? Or is it only partially
known?
In general, planning problems consist of an initial (known) situation, a defined goal, and
a set of permitted actions or transitions between steps. The result of a planning process is a
sequence or set of actions that, when executed correctly, change the executing entity from an
initial state to a state that meets the target conditions. Computationally speaking, planning is a
difficult problem, even if simple problem specification languages are used. Even when relatively
simple problems are involved, the search for a plan cannot run through all state-space
representations, as these are exponentially large in the number of states that define the domains.
Consequently, the aim is to develop efficient algorithms that represent sub-representations in
order to search through these with the hope of achieving the relevant goal.
Current research is focused on developing new search methods and new representations
for actions and states, which will make planning easier. Particularly when one or more agents
acting against each other are taken into account, it is crucial to find a balance between learning
and decision-making – exploration for the sake of learning while decisions are being made can
lead to undesirable results.
Many problems in the real world are problems with dynamics of a stochastic nature.
One example of this is buying a vehicle with features that affect its value, of which we are
29
unaware. These dependencies influence the buying decision, so it is necessary to allow risks and
uncertainties to be considered. For all intents and purposes, stochastic domains are more
challenging when it comes to making decisions, but they are also more flexible than
deterministic domains with regard to approximations – in other words, simplifying practical
assumptions makes automated decision-making possible in practice. A great number of problem
formulations exist, which can be used to represent various aspects and decision-making
processes in stochastic domains, with the best-known being decision networks and Markov
decision processes.
• Part-of-speech tagging
• Automatic summarization
• Named-entity recognition
• Parsing
30
• Voice recognition
• Sentiment analysis
• Co-reference resolution
• Discourse analysis
• Machine translation
• Morphological segmentation
• Answers to questions
• Relationship extraction
• Sentence splitting
The core vision of AI says that a version of first-order predicate logic (“first-order
predicate calculus” or “FOPC”) supported by the necessary mechanisms for the respective
problem is sufficient for representing language and knowledge. This thesis says that logic can
and should supply the semantics underlying natural language.
Although attempts to use a form of logical semantics as the key to representing contents
have made progress in the field of AI and linguistics, they have had little success with regard to a
program that can translate English into formal logic.
To date, the field of psychology has also failed to provide proof that this type of
translation into logic corresponds to the way in which people store and manipulate “meaning.”
Consequently, the ability to translate a language into FOPC continues to be an elusive goal.
Without a doubt, there are NLP applications that need to establish logical inferences between
sentence representations, but if these are only one part of an application, it is not clear that they
have anything to do with the underlying meaning of the corresponding natural language (and
31
consequently with CL/NLP), since the original task for logical structures was inference. These
and other considerations have crystallized into three different
Positions:
• Position 1: Logical inferences are tightly linked to the meaning of sentences, because
knowing their meaning is equivalent to deriving inferences and logic is the best way to do this.
• Position 3: In general, the predicates of logic and formal systems only appear to be
different from human language, but their terms are in actuality the words as which they appear
The introduction of statistical and AI methods into the field is the latest trend within this
context. The general strategy is to learn how language is processed – ideally in the way that
humans do this, although this is not a basic prerequisite. In terms of ML, this means learning
based on extremely large corpora that have been translated manually by humans. This often
means that it is necessary to learn (algorithmically) how annotations are assigned or how part-of-
speech categories (the classification of words and punctuation marks in a text into word types) or
semantic markers or primes are added to corpora, all based on corpora that have been prepared
by humans (and are therefore correct).
In the case of supervised learning, and with reference to ML, it is possible to learn
potential associations of part-of-speech tags with words that have been annotated by humans in
the text, so that the algorithms are also able to annotate new, previously unknown texts. 18 This
works the same way for lightly supervised and unsupervised learning, such as when no
annotations have been made by humans and the only data presented is a text in a language with
texts with identical contents in other languages or when relevant clusters are found in thesaurus
data without there being a defined goal.
19 With regard to AI and language, information retrieval (IR) and information extraction
(IE) play a major role and correlate very strongly with each other. One of the main tasks of IR is
grouping texts based on their content, whereas IE extracts similarly factual elements from texts
32
or is used to be able to answer questions concerning text contents. These fields therefore
correlate very strongly with each other, since individual sentences (not only long texts) can also
be regarded as documents.
These methods are used, for example, in interactions between users and systems, such as
when a driver asks the on-board computer a question regarding the owner's manual during a
journey – once the language input has been converted into text, the question's semantic content is
used as the basis for finding the answer in the manual, and then for extracting the answer and
returning it to the driver.
In traditional AI, people focused primarily on individual, isolated software systems that
acted relatively inflexibly to predefined rules. However, new technologies and applications have
established a need for artificial entities that are more flexible, adaptive, and autonomous, and that
act as social units in multi-agent systems.
In traditional AI (see also "physical symbol system hypothesis"20 that has been
embedded into so-called “deliberative” systems), an action theory that establishes how systems
make decisions and act is represented logically in individual systems that must execute
actions.Based on these rules, the system must prove a theorem – the prerequisite here being that
the system must receive a description of the world in which it currently finds itself, the desired
target state, and a set of actions, together with the prerequisites for executing these actions and a
list of the results for each action.
It turned out that the computational complexity involved rendered any system with time
limits useless even when dealing with simple problems, which had an enormous impact on
symbolic AI, resulting in the development of reactive architectures.
These architectures follow if-then rules that translate inputs directly into tasks.Such
systems are extremely simple, although they can solve very complex tasks. The problem is that
such systems learn procedures rather than declarative knowledge, i.e., they learn attributes that
cannot easily be generalized for similar situations.
33
Many attempts have been made to combine deliberative and reactive systems, but it
appears that it is necessary to focus either on impractical deliberative systems or on very loosely
developed reactive systems – focusing on both is not optimal.
• Autonomous behavior
“Autonomy” describes the ability of systems to make their own decisions and execute
tasks on behalf of the system designer. The goal is to allow systems to act autonomously in
scenarios where controlling them directly is difficult. Traditional software systems execute
methods after these methods have been called, i.e., they have no choice, whereas agents make
decisions based on their beliefs, desires, and intentions (BDI).
• Adaptive behavior
Since it is impossible to predict all the situations that agents will encounter, these agents
must be able to act flexibly. They must be able to learn from and about their environment and
adapt accordingly. This task is all the more difficult if not only nature is a source of uncertainty,
but the agent is also part of a multi-agent system. Only environments that are not static and self-
contained allow for an effective use of BDI agents – for example, reinforcement learning can be
used to compensate for a lack of knowledge of the world.
Within this context, agents are located in an environment that is described by a set of
possible states. Every time an agent executes an action, it is “rewarded” with a numerical value
that expresses how good or bad the action was. This results in a series of states, actions, and
rewards, and the agent is compelled to determine a course of action that entails maximization of
the reward.
• Social behavior
34
In an environment where various entities act, it is necessary for agents to recognize their
adversaries and form groups if this is required by a common goal. Agent-oriented systems are
used for personalizing user interfaces, as middleware, and in competitions such as the Robo Cup.
In a scenario where there are only self-driving cars on roads, the individual agent’s autonomy is
not the only indispensable component – car2car communications, i.e., the exchange of
information between vehicles and acting as a group on this basis, are just as important.
Coordination between .
The agents results in an optimized flow of traffic, rendering traffic jams and accidents
virtually impossible (see also section 5.1, “Vehicles as autonomous, adaptive, and social agents
& cities as super-agents”). In summary, this agent-oriented approach is accepted within the AI
community as the direction of the future.
Multi-agent behavior
Various approaches are being pursued for implementing multi-agent behavior,
with the primary difference being in the degree of control that designers have over
individual agents.22,23,24 A distinction is made here between:
DPS systems allow the designer to control each individual agent in the domain, with the
solution to the task being distributed among multiple agents. In contrast, MAS systems have
multiple designers, each of whom can only influence their own agents with no access to the
design of any other agent. In this case, the design of the interaction protocols is extremely
important. In DPS systems, agents jointly attempt to achieve a goal or solve a problem, whereas,
in MAS systems, each agent is individually motivated and wants to achieve its own goal and
maximize its own benefit.
The goal of DPS research is to find collaboration strategies for problem-solving, while
minimizing the level of communication required for this purpose. Meanwhile, MAS research is
looking at coordinated interaction, i.e., how autonomous agents can be brought to find a common
35
basis for communication and undertake consistent actions.25 Ideally, a world in which only self-
driving cars use the road would be a DPS world. However, the current competition between
OEMs means that a MAS world will come into being first. In other words, communication and
negotiation between agents will take center stage (see also Nash equilibrium).
Multi-agent learning
Multi-agent learning (MAL) has only relatively recently been bestowed a certain degree
of attention.26,27,28,29 The key problems in this area include determining which techniques
should be used and what exactly “multi-agent learning” means. Current ML approaches were
developed in order to train individual agents, whereas MAL focuses first and foremost on
distributed learning. “Distributed” does not necessarily mean that a neural network is used, in
which many identical operations run during training and can accordingly be parallelized, but
instead that:
• A problem is split into sub problems and individual agents learn these sub
problems in order to solve the main problem using their combined knowledge OR
• Many agents try to solve the same problem independently of each other by
competing with each other
4.9 Distributed SQL Processing
It is
particulaly useful in handlingstructured data where there are relations between different entities/v
ariablesof the data. SQL offers two main advantages over older read/write APIs likeISAM or VS
36
AM. First, it introduced the concept of accessing many recordswith one single command; and sec
ond, it eiminates the need to specify howto reach a record,
e.g. with or without an index.Originally based upon relational algebra and tuple relational
calculus, SQLconsists of many types of statements, which may be informally classed as
sublanguages, commonly: a data query language (DQL),
[a] A data definition language (DDL),
[c] The scope of SQL includes data query, datamanipulation (insert, update and delete), data defi
nition (schema creationand modification), and data access control. Although SQL is often descri
bedas, and to a great extent is, a declarative language (4GL).
It also includesprocedural elements.SQL was one of the first commercial languages for E
dgar F. Codd's relationalmodel. The model was described in his influential 1970 paper, "A Relati
onalModel of Data for Large Shared Data Banks". Despite not entirely adheringto the relational
model as described by Codd, it became the most widely used database language.
SQL became a standard of the American National Standards Institute (ANSI)in 1986, and
of the International Organization for Standardization (ISO) in1987. Since then, the standard has
been revised to include a larger set offeatures. Despite the existence of such standards, most
SQL code is not completely portable among different database systems without adjustments.
History
SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce af
ter learning about the relationalmodel from Ted Codd in the early 1970s.This version,
initially called SEQUEL (Structured English QueryLanguage), was designed to manipulate and r
etrieve data stored in IBM's original quasirelational database managementsystem, System R, whi
ch a group at IBM San Jose Research Laboratory had developed during the 1970s.
Chamberlin and Boyce's first attempt of a relational database language was Square, but it
was difficult to use due tosubscript notation. After moving to the San Jose Research Laboratory i
n 1973, they began work on SEQUEL. Theacronym SEQUEL was later changed to SQL because
37
"SEQUEL" was a trademark of the UKbased Hawker SiddeleyDynamics Engineering Limited co
mpany.
Design
SQL deviates in several ways from its theoretical foundation, the relational model and its
tuple calculus. In that model, atable is a set of tuples, while in SQL, tables and query results are l
ists of rows: the same row may occur multiple times, andthe order of rows can be employed in q
ueries (e.g. in the LIMIT clause).
Critics argue that SQL should be replaced with a language that strictly returns to the origi
nal foundation: for example, seeThe Third Manifesto. However, no known proof exists that such
uniqueness cannot be added to SQL itself, or at least avariation of SQL. In other words, it's quite
possible that SQL can be "fixed" or at least improved in this regard such thathe industry may not
have to switch to a completely different query language to obtain uniqueness. Debate on this
remains open.
Interoperability and standardization
SQL implementations are incompatible between vendors and do not necessarily
completely follow standards. Inparticular date and time syntax, string concatenation, NULLs, an
d comparison case sensitivity vary from vendor tovendor. Particular exceptions are PostgreSQLa
nd Mimer SQL which strive for standards compliance, thoughPostgreSQL does not adhere to the
standard in how folding of unquoted names is done.
The folding of unquoted names tolower case in PostgreSQL is incompatible with the stan
dard, which says that unquoted names should be folded toupper case.Thus, Foo should be equiva
lent to FOO not foo according to the standard.
Popular implementations of SQL commonly omit support for basic features of Standard S
QL, such as the DATE or TIMEdata types. The most obvious such examples, and incidentally th
e most popular commercial and proprietary SQL DBMSs,are Oracle (whose DATE behaves as D
ATETIME,and lacks a TIME type)and MS SQL Server
38
5.Result And Descussion
HONDA
39
HYUNDAI
MAHINDRA
40
S.No ModelName CC Kmpl Prize FuleType Rank Color
1. Mahindra XUV300 1197 17 790000 Petrol,Diesel 3 Red
2. Mahindra XUV500 2179 13 127200 Petrol,Diesel 4 Red
3. Mahindra Scorpio 2179 11 100000 Diesel 5 White
4. Mahindra Thar 2498 16 672000 Diesel 4 Red
5. Mahindra Bolero 2523 15 769000 Diesel 5 Black
6. Mahindra Tuv300 1493 18 840000 Diesel 3 Red
7. Mahindra Marazzo 1497 17 999000 Diesel 4 Green
8. Mahindra kuv100NXT 1198 25 477000 Petrol,Diesel 4 Red
9. Mahindra Xylo 2489 14 942000 Diesel 5 White
10. Mahindra Tuv300plus 2179 18 978000 Diesel 3 Silver
MARUTI
41
4. Maruti Ertiga 1248 25 744000 Petrol,Diesel 3 Rose
5. Maruti Vitara Brezza 1248 24 767000 Diesel 4 Brown
6. Maruti Alto800 796 24 263000 Petrol,CNG 5 Green
7. Maruti Ciaz 1462 21 519000 Petrol,Diesel 4 Blue
8. Maruti Eeco 1196 15 337000 Petrol,CNG 4 Red
9. Maruti Alto k10 998 24 338000 Petrol,CNG 5 Black
10. Maruti Ignis 1197 20 479000 Petrol 5 Blue
TATA
42
5. Tata Zest 1193 17 564000 Petrol,Diese 4 Blue
l
6. Tata TiagoNRG 1047 27 561000 Petrol,Diese 5 Black
l
7. Tata Tiago JTP 1199 23 639000 Petrol 4 Red
8. Tata Tigor JTP 1199 20 749000 Petrol 5 White
9. Tata Bolt 1193 17 508000 Diesel,Petro 3 Red
l
10. Tata Nexon 1497 21.5 653000 Diesel 4 Blue
TOYOTA
43
l
3. Toyota Platinum Etios 1496 16 690000 Diesel,Petro 3 Red
l
4. Toyota Corolla Altis 1364 21 1645000 Diesel,Petro 4 Black
l
5. Toyota Yaris 1496 17 929000 Petrol 5 Red
6. Toyota Camry299 2487 19 3750000 Petrol 5 Black
7. Toyota Land Cruiser 4461 11 14700000 Diesel 5 White
8. Toyota Land Cruiser 2982 11 9630000 Diesel 5 Black
Prado
9. Toyota Prius 1798 23 4509000 Petrol 4 Blue
10. Toyota Etios Liva 2982 15 778000 Petrol, Diesel 4 Red
44
45
46
47
6.CONCLUSION AND FUTURE ENHANCEMENT
6.1 Conclusion
There are a lot of different considerations for businesses wishing to take advantage of
data analytics, especially those with limited resources. A majority of SMEs have not yet
immersed themselves into the world of Big Data for various reasons, most of which stem back to
not having the required knowledge or even the need for vast data collection. For SMEs which
are on the smallest end of the scale, using free web tools which are catered for less technot
necessary for them to be specialised to begin with as training them in relevant courses should be
more affordable.
They also do not necessarily need to be proficient in IT and networking support as,
depending on the amount of Small Data being processed and stored, a single machine should be
sufficient. As there are many different types of data analysis to choose from, SMEs benefit much
more from Small Data as they can pick and choose which specific areas they are going to
analyse, as opposed to collecting a lot of information where not all of it is actionable or relevant
to their current needs/access to resources. However, if SMEs are trying to get the most reach
online, particularly in social media, they need to be prepared for the vast variety of data they will
be collecting, and be sufficiently trained or have the right software to analyze it.
This in itself can be a large restriction, but there are short and relatively cheap online
courses that are focused on specific areas which an SME may wish to invest in if they feel a
social media presence will have a great impact on their business
Customer data integration, breaking silos across the organization and the desire to create a
single integrated view of the customer is the key to unlocking the richness and maturity of the
dataset. This will involve the aggregation of a range of internal and external sources including
CRM, dealer management systems , demographics, sales and marketing databases to name a few. A
critical next step in making customer data useful is to use it to create actionable and meaningful
48
customer segment that allow the development of a differentiated product offering and value
proposition for each segment at each stage of the lifecycle.
This could lead for example to the formulation of a new innovative retail model such as
a pop up store to drive awareness and/or more targeted campaigns of special service bundles
beyond normal warranty to reduce customer leakage and aid retention.
49
REFERENCES
1. Bodkin, R. The big data Wild West: The good, the bad and the ugly. [Accessed
30.09.2013].
4. Brynjolfsson, E., Hitt, L., Kim, H. Strength in Numbers: How Does Data-Driven
Decision Making Affect Firm Performance? MIT Working Paper, April 2011.
5. Chen, H., Chiang, R. H., Storey, V. C. Business Intelligence and Analytics: From Big
Data to Big Impact. MIS Quarterly, 36 (4), 1165 – 1188.
50