Data Analytics
What is Analytics ?
Analytics is the extensive use of data, statistical and quantitative analysis,
exploratory, predictive models, and fact based management to drive
decisions and actions.”
Analytics can be defined as “the analysis of data to draw hidden insights to
aid decision making”.
…… and many more !!!
                       Frequently used terms
Data Analytics        Data Analysis           Big Data                   Data Types
Data Warehouse        Data Mining             Data Cleansing             Data Definition
Data Manipulation     Data Transformation Data Wrangling                 Databases
Data Sources          Data Forms              Raw and Processed Data
Data Collection       Statistics              Statistical measures       Mathematics
                                                                         Linear Algebra
Artificial Intelligence Normalization         R / Python                 Hadoop
Text Analytics        Algorithms              Predictions                Patterns
Supervised learning   Unsupervised learning                 Clustering
                                                                                    etc….
                                                                  Definitions
Statistics is just about the numbers, and quantifying the data. There are many tools for finding relevant
properties of the data but this is pretty close to pure mathematics.
Data Mining is about using statistics as well as other programming methods to find patterns hidden in the
data so that you can explain some phenomenon. Data Mining builds intuition about what is really happening
in some data and is still little more towards math than programming, but uses both.
Machine Learning uses Data Mining techniques and other learning algorithms to build models of what is
happening behind some data so that it can predict future outcomes. Math is the basis for many of the
algorithms, but this is more towards programming.
Artificial Intelligence uses models built by Machine Learning and other ways to reason about the world
and give rise to intelligent behavior whether this is playing a game or driving a robot/car. Artificial
Intelligence has some goal to achieve by predicting how actions will affect the model of the world and
chooses the actions that will best achieve that goal. Very programming based.
                                                                                             Statistics quantifies numbers
                                                                                             Data Mining explains patterns
                                                             In short                        Machine Learning predicts with models
                                                                                             Artificial Intelligence behaves and reasons
https://stats.stackexchange.com/questions/5026/what-is-the-difference-between-data-mining-statistics-machine-learning-and-ai
                          Types of Analytics
                    Types of report, analytics
                    and query                                 Focus
                         Optimization            What’s the best that can happen ?
                          Prediction                 What will happen next ?
Analytics
                          Forecasting              What if this trend continues ?
                      Statistical Analysis            Why is this happening ?
                             Alerts                 What actions are needed ?
Query and Reports
                      Drilldown reports               Where is the problem ?
                        Ad-hoc reports                How many, how often ?
                      Standard Reports                   What happened ?
                                        Data Science
• Art of transforming hypotheses and data into actionable predictions
• For example, we can use models and data to
       Predict who will win an election
       What products will sell well together (Apriori / Market-Basket analysis)
       Who is likely to default on loans
       Which advertisements will be clicked on
       etc.
• Tools used (but not restricted to)
  Empirical Sciences    Statistics   Business Intelligence    Databases       Data Warehousing      Visualization
  Expert Systems        Analytics    Machine Learning         Big Data        Data Mining           Reporting
• Central goal of Data Science
To deploying effective decision-making models to a production environment
What distinguishes data science itself from the tools and techniques is the central goal of deploying
effective decision-making models to a production environment.
                               Data Science
These systems share a lot of features:
•   Amazon’s product recommendation systems
•   Google’s advertisement valuation systems
•   Linkedin’s contact recommendation system
•   Twitter’s trending topics
•   Walmart’s consumer demand projection systems
    Built on a large dataset             Most of the systems are live or online
    Allowed to make mistakes             Not concerned with any cause
                          Machine Learning
• The ability to write a mathematical function that will read an input and produce output
• We provide the function – machine does not pick its own function
• ML considerations
    Training data (lots of it)
    Model
    Cost function (eg: Ordinary Least Squares)
    Optimisation (eg: Gradient descent)
 Why is learning possible?
     Generalisation is possible
      eg: if dataset contains travel time between places A and B, function would not generalise if we
          predict travel distance between A and C
     IID (independent and identical distribution) of data
     That’s why gradient descent needn’t go through the entire dataset, since data is similar
… Eventually data will surpass in oil and water in importance
             Thank You !!!