Evolution of Big Data in USA
1
                    Outline
•   Birth: 1880 US census
•   Adolescence: Big Science
•   Modern Era: Big Business
•   Future Landscape
•   Conclusion
                               2
     The First Big Data Challenge
• 1880 census
• 50 million people
• Age, gender (sex),
  occupation, education
  level, no. of insane
  people in household
                                    3
      The First Big Data Solution
• Hollerith Tabulating
  System
• Punched cards – 80
  variables
• Used for 1890 census
• 6 weeks instead of 7+
  years
                                    4
          Space Program (1960s)
• Began in late 1950s
• An active area for Big
  Data nowadays
                                  5
Adolescence: Big Science
                           6
                        Big Science
• The International
  Geophysical Year
   – An international scientific
     project
   – Last from Jul. 1, 1957 to Dec.
     31, 1958
• A synoptic collection of
  observational data on a
  global scale
• Implications
   – Big budgets, Big staffs, Big
     machines, Big laboratories
                                      7
        Summary of Big Science
• Laid foundation for ambitious projects
  – International Biological Program
  – Long Term Ecological Research Network
• Ended in 1974
• Many participants viewed it as a failure
• Nevertheless, it was a success
  – Transform the way of processing data
  – Realize original incentives
  – Provide a renewed legitimacy for synoptic data
    collection
                                                     8
       Lessons from Big Science
• Spawn new Big Data projects
  – Weather prediction
  – Physics research (supercollider data analytics)
  – Astronomy images (planet detection)
  – Medical research (drug interaction)
  –…
• Businesses latched onto its techniques,
  methodologies, and objectives
                                                      9
Modern Era: Big Business
                           10
     Big Science vs. Big Business
• Common
  – Need technologies to work with data
  – Use algorithms to mine data
• Big Science
  – Source: experiments and research conducted in
    controlled environments
  – Goals: to answer questions, or prove theories
• Big Business
  – Source: transactions in nature and little control
  – Goals: to discover new opportunities, measure
    efficiencies, uncover relationships
                                                        11
                    Current Status
• IDC reports
   – 2.7 billion terabytes in 2012, up 48 percent from 2011
   – 8 billion terabytes in 2015
• Sources
   – Structured corporate databases
   – Unstructured data from webpages, blogs, social networking
     messages, …
   – Countless digital sensors
• Business sectors
   –   Retailers: Walmart, Kohl
   –   Logistics companies: UPS
   –   Telecommunication: AT&T, T-Mobile
   –   …
                                                                 12
    Understanding of Big Data (1)
• An avalanche of data available increasing
  exponentially
• Google CEO Erik Schmidt said
  “Every two days we create as much information as we
  did from the dawn of civilization up until 2003. That’s
  something like five exabytes of data.”
• Farnam Jahanian kicked off a May 1, 2012
  briefing, calling data
  “a transformative new currency for science,
  engineering, education, and commerce.”
                                                            13
     Understanding of Big Data (2)
• Farnam Jahanian (NSF)
   “Big Data is characterized not only by the enormous volume
   of data but also by the diversity and heterogeneity of the
   data and the velocity of its generation.”
• Nuala O’Connor Kelly (GE)
   “it’s the volume and velocity and variety of data… to achieve
   new results for …”
• Nick Combs (EMC)
   “It’s needle in a haystack or connecting the dots.”
• Arvind Krishna (IBM) added the fourth V:
  –Veracity: data in doubt
  – Describe 'contradictory data,' or noisy data
                                                                14
                     Implications
• Big Science ?
   – Big budgets, Big staffs, Big machines, Big laboratories
• Farnam Jahanian (NSF)
   – To drive the creation of new IT products and services
   – To accelerate the pace of discovery in almost every SE
     discipline
   – To solve the nation’s most pressing challenges
• Response: $200 million Big Data R&D initiative in 2012
   – Advance in foundational techniques and technologies
   – Cyberinfrastructure to manage, curate, and serve data to
     SE research and education communities
   – New approaches to education and workforce
     development
   – Nurturance of new types of collaborations
                                                                15
             Data Bases’ View
• DB space 2000 - 2010   • After
                                               scalable
                               BI /         nonrelational
                            reporting         (“nosql”)
                                     OLTP /
                                   operational
                                                            16
                         Big Medicine
• Information
   – Related people: Patients, service
     providers, nurses, physicians,
     hospital administrators,
     government, insurance agencies
   – A mixture of structured and
     unstructured data
• Technologies
   – Dashboard technologies and
     analytics, business intelligence,
     clinical intelligence, revenue cycle
     management intelligence
• Other factors
   – Decision support, ease of
     information accessibility, quality of
     care, physician-patient relationship
                                             17
        Changes in Algorithms
• Efficiency vs. Effectiveness
• Flexible learning algorithms to remove bias
• Big Data is at an evolutionary juncture to
  improve/replace human judgment
• Businesses are seeing the value, but thwarted
  by the cost of storage, slower processing
  speeds, and the flood of the data themselves.
                                                  18
            Big Data at NASA
• NASA Open Government Plan ver. 2
  – Managing and processing
  – Storage
  – Archiving and Distribution
  – Analysis
  – Visualization
  – Commercial cloud computing services
• Strategy: push from top down and bottom up
                                               19
                 Conclusions
•   The first challenge
•   The first solution
•   What is adolescent age?
•   What is modern era?
•   What are characteristics?
•   What is future landscape?
•   What does NASA do?
                      Big         Big
     Census         Science     Business
                                           20
                       References
• Frank J. Ohlhorst, Big Data Analytics: Turning Big Data into Big
  Money, Wiley, 2012.
• 1880 census: http://www.1880census.com/
• Herman Hollerith: http://en.wikipedia.org/wiki/Herman_Hollerith
• Manhattan Project:
  http://en.wikipedia.org/wiki/Manhattan_Project
• Space exploration: http://en.wikipedia.org/wiki/Space_exploration
• Big Science: http://en.wikipedia.org/wiki/Big_Science
• IBM Research:
   http://ibmresearchalmaden.blogspot.hk/2011/09/ibm-research-al
  maden-centennial.html
• NASA:
  http://open.nasa.gov/blog/2012/10/04/what-is-nasa-doing-with-bi
  g-data-today/
                                                                  21