Tuesday, May 22, 12
Eric.kavanagh@bloorgroup.com
Twitter Tag: #briefr
Tuesday, May 22, 12
Reveal the essential characteristics of enterprise software, good and bad Provide a forum for detailed analysis of todays innovative technologies Give vendors a chance to explain their product to savvy analysts Allow audience members to pose serious questions... and get answers!
Twitter Tag: #briefr
Tuesday, May 22, 12
May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database
Twitter Tag: #briefr
Tuesday, May 22, 12
Ultimately analytics is about businesses making optimal decisions, although the range of technologies that inhabit this area is wide: statistical analysis, data mining, process mining, predictive analytics, predictive modeling, business process modeling and complex event processing. With the advent of big data, analytics has become big analytics with organizations diving into large heaps of data that previously was not available or usable. A major challenge with this market trend is to be able to provide adequate performance for all BI and analytics workloads on the volumes of data that are now being assembled and which are continuously growing.
Twitter Tag: #briefr
Tuesday, May 22, 12
Robin Bloor is Chief Analyst at The Bloor Group.
Robin.Bloor@Bloorgroup.com
Twitter Tag: #briefr
Tuesday, May 22, 12
SAP Sybase has a history of database innovation and application from the corporate RDBMS through to the mobile and embedded market. Sybase IQ has been deployed in many areas of application and is used in many complex predictive analytics deployments, where speed data capacity and versatility are critical. Recently it has been upgraded to be used in a symbiotic manner with Hadoop in order to provide a comprehensive capability as a BI and analytics engine for Big Data applications
Twitter Tag: #briefr
Tuesday, May 22, 12
David Jonker works in the area of Data Management & Analytics for SAP and is Product Marketing Director for Sybase IQ. In the last 5 years David has led product marketing teams for Sybases Data Management & Analytics product lines, including Sybase IQ, Sybase ASE, SQL Anywhere, and Advantage Database Server. His career includes over 10 years in software engineering and product management. Before joining Sybase, David had consulting, product management and software development roles. Courtney Claussen is a product manager at Sybase, Inc., focusing on Sybase's data warehousing and analytics products. She has enjoyed a 30 year career in software development, technical support and product marketing in the areas of computer aided design, computer aided software engineering, database management systems, middleware, and analytics.
Twitter Tag: #briefr
Tuesday, May 22, 12
Sybase IQ 15.4 Overview Big data analytics & Hadoop
Tuesday, May 22, 12
Sybase IQ
Widespread success
Manage and analyze statistical measures for the entire nation of Canada
Analyze ALL Federal tax returns in the US
Analyze complex models in more than 200 financial institutions worldwide
Stands out as the leading enterprise data warehouse among the largest banks, insurance agencies, and telecom operators worldwide
Store and analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and Axiom
2012 SAP AG. All rights reserved.
10
Tuesday, May 22, 12
BIG DATA ANALYTICS ISSUES
Dealing with volume, variety, velocity, costs, skills
Volume
Managing and harnessing terabytes of data
Skills
Lack of adequate skills for nonstandard platforms and APIs
Variety
Harmonizing silos of structured and unstructured data
BIG DATA ANALYTICS
Costs
Too expensive to acquire, operate, and expand
Velocity
Keeping up with unpredictable data and query flows
2012 SAP AG. All rights reserved.
11
Tuesday, May 22, 12
Sybase IQ 15
A powerful big data analytics platform in the making
2009
v15.0
2009
v15.1
2010
v15.2
2011
v15.3
2011
v15.4
Big data analytics
Skills Costs Variety Velocity Volume
2012 SAP AG. All rights reserved.
MapReduce API
PlexQ MPP Foundation
Text Search, Web 2.0 API
In-Database Analytics API
VLDB Platform Foundation
12
Tuesday, May 22, 12
Sybase IQ 15.4
A comprehensive platform for big data analytics
Eco-System
CONTROL CENTER
Sybase
POWERDESIGNER
Sybase
CERTIFITED ISV TOOLS Ingest + Persist Federation
App Services
Web 2.0
Java
C/C++
SQL
Unstructured Data (Hadoop, Content Mgmt)
Structured Data (DBMS)
DMBS
2012 SAP AG. All rights reserved.
13
Tuesday, May 22, 12
Details: In-Database Analytics & Hadoop
Tuesday, May 22, 12
In-database analytics in Sybase IQ
No compromise for complex analytics
Basic to advanced analytical functions available to SQL directly from Sybase IQ engine Data never leaves the database until results are materialized Analytics code / models must be shareable yet must allow AD-HOC analysis Analytics code / models must be applicable to the latest data set Standards based access, concept extensibility is compulsory Performance and scalability is a given Average developer must be able to build In-database analytical models
Sybase
IQ
Process Built-In
func6ons External
DLL
A
Database
=
Logic/Filtering Applied
in
database
External
DLL
A
Analy7cs
simplied:
Logic
To
Data
=
Fast
+
Ecient
2012 SAP AG. All rights reserved.
15
Tuesday, May 22, 12
Tuesday, May 22, 12
In-database analytics in Sybase IQ
Custom functions APIs
Several different forms of C++ and JAVA UDF APIs for building custom In-database analytics, each valid at different locations within queries 1.{Scalar} to {Scalar functions} e.g. sin, cosine, 2.{Scalar set} to {Scalar functions} e.g. max, min, 3.{Scalar set} to {Scalar set} e.g. OLAP windows, 4.{Scalar set} to {Tables} e.g. join result sets, 5.{Scalar set, Tables} to {Tables} e.g. MapReduce, All variants are parallelizable, but (5) is also distributable across the PlexQ grid
2012 SAP AG. All rights reserved.
17
Tuesday, May 22, 12
In-database analytics in Sybase IQ
Java custom functions
3
Feature
JAVA User Defined Function offers a new indatabase analytics API
Characteristics
External algorithms written as JAVA fns, plugged into Sybase IQ JAVA fns via SQL: runs InDatabase, much faster than client side JAVA fns run protected/fault tolerant (in separate process) Supports scalar and table outputs Supports all data types
Big Data Use Cases
Ideal for ISV or custom Data Mining libraries for Healthcare, eCommerce, Public Sector Apps include: ISV partner Zementis built a plug-in for PMML (Predictive Modeling Markup Language) models Validates PMML from SAS, R,.. Translates PMML to JAVA UDFs JAVA UDFs called from SQL
Plug-In
PMML
Zementis
JAVA UDF
Sybase IQ
2012 SAP AG. All rights reserved.
18
Tuesday, May 22, 12
SYBASE IQ 15.4 DECONSTRUCTED
App services integrating Sybase IQ + Hadoop: at client side 6a Feature
Client side federation: Join data from Sybase IQ AND Hadoop at a client application level
Characteristics
Client tool capable of querying Sybase IQ and Hadoop Currently certified client tool is Quest Toad for Cloud Better performance when results from sources are pre-computed/ pre-aggregated
Big Data Use Cases
Ideal for bringing together Big Data Analytics pre-computations from different domains
Example In Telecommunication: Sybase IQ with aggregated customer loyalty data & Hadoop with aggregated network utilization data; Quest Toad for Cloud can bring data from both sources, linking customer loyalty to network utilization or network faults (e.g. dropped calls)
Toad for Cloud Databases
$
Sybase IQ
Hadoop Hive
2012 SAP AG. All rights reserved.
19
Tuesday, May 22, 12
SYBASE IQ 15.4 DECONSTRUCTED
App services integrating Sybase IQ + Hadoop: using ETL 6b Feature
Load Hadoop data into Sybase IQ column store: Extract, transform, load data from HDFS (Hadoop Distributed File System) into Sybase IQ schemas
Characteristics
Extract & load subsets of HDFS data into Sybase IQ column store Raw data from HDFS Results of Hadoop MR jobs HDFS data stored in Sybase IQ is treated like other Sybase IQ data Gets ACID properties of a DBMS Can be indexed, joined, parallelized Can be queried in an ad-hoc way Visible to BI and other client tools via Sybase IQ ANSI SQL API only Currently, the Apache bulk data transfer utility SQOOP (built by Cloudera) is certified to provide this ETL capability
Big Data Use Cases
Ideal for combining subsets of HDFS unstructured data or summary of HDFS data into Sybase IQ for mid to long term usage in business reports
Example In eCommerce: clickstream data from weblogs stored in HDFS and outputs of MR jobs on that data (to study browsing behavior) ETLd into Sybase IQ. The transactional sales data in Sybase IQ joined with clickstream data to understand and predict customer browsing to buying behavior
ETL
Clickstream Data HDFS
Sales Data Sybase IQ
SQOOP
2012 SAP AG. All rights reserved.
20
Tuesday, May 22, 12
SYBASE IQ 15.4 DECONSTRUCTED
App services integrating Sybase IQ + Hadoop: using Data Federation 6c Feature
Join HDFS data with Sybase IQ data on the fly: Fetch and join subsets of HDFS data on-demand using SQL queries from Sybase IQ (Data Federation technique)
Characteristics
Scan and fetch specified data subsets from HDFS via table UDF Can read and fetch HDFS data subsets Called as part of Sybase IQ SQL query Output joinable with Sybase IQ data HDFS data not stored in Sybase IQ Fetched into Sybase IQ In-memory tables ACID properties not applicable
Big Data Use Cases
Ideal for combining subsets of HDFS data with Sybase IQ data for operational (transient) business reports Example In Retail: Point Of Sale (POS) detailed data stored in HDFS. Sybase IQ EDW fetches POS data at fixed intervals from HDFS of specific hot selling SKUs, combines with inventory data in Sybase IQ to predict and prevent inventory stockouts
Visible to BI/other client tools via Sybase IQ ANSI SQL API
POS Data HDFS UDF Bridge
Inventory Data Sybase IQ
2012 SAP AG. All rights reserved.
21
Tuesday, May 22, 12
SYBASE IQ 15.4 DECONSTRUCTED
App services integrating Sybase IQ + Hadoop: using Query Federation 6d Feature
Characteristics Characteristics
Trigger and fetch Hadoop MR job results via table UDF Can trigger Hadoop MR jobs Called as part of Sybase IQ SQL query Output joinable with Sybase IQ data HDFS data not stored in Sybase IQ Fetched into Sybase IQ In-memory tables ACID properties not applicable Repeated use: put fetched data in tables Visible to BI and other client tools via Sybase IQ ANSI SQL API
Big Data Use Cases
Ideal for combining results of Hadoop MR job results with Sybase IQ data for operational (transient) business reports Example In Utilities: Smart meter and smart grid data can be combined for load monitoring and demand forecast. Smart grid transmission quality data (multi-attribute time series data) stored in HDFS can be computed via Hadoop MR jobs triggered from Sybase IQ and combined with Smart meter data stored in Sybase IQ to analyze demand and workload.
Combine results of Hadoop MR jobs with Sybase IQ data on the fly: Initiate and Join results of Hadoop MR jobs on-demand using SQL queries from Sybase IQ data (Query Federation technique)
Smart Grid Transmission Data HDFS
Smart Meter Consumption Data Sybase IQ
UDF Bridge
2012 SAP AG. All rights reserved.
22
Tuesday, May 22, 12
SYBASE IQ 15.4
Unique, user community focused platform for big data analytics
Data
Discovery
(Data
Scien7sts)
Applica6on
Modeling
(Business
Analysts)
Reports/Dashboards
(BI
Programmers)
Business
Decisions
(Business
End
Users)
Full
Mesh
High
Speed
Interconnect
Infrastructure
Management
(DBAs)
SAN Fabric
Dynamic, elastic PlexQ MPP grid Grow, shrink, provision on-demand Heavy parallelization Load, prepare, mine, report in a workflow Privacy through isolation of resources Collaboration through sharing of results/data via sharing of resources
2012 SAP AG. All rights reserved. 23
Tuesday, May 22, 12
Thank you
Courtney Claussen Product Manager, Sybase IQ courtney.claussen@sap.com David Jonker Product Marketing Director, Sybase IQ david.jonker@sap.com
Tuesday, May 22, 12
Twitter Tag: #briefr
Tuesday, May 22, 12
Tuesday, May 22, 12
Most of the Big Data opportunity is, in the end, a Big Analytics opportunity. There are two challenges in this: Managing the data and the data flow Providing acceptable performance for analytics applications Hadoop and its associated technologies can be both a blessing and a curse.
Twitter Tag: #briefr
Tuesday, May 22, 12
Hadoop = Key-value store & Parallel processing framework Some NoSQL databases are DHT-based, some are specialized DBMS Column-store DBMS vary, but in general they are MPP RDBMS and NewSQL DBMS
Twitter Tag: #briefr
Tuesday, May 22, 12
Data volumes (includes complexity of data structure) Concurrency (includes also workload variability) Computation (is application dependent) Data flow architecture is a factor
Twitter Tag: #briefr
Tuesday, May 22, 12
In many ways this is similar to the Data Warehouse data flow challenge; writ larger Latency is about application service levels This is probably still a three stage process This is, by the way, a simplification
Twitter Tag: #briefr
Tuesday, May 22, 12
Big Analytics is here to stay In some analytical application areas speed is desirable, in others speed is critical. Warning: Workloads can be mixed Analytic speed depends upon the database engine, but also data flow architecture Business effectiveness depends upon integration with the business process
Twitter Tag: #briefr
Tuesday, May 22, 12
The prebuilt functions clearly make sense (for speed of processing). Are they intended to make some analytic tools unnecessary or simply to be called directly by such tools? What does SAP see as the appropriate role(s) for Hadoop in most businesses? As I understand it, Sybase IQ can fully replace Hadoop in some contexts. What are the situations where you think Hadoop AND Sybase IQ is appropriate? Im intrigued by the idea of JOINing data between Hadoop results and Sybase IQ, but Im not sure of the role of such a capability. How is this different from using MR for data ingest? As you can link up to Hadoop/Sybase IQ at the front or at the back-end, which would you tend to use when?
Twitter Tag: #briefr
Tuesday, May 22, 12
You speak of broad and comprehensive capability, in combination with Hadoop. So which areas do you think are sweet spots? And which kinds of application and/or data collections do you think require different approaches? Who have been the early adopters of this Hadoop/Sybase IQ capability and what kind of business problems are they trying to solve? What do you see as SAP HANAs role in this? Are the same analytical capabilities being added to SAP HANA?
Twitter Tag: #briefr
Tuesday, May 22, 12
Tuesday, May 22, 12
May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database
Twitter Tag: #briefr
Tuesday, May 22, 12
Tuesday, May 22, 12