0% found this document useful (0 votes)

68 views14 pages

Data Model For Big Data: Illustration: This Chapter Covers

This document discusses using Apache Thrift to implement a graph data model for the SuperWebAnalytics website. It covers defining nodes, edges, and properties using Thrift data types like unions and structs. Bringing all the pieces together, it shows how to define a DataUnit union to store properties and edges together, and a Data struct to pair each DataUnit with metadata like a timestamp. Overall, the document illustrates how to represent a graph schema using a serialization framework like Apache Thrift.

Uploaded by

Alex Adamitei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views14 pages

Data Model For Big Data: Illustration: This Chapter Covers

Uploaded by

Alex Adamitei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Data model for

Big Data: Illustration

This chapter covers

■ Apache Thrift
■ Implementing a graph schema using Apache
Thrift
■ Limitations of serialization frameworks

In the last chapter you saw the principles of forming a data model—the value of
raw data, dealing with semantic normalization, and the critical importance of
immutability. You saw how a graph schema can satisfy all these properties and saw
what the graph schema looks like for SuperWebAnalytics.com.
This is the first of the illustration chapters, in which we demonstrate the concepts of
the previous chapter using real-world tools. You can read just the theory chapters of
the book and learn the whole Lambda Architecture, but the illustration chapters show
you the nuances of mapping the theory to real code. In this chapter we’ll implement
the SuperWebAnalytics.com data model using Apache Thrift, a serialization frame-
work. You’ll see that even in a task as straightforward as writing a schema, there is fric-
tion between the idealized theory and what you can achieve in practice.

Licensed to Mark Watson <nordickan@gmail.com>

48 CHAPTER 3 Data model for Big Data: Illustration

3.1 Why a serialization framework?

Many developers go down the path of writing their raw data in a schemaless format
like JSON. This is appealing because of how easy it is to get started, but this approach
quickly leads to problems. Whether due to bugs or misunderstandings between differ-
ent developers, data corruption inevitably occurs. It’s our experience that data cor-
ruption errors are some of the most time-consuming to debug.
Data corruption issues are hard to debug because you have very little context on
how the corruption occurred. Typically you’ll only notice there’s a problem when
there’s an error downstream in the processing—long after the corrupt data was writ-
ten. For example, you might get a null pointer exception due to a mandatory field
being missing. You’ll quickly realize that the problem is a missing field, but you’ll have
absolutely no information about how that data got there in the first place.
When you create an enforceable schema, you get errors at the time of writing the
data—giving you full context as to how and why the data became invalid (like a stack
trace). In addition, the error prevents the program from corrupting the master data-
set by writing that data.
Serialization frameworks are an easy approach to making an enforceable schema.
If you’ve ever used an object-oriented, statically typed language, using a serialization
framework will be immediately familiar. Serialization frameworks generate code for
whatever languages you wish to use for reading, writing, and validating objects that
match your schema.
However, serialization frameworks are limited when it comes to achieving a fully
rigorous schema. After discussing how to apply a serialization framework to the Super-
WebAnalytics.com data model, we’ll discuss these limitations and how to work around
them.

3.2 Apache Thrift

Apache Thrift (http://thrift.apache.org/) is a tool that can be used to define statically
typed, enforceable schemas. It provides an interface definition language to describe the
schema in terms of generic data types, and this description can later be used to auto-
matically generate the actual implementation in multiple programming languages.

OUR USE OF APACHE THRIFT Thrift was initially developed at Facebook for
building cross-language services. It can be used for many purposes, but we’ll
limit our discussion to its usage as a serialization framework.

Other serialization frameworks

There are other tools similar to Apache Thrift, such as Protocol Buffers and Avro.
Remember, the purpose of this book is not to provide a survey of all possible tools
for every situation, but to use an appropriate tool to illustrate the fundamental con-
cepts. As a serialization framework, Thrift is practical, thoroughly tested, and
widely used.

Licensed to Mark Watson <nordickan@gmail.com>

Apache Thrift 49

The workhorses of Thrift are the struct and union type definitions. They’re composed
of other fields, such as
■ Primitive data types (strings, integers, longs, and doubles)
■ Collections of other types (lists, maps, and sets)
■ Other structs and unions
In general, unions are useful for representing nodes, structs are natural representa-
tions of edges, and properties use a combination of both. This will become evident
from the type definitions needed to represent the SuperWebAnalytics.com schema
components.

3.2.1 Nodes
For our SuperWebAnalytics.com user nodes, an individual is identified either by a
user ID or a browser cookie, but not both. This pattern is common for nodes, and it
matches exactly with a union data type—a single value that may have any of several
representations.
In Thrift, unions are defined by listing all possible representations. The following
code defines the SuperWebAnalytics.com nodes using Thrift unions:
union PersonID {
1: string cookie;
2: i64 user_id;
}

union PageID {
1: string url;
}

Note that unions can also be used for nodes with a single representation. Unions
allow the schema to evolve as the data evolves—we’ll discuss this further later in this
section.

3.2.2 Edges
Each edge can be represented as a struct containing two nodes. The name of an edge
struct indicates the relationship it represents, and the fields in the edge struct contain
the entities involved in the relationship.
The schema definition is very simple:
struct EquivEdge {
1: required PersonID id1;
2: required PersonID id2;
}

struct PageViewEdge {
1: required PersonID person;
2: required PageID page;
3: required i64 nonce;
}

Licensed to Mark Watson <nordickan@gmail.com>

50 CHAPTER 3 Data model for Big Data: Illustration

The fields of a Thrift struct can be denoted as required or optional. If a field is

defined as required, then a value for that field must be provided, or else Thrift will
give an error upon serialization or deserialization. Because each edge in a graph
schema must have two nodes, they are required fields in this example.

3.2.3 Properties
Last, let’s define the properties. A property contains a node and a value for the property.
The value can be one of many types, so it’s best represented using a union structure.
Let’s start by defining the schema for page properties. There’s only one property
for pages, so it’s really simple:
union PagePropertyValue {
1: i32 page_views;
}

struct PageProperty {
1: required PageID id;
2: required PagePropertyValue property;
}

Next let’s define the properties for people. As you can see, the location property is
more complex and requires another struct to be defined:
struct Location {
1: optional string city;
2: optional string state;
3: optional string country;
}

enum GenderType {
MALE = 1,
FEMALE = 2
}

union PersonPropertyValue {
1: string full_name;
2: GenderType gender;
3: Location location;
}

struct PersonProperty {
1: required PersonID id;
2: required PersonPropertyValue property;
}

The location struct is interesting because the city, state, and country fields could have
been stored as separate pieces of data. In this case, they’re so closely related it makes
sense to put them all into one struct as optional fields. When consuming location
information, you’ll almost always want all of those fields.

Licensed to Mark Watson <nordickan@gmail.com>

Apache Thrift 51

3.2.4 Tying everything together into data objects

At this point, the edges and properties are defined as separate types. Ideally you’d
want to store all of the data together to provide a single interface to access your infor-
mation. Furthermore, it also makes your data easier to manage if it’s stored in a single
dataset. This is accomplished by wrapping every property and edge type into a
DataUnit union—see the following code listing.

Listing 3.1 Completing the SuperWebAnalytics.com schema

union DataUnit {
1: PersonProperty person_property;
2: PageProperty page_property;
3: EquivEdge equiv;
4: PageViewEdge page_view;
}

struct Pedigree {
1: required i32 true_as_of_secs;
}

struct Data {
1: required Pedigree pedigree;
2: required DataUnit dataunit;
}

Each DataUnit is paired with its metadata, which is kept in a Pedigree struct. The
pedigree contains the timestamp for the information, but could also potentially con-
tain debugging information or the source of the data. The final Data struct corre-
sponds to a fact from the fact-based model.

3.2.5 Evolving your schema

Thrift is designed so that schemas can evolve over time. This is a crucial property,
because as your business requirements change you’ll need to add new kinds of data,
and you’ll want to do so as effortlessly as possible.
The key to evolving Thrift schemas is the numeric identifiers associated with each
field. Those IDs are used to identify fields in their serialized form. When you want to
change the schema but still be backward compatible with existing data, you must obey
the following rules:
■ Fields may be renamed. This is because the serialized form of an object uses the
field IDs, not the names, to identify fields.
■ A field may be removed, but you must never reuse that field ID. When deserializing
existing data, Thrift will ignore all fields with field IDs not included in the
schema. If you were to reuse a previously removed field ID, Thrift would try to
deserialize that old data into the new field, which will lead to either invalid or
incorrect data.

Licensed to Mark Watson <nordickan@gmail.com>

52 CHAPTER 3 Data model for Big Data: Illustration

■ Only optional fields can be added to existing structs. You can’t add required fields
because existing data won’t have those fields and thus won’t be deserializable.
(Note that this doesn’t apply to unions, because unions have no notion of
required and optional fields.)
As an example, should you want to change the SuperWebAnalytics.com schema to
store a person’s age and the links between web pages, you’d make the following
changes to your Thrift definition file (changes in bold font).

Listing 3.2 Extending the SuperWebAnalytics.com schema

union PersonPropertyValue {
1: string full_name;
2: GenderType gender;
3: Location location;
4: i16 age;
}
struct LinkedEdge {
1: required PageID source;
2: required PageID target;
}

union DataUnit {
1: PersonProperty person_property;
2: PageProperty page_property;
3: EquivEdge equiv;
4: PageViewEdge page_view;
5: LinkedEdge page_link;
}

Notice that adding a new age property is done by adding it to the corresponding
union structure, and a new edge is incorporated by adding it into the DataUnit union.

3.3 Limitations of serialization frameworks

Serialization frameworks only check that all required fields are present and are of the
expected type. They’re unable to check richer properties like “Ages should be non-
negative” or “true-as-of timestamps should not be in the future.” Data not matching
these properties would indicate a problem in your system, and you wouldn’t want
them written to your master dataset.
This may not seem like a limitation because serialization frameworks seem some-
what similar to how schemas work in relational databases. In fact, you may have found
relational database schemas a pain to work with and worry that making schemas even
stricter would be even more painful. But we urge you not to confuse the incidental
complexities of working with relational database schemas with the value of schemas
themselves. The difficulties of representing nested objects and doing schema migra-
tions with relational databases are non-existent when applying serialization frame-
works to represent immutable objects using graph schemas.

Licensed to Mark Watson <nordickan@gmail.com>

Summary 53

The right way to think about a schema is as a function that takes in a piece of data
and returns whether it’s valid or not. The schema language for Apache Thrift lets you
represent a subset of these functions where only field existence and field types are
checked. The ideal tool would let you implement any possible schema function.
Such an ideal tool—particularly one that is language neutral—doesn’t exist, but
there are two approaches you can take to work around these limitations with a serial-
ization framework like Apache Thrift:
■ Wrap your generated code in additional code that checks the additional properties you care
about, like ages being non-negative. This approach works well as long as you’re only
reading/writing data from/to a single language—if you use multiple languages,
you have to duplicate the logic in many languages.
■ Check the extra properties at the very beginning of your batch-processing workflow. This
step would split your dataset into “valid data” and “invalid data” and send a noti-
fication if any invalid data was found. This approach makes it easier to imple-
ment the rest of your workflow, because anything getting past the validity check
can be assumed to have the stricter properties you care about. But this approach
doesn’t prevent the invalid data from being written to the master dataset and
doesn’t help with determining the context in which the corruption happened.
Neither approach is ideal, but it’s hard to see how you can do better if your organiza-
tion reads/writes data in multiple languages. You have to decide whether you’d rather
maintain the same logic in multiple languages or lose the context in which corruption
was introduced. The only approach that would be perfect would be a serialization
framework that is also a general-purpose programming language that translates itself
into whatever languages it’s targeting. Such a tool doesn’t exist, though it’s theoreti-
cally possible.

3.4 Summary
For the most part, implementing the enforceable graph schema for SuperWebAnalyt-
ics.com was straightforward. You saw the friction that appears when using a serializa-
tion framework for this purpose—namely, the inability to enforce every property you
care about. The tooling will rarely capture your requirements perfectly, but it’s impor-
tant to know what would be possible with ideal tools. That way you’re cognizant of the
trade-offs you’re making and can keep an eye out for better tools (or make your own).
This will be a common theme as we go through the theory and illustration chapters.
In the next chapter you’ll learn how to physically store a master dataset in the
batch layer so that it can be processed easily and efficiently.

Licensed to Mark Watson <nordickan@gmail.com>

Features of Apache Thrift
By Randy Abernethy

In this article, excerpted from The Programmer's Guide to Apache

Thrift, we’ll discuss the key features of Apache Thrift.

There are several key benefits associated with using Apache Thrift to develop network
services or perform cross language serialization tasks.

 Full SOA Implementation - Apache Thrift supplies a complete SOA solution

 Modularity - Apache Thrift supports plug-in serialization protocols and transports
 Performance - Apache Thrift is fast and efficient
 Reach - Apache Thrift supports a wide range of languages and platforms
 Flexibility - Apache Thrift supports interface evolution

Let’s take a look at each of these features in turn.

Service Implementation
Services are modular application components that provide interfaces accessible over a
network. Service interfaces are described in Apache Thrift using Interface Definition Language
(IDL) (see Listing 1). The IDL can be compiled to generate stub code used to connect clients
and servers in a wide range of languages.

For example, imagine you have a C++ module in a GUI application that tracks and
computes sailing team statistics for the America’s Cup. As it happens, your company’s web
development team would like to use the sail stats module to enhance a client facing web
application, but the web site is written in PHP. To provide the sail stats features to the web
dev team the sail stats module can be deployed as a network service.

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/
Figure 1 - Converting a module from a monolithic application (above dotted line) into a network service
for a distributed application (below dotted line)

Microservices And Service Oriented Architecture (SOA)

The microservices and SOA approaches to distributed application design break applications down
into services, which are remotely accessible autonomous modules composed of a set of closely
related functions. SOA based systems generally provide their features over language agnostic
interfaces, allowing clients to be constructed in the most appropriate language and on the most
appropriate platform, independent of the service implementation. SOA services are typically
stateless and loosely coupled, communicating with clients through a formal interface contract. SOA
services may be internal to an organization or support clients across business boundaries.

Encapsulating the SailStats module in a SOA style service will make it easy for any part of
the company’s enterprise to access the service. There are several common ways to build SOA
services using web-oriented technologies. However many of these would require the
installation of web or application servers, possibly a material amount of additional coding, the
use of HTTP communications schemes and text based data formats, which are broadly
supported but not famous for being fast or compact.

Apache Thrift offers a compelling alternative. Using Apache Thrift IDL, we can define a
service interface with the functions we want to expose. We can then use the Apache Thrift
compiler to generate RPC code for our SailStats service in PHP and C++ (and most other
commercially viable languages). The web team can now use code generated in their language

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/
of choice to call the functions offered by the SailStats service, exactly as if the functions were
defined locally (see Figure 1).

Apache Thrift also supplies a complete library of RPC servers. This means that you can use
one of the powerful multithreaded servers provided by Apache Thrift to handle all of the
server RPC processing and concurrency matters. Apache Thrift RPC servers are not only fast
but they also have a much smaller footprint than most web application servers, making them
suitable for many embedded systems.

Listing 1

service SailStats {
double GetSailorRating(1: string SailorName)
double GetTeamRating(1: string TeamName)
double GetBoatRating(1: i64 BoatSerialNumber)
list<string> GetSailorsOnTeam(1: string TeamName)
list<string> GetSailorsRatedBetween(1: double MinRating,
2: double MaxRating)
string GetTeamCaptain(1: string TeamName)
}

In summary, to turn a code library or module into a high performance RPC service with
Apache Thrift, all we need do is:

1. Define the service interface in IDL

2. Compile the IDL to generate client and server RPC stub code in the desired languages

3. On the client side call the remote functions as if they were local using the client stubs

4. On the Server side connect the server stubs to the desired functionality

5. Choose one of the prebuilt Apache Thrift servers to host the service

In exchange for a fairly small amount of work, we can turn almost any set of existing
functions into a high performance Apache Thrift service, accessible from a broad range of
client languages.

Modular Serialization
To make a function call from a client to a server, both client and server must agree on the
representation of data exchanged. The typical approach to solving this problem is to select an
interchange format and then to transform all data to be exchanged into this interchange
format. The process of transforming data to and from an interchange format is called
serialization.

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/
The Apache Thrift framework provides a complete, modular, cross language serialization
layer which supports RPC and stand alone serialization. Serialization frameworks make it easy
to store data to disk for later retrieval by another application. For example, a service written
in C that captures live earthquake data in a C struct could serialize this data to disk using
Apache Thrift (see figure 3). The serialization process converts the C struct into a generic
Apache Thrift serialized object. At a later time, a Ruby earthquake analysis application could
use Apache Thrift to restore the serialized object. The serialization layer takes care of the
various differences in data representation between the languages automatically.

Figure 2 - Apache Thrift serialization protocols enable different programming languages to share
abstract data types

A fairly unique feature of the Apache Thrift serialization framework is that it is not hard-
wired to a single serialization protocol. The serialization layer provided by Apache Thrift is
modular, making it possible to choose from an assortment of serialization protocols, or even
to create custom serialization protocols. Out of the box, Apache Thrift supports an efficient
binary serialization protocol, a compact protocol that reduces the size of serialized objects
and a JSON protocol which provides broad interoperability with JavaScript and the web. A
ZLib layer can also be added to provide high ratio compretion in some languages.

Performance
Apache Thrift is a good fit in many distributed computing settings, however it excels in the
area of high performance backend services. The choice of prebuilt and custom protocols for
serialization allows the application designer to choose the most appropriate serialization
protocol for the needs of the application, balancing transmission size, speed, portability and
human readability.

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/
Custom Apache Thrift REST

Extreme High Performance Extreme

Performance Broad Reach Reach

Figure 3 – Apache Thrift balances performance with reach and flexibility.

Apache Thrift supports compiled languages such as C, C++, Java and C#, which generally
have a performance edge over interpreted languages. This allows performance-critical
services to be built in the appropriate language while still providing interoperability with
highly productive front end development languages.

Apache Thrift RPC servers are lightweight, performing only the task of hosting Apache
Thrift services. A selection of servers is available in various languages giving application
designers the flexibility to choose a concurrency model well suited to their application
requirements. These servers are easy to deploy and load balance as standalone processes or
within virtual machines or containers.

Apache Thrift covers a wide range of performance requirements in the spectrum between
custom communications development on one end and REST on the other (see figure 3). The
lightweight nature of Apache Thrift combined with a choice of efficient serialization protocols
allows Apache Thrift to meet demanding performance requirements while offering support for
an impressive breadth of languages and platforms.

Reach
The Apache Thrift framework supports a number of programming languages, operating
systems and hardward platforms in both serialization and service capacities. Companies that
are growing and changing rapidly need solutions that give teams the flexibility to integrate
with new languages and platforms rapidly and with low friction. Apache Thrift can be a
significant business advantage in such settings. Figure 4 illustrates the broad scope of
environments within which Apache Thrift is often found.

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/
Figure 4 - Apache Thrift is an effective solution in embedded, enterprise and web technology
environments.

The table below provides a list of the languages currently supported directly by Apache
Thrift. Note that support for C# enables other .Net/CLR languages, such as F#, VisualBasic
and IronPython. By the same token, support for Java enables most JVM based languages to
interoperate with Apache Thrift, including Scala, Clojure and Groovy. JavaScript support is
provided for browser based applications and Node.js. Other projects found on the web expand
this list further.

Table 1 - Languages supported by Apache Thrift

C C++ C# D

Delphi Erlang Go Haskell

Haxe Java JavaScript Lua

Objective-C OCaml Perl PHP

Python Ruby Smalltalk TypeScript

Apache Thrift supports these languages on a range of platforms including Windows, iOS,
OS X, Linux, Android and many other Unix-like systems. Because Apache Thrift is compact
and supports C/C++ and JavaME, it is often appropriate for embedded systems. Apache Thrift
also supports HTTP[S], Webscoket and an array of web tech languages, including Perl, PHP,
Python, Ruby and JavaScript, making it viable in web oriented environments. Few frameworks
can supply the breadth of reach in languages and platforms offered by Apache Thrift.

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/
Interface Evolution
Interface evolution is the process of changing the elements of an interface gradually over
time. Modern IDL based systems like Apache Thrift make it possible to evolve interfaces
without breaking interoperability with modules built around older versions of the interface.

For example, consider the previously described earthquake application where a C

language program writes a C language struct to disk each time a tremor is reported. Let’s
assume that the earthquake struct contains fields for the date, time, position and magnitude.
The interface evolution features of Apache Thrift allow new fields, say the earthquake’s
nearest city and state, to be added to the earthquake struct without breaking other
applications reading the serialized data. The Ruby reporting program will continue to read old
and new earthquake files, simply ignoring fields it does not recognize. Should the Ruby
programmers require the new fields they may add support for them at their leisure, using
default values when old files without the new fields are read.

Early RPC systems like SunRPC, DCE RPC, CORBA and MSRPC supplied little or no support
for interface evolution. As platforms grow and requirements change, rigid interfaces can make
it hard to extend and maintain RPC based services. Modern RPC systems such as Apache
Thrift provide a number of features which allow interfaces to evolve over time without
breaking compatibility with existing systems. Functions can be extended with new
parameters, old parameters can be removed, and default values can be supplied. Properly
applied these changes can be made without impacting peers using older versions of the
interface.

Modern engineering sensibilities such as Microservices, Continuous Integration (CI) and

Continuous Delivery (CD) require systems to support incremental improvements without
impacting the rest of the platform. Systems that do not supply some form of interface
evolution tend to “break the world” when changed. In such systems changing an interface
means that all of the clients and servers using that interface must be rewritten and/or
recompiled, then redeployed in a big bang. Apache Thrift interface evolution features allow
multiple interface versions to coexist, making incremental updates simple and natural.

For source code, sample chapters, the Online Author Forum, and other resources, go to
http://www.manning.com/abernethy/

Algebraic Property Graphs (Shinavier, Et Al 2022)
No ratings yet
Algebraic Property Graphs (Shinavier, Et Al 2022)
15 pages
Session - 6 - Complex Data Types
No ratings yet
Session - 6 - Complex Data Types
27 pages
An Introduction To Graph Data Management
No ratings yet
An Introduction To Graph Data Management
39 pages
Data Engineers' Guide to Delta & Spark
No ratings yet
Data Engineers' Guide to Delta & Spark
5 pages
Software Engineer Concepts - 4030afdb-00a4-4f83-A520 - 241007 - 202416
No ratings yet
Software Engineer Concepts - 4030afdb-00a4-4f83-A520 - 241007 - 202416
26 pages
RDFS: Enhancing RDF with Class Hierarchy
No ratings yet
RDFS: Enhancing RDF with Class Hierarchy
4 pages
Final Report
No ratings yet
Final Report
22 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
Detailed Study of The Object-Oriented Programming
No ratings yet
Detailed Study of The Object-Oriented Programming
11 pages
Unit 2
No ratings yet
Unit 2
41 pages
Facebook Thrift
No ratings yet
Facebook Thrift
13 pages
Core Classes and Properties in RDF-CT-1
No ratings yet
Core Classes and Properties in RDF-CT-1
35 pages
Object-Based Logical Models Guide
No ratings yet
Object-Based Logical Models Guide
6 pages
42 P16cse5a-P16ite3a 2020052204503639
No ratings yet
42 P16cse5a-P16ite3a 2020052204503639
23 pages
Translation To Relational Database: Abstract
No ratings yet
Translation To Relational Database: Abstract
4 pages
MongoDB Data Models Guide
100% (1)
MongoDB Data Models Guide
39 pages
Unit1 Object Relational Database
No ratings yet
Unit1 Object Relational Database
54 pages
Objects, Fields and Methods - OpenERP Server Developers Documentation 7.0b Documentation
No ratings yet
Objects, Fields and Methods - OpenERP Server Developers Documentation 7.0b Documentation
9 pages
ORM API - Odoo 13.0 Documentation
No ratings yet
ORM API - Odoo 13.0 Documentation
1 page
Cheat Sheet v2
No ratings yet
Cheat Sheet v2
3 pages
ch20 22
No ratings yet
ch20 22
8 pages
Oo
No ratings yet
Oo
78 pages
Wa0000.
No ratings yet
Wa0000.
35 pages
2.1.2 Data Models
No ratings yet
2.1.2 Data Models
13 pages
8 Data Modeling Patterns in Redis
No ratings yet
8 Data Modeling Patterns in Redis
56 pages
Big Data & Hadoop Overview
No ratings yet
Big Data & Hadoop Overview
44 pages
Ontology
No ratings yet
Ontology
14 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
ADB ch1 Advanced Database
No ratings yet
ADB ch1 Advanced Database
12 pages
Week9 RDF Scheme Final
No ratings yet
Week9 RDF Scheme Final
38 pages
8 Data Modeling Patterns in Redis
No ratings yet
8 Data Modeling Patterns in Redis
56 pages
Algorithm and Data Structure Lecture 1a
No ratings yet
Algorithm and Data Structure Lecture 1a
4 pages
Unit 2
No ratings yet
Unit 2
18 pages
CH 3 Data Modeling
No ratings yet
CH 3 Data Modeling
31 pages
OODJ Assignment
100% (1)
OODJ Assignment
16 pages
R23 IDS Unit3
No ratings yet
R23 IDS Unit3
36 pages
Graph Databases in Public Procurement
No ratings yet
Graph Databases in Public Procurement
78 pages
Cyber Infrastructure For The Power Grid: Computation Lecture 2: Data Management
No ratings yet
Cyber Infrastructure For The Power Grid: Computation Lecture 2: Data Management
41 pages
Diplo Cloud
No ratings yet
Diplo Cloud
5 pages
DDB Unit - 5
No ratings yet
DDB Unit - 5
42 pages
Neo4j Graph Database Data Modeling
No ratings yet
Neo4j Graph Database Data Modeling
64 pages
NoSQL - U1
No ratings yet
NoSQL - U1
8 pages
Rohini 38341594818
No ratings yet
Rohini 38341594818
18 pages
FB Thrift
No ratings yet
FB Thrift
4 pages
DBMS Answer Key
100% (1)
DBMS Answer Key
26 pages
ClassHandout CES2317 Grubb AU2024 - 1727803930161001RIdv
No ratings yet
ClassHandout CES2317 Grubb AU2024 - 1727803930161001RIdv
29 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
Query Analysis On A Distribute GraphDB
No ratings yet
Query Analysis On A Distribute GraphDB
151 pages
Chapter 5
No ratings yet
Chapter 5
18 pages
Distributed Thrucene
No ratings yet
Distributed Thrucene
3 pages
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
RDF Schemata: (With Apologies To The W3C, The Plural Is Not Schemas')
No ratings yet
RDF Schemata: (With Apologies To The W3C, The Plural Is Not Schemas')
33 pages
Unit - 6
No ratings yet
Unit - 6
7 pages
MGraph New Thinking
No ratings yet
MGraph New Thinking
15 pages
Sesame: An Architecture For Storing and Querying RDF Data and Schema Information
No ratings yet
Sesame: An Architecture For Storing and Querying RDF Data and Schema Information
16 pages
RDF Schema: Basic Ideas: 1.1 Classes and Properties
No ratings yet
RDF Schema: Basic Ideas: 1.1 Classes and Properties
16 pages
Nosql MQP Solution
No ratings yet
Nosql MQP Solution
53 pages
Apache Thrift-White Paper
No ratings yet
Apache Thrift-White Paper
8 pages
Si 09 GEA2Chapter1
No ratings yet
Si 09 GEA2Chapter1
33 pages
Distributed Systems: C6 Termination Detection (TD)
No ratings yet
Distributed Systems: C6 Termination Detection (TD)
24 pages
Distributed Systems: C5 Basic Distributed Algorithms
No ratings yet
Distributed Systems: C5 Basic Distributed Algorithms
29 pages
C1 C2 C3 Review DCmodel GlobalStates TimeCausality
No ratings yet
C1 C2 C3 Review DCmodel GlobalStates TimeCausality
81 pages
Reliable and Fault Tolerant Distributed Systems
No ratings yet
Reliable and Fault Tolerant Distributed Systems
45 pages
Mpls and GMPLS: Preluat: Li Yin Berkeley, CS 294
No ratings yet
Mpls and GMPLS: Preluat: Li Yin Berkeley, CS 294
29 pages
How To Learn Any Language.: Click To Download E-Book
No ratings yet
How To Learn Any Language.: Click To Download E-Book
2 pages
Kiowa Noun Classes & Number Theory
No ratings yet
Kiowa Noun Classes & Number Theory
233 pages
Prez2 Sec
No ratings yet
Prez2 Sec
85 pages
Winer2016 PDF
No ratings yet
Winer2016 PDF
8 pages
Microelectronics and The Personal Computer PDF
No ratings yet
Microelectronics and The Personal Computer PDF
17 pages
Raluca Iulia Iulian - Kosovo - A Unique Experiment PDF
No ratings yet
Raluca Iulia Iulian - Kosovo - A Unique Experiment PDF
7 pages
Christos Papadimitriou - Combinatorial - Optimization - Algorithms - and - Complexity PDF
No ratings yet
Christos Papadimitriou - Combinatorial - Optimization - Algorithms - and - Complexity PDF
9 pages
Curs1 2017 18
No ratings yet
Curs1 2017 18
99 pages
Wolfram Hinzel - Mind Design and Minimal Syntax PDF
100% (2)
Wolfram Hinzel - Mind Design and Minimal Syntax PDF
314 pages
Accelerator Program Brochure
No ratings yet
Accelerator Program Brochure
12 pages
TCS CodeVita 13 FAQs, Instructions and Best Practices
No ratings yet
TCS CodeVita 13 FAQs, Instructions and Best Practices
10 pages
SAP RFC: A Guide for Developers
100% (1)
SAP RFC: A Guide for Developers
9 pages
The Proposed Simplex Method For The Solution of Linear Programming Problems
No ratings yet
The Proposed Simplex Method For The Solution of Linear Programming Problems
9 pages
Show That The Number of Distinct Binary Search Trees B (N) That Can Be Constructed For A Set of N Orderable Keys Satisfies The Recurrence Relation
No ratings yet
Show That The Number of Distinct Binary Search Trees B (N) That Can Be Constructed For A Set of N Orderable Keys Satisfies The Recurrence Relation
3 pages
13.1 C Programming - Linked Lists
No ratings yet
13.1 C Programming - Linked Lists
3 pages
TCS Wings1 T4 - Spring Boot MCQs (100 Questions)
No ratings yet
TCS Wings1 T4 - Spring Boot MCQs (100 Questions)
22 pages
Distributed Systems with ZooKeeper
No ratings yet
Distributed Systems with ZooKeeper
1 page
MDG 8.0 Consolidation Guide
No ratings yet
MDG 8.0 Consolidation Guide
47 pages
Distributed Order Management - Retail & Commerce
No ratings yet
Distributed Order Management - Retail & Commerce
12 pages
Full Stack Development Course Guide
No ratings yet
Full Stack Development Course Guide
4 pages
Blue Coat ProxySG Security Analysis
No ratings yet
Blue Coat ProxySG Security Analysis
43 pages
Spring 2025 - CS304P - 1
No ratings yet
Spring 2025 - CS304P - 1
3 pages
Nvidia-Learning-Training Course-Catalog
No ratings yet
Nvidia-Learning-Training Course-Catalog
27 pages
Command Automation
No ratings yet
Command Automation
6 pages
Amazon Redshift To Bigquery SQL Translation Reference
No ratings yet
Amazon Redshift To Bigquery SQL Translation Reference
47 pages
Array in C'
No ratings yet
Array in C'
17 pages
Shell and Unix Notes
No ratings yet
Shell and Unix Notes
28 pages
Ocrs Cse Aiml Babariya Vivek 201430142086 Edited
No ratings yet
Ocrs Cse Aiml Babariya Vivek 201430142086 Edited
116 pages
Assignment Problems in Logistics
No ratings yet
Assignment Problems in Logistics
11 pages
Matlab Activity 1-1
No ratings yet
Matlab Activity 1-1
1 page
ArcSight Flexconnector Training V1
No ratings yet
ArcSight Flexconnector Training V1
42 pages
Orange en 1
No ratings yet
Orange en 1
100 pages
C Programming Lexical Analysis
No ratings yet
C Programming Lexical Analysis
47 pages
Simulado para o Exame 70 Pt1
No ratings yet
Simulado para o Exame 70 Pt1
11 pages
Dictionary by Vignesh Prasad V & Vishal Sapatla: Objective
No ratings yet
Dictionary by Vignesh Prasad V & Vishal Sapatla: Objective
8 pages
AC11 Sol PDF
100% (1)
AC11 Sol PDF
194 pages
Linux
No ratings yet
Linux
66 pages
RAG - Context-Aware Chunking - Google Drive To Pinecone Via OpenRouter & Gemini
No ratings yet
RAG - Context-Aware Chunking - Google Drive To Pinecone Via OpenRouter & Gemini
9 pages
ERDAS Macro Language: On-Line Manual
No ratings yet
ERDAS Macro Language: On-Line Manual
285 pages

Data Model For Big Data: Illustration: This Chapter Covers

Uploaded by

Data Model For Big Data: Illustration: This Chapter Covers

Uploaded by

Data model for

Big Data: Illustration

This chapter covers

Licensed to Mark Watson <nordickan@gmail.com>

3.1 Why a serialization framework?

3.2 Apache Thrift

Other serialization frameworks

Licensed to Mark Watson <nordickan@gmail.com>

Licensed to Mark Watson <nordickan@gmail.com>

The fields of a Thrift struct can be denoted as required or optional. If a field is

Licensed to Mark Watson <nordickan@gmail.com>

3.2.4 Tying everything together into data objects

Listing 3.1 Completing the SuperWebAnalytics.com schema

3.2.5 Evolving your schema

Licensed to Mark Watson <nordickan@gmail.com>

Listing 3.2 Extending the SuperWebAnalytics.com schema

3.3 Limitations of serialization frameworks

Licensed to Mark Watson <nordickan@gmail.com>

Licensed to Mark Watson <nordickan@gmail.com>

In this article, excerpted from The Programmer's Guide to Apache

 Full SOA Implementation - Apache Thrift supplies a complete SOA solution

Let’s take a look at each of these features in turn.

Microservices And Service Oriented Architecture (SOA)

1. Define the service interface in IDL

Extreme High Performance Extreme

Figure 3 – Apache Thrift balances performance with reach and flexibility.

Table 1 - Languages supported by Apache Thrift

Delphi Erlang Go Haskell

Haxe Java JavaScript Lua

Objective-C OCaml Perl PHP

Python Ruby Smalltalk TypeScript

For example, consider the previously described earthquake application where a C

Modern engineering sensibilities such as Microservices, Continuous Integration (CI) and

You might also like