0% found this document useful (0 votes)
64 views5 pages

Table of Contents: 1. Dremel

Dremel is an interactive SQL query engine for analyzing large amounts of protocol buffer data stored in various sources like log files and Bigtable. It is used internally at Google and also offered externally as BigQuery. Dremel provides different interfaces and client libraries and supports two SQL dialects: DremelSQL and GoogleSQL, with GoogleSQL being the recommended one. Borg is the cluster management system that Google uses to schedule and manage applications across its datacenters. A BorgMaster controls each Borg cell and schedules applications onto machines that meet requirements. Borg monitors health and restarts applications and machines in case of failures.

Uploaded by

Vijay Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views5 pages

Table of Contents: 1. Dremel

Dremel is an interactive SQL query engine for analyzing large amounts of protocol buffer data stored in various sources like log files and Bigtable. It is used internally at Google and also offered externally as BigQuery. Dremel provides different interfaces and client libraries and supports two SQL dialects: DremelSQL and GoogleSQL, with GoogleSQL being the recommended one. Borg is the cluster management system that Google uses to schedule and manage applications across its datacenters. A BorgMaster controls each Borg cell and schedules applications onto machines that meet requirements. Borg monitors health and restarts applications and machines in case of failures.

Uploaded by

Vijay Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

TableOfContents

1. Dremel
2. Borg

Dremel:

Dremelisaninteractive,fast,SQLbasedqueryengineforanalyzinghugequantitiesofprotocol
bufferdatastoredinlogfiles,ColumnIOfiles,Bigtable,andmanyothersources.InternallyDremelis
usedtoqueryallkindsofdataatGoogleincludinglogsandfinancialdata.Dremelisexternalizedas
partofGoogle'scloudofferingunderthenameBigQuery.

Dremeloffersavarietyofinterfaces,includingthecommandlineshell,webUIssuchas
#plxUI
,
Contour
,and
PowerDrill
,andclientlibrariesfor
C++
,
Java
,
Python
,and
R.

Tocreatereportsanddashboardsfordataanalysis

Tocreatescheduledpipelinesthatcleanandtransformdata

DremelprovidestwoSQLdialectsforwritingqueries:GoogleSQLandDremelSQL.DremelSQL
hasbeeninuseatGooglesinceDremelstarted.GoogleSQLisanewdialect.Ifyou'renewto
Dremel,useGoogleSQL.

GoogleSQLFeatures

LanguageiscompliantwiththeSQL2011Standard

SELECTDISTINCT
NonequalityJOINcondition(akanonthetajoins)
WITHclause
Correlatedsubqueries
EXISTSpredicate
UNIONALLaresupported(CommainFROMclausenolongermeans
UNIONALL,itmeansCROSSJOIN)
Exactandscalabledistinctaggregates

SUM(DISTINCT)andCOUNT(DISTINCT)
Betterqueryoptimization

FilterpushdownthroughJOINandotheroperators
Autosharding(nomoreJOINEACH/ALLorGROUPEACH)
Improvedhandlingofstructureddata

New
STRUCT
datatypeascontainerfororderedtypedfields
New
ARRAY
datatypetorepresentrepeateddata
ProtocolBuffersarefirstclassdatatypeswiththeabilitytooperateonand
returnfullprotomessages.
NonleaffieldscanbereferenceddirectlyinyourSQLstatements
SharedacrossqueryenginesatGoogleincludingF1andSpanner

DremelSQLFeatures

Allowscolumnaliaseswhichdonotmatchtheunderlyingrecordstructure.

UsesacommaasaUNIONoperatorformoretersequeriesoverlogsdata.

Hassomedifficultieshandlingindependentlyrepeatingfields.

Implicitlyflattensresultswhenusing:

ORDERBY
GROUPBY

AdvancedusersmaywishtoaccessDremelprogrammaticallythroughvariousclientlibraries
insteadofusingtheDremelclient.

C++

Python

Java

DremelR

Borg:
AnarchitectureforschedulingandmanagingapplicationsacrossallGoogledatacenters.Amaster
server(BorgMaster)anditsreplicascontroleachBorg"cell"(typicallyacluster)inadatacenter.
Withinacell,themasterschedulesapplicationsontomachinesthathavetheappropriateresources
available.Borgalsomonitorsapplicationandmachinehealthandrestartsthemincaseoffailure.

BorgisakeycomponentofGoogle'sclustermanagementsystem.Itcontrolsthedistributionof

jobswithinamachinecluster,assigningjobstomachinesinawaythatsatisfiesconstraints&
requirements(e.g.,memoryrequirements),andreassigningjobstoothermachinesas

necessary(e.g.,whenamachinefails).Borgwasdesignedtoperformsuchactivitiesona
massivescale.
Borgsignificantlyreducestheamountofmanagementoverheadrequiredtokeepclustersup
andrunning.ThisisahugeissuewhenconsideringthescaleatwhichGoogleoperates.Also,
byestablishingacommonpoolofmachinescontrolledbyamasterscheduler,therearegreater
opportunitiesforresourcesharingbetweenjobsandresourceutilizationproblemsareeasierto
identify.

Aphysicaldatacenterfacilityhousesoneormoreclustersofmachinessharingadministrative
resources(alockservernamespace,securityservices,machinerepairservices,etc.).Each
suchclustertypicallyalsocorrespondstooneBorgcell.(Theyalsooftensharethesame
twolettername.)ABorgcellcontainsaBorgmaster(anditsreplicas)andalargenumberof
slavemachines,eachofwhichrunsaBorgletdaemonprocess.Onecellmaycontainanywhere
fromafewhundredtotensofthousandsofmachines.
Weshallmanageborgjobsandwatchthemfromsigma.

All
Bigtable
serversrunonBorg

Plx:
#plxisacompletebigdataanalysisandvisualizationplatform.Youcansearchourdatacatalogfor
relevantdata,runinteractivequeriesonbillionrowdatasets,visualizetheresultsinconfigurable
dashboards,andcreatedatapipelinestoautomaticallyimportandprocessyourdata.Thegoalof
#plxistounifyandsimplifyaccesstoyourdata,whetheritbeinDremel
,
Tenzing
,
F1,orsomeother
system.

#plxsurfacesfourobjects:

#plxTablesarethedatasources

#plxScriptsanalyze#plxTablesandcanbeusedtocreatenewtables

#plxWorkflowsscheduletheautomatedexecutionof#plxScriptsandotherprocessing
requiredtocreate#plxTables

#plxDashboardsvisualize#plxTablesandtheresultsof#plxScripts.Youcanaccessthese
throughthe#plxsiteortheDremelorTenzingcommandlinetoolsorprogrammingAPIs.

YoucanalsoarrangetorunyourSQLscriptsautomaticallyonaschedule.

You might also like