0% found this document useful (0 votes)
317 views39 pages

IDS Unit-1-Handwritten

Introduction to Data Science- Engineering 4th Semester

Uploaded by

prohithcsd216744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
317 views39 pages

IDS Unit-1-Handwritten

Introduction to Data Science- Engineering 4th Semester

Uploaded by

prohithcsd216744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 39
UNIT-L oO — _Sntrodluction to Auta Scene = ConTENTS a beficition of Baila scfance ; ty batt ard balasdence hype, Ging post the hype ' Blin, caren lardscepe of perspec Glatistteal Anfewnce, populalions and aie : staticltea| rndellng thm a mexlroree ie: Enplolraton bata arolie. deliefitton of Dale sdianee— ; 1. beitascfence %s the ort of stady thal extiacks rang maryland ‘ntespae Krooked. fem vac omount of data. xen various ackenlfe melhods ' afi and: HOCLSELS , 2, bata sdunce is on mullidtecplinasy field thot allows yore elt trl fiom ctuclired w+ unskuctired dala. 3. hala sclance enables you'to banslate o business poblentato axesearch pal and then translate 9t back ilo a pratical solution 4. bata science “fers te get of theoses and lechei aes fiom rary fidds and discipline axe used to fevestigate and orale alae aroun of alala to help deck ston makers in many Industiies such as scfencey engineering e-Commerce, economies, polis finance, and education Bata seance process of Wfecyce 5 Biscover \ [ | lancfing Bxblding \ Bisconey Besxoey stip fowolves al data-fro alt the ickotified ‘ 4nternal and etteral Sources which helps your answer the buciness question | 2. peepoaon- hati. can have macy anrcorsistencies like missing values, blank columns, an incorrect dei fovenat, which need st be cleaned 3 beat ploctog —Th this slg youned to determine the method and ‘techriqae ty dra the velalion belween inp andl ouput vardables. Ho Model eat The aclual model buildé pocess staid Here, bate | Seclenttel dictvibulec datasets fo: Tatrng and tty 5. Operattonalise + fou debiver the inal baseltned mode! with vapors code _ | = and technical documents tn this clage. ©. be Conpenucticale Results Ih “this stages the Key +fiedicgs axe communicated 6 all stakeholders, Applicalfors of bala Schence— Vet Talent search = Geo le seach uses dala sciance Techndogy search fp spec xeatt with a-fiactin ofa second oe a Recperruedation syslen + Te cveike avecommendalion sae foteq, Saale fisids'0n ‘face book o¥ cited videos on ee . 5. tog and speech Rescatlon- speech Ge suit (ke simi, §eeqle / -Acctstant and Aleta wn onthe datascience techie Moreover; facebook eo gout falerd when ypu uploe aphale with then A 4, eodog nies EA gots Sony, niatendo are vstrg dat ecanee tehrlogy “This enhances ye qty expec. 8. online price comparision — price Runner} ura i shop {la workvon the olla eeience mechanism . hy Dalagciance%s fen oitant 2 t, To poss ava volanesof dala ‘Acodlrg tb tOc by 805 qlotal dala. will qoute 5 zattabylis To pores lange volume of dat. &. bala sefence enables compories Ta ef ficien undexttand complex chuctaved dala ‘fein enattiple sources anddevive valuable feats to make smavtex dala dviven decistone 30 Watacdence % widely ured in various ‘inducty ea) rrotib hreattheave,-ftrarce, banking 1 pobty worksand more. What 4s Btq: pala 4, By palais acollectton of dale that fs agen dalame yet psy expe nth Ge EE is adata with so laige dice and completily that name none Taditiona| data: iraranerent ols can store t ot process ery Example te Social med, The statistic shows that Boo + teal of new data qd anceited Galo the databases of Social medtactte Face- | book every dow THs data % mde generate fn teres of phil ard video uploads, reesioge EeneeHt; patting commeals ele. @ The Nusyork Stock Exchanges an eipenple a etait tht quel | about one terabyte of new bade Baap Bo a tndine can ger tor teabyle of dalaén 30 minules of i with rare thossand| -flsghts per day rvalien of data ac caches. upto ee pelaby tes Lys of Bq dala + @ le Shudlurad dala An dala that ean be loved accessec| and processed flhe form t fi ated post asterened as a shaclived data c “Tabi es| ' 8 Nowadays pwe-arve foresedi fssues whena size such dala ins toa huge eitent /Lypical sizes ay athe tage of rattle _ altel ‘igs - Employee 1D. : Employee tame _ | Gender, | Depattment | salary Trt Q365 R ie h Kulkorrti Male Fconce a 3348 peatibha joshi * Feanale ot 600000 AAYGB shushil Roy J Male Aderfin 00000 aoe ‘shu hofit Das male Finance | 0co0 4614 a aSane Female france 5600000 a unskuitard oPoy data with unknown form ov the shiuckan fcc sifted + as undkiuctared data Th oddflton to the size: bein Pages ; un-atwuclaved data poses cnalbiple eatin farterens of ils Pe ina fe devi vir aA typo al value ott oft, a cormbinal en t € f undiidluved data ts a hetivog eos aks source canln sep eat lesa Nideos ete «The oalpat of Se ogl seaseh)) 3. Sunhistiaclured ¢ Serri-Shuclared dt can comtefs both The forms 9 data neon sue ssn cit as atacand fom bak tt acualy not defined Saori Semi-clyucived data fsa data precited finan xmb file LieC> Chayaclaistics of efg ota ;~ te volume © Refers to the amount of dala that exists Af The volume of data is lage enough, if thea be concidered bay dala, her 2 yal vast of hte snuees and there of il, bth cTiuckived and undtructared 3. lat & Refie to the ape of qentn of dali the dala generated and proces ed to meet the demands , deleyentines » real potenltal inthe data . 4. Vedoktltly <7 wef te the Snconets rng which can be chown by the : at afte «thas harpeting th process of. big able te handle and ranage ‘the data effi j a. value - Refers to the value that big data can provide, ard. at alates dlely “Lo wha oxgoricalios cando-uilhy thal calleled dala. Applications of Big bila 'e Banking and AInsuvdince seclars Qe commurticalfens, Meda’ and! enlevtaioment Tu Bo Healthcare providers 4. tducalion Be Morag ard niilaval Resources 6. Government Alo Retail ard wholesale Fade . cys & Toonspallon Ne Energy are utility . Liretaltons of Big oat’: be Sees, ata.cats can Xequtre-condleable seoureecto above . 2 Focmaltin sed al cnt edie fra im avd clarog elds ray be vagtved before dita aie : Be Qualfly conliol 3 canbe di seatl-and often, has to be-done thro eral aaa vi ome . t flin more cole than fo Tiadittona| data cits ‘ cece ‘acd early of hes _ al ap proaches are anal new and fenpeect although these aa continue tO ii 4 ie Sacwdtly ard phienpeoneror ‘ny yore over Lime 6, atacscance 's bluny term: bataccencefca ver genveal tern and does not AR AAA AAS hove a defintte defteiton. whilett has become abuxzwoarel atts ay hurd to write cownithe enact rooting ofa bala cehenltel. s Mailog hla scoee ts nace fopesbles bang anizlare ro ffelds, bala, dence. ems fren stltstes compute Sdeece and enathemaltes Tf th fs fav -fiem posible ts waster tach fide and be eos eipeitinall of fem ' fo Large ferust of Domain kouledge agree dtsodvarlage orale Scunceis is deperdene on domdin Knowledges A person usith a- const derable: background fn slalfilfes ore engl eGo willie ‘ot db ficult Te solve data scence poblen ufithoul ¥ backooure troche Ae Aeilany balieey yl eneepeiti sll sf dala scunlict orth dale. | * aed makes easefal precicfons tnowderte ocililalz the dacstn-snkg process ttar ines the deta prof is sty ard does rok yal | eapected walls | m4 lo. peblin at bate phvscyt: oanate aindushies dala istheivfadl. bala etunlicts help compares make daita-diven decistors. “Howeiethe dats ubilee tithe process may bach the privacy of eaflrne. | Big pal aod bala setts o ie Given the hype cxound ‘data scence; the vealfli 4 that most compante sit fail +o ose mach of the dala they collect-and tore dat my business pébRities & wh Y Now + Tichnalogy ‘makes This posstble oT vashudluve-for laege dl tat process a7 neteaseenrery and bardusidlh DiteRetion Ladataftealfen is the jie op ley all aps of bf an Teg , daa toi bataftealfon aliens to Frarsfon erost aspeils of abutinsiote oposlfinle datacthat- can be tracked rnoriiboved ‘ord onalyeed 15 “tax on eran fila o 2, Rofeys to The vse og Tals and processes + data-driven erferplse Exompls nl to Tilley dilafes aby hou ts Lo Untedin dati pafeseal Aelowrks - 3o Geog cue ealty glasses daliy geceClocts) Cees lardtape of pps — > batascence 4 not enevel afaliclics ov hacking ormalherelics hele setence te the civil enjiowog f datas Ft fncludes. | pod to staltétics C traditional malfemalica| oral 8» bala changing pouieget ging en eile ts data) 80 int qe sae) > Th isa proética| Knowledge er tools and malevi ae odilh theo undelands of whals a Currant landscape of pospectis + 1 Mallon stattetes trauledae Mathematics is the cyitical port of dal, oti ence Mathurales iovolesthe tlady of cporltly chuck, Spee and chong’. for ad. sdsnbttteowhalye opgred enathundlies fs asceattal , ctatiatics ts one of the moct rit tampanetsop ala Golance, stabietics 4s any 0 collet ord analyze ‘the nametical dala fina lage amount and Skea mearioafl tnsehts fren it. a, SubstarkiveC coméin) Expeittee The dubslartie Knowledge 6 the Krowaledye pedtic ts the oxen where dita sciance te appliiol FL s ry ait offin “ee pate a tontele fox sample if ya ate ai ing dala actence be eae problems, ve should have eubetantive Kraul de onthe togfe 0 “Heston stiles The Po skiils vefero ths cor itey scence stills, Aada‘ts data. Todt effiialy costal the diy yin 8 have some progeoer ails, ypaneed to be confortable at the carerarc| lin, be oble to coral fils of differ forals peoparn alin that wil enodtty the doita, ete Me achion lustig + machine mails is backbone of data sctince machine {eavri ‘ 46 all aboal to prone tcf Joamadtine solhal it canagl as ahumen btn. Tn data science, weuse various machine { earsiin algo ¢ lke Supewied Le rset ca and Refoforeroent Leayetng algortthims to solve the piobleme. Theveave Yoxious cocktail aloo thos which ave broadly being used 4g data cciance suc as Reqress ory Beciston Tre, clustering prt corp anal Sofpect veclor erachines, Naive. ae ; teal Aeural nchwork and ~Apstosi alin. 1. Stalisttes tsa branch of Mathemeltes that deals fei eipcatton, ard the preselalion af the omefical dala &, “The main pospese o ctalfettes gS make an accudle conclaion wg aictited , ‘ with the collecLian, ard Sor abit 0 eres popelalion Tapes of claletlics ¢ le bucriplieslalislics 2 bucetibe aboril the dala "8, Thfeenttal stabtclics s Pt helps to make pedtcins forthe data a6 8, Salictica| ar fete maedins Gussey which means raking Inference aboit- Bemething . 4, Talistica| “Toferance fs the discipline that concems with the develo nut : pourdares rmethods pand theorens that allow usto ethact meacing and: fnfowralion fismdata that: has-been gered 4 ctachaélte Crarclorn) processes Be The overall pores fs claalsn fom te The acttiltes ox processes §nithe world to the. dite ' 2. Manipulate the data and then 8. From the data back tothe world, % the fe i olatisttea| fofirence ' Eroinple- ' poses. och teocling ord ae) mails sons & bata = ae seat and veceived wey dey forthe last 3 months 8, dofene ~ Find aa] rover neil ond wil) be Gerit or recived , An the neil ammoalhs Slattettcal In Res pies and pala pete 9 eae ' » Prscess ls The aclivite ox arclins which ave hopperog foand around the world y 8s called process % one chould Know abexit rage ducevibe, unolexiting ard rnake sence of thee processes Ho understand the world baller ard undleratay in These processes is past of the solulion to problens, pata @ to at repre the bce of the veal -norld processes, and “il ochich traces we qther ave decided byooe dita colle lfor o¥ sornpl malhod, 2 once we have all the data to dlevive newridea, and that's “fa deplfy those caplavedl Braces Catal eile enove compehuncible iene should founda mathematical models ov fureltons of the dati, Kroun as tiltettcal enodet or ectimatos 3. Note That, the process and data will be random and uncertatity in ralare. Exomgles- Fam the shu fleal packs ofcards acad 8s drawn. This Hal fs “epialed for yoo Times, and the suits are ie below, Suite Spade: clubs tears Diamonds Novof tines dvawn | 90 too 120 qo T > Quslion. while acavd's ted att vandomn her ruhatt. és the probability of git a Barcond| cord. Sclalion > “Total comber of veils -yoo Nombey of Kials tn which diamond cord %s drown=4o Therefore, pCdiamand card )= 401400 = 0.886 '» population refers tothe entie epouy oF indtviduals abot hom gost te draw conclusfons. Q. Somnple vefess tothe subset of peopl (fem poplin] from which you vil] be Si dala. beta under invitbgalitn = pat f pppulaion Cer Lae pfattlotewelh) 36 In slattelica “Inference i lhe tery population denotes the aioe obec orartls ' suchas tueils ov photographs ot élavts . 4, The setof chavactevisltes that are measured Saceyli aTfesence ox citrated erste objecls 4s called as obsewalions Kcomple and this denoted as ni-the number of obsenalfins fern the population, Exarnples ‘= population s The emails sent tasty ton by employee Re observation, Thesendels nore, ‘he lel of vecihuil, bata cunt rTetof amdil, ee) chavecters and gentences tnrthe email No. of veths in the enelil and The length of Time untel fiestve ply. ~ soorpe vefes toa subset ofthe uct of size n ‘fom popalalienthit aye conte In order to charrfine | about'the poplin, the obsewalions to dvaw conclusion and make ‘inference ~~ Thee oe def fiant: ning that ean be followed fox ia cubsel of dala which ave called earmpltrg mechasfcrns — nite that, some a mechariisms rnay ‘introduce biases ‘ilo the dala and distor it. once t and Asélovled. at happens any conclusfon yet deaus will sieply be very ‘ Example > Employee Emoils E Saropde=d ca lo of Employeesard thay ends at Yarden & serrply 8 Alto oF Emails adits Eglo erly ‘ — Bat f we counted hows ol email nnessage each peson seat, andused thal? fo ebimalt the undef disli bibl of aril sen by all ae raeotaht qt | dlffennt: ansuseys . Populations VS Soenples ¢— Basis fox Comparison population Sornples aaa populalion ai “lo the colleeki ofall elornecls pasting omen pop of the mernbets characteristic thet comprises of popalalon choosen fr UNNerse Sareple means sub pritecipalin Yiothe Trcludes Each and ever art of the only a hand al of wns “f a : popula fon charac terielic pavornles olalictic bata collection coir, rene sarnple a or Sai . . Census Fecus on Te apy yey ' ray vn chavacleticlics about popal ppelatien Se epee si im ° The big alatla. world ‘is defied by the nox enoas arnaxrit of eer- copa diver data. being gent tcolleated| andl aval enact Bel politiones eke ae While Large daitacis allow ete” eto oneal tsi about guneval treds, “emalley sures contatned withthe tase data cet ave sttl| veefal, 3. For erorplercnstley concep of petonalfalfon works Cpertoralized medicine) Hove forthe lage «dat. ca ne cxete oral, horegenas data ect ta make : prectickfons within emalles pours a, Za thts context pone can apy thc of pln and samples tod wre usefal§ bal hl é far smaller clita ete Cea) vshich usas condelered| ern er lager dita cite epi ap TResues need te braddvarced ie sooopling solves game enyfonting seslines 2. Hidden biases of big dala 3. Sampling milhod| 4 underlyin assomptfons Be samplicg dtstbalfon, Modelling ' '. Mode Iteg fic destbing mndkhenalfcally a gfludtion in veliy forty porpese of to a apsestion 4nrthat sitacilion {froma solving a problem ox fieding aanswey & Mode Mog press Aincludes on terative porss at agains cveailivik and favankiveness and in which malhemalfcal yScecliffe and Techefical Frrdedaets applied bo descr be cus citaabions ¢ bat) 8, Modelle peceess consists of the aclivatec walatid to zo dilerminin a ileilagy le ut ‘the model — anabsiny orarlling ‘to the bolton of he pobleen a a — aa vaaboles silling ap vlation beloocen atables yard — deplo gag malhemalical ard eomgilabiona locls, tes of palais cet buepas ard tite Yen Architect coplare albabit Airnenstenal ; scaled-down versions an atrudlare with three-dimenonal Be A molecalar biologicte plane pot sicualiclfon op he connections bilwoeen antina atids 6. Nok that ,amodel san, avi fied conatiuc Lten where all een vemoved or absteactad . Leibrnal detail bas physical Theokt YS Ca obseveln Reasorfinr Esper “ais tps Medasarement / \ 1 |e \ | Research ala eptaaltt les 1 otal > onthe lek hand afde ave aclvitie velatig to rescore suchas “ That are used tn the medel andlor cam be usedte assess “thie modellsing ws ~—+ on the wight hand efde avecreelal acliitic that must lucl tsthe develop~ een of amodel, leis and fal hagalbces te beteted. + Yow te Buble a Mock 5 “The bey slips Anvolued Andale scherce oan ave Shipt ¢ undectandtns the probleme “The first cep tnvolved tn data ecence col fe uncherstandting The problem Ndatasciealeet Utin-foc key words and phates when (leaing a line ~f ~bustiness eaperk about a business tats a precedaal ft oy! Sowolves ahdtelte HoarSaray “f “the business ee stipe: ba exten Aatjust any dais, but the unituatared data pieces qe collect velevant to The business poblem fae faq 6 addiss, The dala Faleacltan % done fom yasfous Sources online aan rand eisling dlatabaces. ae 3% bate claring 5 = beta cloning Re veefal ay as youneedty sanilte data while qithatrad qT illoutry axesorre of theoct tpl Causes gt dala inangighertns ond Eros: to feale Gleme ave reduced froma. gael of batabyces Bo The evor aith the innit eats in terms of preston 3. Vafables ustth aa values acvors mali batabaces shpas Exptl bala Aral Exon Se és CEs] tsaxvoluit ‘ee for for" id fact "4 al with dita antec usefa fl Sociable ets pe Cealare selittions Featine ealcléon is the process epelifrg electing the ‘fish that contethale “Lhe most to he p cdtclion vastable or supa that jena aotereitiol ithe galorslially orecnnall 2 fin Stipes Treorporaling Machine | wuntng-Al ilfins - Pie ery Ree Pp Thee fs one ofthe most eonuckail processes Gndala scferce rvclig as The ml Alaofths aids tn etg a usable bala Model, bs Supervised leunt > Linear Regression — tardom fort — Suppo veclox machines > upsupesed lsat . =) tin = K-means a > AH earchical clus al ~— | be acKlony = oan +t Pheu — vat, Rewsard -stali-Aclen(sarse) ~ hep @ Niliaork steps © biog the medals The dan model te applied 15 the tct-date tocheck oe ipit’s accavalt “ath all desivable flares . 2 can farther aural rode to ideslify an adjusters that night beteqiced to enhance the per fowrance arc achéi ve thedectved vesulls. slopes Spleen thet Ahe melel which povidethe bact “alt bas on tstieg fied 5 complied anc deploy tthe packalton enteral shewer the dectved wesalt é achieved though proper tesbing as per the business needs Stati lien Tens 1 stabicticn) model is atype of mathenalical model that cp ope assumglfon undertaken te describe the da qunwalin press 1 Type prep ralhematice| rode dy Haltetca) credel % non-detererfcitie unlike other mathemabiin| models where vasables have spe ift los. wostable to stabtihtcal model are clochailic t. vty hove pobati ay deatvibutions. , Howto Housto build aslalistic| sell ling = —» while building acta sttca| model “the ‘erpoitant step 4st choose the ce Stalicten| model based on seqitenels a Ask the follosing to ily gor te emens ts bo you wankbe address ee 4 ovatch ts rake forecasts from ase @ of tated) &. what's the number explontay Cindeperert) and rere aveilable) Be what’ the umber f valbles youu a foclade the model 9 Issues — + The maf fssues fowalved tleeg armed ase = 1. urdes KN "4 process about he pln a 2 Resumibens about the problen Be Sfenple vs corrglen. ma ed “4, matiuaratteal eaprestons vs ls rrathods Prabatsy sib -biclibutfon—yeutable oA voulable fs agparlly whose ae Boh discvili voutable is avadable sshose valueis obtained by curling Epornylss nuenbey q dadust pest B.A continous variable sa vasiable whose value is blaine] ey rosin , £qr haght » all students inclass 4, Arandom vatfable %s avefable whose valueis.a numerical outcome op arndlem phenomenon aA pbb dik batten af acrandem vasfable x tells what the posible alae apa ave are how probabilities ave aston > Avardon vaskable car be disctelt or cartiness pesbaliltly bistabalin bistiibaltons — te staltetical meadel ts non-dleternsiriskic models, where vasiables are stochastic, Crordor) fa natave Tac they have probotitity diikibslions, $0 (the probit | dtskbitis gxcthe fron o titel ores ® A probabil dsitabalten %6 amnalhemalia| pron that descri bes'the pobetsl *f dipferert posstble valaes of avautoble probalsity dictabultens areaflen dlp usiog ane ox probols iy ‘Tables Exasngle = one udio-f lp Geka oa Jypes os to Thee ave alypes of pebabilft to pobabilsty dtsbibulton f one vandon radial. a probability dsiltbution of wal ple Pandoen yostable int py destabulin) te Aviat pabalglfly pote oie « conditiona| probsbSiily probabil eveck J fanewn aes ee ee ee 3. prbebilty dest balm f indeendlure and tll t pally tbe Scrabble ‘oat quantifies how Irkely ealcalta outcome ise a tandom votiable gachas the fp ofacoingthe roll of adlice,ox draseg a plofing cad fem a deck. & For avandorn vostable x; po is a-forebion That i a probally tpall walues of X. pobabslity sstibutions op = pos vobabidity % calculated gs the nurebey f desired oalcomes divided by titel | ossible ouiteomes proba by =Crumbey of deste outcomes { tll curbavop posstbl cits) > for eg the probably ofadie miley a6 ts cokulalid asone culcorne op oll a 5A} divided by the total norrbey of dtserale calcormes (4) ov-t}6 ovabout 601600 orabout 6. 66> Exomph te Let avardom vosable x Cthe aereast op He until the next bus awtyes) a. tet poo ee probly cltcbabalton, wich ig a poi xeal number. ‘Let usassume thal the flame fara af nut bus ts fun oS pod) sae &h 3, “Then if yoawant ts calculate the probability Clikelthoad) of thou bus : ariivieg Gn balaoeen taand 3 erfailec fs as am a peobelilly dectababien dsctitbation of Rardory val vatiables +-( fatal (jak pebolatst) de The vobabslity o oft Taso Cov more) events ts called the jact probaly > Te Gofat pobabilly of tine or more vardem vauiables 4s vel ened to as ‘the jetat re cli deattbalten 8 &. for the vardom variable x ard y 1 posry 6 afatat potable ardit is! oa possi as probabil beclabalon poe) = porands) = fox pty) 8. The calculalfon of the jet ps is sorralfines called the-funclorrantal vale of pobats ity ovthe pode of remy oilhe chetrsule °f prbal iy Examph- + what is the joa pobstily ef a a Key that's black ) Evert A’ = The pobstily of deosig abe: ulogsoote 7 | Event’ 2 The pebelsty of das ablack cord = 26| 68 =0250 Vistas the joi prabastty of earth and 8 45 pC L5a) x pCaols9) =000385 = 369¢] iy io The pobabilst ob ancenst qientheaceurerce of onather event 4s called the conde bfonal picks & The conditfona| pobebilsty of one vasicble bone ox move tard vatiables fs vefeud ty as the corditfern| poballey distithalter - z 8. The condi tor pe baby for ent A f en avant B's cglealiti| a-folloses pCAle) = pon qivene) = pCAord €) | pC) Not + to This npkalten assomes that the pobsbslty of erat Bis not ausarth took tooo test . The poobasll pasty both lasts 4s 0.6 . The pebe- 4 ley of pay the frat Tat iso ob srahait is the flames ‘fre Second test qfren that she has passe theficth im) pC second [fect = pe Cesk anc suond) Ob past) Oak : Ey acl . 5 sity arnodel refs te adjusting the prcaerstethe mode “to Tepe aeunaty othe process fovolves ts Rar on alathn on dala fo which the laugd vastable is knounrts produce a mathematta| mode Bo Then, the model’ aulzornes ave corpo tthe seal obserad values of the “tsadd vasiable to detevering the accanany B= Theat spf eto te alts sda poor ood xeduce"the level oferior and make. the model more accuals Yo This press Ac ve peat senteval tienes vali] “the rode fied the option pears to make prcdiettons with substacttal seca Quetsrttin and underfittin te when ins bao ndise in the ki tae dala are pict ced of and teayned as cone bythe med el, the mode| overfils a. ovesff " nape the peiformance of the model on newdata. 3. TH wil | pee seen wellon the bilining sel bul vey pork onthe lat cet. This nega eng “the enodel's ably ts generalise ord rake i predliclions-fov new data ovesfitig — Sr hoppers when the made| canndt su etal model thy alata nox qenvialice ows dala Ht Ain undesf?t model is nstiasuttable model, this wilr-be obvious artt wit have a poow fies ete data. > He Comrplet pfelare of dala ecfence process can be dlepicted as shousn below. 7 Explorator re 1 1 Analysts data t PE ML nly rithms ctaltelical models {hernrrrarenemnes communica. visuali zalfons | | b . | ep pg | 7 adore ~+ Tnstdethe Real world ave lost vow dacla- bag olymgies shoes ppt ; product eméils ror recovded gertlic maluied > we wort to poces thts to make st clon focoralysts-to we bildandase rp lines op dat, rund? pie craig wera bg ox whatever jaent to eal ak. Todo this we uselools sach as pythershel [sryts 8,07 SL oral —> ance we have this clean dataset we should be dein come kind of Epon the course. of deieg Epfy, we ray alee that +t tect acally clean beacuse of du pbalsyasirg values (absurd outliers and data. that met actially logad oc Treorn tly logge. > Nuck we esti the model to use sorne aliptin Uike- K-nearest ae CK-NN) ; bimeas Nequssion iNaive Bays oY all alse Themed we choosen daar onthe tpe of poblen recat tole ~+ nctheo can‘etrpt, foul ep or cm cay wsatl THiscoald ‘take the {own * al the valli up te basiness-fo make desis. —> Alternatively cthe goal may may to be build or prot pe ae produ ie Spam clessfev, ror seach oh algorithm or a recornmendlalton sien Expeton Raldvolys — aul Sacra terta) an approach to age ‘the data wing visual Techovique — tHhisuced ts disoved Cunds falls or to check assomptionsusith the help of slabtalica| Sarcoma and apltcl “presen — taal | ae asteriftcatt slip totake before a nto Stabtilica| erodelf a ie ensure the data és wally vohat fs alaimed ts be- and that there ace no obvious enoxs. = Epa should be pat of dala sciunce pofcsinwvery onquicallin hy Sapo batty; ~The primo geal of viplowlony dala anvalstt Goto uncoyer the ander tg shud «The shuclave af the vauious dala cele deteverfine the tends, pollen s and velabenchips aren then - —> A bustness cannot came'to afinal conclusion or draw assomgtions from ahuge ayoonltly of dale andvaithey “aguis Taig anexhauclivelook at the data set ‘hreaghan analyeal tang Therefore, pofartey an lc “| hale roalyts al allouss data. scienbists lelect evvore j teed assomplions and eesich wore alltel aeled an ay 4 april p redtcLive model. Obs yelies’ om ~ He val af EDA GS allow dita ectentiels ts gf ancight ilo a dust and at the came Lme provide Spesffc pitzores that aditasdenlist would wont to vifvaet foe the dla fr th ditasel. y ‘cludes — List ofouttiers + ketiatsfrpoonig =a, vneedtataltes tov thace uttmoles SB — Let of all4 erpoitant fecloes ~~ conclusfon ov vr Tfons acto whither’ cutlain aindevidual factors axe ctaktctPeall { ly escenttal ~~ optimal edllings — A qed predictive moeclel Balsi- ~—> The baste tools of EDA ave plats graphs and suromary Slalieltes — > “The Epa as amethenl fs ental qe Me “the datate othe fellnsteg a potig dstabutfons ofal vostables(: aia plat) a plo Bene sevies f dala — “Frans forefing vasfables > looking at ail predic alas belioeer, vastabks wi scothtiglate malvices — cael Sarena etal lics —=> com alin the mean rricimony mation ll “f pper aoe ower quills ond idunlip " outlinrs, nkioductton to R Lana age om , : ; te RAs an open-sarce pagel laaoge and envivonmerit used fox slatted aval dala, visualizalfon, and ae i : a Being open -seuree has oe community that conlfrously works'te tnpoe “the envionment as well as helps members worldustele ‘to iraprove and tanovali 8. Tthas over to,000 differant librates and pactones 6 anhonce and add on to als abvady significant capable . i ' be Ris aneilension of the S-prgeameag larguage hich uss cvealid by john chambers at Bell Labordlartes 4 1496, 8 was a punter tool ‘fo slaticlica| *Cecearch Q. Tn ta92, Ross thaka. and Robert Gentleman eyeitea Rat the urbasly f . Aucklin riuwaealond asottao| that thay etadenls. could learn ard use easly 3. thaka and Geritleman released the tlfal yerston tn Habanda cable bite Marston was veleated in dodo. ‘open source s Ris in opensource envinonment , Eis cost-ef ficltie for pods Of any sine ands sidely-ovtloble ‘ 3. “hore apgties, Khas voufous (thavives and packages avollable for pling allviaclive are elegrt graphs. Thee car also be used te credbi™ ny inteaclive ges fo data-driven ily ‘ellng yaswell 3 Ry has -amassive communtiy that wostrachcly ts oe and odd apon Bs abilities. cRAN ov comprehend & Archive Nitiork has ove¥ 101000 fee eilenstons that can beusedl fie ‘aa High-depriin ppt to craig Interactive reb—apps Me Bean pein comple mathematical and otaltibtea| opens on veelors, evalvtces data frames ays and othe data ol ik # raging shes. B. Rigan inter peated lenjuoge and does not need compilers at genesis : rocketed that 5 lg rd hl potable 6, Ris a ctrapechencive peganning lorquages that sappoes object ostentid as well as procedural poem oath genet ard ‘fiectelass-fanclins 4. ek supports bath cormurd line faterface and Spophical user interface by which « users can be allowed to do peng at console (evel ardalso allows te work with sevipts t, R suppois autde vote of packanes fo handle the problems finthe avon a fironcal exclos, Heathens, High perfowrnance compily dlesbibuited comin statics and many more- qe Compitiable vith vastous. dthey a ° Roan inegal with anombe (ig) fF ai [feet techelri and preqiarerg lng bry was _ Biodenles- . . to The & ceemste Se easy Lean at the begteig but ttc haed to matter it, a0 with the conmand basedl rit bear Highly incon vesent-for the statistician and non— comaing profissfonas to wet 8. Redmnmards dot oncen wth rion gporagrert sand ef Rcan concume a lange amount of memor ( 4 ducts lange nember F F packages dwilable and the efi ee senang then S092 + porta can be 9 por opal : R Roveorenut ibbp—Teelall Ron weds alps + Got CRAN profict website Stipa click on the Download Ror windows link | Botetamtsiip = " chp = &clfckon the base subalel clay I [en ox frctall forthe fiat Ufine lok te mur click Download R-3e3- “4 windows ord ae erecatable cere i ‘le Sucpe gun the wet fite and oll the Snctallalion tnéttuctions . ~> Boa select the desired lorquage ard The clicked Sepe5 Ran'the ete file ond flow the anstallalien inctiuction "e+ Bb. Read the license agrees or click lect —? Bice gelect the coronas yo usch to install. click Neat > Bed. Fier [browse the folder pith qoussch to fnslall picts then conftsm by clicking Nut. —> bua. Select additional tasks l?ke “ decklop chorlcuts ete shen . click nleek : > Bef. wait -for The tnstallation poocessta compte > Boge click on Finish to 0 camel ‘the installation R- Environment Estlap-ostal R studio on windows Sep domalo) begin gots dounload esticlto and atic onthe dlsusnloacl ballon fox ectidlfo desktop ; Stepar click orrthe Link-for the window version gclidto and savethe ° tiefile Sheps ~ Bun the eke and follow the inctallalten indtvucltons Baas click Netkon the welcome window. Beb. Enter | browse the path + the inclallbton folder andelick neal te proc ud. Beco Select the ole of th ctait mena chovkeat ovclick pndo nok cvedde ohovteats. and then click Next. Bede wot for thejnstallation proces eave Bees click Ficich'to end the instaltatfon. , Bees eens ee) Uoox 1 te “Add Rte ype epost ey by typh fi the: oll bulng command > sudo echo "dep bil: II cyan.y claelio. coml bint ubuntiniact ‘atl | a | A Tee -O ete | apt | Spares Ie lect. ' 7 Here fe beast vefers to ubuata, 4s 04; poy versfon of ubarikit 48 installed fin yu arena ane vith the veped Wve Version fron thee art website do Add ete pubunt keph ‘ > 9p} Kegsewer keyserte. se eet ey Fogq4pney a : a > qP] -a— ergot Eo 644 DBAY { suclo apt key add — 3e Finally gnclal| R- Base ' > sudo apt qt up . , > sudo apt- qt natal y—base'x— base-dev ty update “Ai tas way +o tnttal| Ris b ia by -yom commend. The command line an apace ceti)-forthe gare is 4 yor install iB IK cornmand #nélalls the tore fanclfons of the R popsnnieg language and also the standaxd package equ xed forthe woortiog with Renfler fncta Ing all the slandasd packages yosean anétall the addflfona| npr by auschieg Reonsole a) $R TRis command feitiatis the R promt > and the neces aug pckages can be fnstalled b a pg ‘Them as commands. gan > install. pects Cpl’) R-Eovracrat stp Tratll Retudfoon Uruk $= 10363 “Instatlin ord tenfiqarrg etadio an Linuk 8 “afte andtalling @-base the next Sep 156 inctall Refadto tEcan be inilalled by a the fo ca Senple commands. : to Ftc tnelalll the cove clebsin Cominar| 7 instatlolien i ‘the debsin | Versfon of Rstadto ra > chido apt- get instal P qdebi cote ~& using the: eget commard| fitch: ‘the debate vextion of gctadto > penis Il dounload 1 sy ctadto « cong vido soai3b~and 6Yedeb 3. “Apter fe itching Réladto the falascg com corortands inital @ cdo ust The stordard am ON > sudo gptebi _nvstadto -160.136-and64odeb Ly. Attis inétalling @ sludko, vernot the nallalton file f ov oa dick gpa Sm. ¥sladto—tooa4y —amdéyedeb .

You might also like