0% found this document useful (0 votes)
185 views21 pages

Big Data Unit 1

Big data analytics unit 1 book

Uploaded by

nishal824804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
185 views21 pages

Big Data Unit 1

Big data analytics unit 1 book

Uploaded by

nishal824804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 21
— UNITI UNIT IL UNIT IIT UNITIV UNITV SYLLABUS UNDERSTANDING BIG DATA Introduction to big data - convergence of key trends — unstructured data — industry examples of big data ~ web analytics ~ big data applications big data technologies ~ introduction to Hadoop — open source technologies © loud and big data ~mobile business intelligence - Crowd sourcing analyticg = inter and trans firewall analytics. NOSQL DATA MANAGEMENT Introduction to NoSQL — aggregate data models — key-value and document data models ~ relationships ~ graph databases ~ schemaless databases materialized views ~ distribution models — master-slave replication — ‘consistency - Cassandra ~ Cassandra data model — Cassandra examples — Cassandra clients MAP REDUCE APPLICATIONS MapReduce workflows — unit tests with MRUnit ~ test data and local tests ~ anatomy of MapReduce job run ~ classic Map-reduce — YARN ~ failures in classic Map-reduce and YARN — job scheduling — shuffle and sort —task execution ~ MapReduce types — input formats — output formats. BASICS OF HADOOP Data format ~ analyzing data with Hadoop — scaling out Hadoop streaming = Hadoop pipes — design of Hadoop distributed file system (HDFS)—HDFS. concepts ~ Java interface — data flow — Hadoop 1/0 ~ data integrity — compression — serialization — Avro — file-based data structures - Cassandra ~Hadoop integration. HADOOP RELATED TOOLS Hbase — data model and implementations ~ Hbase clients — Hbase examples ~ praxis. Pig - Grunt ~ pig data model — Pig Latin — developing and testing Pig Latin scripts. Hive ~ data types and file formats ~ HiveQL. data definition ~HiveQL data manipulation — HiveQL queries. CONT YN CONT Hy NITE UNDERSTANI iy INTRODUCTION 10) 1 1.1.1 Types of 1.1.2 Characterin 113 Advantages 12. CONVERGENCE ¢ A 13 UNSTRUCTURED Dara, e 13.1 Structured Vs, ( 1.4 INDUSTRY EXAM 1.5 WEBANALYTICS : 15.1 Types of Web Analy 15.2 Process of Web Analytics | 15.3. Benefits of Web Analytice. 16 BIGDATAAPPLICATIONS BIG DATA TECHNOLOGIE: pe a 1 1.8.2 Key Features and Benefits of Hadsg, OPEN-SOURCE TECHNOLOGIES 1,10 CLOUD AND BIGDATA : | 1.10.1 Cloud Computing and Big Date |. 1-11 MOBILE BUSINESS INTELLIGENCE | LALA. Need For Mobile Bi / 1112 Advantages Of Mobile BI 2. CROWD SOURCING ANALYTICS Understanding Big Data i UNITI UNDERSTANDING BIG DATA i oo een ST eee ctny 1g sta = convergence of key trends ~ unstructured applications— bi saree of big data — web analytics — big data source technolo’, ‘ata technologies — introduction to Hadoop — open = Crowd sourcia °S ~ £!oud and big data ~ mobile business intelligence “ING analytics — inter and trans firewall analytics. eaten 1.1 INTRODUCTION TO BIG DATA ind complexity that none of the traditional data * Process it efficiently. Big data is also data but with a ie a ig Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. : Big Data analytics Provides various advantages—it can be used for better decisi making, and preventing fraudulent activities, among other things. 1.1.1 Types of Bigdata There are three main types of big data: * Structured, * Semi-structured, and * Unstructured data. + Structured data: Structured data is highly organized and typically stored in a database. It can be easily analyzed using tfaditional data analysis tools and techniques, as it is formatted in a specific way. Examples of structured data include transactional data, customer data, financial data, and inventory data. Big Data Analytics Fenistructured data: Semi-structured data is a mixture of structured and serge tured data Ithas a defined data model, but the data itself may not be fully vrmanized. Examples of smi-structred data include XML and JSON daa, log and sensor data, fil Unstructured data: Unstructured dat ‘and does not have a defined data mod ‘and can be difficult to analyze u ‘Examples of unstructured data include emails, and text data. 1.12 Characteristics of Big Data big data can be described by the following characteristi is not organized in any particular way ‘media data, images, videos, «Volume is enormous. Size of data F data. sidered as a Big Data or one Variety The next aspect of Big PDFs, audio, variety of analyzing data, ars : understanding Big Data un derstaneg competitors. 4 Big Data Analytics CONVERGENCE OF KEY TRENDS. Several key trends have converge As more devices become connected tothe interne, the amount of data generated is expected to continue to increase. Cloud computing: The widespread a of cloud computing has made iteasierand more cost-effective for organizations to store and process lane Cloud-based big data platforms have become more businesses to process and analyze large amounts of deta se infrastructure. amounts of d accessible, allow ‘without investin expensive on-pr © Machine learning and AL: The growth of machine learning and artifi intelligence (AD) has made it possible to extract insights from large and complex data sets. These technologies can help automate data analysis, identify patterns and trends, and make predictions based on data, ‘+ Data privacy and security: The importance of data privacy and security has increased significantly in recent years, as data breaches and cyber-attacks have become more common. Big data solutions must ensure that sensitive data is properly protected, and that security protocols are in place to prevent unauthorized access. j + Data governance: The growing importance of data governance has made it | for organizations to have policies and procedures in place for managing and using data. This includes ensuring data quality, maintaining data accuracy and consistency, and complying with data privacy regulations. ‘These key trends have converged to make big data a critical component of modern business operations, As the amount of data continues to grow, organizations will need {0 adopt new technologies and processes to ensure that they can effectively manage and extract value from their data. 13 UNSTRUCTURED DATA Sn ee eeally waa erstanding Big Data d tis typically text-based data, tan 14 images, Videos, and aud ‘Unstructured data qmail, mobile devices, Sensors, and webs eys, uf 4A esas i. etait some examples of unstructured data Social media data: Facebook, Twitter, Link videos, and other types of n Emails: Email data sent and received by int Audio and video files: Audio anc phone calls, interviews, and surv Hance footage. One popular application is customer analytics. Retailers, manufac ‘companies analyze unstructured data to improve customer exp targeted marketing. Sentiment analysis can be done to better understand eustorm: and identify attitudes about products, customer service and corporate bran 1.3.1 Structured Vs. Unstructured Data ‘The main differences between structured and unstructured data include the ty of analysis it can be used for, schema used, type of format and the ways itis stored. Traditional structured data, such as the transaction data in financial systems and other business applications, conforms to a rigid format to ensure consistency in processing and analyzin, Sets of unstructured data, on the other hand, can be maintained in formats that aren't uniform. Structured data is stored in a relational database (RDBMS) that provides access to data points that are related to one another via columns and tables. For example, ‘customer information kept in a spreadsheet and categorized by phone numbers. addresses or other criteria is considered structured data. Big Data Analytes 16 1.4 INDUSTRY EXAMPLE OF BIG DATA ‘There are many industries that are using big data to dri ‘ovation and improve business oper Retail Industry: patterns, which can be use growth. One example of big data in the retail industry is Amazon. Amazon uses big ie, by recommending products based ing behavior. This helps to increase which in tur drives sales and revenue customer engagement and loyal idustry is Walmart, Walmart uses ly chain operations, by analyzing data from suppliers, distributors, and its own stores. + This allows Walmart to better forecast demand, optimize inventory levels, and reduce waste resulting in cost savings and improved operational efficiency, nother example of big dat Healthcare Industry: The healthcare industry is one of the fastest-growing industries for big data, asit generates and manages vast amounts of data from various sources such as electronic health records (EHRs), medical ima, genomics. © Byanalyzing insights into patient health, disease diagnosis and treatment, and operational efficiency. organizations improve patient outcomes, reduce costs, and enhance the + Watson Health analyzes vast amounts of patient data, including medical records, lab results, and imaging data, to provide clinicians with personalized ‘reatment recommendations and insights into disease trends and Understanding Big Data 47 another exammle of Ut? ‘Another example of big data in healthcare is Pfizer. Pfizer uses big data to iscovery and development, by analyzing vast amounts of |, and operational data. accelerate dru gets and develop more effective ional efficiency and reducing costs. ws Pfizer to identify new dru while also improving oper yy is another industry that generates and manages vast, amounts of data, including financial transactions, market data, and customer tutions can gain insights into market id risk management, which can be used to inform business decisions and drive revenue growth, Mastercard. Mastercard ime, by analyzing vast One example of big data uses big data to id amounts of transaction data from its global network. 'y and alert cardholders ‘This allows Mastercard to detect fraudulent a re processed, reducing and merchants before any fraudulent trans: financial losses and improving the customer experience. + Another example of big d: an be used to improve website design, user experience, and 8g Daca ta is typically collected using a web analytics tool su Visitor activity on the website, has Googie ide metrics such as: The data collec a Pageviews: The number of times a specifi page on the website is viewey iewe Unique visitors: The number of unique individuals who visit the web; © Website iven period of time. The percentage of visitors who leave the website after view; iewing Bounce rai only one page. Session duration: The length of time that visitors spend on the website Conversion rate: The percentage of visitors who complete a specific goa) such as filling out a contact form or making a purchase. 1.5.1 Types of Web Analytics ‘There are two main types of web analytics: on-site analytics and off-site analytics, Ou-site analytics: «On-site analytics tracks user behavior on a specific website. It collects data on website traffic, pageviews, bounce rates, conversion rates, and ot On-site analytics tools include Google Analytics, Adobe Analytics, me and Piwik. «On-site analytics data can be used to identify user behavior patterns, popular pages, and areas for improvement on the website. used to optimize website design, user experience, and marketing efforts. Off-site analytics: Off-site analyties tracks traffic from external sources, such as search engines, social media, and referral sites. It collects data on the number of vi referral sources, and user behavior on the website. + Off-site analytics tools include SEMrush, Abrefs, and Si analytics data can be used to track the effectiveness of marketing fy popular referral sources, and optimize ilar Web. Off-site efforts, understanding Big Data 1.5.2 Process of Web Analytics ies involves: “The Process of web an ‘Setting business goals: Defining the key metrics that wil i success of your business and website. will determine the Collecting data: Gathering information, statistics, and : itors using analytics tools. * and data on website Processing data: Converting the raw data you've gathered into meani ratios, KPIs, and other information that tell a story. a Reporting data: Displaying the processed data in an easy-to-read format. Developing an online strategy: Creating a plan to optimize the websi experience to meet business goals. Experimenting: Doing A/B tests to determine the best way to optimize website performance. ie oes 1.5.3 Benefits of Web Analytics «Understanding visitor behavior: Web analytics provides insights into how visitors are interacting with the website, which can help identify areas for improvement. ‘or behavior, website owners can Improving website design: By analyzing optimize the website design to improve the user experience. ics data can be used to track the Measuring marketing performance: Web anal effectiveness of online marketing campaigns and make adjustments to improve their performance. Increasing website traffic: By identifying popular pages and optimizing website owners can increase website traffic and engagement. content, wet BIGDATA APPLICATIONS tions across various industries, including: Big data has numerous appli Healthcare: Big. data is used in healthcare to improve patient outcomes, reduce costs, and optimize treatment plans. Healthcare providers use data to analyze patient data, predict disease outbreaks, and improve diagnostic accuracy. eee rack vehicle performance, optimize routing, and + Marketing: Marketers use big data to analyze customer be 1¢ customer experience, and optimize marketing campaigns. Big da marketers identify target audiences, track customer behavior, and optimize marketing strategies. designed for large amounts of or user acti © Apache Bea 1.7 BIG DATA TECHNOLOGIES for both bat ‘There are several big data technologies and tools that are commonly used to and can run on various big data processing engines, ineluding Apache Spark, store, process, and analyze large and complex data sets. Some of the popular big di Apache Flink, and Google Cloud Dataflow. technologies include: © Elastic Stack: Elo: source software pache Beam isan open-source unified program j¢ Stack (formerly known as ELK stack) is an open- used for search, analyties, and visualization of large + Harloop: adoop is an open-source framework used for distributed storage ‘and processing of large data sets across clusters of commodity hardware. It data sets. It includes Elastiesearch, Logstash, and Kibana. is de ened to handle large and complex data sets and can scale up or down + Apache NiF: Apache NiFi is an open-source data integration tool used for as needed. : ingesting, processing, and distributing data across various systems. Itis often used in data lakes and data hubs. Big Data Anatnics 1g INTRODUCTION TO HADOOP Java language adopting Daetaeel ‘and distributed proces: MapReduce from im of huge volumes of ds jadoop was J le System (GFS). rived from white papers such as Google MapReduce ang A Google ned to scale up from single servers to thousunds of computers, job scheduling, and resource op platform. a cluster of computers that consists of one master node Hadoop can be viewed as and many worker nodes. .de schedules the tasks and the workers are responsible The master performing the execu! Hadoop can be deployed i 1a of the map and reduce tasks. ree modes: used for debuyging in a single node environment, a single standalone instance, + Standalone mo Hadoop can be installed on a single node adoop cluster can be formed by connecting hardware, + Fully Distributed mod ‘multiple nodes of commodi d mode: This ingle node java system that runs the ent ‘The two versions of Hadoop: Hadoop 1.x and Hadoop 2.x, 1. Madoop 1.x: + Itsupports the MapReduce mode! only. ‘+ Non- MapReduce tools are not supported. less scalable than the Hadoop 2. x version since nodes per cluster. + Hadoop 1. x is responsible for data processing and lus management. erstanding Big bata ‘Understanding Big Date 8 2 Hadoop 2.x: Jt supports the as Spark, i In can scale up 10 cluster resource m data processi On cach of the nodes, resour of map and reduce slots avai ows running other Sremenorks on top of HDFS. System) using YARN API. Wiedoop Distibuted ‘The MRV2 is a next-generation MapReduce framework that runs wit a m that runs within Hadosp 1.9 te rie fa MepResuce (Rescurca Maegerart ad Data Processing Hors (Fie Storeze) 14 Hadoop 1.x version Hadsop 20 rs aes, | [oe g U YARN Resource Management and Dasa Possess) HOFS (File Storage) Figure 1.2 Hadoop 2.x version Big Data Analyticg Mn ee 1.8.1 Hadoop Core Components Understanding Big Data ioe 4 mist aap aula Pevformaed + Provides data security a Tiadoop Common ‘Common utilities i.e. java library and java files used by «Highly fault-tolerant - [fone machine goes down, the data from that machine other components such as HDFS, YARN, and goes to the next machine ‘MapReduce for running the Hadoop cluster. 2. Hadoop YARN HDFS- Storage layer | It allows the storage of a huge volume of data across ‘+ Hadoop YARN stands for Yet Another Resource Negotiator. Its the resource multiple nodes. Data is stored in the form of memory management unit of Hadoop and is available as a component of Hadoop blocks and is distributed across the cluster. version 2. Hadoop YARN- resource| It is responsible for job scheduling and resource * Hadoop YARN acts like an OS to Hadoop. It is a file system that i ‘management layer ‘management, built on top of HDFS, MapReduce- data Parallel Processing of huge datasets + Itis responsible for managing cluster resources to make sure you don"t processing layer overload one machine. z : : % Itperforms job scheduling to make sure that the jobs are scheduled in 1, Hadoop Distributed File System (DFS) ie taialice + HDFS is the file system of Hadoop. + Inthe second version of Hadoop called YARN, the two major features of the = Itis an open-source implementation of the distributed Google File System, Job Tracker have split into, (1) a global Resource Manager and (2) a per- . th huge datasets or files. HDFS splits the data into application Application Master block-sized chunks. + Cluster resource management and job scheduling are separated into two +The default block size of HDFS is 64 MB and it ean be extended up to 128 aca He + The main components of YARN architecture are resource manager, node + Users ean configure the block size as per the requirement. See ae Cones 3. MAPREDUCE + Storing small files in HDFS leads to a wastage of memory. + Data is stored in HDF in two forms, actual and metadata, + Theactual datais stored in DataNodes and metadata is stored in NameNode. + Itincludes the timestamp, file size, and location of blocks. Re} HDFS ensures data availabi Features of IDFS + Provides distributed storage + Can be jlemented on commodity hardware + MapReduce is the data processing layer of Hadoop. is a software fra vast amount of structured and unstructured data stored Distributed Filesystem (HSDF). Process the the Hadoop © Itprocesses huge by dividing the job (submitted ‘+ InHadoop, MapReduce works by breaking the processing into phases: Map and Redu 's of Hadoop 1.8.2 Key Features and Bene! adoop is designed to be highly scalable and can easily ha snal nodes to the cluster, growing data sets. of data types, It ean also th a wide range of tools and technologies, such as ‘TL tools, and BI platforms. Apache Storm. + Community support: Hadoop is an open-source platform that is supported by a large and active community of developers and users. This provides of resources, such as documentation, tutorials, and support ‘access to.a weal forums. 1.9 OPEN-SOURCE TECHNOLOGIES Open-source technologies refer to software or computer programs that have their source code available to the public, allowing anyone to access, modify, and distrib ie code, which can result in greater fer understanding Big Data the late 1990s and has since become 2 ‘The open-source significant force Open-source flexibility, and cust ‘They also provide opport developers from around the world ean co! improvement. refer to software tools and spen-source technolo; analysis, and storage of large ind enable process In the context of big da platforms that are freely av volumes of data. abl ecosystem large and I part of the big dat These technologies have become an esse because they provide scalable and cost-effective solutions for man: complex data sets. ‘Some popular open source big data technologies include: form that allows for the storage and clusters of computers. jbuted computing ig of large data sets acri ‘Spark: A fast and general-purpose data processing engine that can handle both batch and re + Cassandra: A NoSQL database that is designed to handle large amounts of data across multiple servers. + Elastiesearch: 4 distributed search and analytics engine that ean quickly and easily search large amounts of data. ime processing + Kafka: A distributed streaming platform that can handle real-time data streams. development and community contri features. peo Taprostructure ws a servic : Applications or software as a service (SAAS) big Data Analytic, ity to our database Examples of PaaS are Windows Azure and Google App Engine (GAB) ex, Salesforce.com, dropbox, google drive ete. Cloud for Big Data Below are some examples of how cloud applications are used fa JAAS in a public cloud: Using a cloud provider's infrastructure for Big Data understanding Big Data 1.21 the need to analyze the customer's voice, | media data. of businesses and planning. corporation that employs hundreds. Providers in the Data Cloud Market id comp! large software: vendors ee, or are in the process of Iaun ay startupsth ‘ ire we have a list of major vendors of cloud computing, Few of the eloud providers are £0% é hne leading cloud provider amongst all, s called as azure, 1BM’s offerings include Smart Business Storage Cloud and Computing on Demand (CoD). AT&T's provides Synaptic Storage and Synaptic Compute as a service, Platform as a Service cloud computing companies Googles AppEngi is a development platform that is built upon Python and Java. : com's provides a development platform that is based upon Apex. Microsoft Azure provides a development platform based upon .Net. Software as a Service companies In SaaS, Google provides space that includes Google Docs, Gm Calendar and Picasa, IBM provides LotusLive and calendaring capabi Understanding Big Data 1.23 Issues in Using Cloud Services ‘Some important cloud services issues are as listed: Data Security ‘© Organizations must ensure that their agreement wi to take advantage of a cli company’s information ied wherever possible. + Exceptions must be clearly noted. Service-Level Ayreem: “Fhoule clewely = e tees and conditions between a service user - ider to ensure propér performance. ‘+ Cloud services must be compatible with the compliance needs of the business. Some companies are also concerned about regulatory issues. + Market observers say that around 50 percent people worry that they will be tied to one provider of cloud storage. Legal Issues jon must ensure that the location of the physical resources of ‘cloud does not bring any legal issue. ‘The cloud presents a number of legal cl + Organizations should be aware of all cloud, and use the jed manner as eloud offers pay ~asper usage method of the cost incurred by the company, Lu hat ENCE MOBILE BUSINESS INTELLIG' pero ind BE nto mobile BI is able to brn, Hons © Myset when done Properly © snagement PO: Joser to HE _ in the airport departure lounge o, .d almost anywhere ang with mobile BI. Mobile BI — driven by the asa big wave in BI and analy Hasion in the market an success of mobile devices ~ WAS consig. a few years ag0. Nowaday, d users attach much less impo,_ alevel of di this trend, orthy information to the right perso, jgenceis the transfer of business int has the BlackBerry, iPad, ang jp, BI delivers relevant and trust right time. Mobile business int from the desktop to mobile devices suc ics and data on mobile devices or tab fered to as mobile business intelligenc, icators (KPIs) are pl ty to access analytic than desktop computers is re business metric dashboard and key performance clearly displayed. With the rising use of mobile devices, so have the technology that we 4, cluding business, yj.) ell ves to make our lives easit businesses have benefited from mobile business Essentially, this post is a guide for business owners and others to educate them on the benefits and pitfalls of Mobile BI. Need For Mobile Bi Mobile phones’ data storage capacity has grown in tandem with their ys. ms and act quickly in this fast-paceg You are expected to make deci environment. ‘The number of businesses receiving assistance in such a situation is gro by the day. To expand your business or boost your business productivi th both small and large businesses. mobile BI help, and it works 125 Mobile BI can help you whether you are a salesperson or a CEO. ‘There is 2 high demand for mobile BI in order to reduce information time and use that time for quick decision-making. jon-making can boost customer satisfaction and As a result, timely ¢ improve an enterprise's reputation among its customers. + Italso aids in making quick decisions in the face of emerging risks. 1.11.2 Advantages Of Mobile BI “simple access Mobile BI is not restricted to a single mobile device or a certain place. You can view your data at any time and from any location. y into a firm improves production and the daily + Having real-time efficiency of the business. Obtaining a company’s perspective with a single jes the process. Competitive advantage Many firms are seeking better and more responsive methods to do business in order to stay ahead of the competition. + Easy access to real-time data improves company opportunities and raises sales and capital. + This also aids in making the necessary decisions as market conditions change. ‘Simple decision-making + As previously stated, mobile BI provides access to real-time data at any time and from any location, During its demand, Mobile BI offers the information. ‘This assists consumers in obtaining what they require at the time. © Asaresult, decisions are made quickly. Increase Productivity + By extending BI to mobile, the organization's teams ean access critical company data when they need it, understanding B19 Data 4.27, + ‘way of solving time-intensive problems of time to focus on 1 = Deeper engagement by communities, who resonate and build loyalty to the oductivity result product or solution, ; e Increased pr ~ 4.42, CROWD SOURCING ANALYTICS isadvantages = Crowdsoureing is thi ny z usually sourced Results can be easily skewed based on the crowd being sourced group of people, + Lack of confiden fan idea = Crowdsourcing work tion and fall short of the goal imo people wi «While crowdsourci or purpose. ieee es 1.12.2. Types of Crowdsourcing + Theadvantages oferowdsourc to work with people who have sh Crowdsourcing involves obtaining information or resources from a wide swath ‘out work to people anywhere jy lets busineage, Wisdom of the erowd: + Crowdsourcing the country or around tt the norma} + Isa collective opinion of differ indi 1als gathered in a group. : + This type is used for decision-making since it allows one to find the b hod to raise capital for specig, solution for problems. at Many brands pay attention to the collective opinion of their customers inking, ideas, and taps into the shared gatekeepers and intermediaries requi « Crowdsoureing usual a crowd of peo} -¢ of a company improves. While ere formation or workers" solicits money or t ‘Ip support individuals, el inies get brand new stand out, For instance, MeDonald’s repay! Advantages © Crowdsourcing brings Big Data ay Crowd voting: Insanypeofermudsoure They ean ations. Consumers can choose g ts ereated by consumers, La by experts or products anew 1aste, package, consumers £0 €FC: tify the best one, ers vote to id Crowdfunding: tr's when people collec ask for investments for charities, rojeg, money to the owners, People do it voluntarily. fuals and families sufferin, i ey to help indi Often, companies gather money to hi from natural disasters, poverty, social problems, ¢t€- YTICS |13. INTER AND TRANS FIREWALLANAL 1.13.1 Inter Firewall Analytics = Imer Fi Analytics is a type o! ‘monitoring and analyzing traffic flowing between ofa network that are separated by firewall identify and prevent potential threats that may be fF security analytics that focuses op ferent Zones OF Seemen, + Thegoal isto the traffic measure used 10 control traffie flow between Firewalls are a common security | different segments of a network, such as between an internal network ang the internet or between different departments within a company. |. However, firewalls alone cannot provide complete protection against al threats, especially those that may be hiding within the allowed traffic, J ves deploying specialized tools and techniques «Inter Firewall Analytics invo and Ui to monitor the traffic passing through the firewalls and analyzing it for signs of potential threats. + This can include detecting anomalies in the traffic patterns, ides unusual or unauthorized access attempts, and flagging suspicious a Ihniques used in Inter Firewa nd analyzing network ‘or suspicious avolves eapturi sats, such as malwa = Packet capture and analysis: traffic to ider behavior. involves analyzing the behavior of network 's or patterns that may indicate a potential Ives using machine learning algorithms to fy patterns that may be indicative of a Machine learning: TI analyze threats before they ean cause harm. ring and analyzing traffic atthe network level, Inter Firewall Analytics ions identify and respond to potential threats more quickly and 1.13.2 ‘Trans Firewall Analyties ‘Trans Firewall Analytics is a type of network security analytics that focuses on 1g network traffic passing through an organization's perimeter ygand anal’ firewall(s). ‘The main purpose of trans firewall analytics is to identify and prevent network threats and attacks such as malware, viruses, tempts, and other types of cyber threats that try to penetrate an organization's network. 's involves monitoring and analyzing network traffic logs ‘Trans Firewall Anal generated by the firewall, ‘These logs contain information about the source and destination IP addresses, the protocols and ports used, the size of the packets, and other network traffie metadata. By analyzing these logs, security analysts can detect patterns of network traffic that indicate a potential threat or attack, “These tools use advanced algorithms and machi yze patterns of network traffic that may indicate a Is can detect unusus) , security breach.” and reporting: These tools fe alerts and reporis threat or attack is detected, information they need to take action. network security | malware detection. th Understanding Big Data Meiries collected IP addresses, pons, protocols. | § Analysis techniques analysis, anomaly ignature-based Benefits Enhanced network security, carly detection of potential threats, improved incident response Better understanding of web traffic, improved detection and preverion of web-based sacks (Challenges Complexity of traffic analysis, difficulty in identifying Jattecks that span multiple flows Overwhelming volume of| waffic, limited visibility into Tools and technologies Firewall logs, network traffic nalysis tools, SIEM systems Big Data Analyte, /ESTION AND ANSWERS: ‘TWO MARKS QU! What you mean by bigdata? : ta is a collection of data that is huge in volume, yet gro L «Big Dat exponentially with time. «eisdata with so large a size and complexity that none ofthe traditional dq, management tools can store it or process it efficiently. 2. Name the types of Bigdata. ‘There are three main types of big data: Structured, Semi-structured, and Unstructured data. List out the characteristies of Bigdata. Big data can be described by the following characteristics: i. Volume Variety iii. Velocity : iv, Variability What is the advantage of bigdata? Big data has several advantages for businesses and organizations, including: 4. Improved decision-making Enhanced customer insights Improved operational efficiency ‘New revenue opportunities Competitive advantage What you meant by unstructured data? «Unstructured data is data that does not have a well-defined data model or structure. © Itis typically text-based data, but it can also include multimedia data such as images, videos, and audio. = understanding Big Data ured data is generated from a variety of sourees such as social media, 133 2 Unstru email, mobile devices, sensors, and web logs. Some examples of unstructured data include: + Social media data: + Emails: Web content «Audio and video files 6. Difference between structured and unstructured data. Unstructured Data prt ‘Structured Data Data that has a clearly defined schema and is easily searchable and organized Data that has no clear structure or schema nd is often difficult to search and organize Text documents, social media posts, Relational databases, spreadsheets images, videos ‘Can be stored in a variety of formats such as text, JSON, XML, binary, etc. (Often requires specialized tools such as natural language processing, machine learning, and computer vision Can be extremely large and difficult to ‘Typically stored in a tabular format Can be processed using traditional data processing tools like SQL ‘Typically smaller in size and easier to manage Changes to unstructured data can be rapid and unpredictable Unstructured data can be very diverse in format and harder to analyze Unstructured data is valuable for gaining insights into customer sentiment, social media trends, and other areas where traditional data may not provide enough manage Changes to structured data are often slow and predictable Structured data is usually uniform in format and easier to analyze Structured data is valuable for traditional data analysis and reporting context. Sih eee Define Web Analytics. a i uring website trai, ” proces of analyzing and measuring Website aTc ang ta II effectiveness of a websi Big Data Anata. a eb ana ‘is (or behavi 1: to improve the overell veurement andanalysisof website dn oie i to understand user beha re onmersmake data-driven decision 10 optim eb anayis helps website oWners . Neb arate cali ad improve wer experience jghts into website tra folves various techniques such as data ie analysis, and web metrics to "and website performance 1 metrics in web analytics? beta 8 What are the data colle« ‘The data collected can include metrics such as: Pageviews Session duration * Conversion rate List out the types of web analyti 9. There are two main types of web analytics © Improving website design © Measuring marketing performance Increasing website traffic List out some applications of Bigdata. u. ions of big data: Here are some appl Business Analytics Healthcare 135 —— Understanding Big Data = 12, Name some bigdata technologies. Here are some popular Big Data technolog' + Hadoop = Spark + Hive = HBase + MongoDB + Zookeeper 13. What is Hadoop? Hadoop is an open-source, distributed processing framework that enables the storage and processing of large volumes of data on acluster of commodity hardware. + It provides a scalable and fault-tolerant platform for processing big data. Hadoop consists of two main components: Hadoop Distributed File System (HDFS) and Yet Another Resource Negotiator (YARN), 14. Explain the core components of Hadoop. an open-source framework intending to store and process big data in Hadoop a distributed manner. ‘Component le System) — Hadoop's key storage syst Hadoop’s Essent red on HDES. It is mainly devised for storing HDFS (Hadoop Distributed. HDFS. The extensive data is: massive datasets in comm 1 hardware. 15. 16, The responsible layer of Hadoop for data proces, e cs essing: Map and Redug ne are two stages of processing: es fe nodes feontainers) for processing. Reduce ied and collated 5 2. Hadoop MapReduce simple terms, May to the executors (computer stage where all processed data is collect je YARRN ~The framework which is used 10 process in Hadoop is YARN, p, ane management and to provide multiple data processing engines i” isan ence, and batch processing is done by Yann’ real-time streamin Explain the features of Hadoop- 1 also processing big data. It is the may, Hadoop assists in not only store data bu ¢ es, Some salient features of Hadog, reliable way to handle significant data hurd! 1. Distributed Processing — Hadoop helps in distributed processing of data ie quicker processing. In Hadoop HDFS, the data is collected in ; distributed manner, and the data is parallel processing, and MapReduce j, liable for the same. Open Source — Hadoop is independent of cost as it is an open-source framework. Changes are allowed in the source code as per the user's requirements. 3. Fault Tolerance ~ Hadoop is highly fault-tolerant. By default, for e} block, it ereates three replicas at distinct nodes. This number of replicas ‘be modified according to the requirement. So, we can retrieve the data fro) a different node if one of the nodes fails. The discovery of node failure restoration of data is made automati 4. Scalability ~ Itis fitted with different hardware, and we can promptly access the new device. 5. Reliability ~The data in Hadoop is stored on the cluster in a safe manner that is autonomous of the machine. So, the data stored in the Hadoop ceosystem’s data does not get affected by any machine breakdowns, ‘What you mean by HDFS? ‘+ Hadoop Distributed File System (HDFS) is a distributed file system that is, designed to run on commodity hardware. understanding Big Data 17. 18. 1. 137 = Itisa core component of the Hadoop framework and provides a distributed storage system for large data sets. 1 HDFS is designed to handle very large files with stre patterns, and to provide high-throughput access to data. ning data access What do you mean by YARN? «YARN (Yet Another Resource Negotiator) is one of the core components of Hadoop, responsible for managing resources and scheduling tasks across a Hadoop cluster. carlier version ty issues. ‘e_ Itwas introduced in Hadoop 2.x as an improvement over t of MapReduce, which suffered from scalability and flex «YARN separates the job scheduling and resource management functions of MapReduce into two separate daemons, the ResourceManager (RM) and the NodeManager (NM), respectively. List out the benefits of the Hadoop. Hadoop offers several benefits in the world of big data processin| including: Scalability Fault tolerance Cost-effective Processing speed Flexibility Data storage Integration Open-source Define Open-source technology. ‘= Open-source technologies refer to software or computer programs that have their source code available to the public, allowing anyone to access, modify, and distribute it © This means that users ean see and edit the code, which can result in greater collaboration and innovation in software development. + The open-source movement started in the late 1990s and has since become a significant force in the tech industry. 4 Dare Viany popular wfivare Tooke and platforms, including Linu * MySOL, and WordPress, are open source. 20. How cloud technology impacts the bigidata? Cloud technology has 2 significant impact on big data in the following ,, 21. What do you mean by cloud computing? + Cloud computing is the delivery of computing services including ser, storage, databases, networking, software, analytic intelli the Internet (“the cloud”) to offer faster inno economies of seale. The services provided by cloud computing can be categorize: main models: Infrastructure as a Service (laaS), Platform as a Servi and Software as a Service (SaaS) : three ice (Paas, 22. List out the features of Cloud Computing. * Seal © Elasticity © Resource Pooling © Self service © LowCosts. © Fault Tolerance 23. What are the issues in using cloud services? ‘Some important cloud services issues are as listed: * Data Security © Performance Se that enables the access and an: as smariphones and tablets performance giving Mobile BI leverages the power of cloud computing and mobile technology to make data-driven decision-making faster, more accurate, and more efficient. 25, Justify the need for Busi ty has grown in tande Mobile phones’ data storage cay mis and act quickly in You are expected to make di is fast-paced environment, + The number of businesses receiving assistance in such a situation is growing by the day. ‘+ To expand your business or boost your business productivity, mobile Bl can help, and it works with both small and large businesses. 26. What are the advantages of business Intelligence. + Simple access © Competitive advantage * Increase Product 1.40 27. Define crowd sourcing: f information, opinions, oF work 4, a the hi Crowdsourcing is the collection of group of people, usually sourced to save time and money white lows compa or thoughts from all over the wor + Crowdsourcing work ¢ io people with different ski 28. What are the types of crowd sour 1. Wisdom of the erowd 2. Crowd creation 3. Crowd sourcing, 4. Crowdfunding is inter firewall analyt cs is a type of security analytics that focuse, 29. Wh = Inter ‘monitoring and analyzing traffic flowing between diff of a network that are separated by firewalls. ‘The goal is to identify and prevent potential threats that may be hiding, Be fy and p eM the traffic. 30. What do you mean by trans firewall analytics? + Trans Firewall Ana ¥ analyties that focus on monitoring and analyzing network traffic passing through .,, 5). © Themain purpose of trans firewall analytics is to identify and prevent netwg threats and attacks such as malware, viruses, phishing attempts, and types of eyber threats that try to penetrate an organization's network. organization's perimeter firewal + Trans Firewall Analytics involves monitoring and analyzing network tr logs generated by the firewall. 31, Difference between inter and trans firewall analytics. Inter-Firewall ‘Trans-Firewall rewall analytics performed between two oF more firewalls Firewall analytics performed within single firewall Understanding Big Data ‘Spans multiple firewalls or security [L domains ited to a single firewall oF see domain Captures and analyzes traffie between traffic Captures and analyzes traliie wi .¢ technology? Dis oud and bigdata? le II and its types in de What is Crowdsourcing? discuss about Compare and contrast the inter and trans firewall in detail, Can you think of any bigdata application that impact you

You might also like