0% found this document useful (0 votes)

24 views22 pages

UNIT II - Emerging Technology

The document discusses Service Delivery Platforms (SDPs) and their role in delivering various IT services, including cloud services, help desk support, and IT consulting. It also defines Big Data, outlining its characteristics, applications, challenges, enabling technologies, sources, types, and benefits. Key points include the importance of data volume, velocity, variety, and the value derived from analyzing Big Data across different sectors.

Uploaded by

khadijakhalid91

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views22 pages

UNIT II - Emerging Technology

Uploaded by

khadijakhalid91

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Platform used in Delivery of Information Services

A Service Delivery Platform (SDP) is a platform that provides a structure for service delivery,
including controls for service sessions and protocols for service use. Quality of IT service
delivery is gauged by metrics included in a service-level agreement (SLA).
Examples of Information Technology Services and their Platforms used for service Delivery
 Cloud Services (Platform: Amazon Web Services (AWS), Google Cloud Platform (GCP),
Microsoft Azure).
Cloud services are infrastructure, platforms, or software that are hosted by third-party
providers and made available to users through the internet.
 Backup & Disaster Recovery (Platform: IBM Spectrum Protect).
Data backup is the process of replicating files to be stored at a designated location.
Disaster recovery is a system that helps restore those files following a catastrophe.
 Help Desk Support (Platform: Zendesk, HappyFox, Help Scout, SolarWinds Service
Desk, JIRA Service Management, Salesforce Service Cloud, SysAid, Vivantio, Zoho Desk,
Freshdesk).
A help desk is the individual, group, organizational function or external service that
an IT user calls to get help with a problem.
 Computer Training (Platform: TalentLMS, EdApp, Looop, Tovuti, BRIDGE,
Classe365, Auzmor Learn, Schoox, 360Learning).
Computer Training is an instruction provided for the purpose of enhancing an
individual's ability to use computers for learning and functioning.
 IT Consulting (Platform: Catalant, MeasureMatch, Zintro, TopTal, GuidePoint, Maven,
DeepBench).
IT consulting services are advisory services that help clients assess different technology
strategies and, in doing so, align their technology strategies.

Delivery of information services

 Use of mobile devices in delivery of information services
 Use of web 2.0 in delivery of information services
 Use of Social Media in delivery of information services
Define Big Data
Big Data refers to the vast and complex sets of data that are generated at high velocity from a
variety of sources. These datasets are so large and diverse that traditional data processing
tools and techniques are inadequate to capture, store, manage, and analyze them effectively.
Big Data encompasses not only the volume of data but also its variety and the speed at which it is
generated and processed. Data that is huge in volume yet growing exponentially with time.

Characteristics of Big Data:

1. Volume: The sheer amount of data generated, which can range from terabytes to zettabytes.
2. Velocity: The speed at which data is generated and processed, often in real-time.
3. Variety: The different types of data (structured, semi-structured, and unstructured) such as
text, images, videos, and audio.
4. Veracity: The accuracy and trustworthiness of the data, which can vary in quality.

 Volume: The sheer amount of data generated is massive, often measured in terabytes,
petabytes, or even exabytes. Volume refers to how much data is created and stored, requiring
robust storage and processing capacities.

 Velocity: The speed at which data is generated and processed. Many applications, like social
media or IoT devices, produce data in real-time, necessitating systems that can handle rapid data
inflows and deliver insights on-the-fly.

 Variety: Big Data comes from a variety of sources and exists in multiple formats, such as
structured, semi-structured, and unstructured data. Examples include text, images, audio, and
sensor data, demanding flexible tools to manage different formats.

 Veracity: The reliability and accuracy of data. Big Data often has inconsistencies or
uncertainties due to its varied sources. Veracity emphasizes the need for cleaning and validating
data to ensure high-quality insights.

 Value: The ultimate goal of Big Data is to extract valuable insights that drive decision-
making. It's critical to determine whether the data will contribute meaningful information to
support business and operational goals.
Applications of Big Data

1. Healthcare
 Example: Analyzing patient data to predict disease outbreaks, personalize
treatment plans, and improve patient outcomes.
2. Finance
 Example: Detecting fraudulent transactions, assessing credit risks, and optimizing
investment strategies through real-time data analysis.
3. Retail
 Example: Enhancing customer experiences by personalizing recommendations,
managing inventory efficiently, and optimizing supply chains.
4. Manufacturing
 Example: Predictive maintenance of machinery, improving product quality, and
optimizing production processes through data analytics.
5. Transportation and Logistics
 Example: Optimizing routes, managing fleet operations, and improving delivery
times by analyzing traffic patterns and shipment data.
6. Energy
 Example: Monitoring energy consumption, predicting equipment failures, and
optimizing energy distribution using data from smart grids and sensors.
7. Government and Public Sector
 Example: Enhancing public services, improving urban planning, and ensuring
public safety through the analysis of large-scale data from various sources.

Challenges Associated with Big Data

1. Data Privacy and Security

 Protecting sensitive information from breaches and ensuring compliance with data
protection regulations.
2. Data Quality
 Ensuring the accuracy, completeness, and reliability of data to derive meaningful
insights.
3. Storage and Management
 Handling the immense volume of data requires scalable storage solutions and
efficient data management practices.
4. Processing Speed
 Analyzing data in real-time or near-real-time to provide timely insights poses
significant technical challenges.
5. Talent Gap
 There is a high demand for skilled professionals who can effectively analyze and
interpret Big Data.
6. Integration
 Combining data from disparate sources and ensuring interoperability among
different data systems can be complex.
Technologies Enabling Big Data

 Hadoop: An open-source framework that allows for the distributed processing of large
datasets across clusters of computers.
 Apache Spark: A fast and general-purpose cluster computing system for Big Data
processing.
 NoSQL Databases: Databases like MongoDB and Cassandra designed to handle
unstructured and semi-structured data.
 Machine Learning and AI: Advanced algorithms that can analyze Big Data to identify
patterns and make predictions.
 Cloud Computing: Platforms like Amazon Web Services (AWS), Google Cloud Platform
(GCP), and Microsoft Azure provide scalable storage and processing power for Big Data.

Explain sources of big data

Big Data comes from a wide range of sources, both traditional and emerging, contributing to the
vast volume, variety, and velocity of data. These sources are essential in generating the vast
datasets that organizations use for analysis to drive insights and decision-making. Here’s a
breakdown of the key sources of Big Data:

1. Social Media Data

 Description: Social media platforms generate huge amounts of data in the form of posts,
likes, comments, shares, and interactions.
 Examples: Facebook, Twitter, Instagram, LinkedIn, TikTok.
 Data Type: Unstructured and semi-structured (text, images, videos, hashtags, etc.).
 Use Cases: Analyzing customer sentiment, monitoring brand performance, tracking trends,
and targeted advertising.

2. Machine Data (IoT Devices and Sensors)

 Description: Internet of Things (IoT) devices and sensors embedded in machines, vehicles,
and other objects produce continuous streams of data.
 Examples: Smart thermostats, fitness trackers, connected cars, industrial equipment
sensors, smart home devices.
 Data Type: Semi-structured and unstructured (log files, sensor readings, GPS data).
 Use Cases: Predictive maintenance, smart city management, real-time monitoring, and
automation in manufacturing and logistics.

3. Transactional Data

 Description: Data generated from business transactions, such as sales, purchases,

payments, and other financial activities.
 Examples: Point of Sale (POS) systems, online transactions, e-commerce purchases, bank
transfers.
 Data Type: Structured (often stored in relational databases).
 Use Cases: Customer behavior analysis, fraud detection, financial reporting, and inventory
management.

4. Web and Clickstream Data

 Description: Data generated from users' interactions with websites and online services,
capturing every click, visit, or action on a webpage.
 Examples: Website traffic logs, e-commerce behavior, online advertising click-through
rates.
 Data Type: Semi-structured and unstructured (clickstream data, page views, time spent on
site).
 Use Cases: User experience optimization, personalized recommendations, digital
marketing, and customer journey analysis.

5. Email and Text Messages

 Description: Communication data from emails, SMS, and instant messaging platforms
provide large volumes of unstructured data.
 Examples: Gmail, Outlook, Slack, WhatsApp, SMS.
 Data Type: Unstructured (text, attachments, metadata).
 Use Cases: Customer support analysis, communication pattern studies, and spam filtering.

6. Media and Multimedia Data

 Description: Audio, video, and image content from streaming platforms, TV, radio, and
social media contribute to Big Data.
 Examples: YouTube videos, Spotify playlists, images on Instagram, podcasts.
 Data Type: Unstructured (video, audio, images).
 Use Cases: Video content recommendations, image recognition, sentiment analysis from
multimedia, and personalized media suggestions.

7. Mobile Data

 Description: Mobile devices generate large amounts of data from apps, calls, text
messages, and location tracking.
 Examples: App usage data, GPS data, call logs, mobile payments.
 Data Type: Semi-structured and unstructured (geolocation, app logs, user interactions).
 Use Cases: Mobile advertising, location-based services, app optimization, and
personalized mobile experiences.
8. Government and Public Data

 Description: Public records, government reports, surveys, and open data initiatives
contribute to Big Data, providing insights for policy-making and public services.
 Examples: Census data, weather data, traffic data, economic reports.
 Data Type: Structured and semi-structured (spreadsheets, databases, text files).
 Use Cases: Urban planning, healthcare analysis, public service improvement, and policy
formulation.

9. Healthcare Data

 Description: Data from healthcare systems, electronic health records (EHR), medical
imaging, and patient monitoring devices.
 Examples: Hospital records, EHRs, lab results, fitness wearables.
 Data Type: Structured, semi-structured, and unstructured (medical records, patient
histories, images like X-rays or MRIs).
 Use Cases: Personalized medicine, disease outbreak prediction, healthcare optimization,
and patient treatment tracking.

10. Scientific Research Data

 Description: Data generated from research activities, experiments, simulations, and

observations, often in fields like astronomy, biology, and physics.
 Examples: Genomic sequencing data, space telescope readings, climate data.
 Data Type: Structured and unstructured (datasets, simulations, research papers).
 Use Cases: Advancing scientific discoveries, drug development, climate modeling, and
environmental analysis.

11. Financial and Market Data

 Description: Data from financial markets, trading activities, and investment platforms,
providing insights into economic trends.
 Examples: Stock prices, forex trading data, economic indicators, cryptocurrency
transactions.
 Data Type: Structured (numeric and categorical data).
 Use Cases: Stock market analysis, risk management, portfolio optimization, and fraud
detection.

12. Telecommunications Data

 Description: Data from telecommunications services, including call records, data usage,
and internet activity logs.
 Examples: Call detail records (CDRs), internet traffic data, mobile network data.
 Data Type: Semi-structured (logs, timestamps, geolocation).
 Use Cases: Network optimization, customer behavior analysis, targeted promotions, and
service quality improvements.
13. Log and Machine Data

 Description: Logs generated by computer systems, servers, and software applications to

record events and operations.
 Examples: Server logs, application logs, security logs.
 Data Type: Semi-structured and structured (time-stamped logs).
 Use Cases: System monitoring, performance analysis, cybersecurity, and troubleshooting.

Types of Big Data

Big Data can be categorized based on the type and structure of the data, as well as how it is
generated and processed. Broadly, there are three main types of Big Data:

1. Structured Data

 Definition: Structured data refers to data that is highly organized and can easily be stored
in a predefined format, typically in rows and columns (as in databases or spreadsheets). It
is the most straightforward type of data to collect, store, and analyze using traditional
tools.
 Characteristics:
 Organized into fixed fields.
 Easily searchable and manageable using query languages like SQL.
 Examples:
 Customer details in a database (e.g., names, addresses, phone numbers).
 Financial transactions (e.g., sales records, payment details).
 Inventory data (e.g., product codes, quantities).
 Use Cases:
 Customer relationship management (CRM) systems.
 Enterprise Resource Planning (ERP) systems.
 Online transaction processing (OLTP).

2. Unstructured Data
 Definition: Unstructured data refers to data that lacks a predefined format or
organization. It cannot easily be stored in traditional databases because it is not organized
into rows and columns. Unstructured data is typically larger in volume and more
complex, requiring specialized tools for processing and analysis.
 Characteristics:
 Unorganized and not easily searchable.
 Includes various formats such as text, images, audio, and video.
 Requires advanced analytics techniques like natural language processing (NLP) or
image recognition for analysis.
 Examples:
 Social media posts (e.g., Facebook updates, tweets, Instagram captions).
 Emails and text messages.

Multimedia files (e.g., images, videos, audio recordings).

Documents (e.g., PDFs, Word files).
 Use Cases:
 Sentiment analysis from social media posts.
 Customer support and feedback analysis.
 Video and image recognition in security systems or marketing.

3. Semi-Structured Data
 Definition: Semi-structured data falls between structured and unstructured data. It does
not follow the strict organization of structured data, but it has some organizational
properties, such as tags or markers, that make it easier to categorize and process. Semi-
structured data often comes in formats like JSON or XML.
 Characteristics:
 Contains metadata or markers that provide some level of structure.
 Not as rigid as structured data but more organized than unstructured data.
 Examples:
 XML files and JSON data from APIs.
 Email with metadata (e.g., sender, recipient, timestamp).
 Log files generated by systems or devices.
 Sensor data with timestamps.
 Use Cases:
 Web data extraction from XML and JSON formats.
 Email filtering and classification.
 Log analysis for system monitoring and cybersecurity.

Summary Table
Type of Description Examples Analysis
Data Techniques
Structured Highly organized, predefined Relational databases, SQL, basic analytics.
Data format (rows and columns). sales records, customer
info.
Unstructured Unorganized, lacks a predefined Social media posts, NLP, machine learning,
Data format. emails, videos, images. AI, image and audio
recognition.
Semi- Partially organized, uses markers XML, JSON, log files, Tools that process
Structured (e.g., tags) to categorize data. sensor data. metadata, such as big
Data data tools (Hadoop).

Explain benefits/Important of Big data

Big Data offers numerous benefits across various industries by enabling organizations to process
and analyze vast amounts of information to drive decisions, optimize operations, and uncover
insights. Here are the key benefits of Big Data:

1. Enhanced Decision-Making

 Benefit: Big Data analytics helps organizations make more informed, data-driven
decisions based on real-time and historical information.
 How: By analyzing large datasets, businesses can uncover patterns, trends, and correlations
that lead to actionable insights.
 Example: Retailers use customer purchasing history and trends to predict demand, adjust
inventory, and optimize pricing strategies.

2. Improved Operational Efficiency

 Benefit: Big Data allows companies to streamline their operations by identifying

inefficiencies and optimizing resources.
 How: Analyzing operational data such as equipment usage, supply chain logistics, and
employee productivity helps organizations identify areas for improvement.
 Example: Manufacturers use sensor data from machines to perform predictive
maintenance, reducing downtime and repair costs.

3. Personalized Customer Experiences

 Benefit: Big Data enables businesses to offer more personalized products and services by
analyzing customer preferences and behavior.
 How: Companies can use data from customer interactions (e.g., purchase history, browsing
behavior, feedback) to tailor marketing campaigns, product recommendations, and
customer service.
 Example: E-commerce platforms like Amazon and Netflix use Big Data to provide
personalized product or content recommendations based on user activity and preferences.

4. Innovation and Product Development

 Benefit: Big Data helps organizations identify new business opportunities, develop
innovative products, and improve existing services.
 How: By analyzing market trends, customer feedback, and competitor strategies,
companies can identify unmet needs or areas for innovation.
 Example: The automotive industry uses Big Data to develop smart, connected vehicles
with advanced features like autonomous driving and real-time traffic monitoring.

5. Cost Reduction
 Benefit: Big Data analytics helps reduce costs by optimizing operations, reducing waste,
and improving efficiency.
 How: By analyzing data, companies can minimize unnecessary spending, optimize
resource allocation, and improve supply chain management.
 Example: Logistics companies like UPS use Big Data to optimize delivery routes, reduce
fuel consumption, and cut operational costs.

6. Real-Time Monitoring and Insights

 Benefit: Big Data enables real-time monitoring of systems, processes, and customer
behavior, leading to faster response times and immediate action.
 How: With the ability to process and analyze data in real-time, organizations can identify
issues or opportunities and respond immediately.
 Example: Financial institutions use Big Data for real-time fraud detection, identifying
unusual transaction patterns and mitigating risks instantly.

7. Risk Management

 Benefit: Big Data helps organizations assess risks more accurately and proactively manage
them.
 How: Analyzing historical data, external factors, and predictive models enables businesses
to anticipate potential risks and develop strategies to mitigate them.
 Example: Insurance companies use Big Data to assess customer risk profiles, improve
underwriting processes, and prevent fraudulent claims.

8. Competitive Advantage

 Benefit: Organizations that harness Big Data gain a competitive edge by understanding
market trends, customer preferences, and operational efficiency better than their
competitors.
 How: Big Data provides insights into customer behavior, competitor performance, and
market conditions, enabling businesses to adjust strategies quickly and stay ahead.
 Example: Retailers like Walmart use Big Data to monitor and adjust pricing dynamically,
providing better deals and staying competitive in the market.

9. Enhanced Marketing Strategies

 Benefit: Big Data enables companies to refine their marketing strategies by targeting
specific audiences more effectively.
 How: By analyzing customer demographics, purchase history, and online behavior,
businesses can create targeted and personalized marketing campaigns.
 Example: Social media platforms use Big Data to serve personalized ads to users based on
their interests, browsing history, and interactions.

10. Better Healthcare Outcomes

 Benefit: In healthcare, Big Data enhances diagnosis, treatment, and patient care through
predictive analytics, precision medicine, and improved operational efficiency.
 How: Analyzing patient data (medical records, genomic data, wearable device data) helps
healthcare providers offer personalized treatments, detect diseases early, and improve care
quality.
 Example: Hospitals use Big Data analytics to predict disease outbreaks, track patient
outcomes, and optimize treatment protocols.

11. Regulatory Compliance and Reporting

 Benefit: Big Data helps organizations comply with regulations by providing accurate and
comprehensive reporting, reducing the risk of non-compliance.
 How: Companies can automate the process of gathering, analyzing, and reporting data to
meet regulatory standards.
 Example: Financial institutions use Big Data to ensure compliance with anti-money
laundering (AML) and Know Your Customer (KYC) regulations by monitoring and
analyzing transactional data.

12. Supply Chain Optimization

 Benefit: Big Data helps organizations manage and optimize their supply chains, leading to
better inventory management, reduced costs, and improved supplier relationships.
 How: Analyzing data from suppliers, logistics, and sales can improve demand forecasting,
reduce delays, and enhance overall supply chain performance.
 Example: Retailers like Amazon use Big Data to optimize their supply chains, ensuring
fast delivery times and minimizing inventory shortages.

Explain technical requirements of big data

The technical requirements for Big Data refer to the infrastructure, tools, and technologies
needed to store, process, manage, and analyze large and complex datasets. Meeting these
requirements ensures that organizations can handle the Volume, Velocity, Variety, and Veracity
of Big Data effectively. Below are the key technical requirements of Big Data:

1. Data Storage Infrastructure

Big Data requires specialized storage systems to accommodate the large volume and diverse
formats of data.

 Distributed File Systems: Big Data often exceeds the capacity of traditional storage
systems. Distributed file systems, like the Hadoop Distributed File System (HDFS), store
data across multiple nodes to manage massive datasets efficiently.
 Example: HDFS (Hadoop Distributed File System).
 Data Lakes: Centralized repositories that store structured and unstructured data in its raw
form, allowing for flexible analysis.
 Example: Google Cloud Storage.
 Cloud Storage: Provides scalable and cost-effective storage for Big Data, enabling
organizations to store large amounts of data without investing in physical infrastructure.
 Example: AWS S3, Google Cloud Storage, Microsoft Azure Blob Storage.

2. Data Processing Frameworks

Processing Big Data requires high-performance frameworks that can manage and analyze large-
scale datasets in parallel across distributed systems.

 Batch Processing: For processing large datasets in bulk. This is suitable for use cases
where real-time processing is not required.
 Example: Apache Hadoop (MapReduce framework).
 Stream Processing: For real-time data processing where insights are needed
immediately.
 Example: Apache Kafka, Apache Storm, Apache Flink.
 In-Memory Processing: For faster data processing by keeping data in memory (RAM)
rather than on disk.
 Example: Apache Spark, Apache Ignite.

3. Scalable Databases

Big Data requires databases capable of scaling horizontally to accommodate the growing data
volume and diverse data types.

 NoSQL Databases: These databases provide flexibility in handling unstructured and

semi-structured data, unlike traditional relational databases.
 Example: MongoDB, Cassandra, Couchbase.
 Relational Databases: Some Big Data use cases still require structured data
management, so relational databases are used for specific applications.
 Example: MySQL, PostgreSQL, Oracle DB.
 NewSQL Databases: These provide the scalability of NoSQL databases with the
consistency and transactional support of traditional SQL databases.
 Example: Google Spanner, CockroachDB.

4. Data Integration Tools

To extract, transform, and load (ETL) data from different sources into a central storage or
processing system, Big Data requires powerful data integration tools.

 ETL Tools: Extract, transform, and load tools that automate data extraction from various
sources, transformation into a consistent format, and loading into data warehouses or
lakes.
 Example: Apache NiFi, Talend, Informatica.
 Data Ingestion Tools: Tools to bring in data from multiple sources in real-time or
batches.
 Example: Apache Kafka, Apache Flume, Amazon Kinesis.
5. Data Analytics and Visualization Tools

Big Data requires advanced analytics tools to derive insights and make data comprehensible for
decision-makers.

 Machine Learning & AI Tools: To perform advanced analytics such as predictive

modeling, clustering, classification, and natural language processing.
 Example: TensorFlow, Apache Mahout, H2O.ai.
 Visualization Tools: To present data insights in an easily understandable format (graphs,
charts, dashboards).
 Example: Tableau, Power BI, Google Data Studio, Apache Superset.
 Query Tools: To enable users to query Big Data efficiently.
 Example: Hive, Presto, Drill.

6. Data Security and Privacy

Big Data requires robust security measures to protect sensitive information and ensure data
privacy, especially when dealing with large, distributed systems.

 Encryption: Ensures that data is protected at rest (when stored) and in transit (when
moving between systems).
 Example: SSL/TLS for data in transit, AES encryption for data at rest.
 Access Control: Defines who can access the data, ensuring proper authentication and
authorization mechanisms.
 Example: Role-Based Access Control (RBAC), Kerberos, LDAP.
 Data Masking & Anonymization: Protects sensitive data, especially for compliance
with privacy regulations like GDPR or HIPAA.
 Example: Data masking tools, pseudonymization.

7. Scalability

The infrastructure for Big Data must be scalable to handle increasing data volume and velocity
without compromising performance.

 Horizontal Scalability: Scaling by adding more servers or nodes to handle the data load.
 Cloud Services: Cloud platforms like AWS, Google Cloud, and Microsoft Azure provide
on-demand scalability, allowing organizations to expand or reduce resources as needed.
 Load Balancing: Distributes workloads across multiple servers to ensure no single system
becomes overwhelmed.
 Example: AWS Elastic Load Balancer, Nginx.

8. Data Governance

Ensuring data quality, consistency, and compliance across a vast amount of data is critical.
 Metadata Management: Tools to manage and track metadata, ensuring that data can be
easily understood, accessed, and used correctly.
 Example: Apache Atlas, Talend Data Catalog.
 Data Quality Tools: Ensure the accuracy, completeness, and reliability of data.
 Example: Talend, Informatica, Trifacta.
 Compliance: Managing and enforcing data usage policies to adhere to industry
regulations (GDPR, CCPA, etc.).

9. Data Backup and Disaster Recovery

Big Data systems require comprehensive backup and recovery plans to prevent data loss and
ensure business continuity.

 Backup Systems: Ensures that data is regularly backed up, with the ability to restore it
quickly.
 Example: AWS Backup, Azure Backup, Google Cloud Backup and DR.
 Disaster Recovery: Planning and tools to recover data in case of system failure, ensuring
minimum downtime.
 Example: AWS Disaster Recovery, Google Cloud Disaster Recovery.

10. High-Performance Networking

With the high volume of data flowing in and out of systems, high-speed networks are required
for data transfer and communication between distributed systems.

 Bandwidth: Sufficient bandwidth to handle large-scale data transfers in real-time.

 Low Latency: Minimize delays in data transmission to ensure real-time processing and
analytics.

11. Real-Time Processing Capabilities

For scenarios where instant insights are required (such as financial markets or healthcare
monitoring), real-time data processing is a key requirement.

 Low-Latency Frameworks: Systems that can process data with low latency to ensure
immediate responses.
 Example: Apache Kafka, Apache Flink, Amazon Kinesis.
 Edge Computing: Performing data processing closer to where the data is generated (IoT
devices, sensors) to reduce latency and bandwidth usage.
 Example: AWS IoT, Microsoft Azure IoT Edge.

Summary of Big Data Technical Requirements

Component Technical Requirements Examples
Storage Distributed file systems, cloud HDFS, AWS S3, Google
storage, data lakes Cloud Storage
Processing Batch, stream, and in-memory Hadoop, Apache Spark, Kafka
Frameworks processing frameworks
Databases Scalable, NoSQL, NewSQL, MongoDB, Cassandra,
distributed databases Google Spanner
Data Integration ETL, data ingestion, and real-time Apache NiFi, Kafka, Talend
data integration tools
Analytics and Advanced analytics, machine TensorFlow, Power BI,
Visualization learning, visualization tools Tableau
Security Encryption, access control, data SSL/TLS, AES, Kerberos
anonymization
Scalability Cloud services, horizontal scalability, AWS, Azure, Google Cloud
load balancing
Data Governance Metadata management, data quality Apache Atlas, Talend Data
tools, compliance Catalog
Backup & Recovery Regular backups, disaster recovery AWS Backup, Google Cloud
Disaster Recovery
Networking High-speed, low-latency networks High-bandwidth, edge
computing
Real-Time Low-latency processing frameworks, Kafka, Apache Flink, AWS
Processing edge computing IoT

Explain big data infrastructures

Big Data infrastructure refers to the collection of hardware, software, networking, and
storage systems designed to manage, store, and process large volumes of data efficiently. This
infrastructure is essential to support the Volume, Velocity, Variety, and Veracity of Big Data.
Cloud Computing

Cloud computing is a technology model that allows individuals and organizations to access and
use computing resources—such as servers, storage, databases, networking, software, and
analytics—over the internet (the "cloud") instead of owning and managing physical infrastructure
on-premises. These resources are provided on-demand and can be scaled up or down based on the
user's needs, typically through a subscription-based or pay-as-you-go pricing model.

Key Characteristics of Cloud Computing:

1. On-Demand Self-Service: Users can access computing resources as needed, without

requiring direct interaction with the service provider.
2. Broad Network Access: Cloud services are accessible over the internet from a variety of
devices, such as computers, smartphones, and tablets.
3. Resource Pooling: The cloud provider's resources are pooled together to serve multiple
users, with resources dynamically allocated and reallocated based on demand.
4. Rapid Elasticity: Cloud services can be quickly scaled up or down to meet changing
demands.
5. Measured Service: Resource usage is monitored, controlled, and reported, providing
transparency for both the provider and the customer.

Types of Cloud Computing Services:

1. Infrastructure as a Service (IaaS): Provides virtualized computing resources like servers,

storage, and networking. Users manage the infrastructure, including operating systems and
applications.
 Example: Amazon Web Services (AWS), Microsoft Azure, Google Cloud
Platform (GCP).
2. Platform as a Service (PaaS): Offers a platform for developers to build, deploy, and
manage applications without worrying about the underlying infrastructure.
 Example: Google App Engine, Heroku, Microsoft Azure App Services.
3. Software as a Service (SaaS): Delivers software applications over the internet, eliminating
the need for users to install and maintain the software locally.
 Example: Google Workspace, Microsoft 365, Salesforce.
Cloud Computing Deployment Model
A cloud deployment model refers to an arrangement of specific environment variables like
accessibility and ownership of the distributing framework and storage size.

Various types of deployment models.

1. Public Cloud: In this model, cloud resources like servers and storage are owned and
operated by a third-party cloud service provider (e.g., AWS, Microsoft Azure, and
Google Cloud) and delivered over the internet. It is a cost-effective solution since the
infrastructure is shared across multiple organizations, but it offers less control over data
and infrastructure. Cloud computing offers less control over data and infrastructure
primarily because the underlying hardware, network, and sometimes even the software
stack are owned, managed, and maintained by the cloud service provider (CSP).
 Advantages: Low cost, no maintenance, high scalability, and flexibility.
 Disadvantages: Limited control, potential security concerns, and less
customization.
2. Private Cloud: This model provides a dedicated environment exclusively for a single
organization. Private clouds can be hosted on-premises or by a third-party provider,
and they offer greater control, security, and customization options, making them
suitable for organizations with strict data privacy and security requirements.
 Advantages: Enhanced security, greater control, and customization.
 Disadvantages: Higher cost, maintenance, and less scalability compared to public
clouds.
3. Hybrid Cloud: A hybrid cloud combines elements of both public and private clouds,
allowing data and applications to be shared between them. This model is beneficial for
organizations that need flexibility in handling data and workloads, as they can keep
sensitive data on the private cloud while using the public cloud for less critical workloads.
 Advantages: Flexibility, scalability, cost-effectiveness, and improved security for
sensitive data.
 Disadvantages: Complex management and integration, potential security risks if
not managed properly.
4. Community Cloud: In this model, multiple organizations with similar goals, security, and
compliance requirements share infrastructure and resources. Community clouds are
typically managed by a third-party provider or one of the participating organizations,
offering a mix of the cost-effectiveness of public clouds with increased security controls
for shared data.
 Advantages: Shared infrastructure costs, better collaboration, and security tailored
to a specific community’s needs.
 Disadvantages: Limited control and scalability compared to private clouds, and
potential data security concerns among participants.
These models allow organizations to choose a cloud approach based on their security needs,
budget, and scalability requirements. Hybrid and multi-cloud approaches have also become
popular, allowing organizations to maximize the benefits of each model.

Benefits of Cloud Computing:

 Cost Efficiency: Reduces the need for heavy upfront investments in hardware and
software.
 Scalability: Offers flexible scaling based on demand without needing to invest in
additional infrastructure.
 Disaster Recovery: Cloud providers offer data backup and disaster recovery solutions,
enhancing business continuity.
 Accessibility: Cloud resources can be accessed from anywhere with an internet connection,
promoting remote work and collaboration.
 Automatic Updates: Cloud providers handle software updates, security patches, and
infrastructure maintenance.

Cloud computing has transformed the way businesses and individuals access and use technology,
enabling greater agility, scalability, and cost efficiency in managing IT resources.

Explain virtualization in the cloud computing

Virtualization is a core technology that underpins cloud computing, allowing for efficient
resource utilization and management. It involves creating virtual versions of physical resources,
enabling multiple operating systems or applications to run on a single physical machine.
Here’s a comprehensive explanation of virtualization in the context of cloud
computing:

1. Definition of Virtualization

Virtualization is the process of abstracting physical hardware resources to create virtual

instances, enabling users to run multiple operating systems (OS) and applications on a single
physical server. This abstraction allows for better resource allocation, isolation, and management.

2. How Virtualization Works

 Hypervisor: Virtualization relies on a software layer known as a hypervisor, which sits

between the hardware and the operating systems. The hypervisor allocates physical
resources (CPU, memory, storage, and network) to multiple virtual machines (VMs) and
manages their execution.
 Type 1 Hypervisor (Bare-Metal): Runs directly on the hardware and manages the
guest operating systems. It offers high performance and is typically used in
enterprise environments.
 Examples: VMware ESXi, Microsoft Hyper-V, Xen.
 Type 2 Hypervisor (Hosted): Runs on top of an existing operating system. While
easier to set up and manage, it may offer lower performance compared to Type 1
hypervisors.
 Examples: VMware Workstation, Oracle VirtualBox.
 Virtual Machines (VMs): Each VM operates as a separate computer with its own OS,
applications, and resources, even though they share the same physical hardware. This
isolation ensures that issues in one VM do not affect others.

3. Benefits of Virtualization in Cloud Computing

 Resource Optimization: Virtualization allows multiple VMs to share the same physical
resources, leading to better utilization of hardware. This reduces the need for additional
physical servers, lowering costs.
 Scalability: Virtualization enables rapid provisioning and de-provisioning of VMs,
allowing cloud services to scale up or down quickly based on demand.
 Isolation: Each VM operates in its own environment, ensuring that applications and
processes running on one VM do not interfere with others. This enhances security and
stability.
 Flexibility and Agility: Virtualization allows for the quick deployment of new services
and applications. Organizations can test new applications in isolated VMs before deploying
them in a production environment.
 Disaster Recovery and Backup: Virtualized environments can be easily backed up and
restored, facilitating disaster recovery processes. Snapshots can be taken to preserve the
state of a VM at a specific point in time.
 Cost Efficiency: By consolidating multiple VMs on fewer physical servers, organizations
can reduce hardware costs, energy consumption, and maintenance efforts.

4. Challenges of Virtualization

While virtualization offers numerous benefits, it also presents some challenges:

 Performance Overhead: Running multiple VMs on a single physical server may lead to
performance degradation if the hardware resources are not properly managed.
 Complex Management: Managing a virtualized environment can be complex, requiring
specialized tools and expertise to monitor and optimize performance.
 Security Risks: Although VMs are isolated, vulnerabilities in the hypervisor or
misconfigured settings can lead to security risks, making proper security measures
essential.

Types of Virtualization

Virtualization is a technology that allows multiple virtual instances of operating systems,

applications, and resources to run on a single physical machine. There are several types of
virtualization, each serving specific purposes and use cases. Here’s an overview of the main
types of virtualization:

1. Server Virtualization

 Description: Server virtualization divides a physical server into multiple virtual servers
(virtual machines or VMs), each running its own operating system and applications.
 Purpose: Maximizes resource utilization, reduces hardware costs, and simplifies
management.
 Examples: VMware vSphere, Microsoft Hyper-V, Oracle VM.

2. Desktop Virtualization

 Description: Desktop virtualization allows users to run a virtual desktop environment on

a remote server instead of on a local machine. Users access their desktop and applications
via a client device.
 Purpose: Enables centralized management of desktops, enhances security, and supports
remote work.
 Examples: VMware Horizon, Citrix Virtual Apps and Desktops, Microsoft Azure Virtual
Desktop.

3. Application Virtualization
 Description: Application virtualization allows applications to run in isolated
environments without being installed directly on the operating system. This can be
achieved through streaming or encapsulation.
 Purpose: Simplifies application deployment and management, enhances compatibility,
and reduces conflicts between applications.
 Examples: Microsoft App-V, Citrix Application Streaming.

4. Storage Virtualization

 Description: Storage virtualization abstracts and combines multiple physical storage

devices into a single virtual storage unit. This simplifies storage management and allows
for easier allocation of storage resources.
 Purpose: Enhances storage efficiency, improves data accessibility, and simplifies backup
and recovery processes.
 Examples: VMware vSAN, IBM Spectrum Virtualize.

5. Network Virtualization

 Description: Network virtualization abstracts and combines network resources to create

virtual networks that operate independently of the underlying physical network
infrastructure.
 Purpose: Enhances flexibility, scalability, and management of network resources,
allowing for the creation of isolated networks for different applications or services.
 Examples: VMware NSX, Cisco ACI (Application Centric Infrastructure).

6. Data Virtualization

 Description: Data virtualization enables access to data from multiple sources without the
need for physical data movement or replication. It presents a unified view of data,
regardless of where it resides.
 Purpose: Simplifies data access, improves data integration, and enhances real-time
analytics.
 Examples: Denodo, IBM Cloud Pak for Data.

7. Hardware Virtualization

 Description: Hardware virtualization involves creating virtual versions of physical

hardware components, such as CPUs and GPUs. This allows multiple operating systems
to share the same hardware resources.
 Purpose: Enhances resource utilization and simplifies the deployment of multiple
operating systems on a single physical machine.
 Examples: Intel VT (Virtualization Technology), AMD-V.

8. OS-Level Virtualization (Containerization)

 Description: OS-level virtualization allows multiple isolated user-space instances
(containers) to run on a single operating system kernel. Unlike VMs, containers share the
same OS kernel but are isolated from each other.
 Purpose: Provides lightweight and efficient application deployment, rapid scaling, and
resource efficiency.
 Examples: Docker, Kubernetes, LXC (Linux Containers).

9. Hybrid Virtualization

 Description: Combines elements of various virtualization types to create a flexible

environment. For example, it can mix server and application virtualization within the
same infrastructure.
 Purpose: Leverages the advantages of different virtualization technologies to optimize
performance and resource utilization.
 Examples: Cloud services that offer both VM-based and container-based environments.

Unit 1
No ratings yet
Unit 1
23 pages
Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
Unit - 1 Bda
No ratings yet
Unit - 1 Bda
14 pages
UNIT 2 Note1
No ratings yet
UNIT 2 Note1
2 pages
Big Data-One
No ratings yet
Big Data-One
9 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
ET Ext
No ratings yet
ET Ext
217 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Big Data
No ratings yet
Big Data
18 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Types of Digital Data & Big Data
No ratings yet
Types of Digital Data & Big Data
136 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
Kwasu-Csc204 Big Data Computing and Security-1
No ratings yet
Kwasu-Csc204 Big Data Computing and Security-1
57 pages
Presentation Print Temp
No ratings yet
Presentation Print Temp
90 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
What's Is Big D-WPS Office
No ratings yet
What's Is Big D-WPS Office
3 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
1 Bda
No ratings yet
1 Bda
41 pages
BDA IA1 New
No ratings yet
BDA IA1 New
21 pages
DSBDA Unit 3 Notes
No ratings yet
DSBDA Unit 3 Notes
16 pages
Big Data Sent 24 10 24
No ratings yet
Big Data Sent 24 10 24
49 pages
Bid Data Analytics
No ratings yet
Bid Data Analytics
5 pages
Finance - Unit 4
No ratings yet
Finance - Unit 4
39 pages
Module 3
No ratings yet
Module 3
47 pages
Kwasu-Csc204 Module 1 Big Data Computing and Security 2
No ratings yet
Kwasu-Csc204 Module 1 Big Data Computing and Security 2
22 pages
Exam Prep: Big Data Insights
No ratings yet
Exam Prep: Big Data Insights
87 pages
Cat Bda Part B-C
No ratings yet
Cat Bda Part B-C
8 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Big Data-Research
No ratings yet
Big Data-Research
5 pages
Big Data Analytics & Distributed Platforms
No ratings yet
Big Data Analytics & Distributed Platforms
18 pages
Big Data
No ratings yet
Big Data
16 pages
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
No ratings yet
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
17 pages
UNIT-2 Emerging Technologies
No ratings yet
UNIT-2 Emerging Technologies
90 pages
Unit I
No ratings yet
Unit I
64 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
ETB 1 (Big Data)
No ratings yet
ETB 1 (Big Data)
28 pages
BD 1
No ratings yet
BD 1
15 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
BDA1-4 Bunits
No ratings yet
BDA1-4 Bunits
113 pages
Big Data Analytics Unit-1
100% (2)
Big Data Analytics Unit-1
5 pages
CCS334
No ratings yet
CCS334
55 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Bda A23v12bigdata Analytics Unit1
No ratings yet
Bda A23v12bigdata Analytics Unit1
36 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
Unit 5
No ratings yet
Unit 5
68 pages
Big Data
No ratings yet
Big Data
3 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
BDA 1-5 Units
No ratings yet
BDA 1-5 Units
87 pages
Unit 1
No ratings yet
Unit 1
20 pages
Emerging Tech & Big Data Guide
No ratings yet
Emerging Tech & Big Data Guide
30 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
UNIT IV - Emerging Technology
No ratings yet
UNIT IV - Emerging Technology
14 pages
UNIT 5 - Multimedia
No ratings yet
UNIT 5 - Multimedia
20 pages
UNIT 4 - Multimedia
No ratings yet
UNIT 4 - Multimedia
40 pages
UNIT 3 Multimedia
No ratings yet
UNIT 3 Multimedia
22 pages
Unit 1
No ratings yet
Unit 1
16 pages
Cloud COMPUTING Module 5
No ratings yet
Cloud COMPUTING Module 5
63 pages
Hotel and Resort Customer Assistant: A Web & Mobile Ordering & Booking Management System
No ratings yet
Hotel and Resort Customer Assistant: A Web & Mobile Ordering & Booking Management System
7 pages
Lab Assignment DBMS For Practical
No ratings yet
Lab Assignment DBMS For Practical
3 pages
Crystal Reports Tutorial
100% (1)
Crystal Reports Tutorial
102 pages
NIST CSF 2 0 Audit Checklist Part 2 Identify ID 1716467086
100% (2)
NIST CSF 2 0 Audit Checklist Part 2 Identify ID 1716467086
15 pages
SAP Kernel Upgrade Steps
No ratings yet
SAP Kernel Upgrade Steps
9 pages
Index: S.No. Particulars No
No ratings yet
Index: S.No. Particulars No
49 pages
CMDBuild UserManual ENG V230 PDF
No ratings yet
CMDBuild UserManual ENG V230 PDF
67 pages
Beer Sales Rep Cover Letter
100% (1)
Beer Sales Rep Cover Letter
7 pages
Plywood Shop System Proposal
No ratings yet
Plywood Shop System Proposal
38 pages
Oracle Projects: HTML vs Forms Guide
100% (1)
Oracle Projects: HTML vs Forms Guide
29 pages
Basic Computer Questions & Answers PDF
100% (1)
Basic Computer Questions & Answers PDF
7 pages
UiPath RPA Support Document
No ratings yet
UiPath RPA Support Document
3 pages
Factsheet LogoSoft LogoBatch EN V01 2
No ratings yet
Factsheet LogoSoft LogoBatch EN V01 2
3 pages
Software Application
100% (1)
Software Application
16 pages
Purge and Archival Process in Pega
No ratings yet
Purge and Archival Process in Pega
3 pages
Seminar Report'04 3D Searching
No ratings yet
Seminar Report'04 3D Searching
21 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
Release Notes Sap 2000 V 2620
No ratings yet
Release Notes Sap 2000 V 2620
8 pages
End Semester Arrear Theory Examinations Time Table - Aprilmay 2024.PDF.01.04.2024
No ratings yet
End Semester Arrear Theory Examinations Time Table - Aprilmay 2024.PDF.01.04.2024
354 pages
NM Study Material
No ratings yet
NM Study Material
42 pages
Let's Get Started! (Think and Recall) : Activity 1
No ratings yet
Let's Get Started! (Think and Recall) : Activity 1
8 pages
Alan Dantas
No ratings yet
Alan Dantas
12 pages
1Z0-1127-24 OCI Generative AI Professional
100% (1)
1Z0-1127-24 OCI Generative AI Professional
15 pages
Bridge Load Rating, Permitting and Posting
No ratings yet
Bridge Load Rating, Permitting and Posting
34 pages
Java Full Stack Handbook
No ratings yet
Java Full Stack Handbook
31 pages
Database Administration WDDBA Level 4 Coc
No ratings yet
Database Administration WDDBA Level 4 Coc
4 pages
DC16 Ch11
No ratings yet
DC16 Ch11
65 pages
Advanced SQL Injection
No ratings yet
Advanced SQL Injection
93 pages
Selection Strategies Materials Processes Ashby Brechet Cebon Salvo
No ratings yet
Selection Strategies Materials Processes Ashby Brechet Cebon Salvo
17 pages

UNIT II - Emerging Technology

Uploaded by

UNIT II - Emerging Technology

Uploaded by

Platform used in Delivery of Information Services

Delivery of information services

Characteristics of Big Data:

Challenges Associated with Big Data

1. Data Privacy and Security

Explain sources of big data

1. Social Media Data

2. Machine Data (IoT Devices and Sensors)

 Description: Data generated from business transactions, such as sales, purchases,

4. Web and Clickstream Data

5. Email and Text Messages

6. Media and Multimedia Data

10. Scientific Research Data

 Description: Data generated from research activities, experiments, simulations, and

11. Financial and Market Data

12. Telecommunications Data

 Description: Logs generated by computer systems, servers, and software applications to

Types of Big Data

Explain benefits/Important of Big data

2. Improved Operational Efficiency

 Benefit: Big Data allows companies to streamline their operations by identifying

3. Personalized Customer Experiences

4. Innovation and Product Development

6. Real-Time Monitoring and Insights

9. Enhanced Marketing Strategies

10. Better Healthcare Outcomes

11. Regulatory Compliance and Reporting

12. Supply Chain Optimization

Explain technical requirements of big data

1. Data Storage Infrastructure

2. Data Processing Frameworks

 NoSQL Databases: These databases provide flexibility in handling unstructured and

4. Data Integration Tools

 Machine Learning & AI Tools: To perform advanced analytics such as predictive

6. Data Security and Privacy

9. Data Backup and Disaster Recovery

10. High-Performance Networking

 Bandwidth: Sufficient bandwidth to handle large-scale data transfers in real-time.

11. Real-Time Processing Capabilities

Summary of Big Data Technical Requirements

Explain big data infrastructures

Key Characteristics of Cloud Computing:

1. On-Demand Self-Service: Users can access computing resources as needed, without

Types of Cloud Computing Services:

1. Infrastructure as a Service (IaaS): Provides virtualized computing resources like servers,

Various types of deployment models.

Benefits of Cloud Computing:

Explain virtualization in the cloud computing

Virtualization is the process of abstracting physical hardware resources to create virtual

2. How Virtualization Works

 Hypervisor: Virtualization relies on a software layer known as a hypervisor, which sits

3. Benefits of Virtualization in Cloud Computing

While virtualization offers numerous benefits, it also presents some challenges:

Virtualization is a technology that allows multiple virtual instances of operating systems,

 Description: Desktop virtualization allows users to run a virtual desktop environment on

 Description: Storage virtualization abstracts and combines multiple physical storage

 Description: Network virtualization abstracts and combines network resources to create

 Description: Hardware virtualization involves creating virtual versions of physical

8. OS-Level Virtualization (Containerization)

 Description: Combines elements of various virtualization types to create a flexible

You might also like