UNIT II - Emerging Technology
UNIT II - Emerging Technology
A Service Delivery Platform (SDP) is a platform that provides a structure for service delivery,
including controls for service sessions and protocols for service use. Quality of IT service
delivery is gauged by metrics included in a service-level agreement (SLA).
Examples of Information Technology Services and their Platforms used for service Delivery
Cloud Services (Platform: Amazon Web Services (AWS), Google Cloud Platform (GCP),
Microsoft Azure).
Cloud services are infrastructure, platforms, or software that are hosted by third-party
providers and made available to users through the internet.
Backup & Disaster Recovery (Platform: IBM Spectrum Protect).
Data backup is the process of replicating files to be stored at a designated location.
Disaster recovery is a system that helps restore those files following a catastrophe.
Help Desk Support (Platform: Zendesk, HappyFox, Help Scout, SolarWinds Service
Desk, JIRA Service Management, Salesforce Service Cloud, SysAid, Vivantio, Zoho Desk,
Freshdesk).
A help desk is the individual, group, organizational function or external service that
an IT user calls to get help with a problem.
Computer Training (Platform: TalentLMS, EdApp, Looop, Tovuti, BRIDGE,
Classe365, Auzmor Learn, Schoox, 360Learning).
Computer Training is an instruction provided for the purpose of enhancing an
individual's ability to use computers for learning and functioning.
IT Consulting (Platform: Catalant, MeasureMatch, Zintro, TopTal, GuidePoint, Maven,
DeepBench).
IT consulting services are advisory services that help clients assess different technology
strategies and, in doing so, align their technology strategies.
Volume: The sheer amount of data generated is massive, often measured in terabytes,
petabytes, or even exabytes. Volume refers to how much data is created and stored, requiring
robust storage and processing capacities.
Velocity: The speed at which data is generated and processed. Many applications, like social
media or IoT devices, produce data in real-time, necessitating systems that can handle rapid data
inflows and deliver insights on-the-fly.
Variety: Big Data comes from a variety of sources and exists in multiple formats, such as
structured, semi-structured, and unstructured data. Examples include text, images, audio, and
sensor data, demanding flexible tools to manage different formats.
Veracity: The reliability and accuracy of data. Big Data often has inconsistencies or
uncertainties due to its varied sources. Veracity emphasizes the need for cleaning and validating
data to ensure high-quality insights.
Value: The ultimate goal of Big Data is to extract valuable insights that drive decision-
making. It's critical to determine whether the data will contribute meaningful information to
support business and operational goals.
Applications of Big Data
1. Healthcare
Example: Analyzing patient data to predict disease outbreaks, personalize
treatment plans, and improve patient outcomes.
2. Finance
Example: Detecting fraudulent transactions, assessing credit risks, and optimizing
investment strategies through real-time data analysis.
3. Retail
Example: Enhancing customer experiences by personalizing recommendations,
managing inventory efficiently, and optimizing supply chains.
4. Manufacturing
Example: Predictive maintenance of machinery, improving product quality, and
optimizing production processes through data analytics.
5. Transportation and Logistics
Example: Optimizing routes, managing fleet operations, and improving delivery
times by analyzing traffic patterns and shipment data.
6. Energy
Example: Monitoring energy consumption, predicting equipment failures, and
optimizing energy distribution using data from smart grids and sensors.
7. Government and Public Sector
Example: Enhancing public services, improving urban planning, and ensuring
public safety through the analysis of large-scale data from various sources.
Hadoop: An open-source framework that allows for the distributed processing of large
datasets across clusters of computers.
Apache Spark: A fast and general-purpose cluster computing system for Big Data
processing.
NoSQL Databases: Databases like MongoDB and Cassandra designed to handle
unstructured and semi-structured data.
Machine Learning and AI: Advanced algorithms that can analyze Big Data to identify
patterns and make predictions.
Cloud Computing: Platforms like Amazon Web Services (AWS), Google Cloud Platform
(GCP), and Microsoft Azure provide scalable storage and processing power for Big Data.
Big Data comes from a wide range of sources, both traditional and emerging, contributing to the
vast volume, variety, and velocity of data. These sources are essential in generating the vast
datasets that organizations use for analysis to drive insights and decision-making. Here’s a
breakdown of the key sources of Big Data:
Description: Social media platforms generate huge amounts of data in the form of posts,
likes, comments, shares, and interactions.
Examples: Facebook, Twitter, Instagram, LinkedIn, TikTok.
Data Type: Unstructured and semi-structured (text, images, videos, hashtags, etc.).
Use Cases: Analyzing customer sentiment, monitoring brand performance, tracking trends,
and targeted advertising.
Description: Internet of Things (IoT) devices and sensors embedded in machines, vehicles,
and other objects produce continuous streams of data.
Examples: Smart thermostats, fitness trackers, connected cars, industrial equipment
sensors, smart home devices.
Data Type: Semi-structured and unstructured (log files, sensor readings, GPS data).
Use Cases: Predictive maintenance, smart city management, real-time monitoring, and
automation in manufacturing and logistics.
3. Transactional Data
Description: Data generated from users' interactions with websites and online services,
capturing every click, visit, or action on a webpage.
Examples: Website traffic logs, e-commerce behavior, online advertising click-through
rates.
Data Type: Semi-structured and unstructured (clickstream data, page views, time spent on
site).
Use Cases: User experience optimization, personalized recommendations, digital
marketing, and customer journey analysis.
Description: Communication data from emails, SMS, and instant messaging platforms
provide large volumes of unstructured data.
Examples: Gmail, Outlook, Slack, WhatsApp, SMS.
Data Type: Unstructured (text, attachments, metadata).
Use Cases: Customer support analysis, communication pattern studies, and spam filtering.
Description: Audio, video, and image content from streaming platforms, TV, radio, and
social media contribute to Big Data.
Examples: YouTube videos, Spotify playlists, images on Instagram, podcasts.
Data Type: Unstructured (video, audio, images).
Use Cases: Video content recommendations, image recognition, sentiment analysis from
multimedia, and personalized media suggestions.
7. Mobile Data
Description: Mobile devices generate large amounts of data from apps, calls, text
messages, and location tracking.
Examples: App usage data, GPS data, call logs, mobile payments.
Data Type: Semi-structured and unstructured (geolocation, app logs, user interactions).
Use Cases: Mobile advertising, location-based services, app optimization, and
personalized mobile experiences.
8. Government and Public Data
Description: Public records, government reports, surveys, and open data initiatives
contribute to Big Data, providing insights for policy-making and public services.
Examples: Census data, weather data, traffic data, economic reports.
Data Type: Structured and semi-structured (spreadsheets, databases, text files).
Use Cases: Urban planning, healthcare analysis, public service improvement, and policy
formulation.
9. Healthcare Data
Description: Data from healthcare systems, electronic health records (EHR), medical
imaging, and patient monitoring devices.
Examples: Hospital records, EHRs, lab results, fitness wearables.
Data Type: Structured, semi-structured, and unstructured (medical records, patient
histories, images like X-rays or MRIs).
Use Cases: Personalized medicine, disease outbreak prediction, healthcare optimization,
and patient treatment tracking.
Description: Data from financial markets, trading activities, and investment platforms,
providing insights into economic trends.
Examples: Stock prices, forex trading data, economic indicators, cryptocurrency
transactions.
Data Type: Structured (numeric and categorical data).
Use Cases: Stock market analysis, risk management, portfolio optimization, and fraud
detection.
Description: Data from telecommunications services, including call records, data usage,
and internet activity logs.
Examples: Call detail records (CDRs), internet traffic data, mobile network data.
Data Type: Semi-structured (logs, timestamps, geolocation).
Use Cases: Network optimization, customer behavior analysis, targeted promotions, and
service quality improvements.
13. Log and Machine Data
Big Data can be categorized based on the type and structure of the data, as well as how it is
generated and processed. Broadly, there are three main types of Big Data:
1. Structured Data
Definition: Structured data refers to data that is highly organized and can easily be stored
in a predefined format, typically in rows and columns (as in databases or spreadsheets). It
is the most straightforward type of data to collect, store, and analyze using traditional
tools.
Characteristics:
Organized into fixed fields.
Easily searchable and manageable using query languages like SQL.
Examples:
Customer details in a database (e.g., names, addresses, phone numbers).
Financial transactions (e.g., sales records, payment details).
Inventory data (e.g., product codes, quantities).
Use Cases:
Customer relationship management (CRM) systems.
Enterprise Resource Planning (ERP) systems.
Online transaction processing (OLTP).
2. Unstructured Data
Definition: Unstructured data refers to data that lacks a predefined format or
organization. It cannot easily be stored in traditional databases because it is not organized
into rows and columns. Unstructured data is typically larger in volume and more
complex, requiring specialized tools for processing and analysis.
Characteristics:
Unorganized and not easily searchable.
Includes various formats such as text, images, audio, and video.
Requires advanced analytics techniques like natural language processing (NLP) or
image recognition for analysis.
Examples:
Social media posts (e.g., Facebook updates, tweets, Instagram captions).
Emails and text messages.
Multimedia files (e.g., images, videos, audio recordings).
Documents (e.g., PDFs, Word files).
Use Cases:
Sentiment analysis from social media posts.
Customer support and feedback analysis.
Video and image recognition in security systems or marketing.
3. Semi-Structured Data
Definition: Semi-structured data falls between structured and unstructured data. It does
not follow the strict organization of structured data, but it has some organizational
properties, such as tags or markers, that make it easier to categorize and process. Semi-
structured data often comes in formats like JSON or XML.
Characteristics:
Contains metadata or markers that provide some level of structure.
Not as rigid as structured data but more organized than unstructured data.
Examples:
XML files and JSON data from APIs.
Email with metadata (e.g., sender, recipient, timestamp).
Log files generated by systems or devices.
Sensor data with timestamps.
Use Cases:
Web data extraction from XML and JSON formats.
Email filtering and classification.
Log analysis for system monitoring and cybersecurity.
Summary Table
Type of Description Examples Analysis
Data Techniques
Structured Highly organized, predefined Relational databases, SQL, basic analytics.
Data format (rows and columns). sales records, customer
info.
Unstructured Unorganized, lacks a predefined Social media posts, NLP, machine learning,
Data format. emails, videos, images. AI, image and audio
recognition.
Semi- Partially organized, uses markers XML, JSON, log files, Tools that process
Structured (e.g., tags) to categorize data. sensor data. metadata, such as big
Data data tools (Hadoop).
1. Enhanced Decision-Making
Benefit: Big Data analytics helps organizations make more informed, data-driven
decisions based on real-time and historical information.
How: By analyzing large datasets, businesses can uncover patterns, trends, and correlations
that lead to actionable insights.
Example: Retailers use customer purchasing history and trends to predict demand, adjust
inventory, and optimize pricing strategies.
Benefit: Big Data enables businesses to offer more personalized products and services by
analyzing customer preferences and behavior.
How: Companies can use data from customer interactions (e.g., purchase history, browsing
behavior, feedback) to tailor marketing campaigns, product recommendations, and
customer service.
Example: E-commerce platforms like Amazon and Netflix use Big Data to provide
personalized product or content recommendations based on user activity and preferences.
Benefit: Big Data helps organizations identify new business opportunities, develop
innovative products, and improve existing services.
How: By analyzing market trends, customer feedback, and competitor strategies,
companies can identify unmet needs or areas for innovation.
Example: The automotive industry uses Big Data to develop smart, connected vehicles
with advanced features like autonomous driving and real-time traffic monitoring.
5. Cost Reduction
Benefit: Big Data analytics helps reduce costs by optimizing operations, reducing waste,
and improving efficiency.
How: By analyzing data, companies can minimize unnecessary spending, optimize
resource allocation, and improve supply chain management.
Example: Logistics companies like UPS use Big Data to optimize delivery routes, reduce
fuel consumption, and cut operational costs.
Benefit: Big Data enables real-time monitoring of systems, processes, and customer
behavior, leading to faster response times and immediate action.
How: With the ability to process and analyze data in real-time, organizations can identify
issues or opportunities and respond immediately.
Example: Financial institutions use Big Data for real-time fraud detection, identifying
unusual transaction patterns and mitigating risks instantly.
7. Risk Management
Benefit: Big Data helps organizations assess risks more accurately and proactively manage
them.
How: Analyzing historical data, external factors, and predictive models enables businesses
to anticipate potential risks and develop strategies to mitigate them.
Example: Insurance companies use Big Data to assess customer risk profiles, improve
underwriting processes, and prevent fraudulent claims.
8. Competitive Advantage
Benefit: Organizations that harness Big Data gain a competitive edge by understanding
market trends, customer preferences, and operational efficiency better than their
competitors.
How: Big Data provides insights into customer behavior, competitor performance, and
market conditions, enabling businesses to adjust strategies quickly and stay ahead.
Example: Retailers like Walmart use Big Data to monitor and adjust pricing dynamically,
providing better deals and staying competitive in the market.
Benefit: Big Data enables companies to refine their marketing strategies by targeting
specific audiences more effectively.
How: By analyzing customer demographics, purchase history, and online behavior,
businesses can create targeted and personalized marketing campaigns.
Example: Social media platforms use Big Data to serve personalized ads to users based on
their interests, browsing history, and interactions.
Benefit: Big Data helps organizations comply with regulations by providing accurate and
comprehensive reporting, reducing the risk of non-compliance.
How: Companies can automate the process of gathering, analyzing, and reporting data to
meet regulatory standards.
Example: Financial institutions use Big Data to ensure compliance with anti-money
laundering (AML) and Know Your Customer (KYC) regulations by monitoring and
analyzing transactional data.
Benefit: Big Data helps organizations manage and optimize their supply chains, leading to
better inventory management, reduced costs, and improved supplier relationships.
How: Analyzing data from suppliers, logistics, and sales can improve demand forecasting,
reduce delays, and enhance overall supply chain performance.
Example: Retailers like Amazon use Big Data to optimize their supply chains, ensuring
fast delivery times and minimizing inventory shortages.
The technical requirements for Big Data refer to the infrastructure, tools, and technologies
needed to store, process, manage, and analyze large and complex datasets. Meeting these
requirements ensures that organizations can handle the Volume, Velocity, Variety, and Veracity
of Big Data effectively. Below are the key technical requirements of Big Data:
Big Data requires specialized storage systems to accommodate the large volume and diverse
formats of data.
Distributed File Systems: Big Data often exceeds the capacity of traditional storage
systems. Distributed file systems, like the Hadoop Distributed File System (HDFS), store
data across multiple nodes to manage massive datasets efficiently.
Example: HDFS (Hadoop Distributed File System).
Data Lakes: Centralized repositories that store structured and unstructured data in its raw
form, allowing for flexible analysis.
Example: Google Cloud Storage.
Cloud Storage: Provides scalable and cost-effective storage for Big Data, enabling
organizations to store large amounts of data without investing in physical infrastructure.
Example: AWS S3, Google Cloud Storage, Microsoft Azure Blob Storage.
Processing Big Data requires high-performance frameworks that can manage and analyze large-
scale datasets in parallel across distributed systems.
Batch Processing: For processing large datasets in bulk. This is suitable for use cases
where real-time processing is not required.
Example: Apache Hadoop (MapReduce framework).
Stream Processing: For real-time data processing where insights are needed
immediately.
Example: Apache Kafka, Apache Storm, Apache Flink.
In-Memory Processing: For faster data processing by keeping data in memory (RAM)
rather than on disk.
Example: Apache Spark, Apache Ignite.
3. Scalable Databases
Big Data requires databases capable of scaling horizontally to accommodate the growing data
volume and diverse data types.
To extract, transform, and load (ETL) data from different sources into a central storage or
processing system, Big Data requires powerful data integration tools.
ETL Tools: Extract, transform, and load tools that automate data extraction from various
sources, transformation into a consistent format, and loading into data warehouses or
lakes.
Example: Apache NiFi, Talend, Informatica.
Data Ingestion Tools: Tools to bring in data from multiple sources in real-time or
batches.
Example: Apache Kafka, Apache Flume, Amazon Kinesis.
5. Data Analytics and Visualization Tools
Big Data requires advanced analytics tools to derive insights and make data comprehensible for
decision-makers.
Big Data requires robust security measures to protect sensitive information and ensure data
privacy, especially when dealing with large, distributed systems.
Encryption: Ensures that data is protected at rest (when stored) and in transit (when
moving between systems).
Example: SSL/TLS for data in transit, AES encryption for data at rest.
Access Control: Defines who can access the data, ensuring proper authentication and
authorization mechanisms.
Example: Role-Based Access Control (RBAC), Kerberos, LDAP.
Data Masking & Anonymization: Protects sensitive data, especially for compliance
with privacy regulations like GDPR or HIPAA.
Example: Data masking tools, pseudonymization.
7. Scalability
The infrastructure for Big Data must be scalable to handle increasing data volume and velocity
without compromising performance.
Horizontal Scalability: Scaling by adding more servers or nodes to handle the data load.
Cloud Services: Cloud platforms like AWS, Google Cloud, and Microsoft Azure provide
on-demand scalability, allowing organizations to expand or reduce resources as needed.
Load Balancing: Distributes workloads across multiple servers to ensure no single system
becomes overwhelmed.
Example: AWS Elastic Load Balancer, Nginx.
8. Data Governance
Ensuring data quality, consistency, and compliance across a vast amount of data is critical.
Metadata Management: Tools to manage and track metadata, ensuring that data can be
easily understood, accessed, and used correctly.
Example: Apache Atlas, Talend Data Catalog.
Data Quality Tools: Ensure the accuracy, completeness, and reliability of data.
Example: Talend, Informatica, Trifacta.
Compliance: Managing and enforcing data usage policies to adhere to industry
regulations (GDPR, CCPA, etc.).
Big Data systems require comprehensive backup and recovery plans to prevent data loss and
ensure business continuity.
Backup Systems: Ensures that data is regularly backed up, with the ability to restore it
quickly.
Example: AWS Backup, Azure Backup, Google Cloud Backup and DR.
Disaster Recovery: Planning and tools to recover data in case of system failure, ensuring
minimum downtime.
Example: AWS Disaster Recovery, Google Cloud Disaster Recovery.
With the high volume of data flowing in and out of systems, high-speed networks are required
for data transfer and communication between distributed systems.
For scenarios where instant insights are required (such as financial markets or healthcare
monitoring), real-time data processing is a key requirement.
Low-Latency Frameworks: Systems that can process data with low latency to ensure
immediate responses.
Example: Apache Kafka, Apache Flink, Amazon Kinesis.
Edge Computing: Performing data processing closer to where the data is generated (IoT
devices, sensors) to reduce latency and bandwidth usage.
Example: AWS IoT, Microsoft Azure IoT Edge.
Cloud computing is a technology model that allows individuals and organizations to access and
use computing resources—such as servers, storage, databases, networking, software, and
analytics—over the internet (the "cloud") instead of owning and managing physical infrastructure
on-premises. These resources are provided on-demand and can be scaled up or down based on the
user's needs, typically through a subscription-based or pay-as-you-go pricing model.
1. Public Cloud: In this model, cloud resources like servers and storage are owned and
operated by a third-party cloud service provider (e.g., AWS, Microsoft Azure, and
Google Cloud) and delivered over the internet. It is a cost-effective solution since the
infrastructure is shared across multiple organizations, but it offers less control over data
and infrastructure. Cloud computing offers less control over data and infrastructure
primarily because the underlying hardware, network, and sometimes even the software
stack are owned, managed, and maintained by the cloud service provider (CSP).
Advantages: Low cost, no maintenance, high scalability, and flexibility.
Disadvantages: Limited control, potential security concerns, and less
customization.
2. Private Cloud: This model provides a dedicated environment exclusively for a single
organization. Private clouds can be hosted on-premises or by a third-party provider,
and they offer greater control, security, and customization options, making them
suitable for organizations with strict data privacy and security requirements.
Advantages: Enhanced security, greater control, and customization.
Disadvantages: Higher cost, maintenance, and less scalability compared to public
clouds.
3. Hybrid Cloud: A hybrid cloud combines elements of both public and private clouds,
allowing data and applications to be shared between them. This model is beneficial for
organizations that need flexibility in handling data and workloads, as they can keep
sensitive data on the private cloud while using the public cloud for less critical workloads.
Advantages: Flexibility, scalability, cost-effectiveness, and improved security for
sensitive data.
Disadvantages: Complex management and integration, potential security risks if
not managed properly.
4. Community Cloud: In this model, multiple organizations with similar goals, security, and
compliance requirements share infrastructure and resources. Community clouds are
typically managed by a third-party provider or one of the participating organizations,
offering a mix of the cost-effectiveness of public clouds with increased security controls
for shared data.
Advantages: Shared infrastructure costs, better collaboration, and security tailored
to a specific community’s needs.
Disadvantages: Limited control and scalability compared to private clouds, and
potential data security concerns among participants.
These models allow organizations to choose a cloud approach based on their security needs,
budget, and scalability requirements. Hybrid and multi-cloud approaches have also become
popular, allowing organizations to maximize the benefits of each model.
Cost Efficiency: Reduces the need for heavy upfront investments in hardware and
software.
Scalability: Offers flexible scaling based on demand without needing to invest in
additional infrastructure.
Disaster Recovery: Cloud providers offer data backup and disaster recovery solutions,
enhancing business continuity.
Accessibility: Cloud resources can be accessed from anywhere with an internet connection,
promoting remote work and collaboration.
Automatic Updates: Cloud providers handle software updates, security patches, and
infrastructure maintenance.
Cloud computing has transformed the way businesses and individuals access and use technology,
enabling greater agility, scalability, and cost efficiency in managing IT resources.
Virtualization is a core technology that underpins cloud computing, allowing for efficient
resource utilization and management. It involves creating virtual versions of physical resources,
enabling multiple operating systems or applications to run on a single physical machine.
Here’s a comprehensive explanation of virtualization in the context of cloud
computing:
1. Definition of Virtualization
Resource Optimization: Virtualization allows multiple VMs to share the same physical
resources, leading to better utilization of hardware. This reduces the need for additional
physical servers, lowering costs.
Scalability: Virtualization enables rapid provisioning and de-provisioning of VMs,
allowing cloud services to scale up or down quickly based on demand.
Isolation: Each VM operates in its own environment, ensuring that applications and
processes running on one VM do not interfere with others. This enhances security and
stability.
Flexibility and Agility: Virtualization allows for the quick deployment of new services
and applications. Organizations can test new applications in isolated VMs before deploying
them in a production environment.
Disaster Recovery and Backup: Virtualized environments can be easily backed up and
restored, facilitating disaster recovery processes. Snapshots can be taken to preserve the
state of a VM at a specific point in time.
Cost Efficiency: By consolidating multiple VMs on fewer physical servers, organizations
can reduce hardware costs, energy consumption, and maintenance efforts.
4. Challenges of Virtualization
Performance Overhead: Running multiple VMs on a single physical server may lead to
performance degradation if the hardware resources are not properly managed.
Complex Management: Managing a virtualized environment can be complex, requiring
specialized tools and expertise to monitor and optimize performance.
Security Risks: Although VMs are isolated, vulnerabilities in the hypervisor or
misconfigured settings can lead to security risks, making proper security measures
essential.
Types of Virtualization
1. Server Virtualization
Description: Server virtualization divides a physical server into multiple virtual servers
(virtual machines or VMs), each running its own operating system and applications.
Purpose: Maximizes resource utilization, reduces hardware costs, and simplifies
management.
Examples: VMware vSphere, Microsoft Hyper-V, Oracle VM.
2. Desktop Virtualization
3. Application Virtualization
Description: Application virtualization allows applications to run in isolated
environments without being installed directly on the operating system. This can be
achieved through streaming or encapsulation.
Purpose: Simplifies application deployment and management, enhances compatibility,
and reduces conflicts between applications.
Examples: Microsoft App-V, Citrix Application Streaming.
4. Storage Virtualization
5. Network Virtualization
6. Data Virtualization
Description: Data virtualization enables access to data from multiple sources without the
need for physical data movement or replication. It presents a unified view of data,
regardless of where it resides.
Purpose: Simplifies data access, improves data integration, and enhances real-time
analytics.
Examples: Denodo, IBM Cloud Pak for Data.
7. Hardware Virtualization
9. Hybrid Virtualization