0% found this document useful (0 votes)

8 views17 pages

Fromat

The document provides an overview of various big data technologies including Apache Hadoop, Apache Storm, Apache Cassandra, CouchDB, Apache Flink, Cloudera, and Apache Hive. Each technology is defined, structured, and explained in terms of data storage, retrieval, and application use cases, along with their advantages and disadvantages. The document serves as a comprehensive guide for understanding the functionalities and characteristics of these technologies in the context of big data processing and management.

Uploaded by

kishorekarthi1311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views17 pages

Fromat

Uploaded by

kishorekarthi1311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

1.

Apache Hadoop

 Definition: An open-source framework that allows for the distributed processing of

large datasets across clusters of computers using simple programming models. It's
the foundation for many other big data technologies.

 Structure: Primarily composed of:

o HDFS (Hadoop Distributed File System): A distributed file system that stores
data across multiple machines.

o YARN (Yet Another Resource Negotiator): Manages resources and schedules

jobs on the cluster.

o MapReduce: A programming model for processing large datasets in a

distributed and parallel manner.

 How Data is Stored and Retrieved:

o Storage: Data is broken into blocks (typically 128MB or 256MB) and

distributed across nodes in the HDFS cluster. Each block is replicated (default
3 times) for fault tolerance.

o Retrieval: When a MapReduce job runs, the processing logic (Map and
Reduce tasks) is moved to the nodes where the data resides (data locality) to
minimize network I/O.

 Simple Example: Imagine you have billions of lines of log data from a website, and
you want to count how many times each IP address accessed your site.

o Storage: The log files are split and stored across many servers by HDFS.

o Retrieval/Processing (conceptual MapReduce):

 Map: Each server reads its portion of the log file and outputs (IP
address, 1) for every access.

 Shuffle & Sort: All (IP address, 1) pairs with the same IP address are
grouped and sent to the same reducer.

 Reduce: Each reducer counts the '1's for its assigned IP addresses,
resulting in (IP address, total_count).

 Sample Query (Conceptual): Hadoop doesn't have a direct query language like SQL.
Instead, you write MapReduce jobs (often in Java, Python) or use higher-level tools
like Hive or Pig.

o Implicit in the example above: Count occurrences of each unique IP address.

 Kind of Data Stored: Can store structured, semi-structured, and unstructured data
(e.g., log files, images, videos, social media data, sensor data).

 Characteristics: Scalable, fault-tolerant, cost-effective (uses commodity hardware),

flexible (schema-on-read).

 Applications Used In: Large-scale data processing, data warehousing, log analysis,
machine learning data preparation, fraud detection, risk management.

 Advantages:

o Handles massive volumes of data (petabytes to exabytes).

o Highly fault-tolerant due to data replication.

o Scales horizontally by adding more commodity hardware.

o Cost-effective compared to traditional data warehousing.

 Disadvantages:

o Batch processing orientation; not suitable for real-time processing.

o MapReduce can be complex to program directly.

o Higher latency for small data queries.

o Security and data governance can be challenging to implement

comprehensively.

2. Apache Storm

 Definition: An open-source distributed real-time computation system for processing

unbounded streams of data. It's often referred to as the "Hadoop of real-time."

 Structure:

o Nimbus: The master node that distributes code, assigns tasks, and monitors
the cluster.

o Supervisors: Worker nodes that execute assigned tasks.

o Topologies: The logic of a real-time application, composed of:

 Spouts: Data sources (e.g., Kafka, Kinesis).

 Bolts: Processing units that perform operations (filtering, aggregation,

joining).

 How Data is Stored and Retrieved: Storm is primarily for processing data in motion,
not for persistent storage. Data is ingested from sources (spouts) and flows through
the topology (bolts) in real-time. Results are typically output to another system (e.g.,
a database, messaging queue).

 Simple Example: Analyzing real-time Twitter feeds for trending hashtags.

o Spout: Connects to Twitter API and streams tweets.

o Bolt 1 (Parse Tweet): Extracts hashtags from each tweet.

o Bolt 2 (Count Hashtag): Increments a counter for each hashtag.

o Bolt 3 (Output Trending): Periodically outputs the top N trending hashtags.

 Sample Query (Conceptual): Storm doesn't use queries; you define a data flow
topology.

o The example above outlines the processing logic.

 Kind of Data Stored: Processes continuous streams of data, typically semi-structured

or unstructured (e.g., sensor readings, clickstreams, financial transactions, social
media updates).

 Characteristics: Real-time, fault-tolerant, scalable, low-latency.

 Applications Used In: Real-time analytics, continuous computation, distributed RPC,

ETL.

 Advantages:

o Processes data in real-time with very low latency.

o Guaranteed data processing (at least once or exactly once).

o Highly scalable and fault-tolerant.

o Can integrate with various data sources and sinks.

 Disadvantages:

o Can be complex to set up and manage.

o Not designed for batch processing.

o Debugging distributed real-time systems can be challenging.

3. Apache Cassandra
 Definition: A free and open-source distributed NoSQL database management system
designed to handle large amounts of data across many commodity servers, providing
high availability with no single point of failure. It's a wide-column store.

 Structure: Peer-to-peer distributed architecture where all nodes are identical. Data is
distributed across nodes using consistent hashing (ring structure).

 How Data is Stored and Retrieved:

o Storage: Data is partitioned across nodes using a partition key. Replication

ensures data redundancy across multiple nodes. Writes are highly available
and fast, as data is written to multiple replicas concurrently.

o Retrieval: Queries are directed to any node in the cluster (coordinator node),
which then forwards the request to the nodes holding the relevant data. Data
is retrieved from one or more replicas based on the configured consistency
level.

 Simple Example: Storing user profile data for a large social media application.

o Table Schema (simplified): CREATE TABLE users (user_id UUID PRIMARY KEY,
username text, email text, age int, city text);

o Storage: When a new user signs up, their user_id acts as the partition key,
determining which node(s) store their data. The data is replicated based on
the replication factor.

o Retrieval: To get a user's profile: SELECT * FROM users WHERE user_id =

<user_uuid>; Cassandra quickly finds the node(s) holding that user_id and
retrieves the data.

 Sample Query:

Code snippet

INSERT INTO users (user_id, username, email, age, city) VALUES (uuid(), 'johndoe',
'john@example.com', 30, 'New York');

SELECT username, email FROM users WHERE user_id = 123e4567-e89b-12d3-a456-

426614174000;

 Kind of Data Stored: Semi-structured data, often denormalized. Excellent for time-
series data, sensor data, and applications requiring high write throughput and
continuous availability.

 Characteristics: Highly scalable, high availability, eventually consistent (tunable

consistency), high write throughput, no single point of failure.
 Applications Used In: Real-time recommendations, IoT data, social media
applications, messaging systems, fraud detection, customer 360 views.

 Advantages:

o Linear scalability for both reads and writes.

o Always-on architecture with no single point of failure.

o Flexible schema design.

o Excellent for geographically distributed data.

 Disadvantages:

o Eventual consistency can be a challenge for applications requiring strong

consistency.

o Joins and complex queries are not directly supported.

o Requires careful data modeling for efficient queries.

4. CouchDB (Apache CouchDB)

 Definition: An open-source NoSQL database that focuses on ease of use and a multi-
master replication model. It stores data in JSON documents and provides a RESTful
HTTP API for interaction.

 Structure: Document-oriented database. Data is stored as self-contained JSON

documents. Replication is a core feature, allowing for master-master or master-slave
setups.

 How Data is Stored and Retrieved:

o Storage: Each document has a unique ID and a revision ID. When a document
is updated, a new revision is created. This allows for optimistic concurrency
control (MVCC).

o Retrieval: Data is retrieved via HTTP requests to the document's URL or by

using "views" (MapReduce functions written in JavaScript) to query and
transform data.

 Simple Example: Storing blog posts.

o Storage: A blog post is a JSON document:

JSON

"_id": "post_123",

"title": "My First Blog Post",

"author": "Alice",

"content": "This is the content of my post.",

"tags": ["blogging", "tutorial"]

o Retrieval:

 Get a specific post: GET /mydb/post_123

 Find all posts by "Alice" (using a view): You'd define a map function
that emits [doc. author, doc.title] and then query that view.

 Sample Query (using curl for HTTP API):

Bash

curl -X PUT http://localhost:5984/mydb/post_123 -d '{ "title": "My First Blog Post", "author":
"Alice", "content": "This is the content." }' -H "Content-Type: application/json"curl
http://localhost:5984/mydb/post_123

 Kind of Data Stored: Semi-structured data in JSON format, including nested

structures and binary attachments.

 Characteristics: Document-oriented, eventually consistent, master-master

replication, offline-first capabilities, RESTful API.

 Applications Used In: Mobile applications (offline sync), web applications, content
management systems, CRM.

 Advantages:

o Easy to set up and use with a simple RESTful API.

o Excellent for distributed and offline-first applications due to robust

replication.

o High availability through multi-master replication.

o Flexible schema.
 Disadvantages:

o Limited query capabilities compared to SQL databases.

o Views can be slow for complex aggregations as they are pre-computed.

o Not ideal for highly relational data.

5. Apache Flink

 Definition: An open-source stream processing framework that can handle both

bounded (batch) and unbounded (streaming) data sets with high throughput and low
latency. It provides stateful computations.

 Structure: A Flink application consists of a dataflow graph, composed of sources,

transformations, and sinks. It runs on a cluster with JobManagers (master) and
TaskManagers (workers).

 How Data is Stored and Retrieved: Flink primarily processes data in motion. While it
maintains state for computations (e.g., counts, sums over windows), this state is
typically stored in memory or on local disk (RocksDB) and periodically checkpointed
to a persistent storage (like HDFS or S3) for fault tolerance. It doesn't act as a primary
data store.

 Simple Example: Detecting fraudulent credit card transactions in real-time.

o Source: Ingests credit card transactions as they occur.

o Transformation 1 (Windowing): Groups transactions for a user within a

specific time window (e.g., 5 minutes).

o Transformation 2 (Fraud Logic): Checks if the sum of transactions in the

window exceeds a threshold or if suspicious patterns are observed.

o Sink: Outputs suspicious transactions to an alert system.

 Sample Query (Conceptual - using Flink's Table API/SQL):

SQL

-- Assuming 'transactions' is a streaming table

SELECT userId, SUM(amount)

FROM transactions

GROUP BY TUMBLE(proctime, INTERVAL '5' MINUTE), userId

HAVING SUM(amount) > 1000;

 Kind of Data Stored (processed): Primarily unbounded data streams (e.g., IoT sensor
data, financial market data, web clickstreams, log data) and bounded batch data.

 Characteristics: Stateful stream processing, exactly-once processing guarantees, low

latency, high throughput, fault-tolerant, supports various time semantics (event time,
processing time).

 Applications Used In: Real-time analytics, event-driven applications, fraud detection,

monitoring, ETL, machine learning.

 Advantages:

o True stream processing capabilities with stateful operations.

o Guaranteed exactly-once processing, even in case of failures.

o Handles both batch and stream processing with a unified API.

o High performance and low latency.

 Disadvantages:

o Can have a steep learning curve due to its advanced concepts (state
management, time).

o Resource-intensive for very large state.

o Operational complexity in managing clusters.

6. Cloudera

 Definition: A company that provides an enterprise data platform built on open-

source technologies like Hadoop, Spark, Hive, Impala, etc. It simplifies the
deployment, management, and use of these complex big data ecosystems.

 Structure: Cloudera's platform (Cloudera Data Platform - CDP) integrates various

open-source components, providing a unified platform for data engineering, data
warehousing, machine learning, and operational databases. It offers management
tools (Cloudera Manager) and security features (Cloudera SDX).

 How Data is Stored and Retrieved: Cloudera itself doesn't store data directly; it
orchestrates and manages data stored in underlying systems like HDFS, S3, or other
compatible storage. Retrieval depends on the specific component being used (e.g.,
Hive for SQL queries on HDFS, Impala for interactive SQL).

 Simple Example: An organization wants to set up a data lake and perform various
analytics.
o Cloudera's Role: Provides the software and tools to easily deploy HDFS for
storage, Hive for data warehousing, Spark for data processing, and Hue for a
web-based interface, all with integrated security and governance.

 Sample Query (depends on underlying tool, e.g., HiveQL via Cloudera Hue):

SQL

SELECT customer_id, SUM(order_total)

FROM sales_data

WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'

GROUP BY customer_id

HAVING SUM(order_total) > 1000;

 Kind of Data Stored: Supports all kinds of data (structured, semi-structured,

unstructured) as it leverages underlying technologies like HDFS.

 Characteristics: Enterprise-grade, unified platform, hybrid cloud support, strong

security and governance, focuses on data lifecycle.

 Applications Used In: Building data lakes, enterprise data warehousing, advanced
analytics, machine learning platforms, real-time dashboards.

 Advantages:

o Simplifies deployment and management of complex big data ecosystems.

o Provides enterprise-grade security, governance, and data lineage.

o Offers a comprehensive suite of tools for various data workloads.

o Supports hybrid and multi-cloud environments.

 Disadvantages:

o Can be expensive due to licensing and support costs.

o Requires significant hardware resources.

o Complexity can still be high for new users despite simplification.

7. Apache Hive
 Definition: A data warehouse software project built on top of Apache Hadoop for
querying and managing large datasets residing in distributed storage. It provides a
SQL-like language called HiveQL.

 Structure:

o Hive Metastore: Stores metadata (schema, location) of tables and partitions.

o Driver: Manages the lifecycle of a HiveQL query.

o Compiler: Parses HiveQL queries, performs semantic analysis, and generates

a logical plan.

o Optimizer: Transforms the logical plan into a series of MapReduce or

Tez/Spark jobs.

o Execution Engine: Executes the jobs on the Hadoop cluster.

 How Data is Stored and Retrieved:

o Storage: Data is stored in HDFS (or other compatible file systems like S3) in
various formats (e.g., TextFile, ORC, Parquet). Hive itself does not store the
data; it provides a schema and SQL interface over the data in HDFS.

o Retrieval: HiveQL queries are translated into MapReduce, Tez, or Spark jobs,
which then read the data from HDFS, process it, and return the results.

 Simple Example: Analyzing website clickstream data stored in HDFS.

o Storage: Raw clickstream logs (e.g., CSV files) are put into HDFS.

o Table Creation: CREATE EXTERNAL TABLE clickstream (timestamp STRING,

user_id INT, page_url STRING) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' STORED AS TEXTFILE LOCATION '/user/hadoop/clickstream/';

o Retrieval: SELECT page_url, COUNT(*) FROM clickstream GROUP BY page_url

ORDER BY COUNT(*) DESC LIMIT 10; (Find top 10 most visited pages).

 Sample Query:

SQL

SELECT customer_state, COUNT(order_id)

FROM orders

WHERE order_date >= '2024-06-01'

GROUP BY customer_state;

 Kind of Data Stored: Primarily structured and semi-structured data, often in large
batches. It can work with unstructured data if a schema is imposed on it at read time.
 Characteristics: SQL-like interface, batch processing, schema-on-read, integrates with
Hadoop, fault-tolerant.

 Applications Used In: Data warehousing, batch ETL, large-scale data analysis,
reporting, business intelligence.

 Advantages:

o Enables SQL users to query big data in Hadoop without writing complex code.

o Scalable and fault-tolerant by leveraging Hadoop.

o Supports a wide range of data formats.

o Good for long-running batch queries.

 Disadvantages:

o High latency for interactive queries (though improved with Tez/LLAP).

o Not suitable for transactional workloads or real-time processing.

o Schema-on-read can lead to performance issues if not carefully designed.

8. MongoDB

 Definition: A popular open-source NoSQL document database. It stores data in

flexible, JSON-like documents, which means fields can vary from document to
document, and the data structure can be changed over time.

 Structure: Document-oriented. Data is organized into collections (similar to tables),

which contain BSON (Binary JSON) documents. Supports sharding for horizontal
scalability and replication for high availability.

 How Data is Stored and Retrieved:

o Storage: Documents are stored in collections. Each document has a unique

_id field. MongoDB allocates data files and journals for durability. Sharding
distributes data across multiple servers (shards) based on a shard key.

o Retrieval: Queries are executed against collections using a rich query

language that supports various criteria, aggregation pipelines, and indexing.
Data can be retrieved based on specific field values, ranges, or using regular
expressions.

 Simple Example: Storing product catalog information for an e-commerce website.

o Storage: A product document:

JSON
{

"_id": ObjectId("65e4e7e7e7e7e7e7e7e7e7e7"),

"name": "Laptop Pro",

"category": "Electronics",

"price": 1200.00,

"features": ["16GB RAM", "512GB SSD", "Intel i7"],

"reviews": [

{"user": "Alice", "rating": 5, "comment": "Great laptop!"},

{"user": "Bob", "rating": 4, "comment": "Good performance."}

o Retrieval: db.products.find({"category": "Electronics", "price": {"$gt": 1000}})

 Sample Query:

JavaScript

db.users.insertOne({

"name": "Jane Doe",

"email": "jane@example.com",

"interests": ["reading", "hiking"]

});

db.users.find({"interests": "reading"}, {"name": 1, "email": 1});

 Kind of Data Stored: Semi-structured data in JSON/BSON format. Ideal for

hierarchical data and data with evolving schemas.

 Characteristics: Document-oriented, schema-less, highly scalable (sharding), high

performance, rich query language, high availability (replication).

 Applications Used In: Content management systems, e-commerce, mobile

applications, real-time analytics, social networking.

 Advantages:

o Flexible schema allows for rapid development and iteration.

o Scales horizontally with sharding.

o High performance for many read/write operations.

o Rich query language and aggregation framework.

o Easy to get started and use.

 Disadvantages:

o Joins are not natively supported (requires client-side joins or complex

aggregation pipelines).

o Can consume significant memory.

o Lacks ACID transactions for multi-document operations in older versions

(though improved in newer versions).

o Data redundancy can occur due to denormalization.

9. MySQL

 Definition: A widely used open-source relational database management system

(RDBMS). It stores data in structured tables with predefined schemas and enforces
ACID properties.

 Structure: Relational model, where data is organized into tables (relations) with rows
(records) and columns (attributes). Relationships between tables are defined using
primary and foreign keys. Uses storage engines like InnoDB (transactional) and
MyISAM.

 How Data is Stored and Retrieved:

o Storage: Data is stored in tables that conform to a predefined schema. Each

row represents a single record, and columns define the attributes and their
data types. Data files are managed by the storage engine.

o Retrieval: SQL (Structured Query Language) is used to interact with the

database. Queries specify which tables to access, what conditions to apply,
and how to order or aggregate the results.

 Simple Example: Managing customer orders.

o Table Creation:

SQL

CREATE TABLE Customers (

customer_id INT PRIMARY KEY,

name VARCHAR(255),

email VARCHAR(255)

);

CREATE TABLE Orders (

order_id INT PRIMARY KEY,

customer_id INT,

order_date DATE,

total_amount DECIMAL(10, 2),

FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)

);

o Storage: INSERT INTO Customers (customer_id, name, email) VALUES (1,

'Alice', 'alice@example.com');

o Retrieval: SELECT C.name, O.order_id, O.total_amount FROM Customers C

JOIN Orders O ON C.customer_id = O.customer_id WHERE C.name = 'Alice';

 Sample Query:

SQL

INSERT INTO Products (product_id, name, price) VALUES (101, 'Smartphone', 799.99);

UPDATE Products SET price = 749.99 WHERE product_id = 101;

SELECT name, price FROM Products WHERE price < 500 ORDER BY name ASC;

 Kind of Data Stored: Primarily structured data with a fixed schema. Best for
applications requiring strong consistency and transactional integrity.

 Characteristics: Relational, ACID compliant, mature, widely supported, good for

complex joins and aggregations.

 Applications Used In: Web applications (LAMP stack), e-commerce, CRM, ERP
systems, data warehousing (for smaller scale).

 Advantages:

o Strong data integrity with ACID properties.

o Well-established and widely supported with a large community.

o Excellent for complex queries and joins.

o Relatively easy to learn and use.

o High performance for many use cases.

 Disadvantages:

o Scalability challenges for extremely large datasets compared to NoSQL

databases.

o Less flexible schema compared to NoSQL.

o Can become a bottleneck for very high write throughput.

o Vertical scaling often means more expensive hardware.

10. Kaggle

 Definition: An online community and platform for data scientists and machine
learning enthusiasts. It's not a data storage or processing tool in itself, but a platform
that hosts data science competitions, provides datasets, and offers a collaborative
environment for machine learning development.

 Structure: A web-based platform where users can:

o Find Datasets: Access a vast repository of public datasets.

o Participate in Competitions: Solve real-world data science problems with

prizes.

o Share Code (Notebooks): Run Python/R code directly in the browser and
share with the community.

o Discuss: Engage in forums and discussions.

 How Data is Stored and Retrieved:

o Storage: Kaggle hosts datasets (CSV, JSON, images, etc.) on its platform. Users
upload their datasets or use existing ones.

o Retrieval: Users download datasets to their local machines or access them

directly within Kaggle Kernels/Notebooks (cloud-based computational
environments) where the data is readily available for analysis.

 Simple Example: Predicting house prices.

o Kaggle's Role: Provides a dataset of house features and prices. Users can
then:

 Download the dataset.

 Create a Kaggle Notebook.

 Write Python/R code to build a machine learning model (e.g., linear

regression, random forest) to predict prices.

 Submit their predictions to the competition leaderboard.

 Sample Query (Conceptual - within a Python/R notebook): Kaggle doesn't have a

direct query language. Data manipulation is done using programming libraries like
Pandas in Python.

Python

import pandas as pd

df = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/train.csv')

print(df.head())

print(df['SalePrice'].describe())

 Kind of Data Stored: Diverse datasets for data science and machine learning tasks,
often tabular data (CSV), images, text files, time-series data.

 Characteristics: Community-driven, collaborative, competition-focused, learning

platform, access to diverse datasets, cloud-based coding environment.

 Applications Used In: Machine learning model development, data exploration, skill
development, benchmarking ML algorithms, crowdsourcing solutions to data
problems.

 Advantages:

o Excellent for learning and practicing data science and machine learning.

o Access to a vast array of real-world datasets.

o Opportunities to collaborate and learn from a global community.

o Competitions provide motivation and a chance to win prizes.

o Cloud-based notebooks simplify environment setup.

 Disadvantages:

o Not a production-grade data management system.

o Focuses on individual model building rather than end-to-end data pipelines.

o Can be competitive, leading to a focus on leaderboard performance over

practical insights.

Karthiayinidva Notes
No ratings yet
Karthiayinidva Notes
29 pages
Unit 5
No ratings yet
Unit 5
14 pages
Assignment 6
No ratings yet
Assignment 6
12 pages
Unit 3
No ratings yet
Unit 3
7 pages
Bda 123
No ratings yet
Bda 123
36 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Unit 4 1
No ratings yet
Unit 4 1
7 pages
BDA Answers
No ratings yet
BDA Answers
6 pages
S - Hadoop Ecosystem
No ratings yet
S - Hadoop Ecosystem
14 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
NOSQL Databases
No ratings yet
NOSQL Databases
18 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
2 Module
No ratings yet
2 Module
14 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
Decomposing SMACK Stack
No ratings yet
Decomposing SMACK Stack
62 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
IAT-IV Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
No ratings yet
IAT-IV Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
9 pages
Class 8 - MongoDB, Neo4j, InfluxDB, Cassandra
No ratings yet
Class 8 - MongoDB, Neo4j, InfluxDB, Cassandra
2 pages
BDA Ass 3
No ratings yet
BDA Ass 3
8 pages
BIGDATA4
No ratings yet
BIGDATA4
28 pages
List of NOSQL Database
No ratings yet
List of NOSQL Database
23 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
32 pages
CT 2
No ratings yet
CT 2
8 pages
NoSQL Database Technology - A Survey and Comparison of Systems
No ratings yet
NoSQL Database Technology - A Survey and Comparison of Systems
44 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Bda QB (Mod 3)
No ratings yet
Bda QB (Mod 3)
6 pages
ADBMS
No ratings yet
ADBMS
19 pages
BDS Session 5 - NoSQL DB
No ratings yet
BDS Session 5 - NoSQL DB
51 pages
Top Big Data Platforms & Use Cases
No ratings yet
Top Big Data Platforms & Use Cases
9 pages
BDA Final
No ratings yet
BDA Final
23 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Unit II
No ratings yet
Unit II
8 pages
Hadoop-Compatible Projects
No ratings yet
Hadoop-Compatible Projects
7 pages
INTRO Hadoop-Ecosystem
No ratings yet
INTRO Hadoop-Ecosystem
6 pages
No SQL
No ratings yet
No SQL
49 pages
Data Management For Distributed Sensor Networks: A Literature Review
No ratings yet
Data Management For Distributed Sensor Networks: A Literature Review
68 pages
Big Data
No ratings yet
Big Data
6 pages
Nosql and Hadoop
No ratings yet
Nosql and Hadoop
42 pages
INS Assignments 1
No ratings yet
INS Assignments 1
4 pages
Hadoop Training in Bangalore
No ratings yet
Hadoop Training in Bangalore
38 pages
4.1 Intro Nosql-Converted-133751863122661863
No ratings yet
4.1 Intro Nosql-Converted-133751863122661863
43 pages
2.2. Components of Hadoop - Analysing
No ratings yet
2.2. Components of Hadoop - Analysing
16 pages
Big Data 2023
No ratings yet
Big Data 2023
18 pages
Unit Iii
No ratings yet
Unit Iii
22 pages
Unit 2
No ratings yet
Unit 2
15 pages
Big Data Visualization
No ratings yet
Big Data Visualization
55 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
8 pages
Hadoop
No ratings yet
Hadoop
83 pages
Bda Ut1 Question Bank
No ratings yet
Bda Ut1 Question Bank
19 pages
NoSQL Database Models Explained
No ratings yet
NoSQL Database Models Explained
18 pages
Big Data
No ratings yet
Big Data
3 pages
NoSQL Databases Overview
No ratings yet
NoSQL Databases Overview
149 pages
Big Data Analysis Unit 1-5 Extended
No ratings yet
Big Data Analysis Unit 1-5 Extended
35 pages
R.M.K. College of Engineering and Technology: Date: 15.02.2024
No ratings yet
R.M.K. College of Engineering and Technology: Date: 15.02.2024
2 pages
Surge Protection in Low-Voltage Switchgear Assemblies: White Paper
No ratings yet
Surge Protection in Low-Voltage Switchgear Assemblies: White Paper
11 pages
SAT Suite Question Bank - Algebra
No ratings yet
SAT Suite Question Bank - Algebra
27 pages
Hybrid Stepper Motor Resonance Analysis
No ratings yet
Hybrid Stepper Motor Resonance Analysis
15 pages
Since
No ratings yet
Since
12 pages
RP 02204001 e
No ratings yet
RP 02204001 e
20 pages
Worksheets - Unit 1 Answer Key
100% (1)
Worksheets - Unit 1 Answer Key
14 pages
Neurath - Pseudorationalism
No ratings yet
Neurath - Pseudorationalism
11 pages
Limiting Reactant Worksheet
No ratings yet
Limiting Reactant Worksheet
4 pages
Complete Syllogism Guide by Smriti Sethi
No ratings yet
Complete Syllogism Guide by Smriti Sethi
57 pages
Catálogo Técnico Recuperador VRF
No ratings yet
Catálogo Técnico Recuperador VRF
20 pages
UNIT-2: 1. What Is Retarded Potential? Explain Different Approaches To Solve Radiation Problems? Ans
No ratings yet
UNIT-2: 1. What Is Retarded Potential? Explain Different Approaches To Solve Radiation Problems? Ans
12 pages
d10 en
No ratings yet
d10 en
18 pages
GATE 2001 Instrumentation Solved Paper
No ratings yet
GATE 2001 Instrumentation Solved Paper
15 pages
Starting Out With C++ From Control Structures Through Objects 7th Edition Tony Gaddis Test Bank
No ratings yet
Starting Out With C++ From Control Structures Through Objects 7th Edition Tony Gaddis Test Bank
32 pages
Projectiles: Mujungu Herbert
No ratings yet
Projectiles: Mujungu Herbert
22 pages
Drag of Nose Cones
No ratings yet
Drag of Nose Cones
13 pages
Crimping Standards Guide
No ratings yet
Crimping Standards Guide
34 pages
Flux Rite
No ratings yet
Flux Rite
2 pages
Basic Civil Questions
No ratings yet
Basic Civil Questions
5 pages
Certified Reliability Engineer
No ratings yet
Certified Reliability Engineer
9 pages
Trabajo Final Ingles Tecnico
100% (1)
Trabajo Final Ingles Tecnico
73 pages
Electric Potential Homework Solutions
100% (1)
Electric Potential Homework Solutions
6 pages
MDT Sample Handling Guide
No ratings yet
MDT Sample Handling Guide
6 pages
Unit-5 Air and Noise Pollution
No ratings yet
Unit-5 Air and Noise Pollution
11 pages
Synthetic Detergent and Linear Alkyl Benzene
100% (1)
Synthetic Detergent and Linear Alkyl Benzene
37 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
PH Scale: Rules of PH Value
No ratings yet
PH Scale: Rules of PH Value
6 pages
20230302114652PPT
No ratings yet
20230302114652PPT
38 pages