PART A
1. What you mean by unstructured data?
2. How cloud technology impacts the big data?
3. Write some advantages of Cassandra.
4. How the SSTable is different from other relational tables?
5. Define how Map-Reduce computation is executed.
6. how the partitions are shuffled in map reduce.
7. Explain the goals of HDFS.
8. Distinguish between Hadoop and Big data.
9. Examine the need for Apache pig.
10. Generalize the difference between Pig and Hive.
11. Difference between structured and unstructured data.
12. What are the data collection metrics in web analytics?
13. What are the types of NOSQL databases?
14. What is data replication in Cassandra?
15. Differentiate between Hadoop and Map Reduce.
16. Explain the steps in map reduce algorithm.
17. How can a key value pair is formed?
18. What are the list of Hadoop applications ?
19. Generalize the difference between Pig and Hive.
20. Examine the differences between HBase and Hive.
PART B
1. How has the convergence of key trends, such as data growth and technological
advancements, shaped the big data landscape?
2. What are the advantages and potential drawbacks of using cloud computing platforms for big
data storage and processing?
3. In what ways does open-source technology foster innovation and collaboration in the
development of big data solutions?
4. How do inter-firewall and trans-firewall analytics contribute to network security and data
protection in an increasingly interconnected world?
5. What challenges and advantages come with managing data in a schemaless NoSQL database,
and how can organizations effectively deal with schema evolution?
6. What are the key architectural features of Cassandra that make it a preferred choice for
applications requiring high availability and fault tolerance, and what are its limitations?
7. In what situations would you choose a graph database over other NoSQL databases, and
what unique capabilities do graph databases offer for data analysis?
8. Can you provide a detailed comparison of the consistency models used in NoSQL databases,
including strong consistency, eventual consistency, and the trade-offs associated with each?
9. In the context of MapReduce, why is it essential to perform local tests with test data before
deploying a job to a production cluster, and how can developers simulate cluster-like
conditions locally?
10. What is the role of the shuffling and sorting phase in MapReduce, and how does efficient
data shuffling impact the overall performance of MapReduce jobs?
11. How does MRUnit facilitate the testing of MapReduce applications, and what are some best
practices for writing effective unit tests for MapReduce code?
12. Can you provide insights into the execution of MapReduce tasks, including how parallelism is
achieved, how tasks communicate, and how task-level failures are handled?
13. Write a short note on the Hadoop ecosystem and HDFS architecture.
14. How does HDFS ensure data integrity in a Hadoop cluster?
15. What is Meta data? What information does it provide and explain the role of Name node in a
HDFS clusters?
16. Define Command line interface using HDFS files and give a brief note on Hadoop-specific file
system types and HDFS commands.
17. Demonstrate about HBase and Hbase clients in detail.
18. Describe the difference between hive and map reduce. (7)
(ii) How is Hive used ? Describe in detail. (6)
19. Explain briefly on Hbase architecture with neat diagram
20. Predict about Pig data model in detail with neat diagram. (13) Understand BTL-2
PART C
Prepare Formulate a Hbase table from the following data
Data_file.txt contains the below data
1. 1,India,Bihar,Champaran,2009,April,P1,1,5
2. 2,India, Bihar,Patna,2009,May,P1,2,10
3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15
4. 4,United States,California,Fresno,2009,April,P2,2,5
5. 5,United States
2. How will you Order the use of Hive. How Does Hive Interact With Hadoop explain in detail?
3. Recommend a procedure to find the number of occurrence of a word in a document using Hive.