0% found this document useful (0 votes)
14 views24 pages

Cassandra

The document is a comprehensive question bank covering various aspects of Cassandra and data concepts, including multiple choice, short answer, and long answer questions. It addresses topics such as data analytics, RDBMS limitations, ACID properties, data modeling, configuration, and maintenance of Cassandra. Each section is designed to test knowledge and understanding of Cassandra's functionalities and best practices.

Uploaded by

Mohammed Faiz K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views24 pages

Cassandra

The document is a comprehensive question bank covering various aspects of Cassandra and data concepts, including multiple choice, short answer, and long answer questions. It addresses topics such as data analytics, RDBMS limitations, ACID properties, data modeling, configuration, and maintenance of Cassandra. Each section is designed to test knowledge and understanding of Cassandra's functionalities and best practices.

Uploaded by

Mohammed Faiz K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Comprehensive Question Bank –

Cassandra and Data Concepts

Section A: Multiple Choice Questions (1 mark each)


1. Which of the following best defines Data Analytics?

a) Storing data only

b) Creating databases

c) Analyzing raw data to find trends and patterns

d) Designing websites

Answer: c) Analyzing raw data to find trends and patterns

2. What is the main difference between data and information?

a) No difference

b) Data is processed; information is raw

c) Data is raw; information is processed

d) Both are outputs

Answer: c) Data is raw; information is processed

3. Which organization originally developed Apache Cassandra?

a) Google

b) Facebook

c) Microsoft

d) Amazon

Answer: b) Facebook

4. What does RDBMS stand for?

a) Random Database Management System

1
b) Relational Data Mapping System

c) Relational Database Management System

d) Remote Database Management Setup

Answer: c) Relational Database Management System

5. What is a key limitation of RDBMS in handling modern data workloads?

a) Too scalable

b) Cannot handle unstructured data

c) No security

d) No schema

Answer: b) Cannot handle unstructured data

6. Which query language is used in Cassandra?

a) SQL

b) MongoQL

c) CQL

d) NoSQL

Answer: c) CQL

7. Which ACID property ensures that a transaction is all-or-nothing?

a) Atomicity

b) Consistency

c) Isolation

d) Durability

Answer: a) Atomicity

8. Which component of CAP theorem does Cassandra prioritize?

a) Consistency

b) Availability and Partition Tolerance

2
c) Performance

d) Security

Answer: b) Availability and Partition Tolerance

9. Which data model does Cassandra follow?

a) Tabular model

b) Hierarchical model

c) Key-value model

d) Wide-column model

Answer: d) Wide-column model

10. Which of the following is a valid data type in Cassandra?

a) String

b) List

c) Object

d) JSON

Answer: b) List

11. What does the 'Durability' in ACID mean in Cassandra context?

a) Data is flexible

b) Data remains even after crashes

c) Data can be easily edited

d) Data is temporary

Answer: b) Data remains even after crashes

12. In CAP theorem, what does 'P' stand for?

a) Partition Tolerance

b) Performance

c) Processing Power

3
d) Persistence

Answer: a) Partition Tolerance

13. Which is NOT a valid CQL data type?

a) text

b) float

c) boolean

d) image

Answer: d) image

14. Why was Cassandra originally developed?

a) To handle banking data

b) To manage social network inboxes

c) To store spreadsheets

d) For educational purposes

Answer: b) To manage social network inboxes

15. What does CQL stand for?

a) Cassandra Question Language

b) Column Query Language

c) Cassandra Query Language

d) Core Query Language

Answer: c) Cassandra Query Language

Section B: Short Answer Questions (10 marks each)


1. Define Data and Information. Explain how they differ with examples.

2. What is RDBMS? List its limitations in the context of big data applications.

3. Explain the ACID properties with respect to Cassandra's behavior.

4
4. Describe the basic structure of Cassandra’s data model.

5. Write a CQL query to create a table with columns for ID, Name, and Email. Explain each
part of the query.

Section C: Long Answer Questions (15 marks each)


1. Discuss in detail the evolution and history of Apache Cassandra. Why was it developed
and how has it grown?

2. Compare Cassandra with traditional RDBMS in terms of scalability, performance, and


data structure.

3. Explain CAP theorem with examples. How does Cassandra address the trade-offs?

4. Design a data model in Cassandra for a library system. Include keyspaces, tables, and
relationships.

5. Discuss various data types supported in Cassandra and give examples of use cases for
each.

5
Question Bank – Data Modeling and
Cassandra Design

Section A: Multiple Choice Questions (1 mark each)


1. What is a data model primarily used for?

a) Encrypting data

b) Designing software UI

c) Organizing and structuring data

d) Compressing files

Answer: c) Organizing and structuring data

2. Which of the following best describes a logical data model?

a) Physical implementation

b) High-level structure of entities and relationships

c) Binary format of data

d) Operating system mapping

Answer: b) High-level structure of entities and relationships

3. Which of the following is true about a physical data model?

a) Ignores hardware structure

b) Is abstract

c) Includes storage and indexing details

d) Focuses on business goals

Answer: c) Includes storage and indexing details

4. In Cassandra, tables are designed based on:

6
a) ER diagrams

b) Normalization rules

c) Application queries

d) Stored procedures

Answer: c) Application queries

5. Which model is independent of DBMS implementation?

a) Physical model

b) Logical model

c) Relational model

d) Distributed model

Answer: b) Logical model

6. Which of the following is a key difference between RDBMS and Cassandra?

a) Cassandra supports normalization

b) RDBMS is schema-less

c) Cassandra uses denormalized tables

d) RDBMS does not use SQL

Answer: c) Cassandra uses denormalized tables

7. What is the main focus during logical data modeling?

a) Data partitioning

b) Query performance

c) Entity relationships and attributes

d) Table file formats

Answer: c) Entity relationships and attributes

8. Which stage comes after defining the initial data model?

a) Data backup

7
b) Application coding

c) Evaluating and refining data model

d) Data deletion

Answer: c) Evaluating and refining data model

9. Which of these is a feature of materialized views in Cassandra?

a) Real-time analytics

b) Automatic view creation

c) Precomputed query results

d) File compression

Answer: c) Precomputed query results

10. In Cassandra, denormalization is preferred because:

a) It saves disk space

b) It reduces query performance

c) It avoids joins

d) It simplifies index creation

Answer: c) It avoids joins

11. Which model would specify primary key and partition key information?

a) Logical model

b) Conceptual model

c) Physical model

d) Hierarchical model

Answer: c) Physical model

12. Which component is essential when defining application queries for Cassandra?

a) Table normalization

b) Query joins

8
c) Access patterns

d) SQL views

Answer: c) Access patterns

13. What is typically refined during the evaluation of a data model?

a) Entity definitions

b) Application UI

c) Encryption algorithm

d) Logging system

Answer: a) Entity definitions

14. Materialized views in Cassandra are:

a) Automatically indexed

b) Normalized tables

c) Designed to replace base tables

d) Write-intensive structures

Answer: a) Automatically indexed

15. What is NOT a part of logical data modeling?

a) Attributes

b) Data types

c) Indexing strategy

d) Relationships

Answer: c) Indexing strategy

Section B: Short Answer Questions (10 marks each)


1. Explain the differences between a logical and a physical data model with examples.

2. Describe how Cassandra designs differ from RDBMS in terms of data modeling and
querying.

9
3. What steps are involved in evaluating and refining a data model in Cassandra?

4. Discuss the role and use of materialized views in Cassandra.

5. How do application queries influence data model design in Cassandra?

Section C: Long Answer Questions (15 marks each)


1. Design and explain a complete data model for an e-commerce system using Cassandra.
Include keyspaces, tables, and primary keys.

2. Compare and contrast RDBMS and Cassandra design philosophies. Include at least five
key differences.

3. Explain the entire process of building, evaluating, and refining a data model in Cassandra
from start to deployment.

4. Describe the limitations of using materialized views and best practices for their use in
real-world applications.

5. Discuss the importance of defining application access patterns before creating a physical
data model in Cassandra.

10
Cassandra Configuration – Question
Bank

Section A: Multiple Choice Questions (1 mark each)


⦁ Which file is the main configuration file in Cassandra?

a) cassandra-env.sh
b) cassandra.yaml
c) logback.xml
d) jvm.options

Answer: b) cassandra.yaml

⦁ Where is the configuration folder typically located in a Docker installation of


Cassandra?

a) /usr/local/cassandra
b) /opt/cassandra/conf
c) /etc/cassandra
d) /var/lib/cassandra

Answer: c) /etc/cassandra

3. What is the purpose of the `seeds` property in cassandra.yaml?

a) Backup configuration
b) Defines the node IP
c) Node discovery during startup
d) Logging level setup

Answer: c) Node discovery during startup

4. Which port does Cassandra use for native client communication by default?

a) 7000
b) 22
c) 8080
d) 9042

11
Answer: d) 9042

5. Which file is used to set JVM options in Cassandra?

a) cassandra.yaml
b) logback.xml
c) cassandra-env.sh
d) metrics-reporter-config.yaml

Answer: c) cassandra-env.sh

6. What should you avoid when setting `listen_address`?

a) Setting seed nodes


b) Setting both listen_address and listen_interface
c) Using IPv6
d) Enabling garbage collection

Answer: b) Setting both `listen_address` and `listen_interface`

7. Which directory is used to store Cassandra’s commit logs?

a) saved_caches_directory
b) data_file_directories
c) hints_directory
d) commitlog_directory

Answer: d) commitlog_directory

8. How can you change logging levels at runtime in Cassandra?

a) By editing cassandra.yaml
b) Using nodetool setlogginglevel
c) Restarting the server
d) Modifying JVM_OPTS

Answer: b) Using nodetool setlogginglevel

9. What is stored in the data_file_directories?

a) Log files
b) SSTables
c) Cache data
d) Metrics

Answer: b) SSTables

12
10. Which file controls logging configurations in Cassandra?

a) cassandra-env.sh
b) logback.xml
c) cassandra.yaml
d) cassandra-topology.properties

Answer: b) logback.xml

11. What is the recommended setup for data and commitlog storage?

a) Same disk for performance


b) External cloud storage
c) Place on separate disks
d) Use RAM disks

Answer: c) Place on separate disks

12. Which file contains configuration for archiving commit logs?

a) cassandra.yaml
b) commitlog_archiving.properties
c) cassandra-env.sh
d) metrics-reporter-config-sample.yaml

Answer: b) commitlog_archiving.properties

13. Which port is used for inter-node communication by default?

a) 7000
b) 22
c) 9042
d) 9092

Answer: a) 7000

14. Which file lets you define data center and rack information?

a) cassandra-env.sh
b) cassandra-topology.properties
c) jvm.options
d) cassandra.yaml

Answer: b) cassandra-topology.properties

15. What log level is used in system.log by default?

13
a) DEBUG
b) WARN
c) INFO
d) ERROR

Answer: c) INFO

Section B: Short Answer Questions (10 marks each)


1. Explain the role of the `cassandra.yaml` file in Cassandra configuration. Discuss at least
five critical properties.

2. Describe the importance of proper directory configuration in Cassandra, including the


purpose of `data_file_directories`, `commitlog_directory`, and `hints_directory`.

3. Discuss the use and significance of `cassandra-env.sh` in JVM tuning for Cassandra
performance optimization.

4. How is logging managed in Cassandra? Include the default behavior and runtime log level
changes.

5. Compare and contrast the differences in configuration file locations based on installation
methods (Docker, package install, tarball).

Section C: Long Answer Questions (15 marks each)


1. Design a configuration strategy for deploying Cassandra in a production multi-node
cluster. Include details such as cluster name, seed nodes, IP settings, directory separation,
and memory tuning.

2. Elaborate on the step-by-step process to modify Cassandra to handle large-scale


workloads, highlighting the importance of each configuration file involved.

3. Analyze the implications of incorrect configuration in cassandra.yaml. Provide examples


of misconfigurations and their effects on cluster behavior.

4. Discuss the logging architecture in Cassandra. How would you modify the logging level
and output location in a live environment without restarting the system?

5. Propose a backup and recovery plan using commit log archiving in Cassandra. Explain the
configuration, process, and benefits.

14
Cassandra – Maintenance Question
Bank

Section A: Multiple Choice Questions (1 mark each)


1. Which command provides a summary of the Cassandra cluster status?

a) nodetool info

b) nodetool tpstats

c) nodetool status

d) nodetool restart

Answer: c) nodetool status

2. What does 'UN' indicate in the nodetool status output?

a) Unknown Node

b) Unreachable Node

c) Up and Normal

d) Updated Node

Answer: c) Up and Normal

3. Which nodetool command provides memory usage information?

a) nodetool tpstats

b) nodetool flush

c) nodetool status

d) nodetool info

Answer: d) nodetool info

4. What is displayed under the 'Load' column in nodetool status?

15
a) CPU load

b) Memory usage

c) Data the node is handling

d) Disk temperature

Answer: c) Data the node is handling

5. Which of the following shows thread pool activity in Cassandra?

a) nodetool info

b) nodetool restart

c) nodetool tpstats

d) nodetool status

Answer: c) nodetool tpstats

6. Which nodetool command is best for identifying memory issues?

a) nodetool tpstats

b) nodetool info

c) nodetool version

d) nodetool drain

Answer: b) nodetool info

7. What kind of tasks are managed by thread pools in Cassandra?

a) Backups

b) Reads, writes, repairs

c) Configuration updates

d) Token assignment

Answer: b) Reads, writes, repairs

8. What does the 'State' column in nodetool status indicate?

a) Disk status

16
b) Operating system state

c) Node operation state

d) Replication factor

Answer: c) Node operation state

9. What does 'Pending tasks' in nodetool tpstats refer to?

a) Tasks that failed

b) Tasks waiting for processing

c) Tasks completed

d) Disk I/O operations

Answer: b) Tasks waiting for processing

10. Which command shows gossip activity status?

a) nodetool status

b) nodetool info

c) nodetool compact

d) nodetool cleanup

Answer: b) nodetool info

11. Which of these is NOT shown by nodetool info?

a) Uptime

b) Cache info

c) Active threads

d) Disk load

Answer: c) Active threads

12. What component is essential for Cassandra node communication?

a) Heap space

b) Gossip

17
c) Memtable

d) SSTables

Answer: b) Gossip

13. What command would you use to analyze performance bottlenecks in thread
management?

a) nodetool info

b) nodetool status

c) nodetool tpstats

d) nodetool listnodes

Answer: c) nodetool tpstats

14. What is represented by 'Tokens' in nodetool status?

a) Security credentials

b) Memory partitions

c) Data partitioning units

d) Repair checkpoints

Answer: c) Data partitioning units

15. What does nodetool info show about heap memory?

a) Compression ratio

b) Used and total heap memory

c) Node rack

d) SSTable count

Answer: b) Used and total heap memory

Section B: Short Answer Questions (10 marks each)


1. Explain the output and significance of the 'nodetool status' command in Cassandra.

2. Describe how the 'nodetool info' command helps administrators monitor Cassandra

18
nodes.

3. List and explain the different thread pool metrics available through 'nodetool tpstats'.

4. Compare 'nodetool status' and 'nodetool info' in terms of the information they provide.

5. Why is monitoring thread pool activity important in Cassandra? How does it affect
performance?

Section C: Long Answer Questions (15 marks each)


1. Demonstrate the usage of nodetool for Cassandra cluster maintenance. Include scenarios
for each major command discussed.

2. Analyze a scenario where nodetool tpstats reveals thread pool congestion. What
corrective actions would you recommend?

3. Explain the role of Gossip in Cassandra. How can nodetool info help verify its status and
issues?

4. Create a health monitoring plan using nodetool for a 5-node Cassandra cluster. Include
the frequency and metrics to check.

5. Describe how you would use nodetool commands during a high-latency issue to identify
and fix the root cause.

19
Cassandra – Reading and Writing Data
Question Bank

Section A: Multiple Choice Questions (1 mark each)


1. What happens when data is written to Cassandra?

a) Only stored in memory

b) Written to disk only

c) Logged in commit log and stored in memtable

d) Stored in SSTables immediately

Answer: c) Logged in commit log and stored in memtable

2. What is the purpose of the commit log in Cassandra?

a) Temporary storage

b) Caching data

c) Ensuring durability

d) Reading data

Answer: c) Ensuring durability

3. What is a memtable?

a) A file on disk

b) A cache

c) A memory-based data structure for temporary writes

d) A type of compression

Answer: c) A memory-based data structure for temporary writes

4. What triggers a memtable flush?

20
a) Disk full

b) Memory limit reached

c) Cache timeout

d) Bloom filter activation

Answer: b) Memory limit reached

5. Where is flushed memtable data written?

a) Memlog

b) Commitlog

c) SSTables

d) Row cache

Answer: c) SSTables

6. Which file holds the actual row data in an SSTable?

a) Index.db

b) Data.db

c) Filter.db

d) Statistics.db

Answer: b) Data.db

7. What does Bloom Filter do during read operations?

a) Compress data

b) Locate partition key

c) Eliminate irrelevant SSTables

d) Store entire rows

Answer: c) Eliminate irrelevant SSTables

8. Which component provides the exact disk location of data during read?

a) Row Cache

21
b) Bloom Filter

c) Partition Index

d) Memtable

Answer: c) Partition Index

9. What is the purpose of Row Cache in Cassandra?

a) Compress data

b) Store metadata

c) Store hot rows in memory

d) Prevent flush

Answer: c) Store hot rows in memory

10. Which tool can be used to manually flush memtables?

a) cassandratool

b) nodetool flush

c) cassandradump

d) nodetool restart

Answer: b) nodetool flush

11. What happens after memtable is flushed to SSTables?

a) SSTables are deleted

b) Commit log is archived

c) Commit log entries are cleaned up

d) Memtable becomes SSTable

Answer: c) Commit log entries are cleaned up

12. What is stored in the Filter.db file?

a) Compression info

b) Bloom Filter

22
c) Index info

d) Summary

Answer: b) Bloom Filter

13. What is the purpose of CompressionInfo.db?

a) Check for errors

b) Store bloom filters

c) Help read compressed data

d) Manage SSTable names

Answer: c) Help read compressed data

14. What does the partition summary provide?

a) Exact row data

b) Memory compression

c) Key range hints for index

d) Complete disk map

Answer: c) Key range hints for index

15. Which file lists all components in an SSTable?

a) TOC.txt

b) Summary.db

c) Digest.db

d) CRC.db

Answer: a) TOC.txt

Section B: Short Answer Questions (10 marks each)


1. Explain the role and purpose of the commit log and memtable in Cassandra's write path.

2. Describe the flushing process in Cassandra. When does it happen, and what are the
consequences?

23
3. List and describe the main components of an SSTable and their purposes.

4. Discuss the sequence of steps Cassandra follows when reading data, starting from
memory and ending at disk.

5. What is the importance of Bloom Filters and Row Cache in optimizing read performance
in Cassandra?

Section C: Long Answer Questions (15 marks each)


1. Draw and explain the complete write path in Cassandra, including commit log, memtable,
and flushing to SSTables.

2. Illustrate and discuss how Cassandra reads data with optimizations like Bloom Filters,
Row Cache, and Compression Maps.

3. Analyze the role of SSTables in Cassandra. What makes them efficient for reads and how
are they structured?

4. Explain Cassandra's directory structure for storing SSTables and how it supports
performance tuning.

5. Design a scenario that causes a flush in Cassandra and describe each system component's
behavior in that process.

24

You might also like