Comprehensive Question Bank –
Cassandra and Data Concepts
Section A: Multiple Choice Questions (1 mark each)
1. Which of the following best defines Data Analytics?
a) Storing data only
b) Creating databases
c) Analyzing raw data to find trends and patterns
d) Designing websites
Answer: c) Analyzing raw data to find trends and patterns
2. What is the main difference between data and information?
a) No difference
b) Data is processed; information is raw
c) Data is raw; information is processed
d) Both are outputs
Answer: c) Data is raw; information is processed
3. Which organization originally developed Apache Cassandra?
a) Google
b) Facebook
c) Microsoft
d) Amazon
Answer: b) Facebook
4. What does RDBMS stand for?
a) Random Database Management System
1
b) Relational Data Mapping System
c) Relational Database Management System
d) Remote Database Management Setup
Answer: c) Relational Database Management System
5. What is a key limitation of RDBMS in handling modern data workloads?
a) Too scalable
b) Cannot handle unstructured data
c) No security
d) No schema
Answer: b) Cannot handle unstructured data
6. Which query language is used in Cassandra?
a) SQL
b) MongoQL
c) CQL
d) NoSQL
Answer: c) CQL
7. Which ACID property ensures that a transaction is all-or-nothing?
a) Atomicity
b) Consistency
c) Isolation
d) Durability
Answer: a) Atomicity
8. Which component of CAP theorem does Cassandra prioritize?
a) Consistency
b) Availability and Partition Tolerance
2
c) Performance
d) Security
Answer: b) Availability and Partition Tolerance
9. Which data model does Cassandra follow?
a) Tabular model
b) Hierarchical model
c) Key-value model
d) Wide-column model
Answer: d) Wide-column model
10. Which of the following is a valid data type in Cassandra?
a) String
b) List
c) Object
d) JSON
Answer: b) List
11. What does the 'Durability' in ACID mean in Cassandra context?
a) Data is flexible
b) Data remains even after crashes
c) Data can be easily edited
d) Data is temporary
Answer: b) Data remains even after crashes
12. In CAP theorem, what does 'P' stand for?
a) Partition Tolerance
b) Performance
c) Processing Power
3
d) Persistence
Answer: a) Partition Tolerance
13. Which is NOT a valid CQL data type?
a) text
b) float
c) boolean
d) image
Answer: d) image
14. Why was Cassandra originally developed?
a) To handle banking data
b) To manage social network inboxes
c) To store spreadsheets
d) For educational purposes
Answer: b) To manage social network inboxes
15. What does CQL stand for?
a) Cassandra Question Language
b) Column Query Language
c) Cassandra Query Language
d) Core Query Language
Answer: c) Cassandra Query Language
Section B: Short Answer Questions (10 marks each)
1. Define Data and Information. Explain how they differ with examples.
2. What is RDBMS? List its limitations in the context of big data applications.
3. Explain the ACID properties with respect to Cassandra's behavior.
4
4. Describe the basic structure of Cassandra’s data model.
5. Write a CQL query to create a table with columns for ID, Name, and Email. Explain each
part of the query.
Section C: Long Answer Questions (15 marks each)
1. Discuss in detail the evolution and history of Apache Cassandra. Why was it developed
and how has it grown?
2. Compare Cassandra with traditional RDBMS in terms of scalability, performance, and
data structure.
3. Explain CAP theorem with examples. How does Cassandra address the trade-offs?
4. Design a data model in Cassandra for a library system. Include keyspaces, tables, and
relationships.
5. Discuss various data types supported in Cassandra and give examples of use cases for
each.
5
Question Bank – Data Modeling and
Cassandra Design
Section A: Multiple Choice Questions (1 mark each)
1. What is a data model primarily used for?
a) Encrypting data
b) Designing software UI
c) Organizing and structuring data
d) Compressing files
Answer: c) Organizing and structuring data
2. Which of the following best describes a logical data model?
a) Physical implementation
b) High-level structure of entities and relationships
c) Binary format of data
d) Operating system mapping
Answer: b) High-level structure of entities and relationships
3. Which of the following is true about a physical data model?
a) Ignores hardware structure
b) Is abstract
c) Includes storage and indexing details
d) Focuses on business goals
Answer: c) Includes storage and indexing details
4. In Cassandra, tables are designed based on:
6
a) ER diagrams
b) Normalization rules
c) Application queries
d) Stored procedures
Answer: c) Application queries
5. Which model is independent of DBMS implementation?
a) Physical model
b) Logical model
c) Relational model
d) Distributed model
Answer: b) Logical model
6. Which of the following is a key difference between RDBMS and Cassandra?
a) Cassandra supports normalization
b) RDBMS is schema-less
c) Cassandra uses denormalized tables
d) RDBMS does not use SQL
Answer: c) Cassandra uses denormalized tables
7. What is the main focus during logical data modeling?
a) Data partitioning
b) Query performance
c) Entity relationships and attributes
d) Table file formats
Answer: c) Entity relationships and attributes
8. Which stage comes after defining the initial data model?
a) Data backup
7
b) Application coding
c) Evaluating and refining data model
d) Data deletion
Answer: c) Evaluating and refining data model
9. Which of these is a feature of materialized views in Cassandra?
a) Real-time analytics
b) Automatic view creation
c) Precomputed query results
d) File compression
Answer: c) Precomputed query results
10. In Cassandra, denormalization is preferred because:
a) It saves disk space
b) It reduces query performance
c) It avoids joins
d) It simplifies index creation
Answer: c) It avoids joins
11. Which model would specify primary key and partition key information?
a) Logical model
b) Conceptual model
c) Physical model
d) Hierarchical model
Answer: c) Physical model
12. Which component is essential when defining application queries for Cassandra?
a) Table normalization
b) Query joins
8
c) Access patterns
d) SQL views
Answer: c) Access patterns
13. What is typically refined during the evaluation of a data model?
a) Entity definitions
b) Application UI
c) Encryption algorithm
d) Logging system
Answer: a) Entity definitions
14. Materialized views in Cassandra are:
a) Automatically indexed
b) Normalized tables
c) Designed to replace base tables
d) Write-intensive structures
Answer: a) Automatically indexed
15. What is NOT a part of logical data modeling?
a) Attributes
b) Data types
c) Indexing strategy
d) Relationships
Answer: c) Indexing strategy
Section B: Short Answer Questions (10 marks each)
1. Explain the differences between a logical and a physical data model with examples.
2. Describe how Cassandra designs differ from RDBMS in terms of data modeling and
querying.
9
3. What steps are involved in evaluating and refining a data model in Cassandra?
4. Discuss the role and use of materialized views in Cassandra.
5. How do application queries influence data model design in Cassandra?
Section C: Long Answer Questions (15 marks each)
1. Design and explain a complete data model for an e-commerce system using Cassandra.
Include keyspaces, tables, and primary keys.
2. Compare and contrast RDBMS and Cassandra design philosophies. Include at least five
key differences.
3. Explain the entire process of building, evaluating, and refining a data model in Cassandra
from start to deployment.
4. Describe the limitations of using materialized views and best practices for their use in
real-world applications.
5. Discuss the importance of defining application access patterns before creating a physical
data model in Cassandra.
10
Cassandra Configuration – Question
Bank
Section A: Multiple Choice Questions (1 mark each)
⦁ Which file is the main configuration file in Cassandra?
a) cassandra-env.sh
b) cassandra.yaml
c) logback.xml
d) jvm.options
Answer: b) cassandra.yaml
⦁ Where is the configuration folder typically located in a Docker installation of
Cassandra?
a) /usr/local/cassandra
b) /opt/cassandra/conf
c) /etc/cassandra
d) /var/lib/cassandra
Answer: c) /etc/cassandra
3. What is the purpose of the `seeds` property in cassandra.yaml?
a) Backup configuration
b) Defines the node IP
c) Node discovery during startup
d) Logging level setup
Answer: c) Node discovery during startup
4. Which port does Cassandra use for native client communication by default?
a) 7000
b) 22
c) 8080
d) 9042
11
Answer: d) 9042
5. Which file is used to set JVM options in Cassandra?
a) cassandra.yaml
b) logback.xml
c) cassandra-env.sh
d) metrics-reporter-config.yaml
Answer: c) cassandra-env.sh
6. What should you avoid when setting `listen_address`?
a) Setting seed nodes
b) Setting both listen_address and listen_interface
c) Using IPv6
d) Enabling garbage collection
Answer: b) Setting both `listen_address` and `listen_interface`
7. Which directory is used to store Cassandra’s commit logs?
a) saved_caches_directory
b) data_file_directories
c) hints_directory
d) commitlog_directory
Answer: d) commitlog_directory
8. How can you change logging levels at runtime in Cassandra?
a) By editing cassandra.yaml
b) Using nodetool setlogginglevel
c) Restarting the server
d) Modifying JVM_OPTS
Answer: b) Using nodetool setlogginglevel
9. What is stored in the data_file_directories?
a) Log files
b) SSTables
c) Cache data
d) Metrics
Answer: b) SSTables
12
10. Which file controls logging configurations in Cassandra?
a) cassandra-env.sh
b) logback.xml
c) cassandra.yaml
d) cassandra-topology.properties
Answer: b) logback.xml
11. What is the recommended setup for data and commitlog storage?
a) Same disk for performance
b) External cloud storage
c) Place on separate disks
d) Use RAM disks
Answer: c) Place on separate disks
12. Which file contains configuration for archiving commit logs?
a) cassandra.yaml
b) commitlog_archiving.properties
c) cassandra-env.sh
d) metrics-reporter-config-sample.yaml
Answer: b) commitlog_archiving.properties
13. Which port is used for inter-node communication by default?
a) 7000
b) 22
c) 9042
d) 9092
Answer: a) 7000
14. Which file lets you define data center and rack information?
a) cassandra-env.sh
b) cassandra-topology.properties
c) jvm.options
d) cassandra.yaml
Answer: b) cassandra-topology.properties
15. What log level is used in system.log by default?
13
a) DEBUG
b) WARN
c) INFO
d) ERROR
Answer: c) INFO
Section B: Short Answer Questions (10 marks each)
1. Explain the role of the `cassandra.yaml` file in Cassandra configuration. Discuss at least
five critical properties.
2. Describe the importance of proper directory configuration in Cassandra, including the
purpose of `data_file_directories`, `commitlog_directory`, and `hints_directory`.
3. Discuss the use and significance of `cassandra-env.sh` in JVM tuning for Cassandra
performance optimization.
4. How is logging managed in Cassandra? Include the default behavior and runtime log level
changes.
5. Compare and contrast the differences in configuration file locations based on installation
methods (Docker, package install, tarball).
Section C: Long Answer Questions (15 marks each)
1. Design a configuration strategy for deploying Cassandra in a production multi-node
cluster. Include details such as cluster name, seed nodes, IP settings, directory separation,
and memory tuning.
2. Elaborate on the step-by-step process to modify Cassandra to handle large-scale
workloads, highlighting the importance of each configuration file involved.
3. Analyze the implications of incorrect configuration in cassandra.yaml. Provide examples
of misconfigurations and their effects on cluster behavior.
4. Discuss the logging architecture in Cassandra. How would you modify the logging level
and output location in a live environment without restarting the system?
5. Propose a backup and recovery plan using commit log archiving in Cassandra. Explain the
configuration, process, and benefits.
14
Cassandra – Maintenance Question
Bank
Section A: Multiple Choice Questions (1 mark each)
1. Which command provides a summary of the Cassandra cluster status?
a) nodetool info
b) nodetool tpstats
c) nodetool status
d) nodetool restart
Answer: c) nodetool status
2. What does 'UN' indicate in the nodetool status output?
a) Unknown Node
b) Unreachable Node
c) Up and Normal
d) Updated Node
Answer: c) Up and Normal
3. Which nodetool command provides memory usage information?
a) nodetool tpstats
b) nodetool flush
c) nodetool status
d) nodetool info
Answer: d) nodetool info
4. What is displayed under the 'Load' column in nodetool status?
15
a) CPU load
b) Memory usage
c) Data the node is handling
d) Disk temperature
Answer: c) Data the node is handling
5. Which of the following shows thread pool activity in Cassandra?
a) nodetool info
b) nodetool restart
c) nodetool tpstats
d) nodetool status
Answer: c) nodetool tpstats
6. Which nodetool command is best for identifying memory issues?
a) nodetool tpstats
b) nodetool info
c) nodetool version
d) nodetool drain
Answer: b) nodetool info
7. What kind of tasks are managed by thread pools in Cassandra?
a) Backups
b) Reads, writes, repairs
c) Configuration updates
d) Token assignment
Answer: b) Reads, writes, repairs
8. What does the 'State' column in nodetool status indicate?
a) Disk status
16
b) Operating system state
c) Node operation state
d) Replication factor
Answer: c) Node operation state
9. What does 'Pending tasks' in nodetool tpstats refer to?
a) Tasks that failed
b) Tasks waiting for processing
c) Tasks completed
d) Disk I/O operations
Answer: b) Tasks waiting for processing
10. Which command shows gossip activity status?
a) nodetool status
b) nodetool info
c) nodetool compact
d) nodetool cleanup
Answer: b) nodetool info
11. Which of these is NOT shown by nodetool info?
a) Uptime
b) Cache info
c) Active threads
d) Disk load
Answer: c) Active threads
12. What component is essential for Cassandra node communication?
a) Heap space
b) Gossip
17
c) Memtable
d) SSTables
Answer: b) Gossip
13. What command would you use to analyze performance bottlenecks in thread
management?
a) nodetool info
b) nodetool status
c) nodetool tpstats
d) nodetool listnodes
Answer: c) nodetool tpstats
14. What is represented by 'Tokens' in nodetool status?
a) Security credentials
b) Memory partitions
c) Data partitioning units
d) Repair checkpoints
Answer: c) Data partitioning units
15. What does nodetool info show about heap memory?
a) Compression ratio
b) Used and total heap memory
c) Node rack
d) SSTable count
Answer: b) Used and total heap memory
Section B: Short Answer Questions (10 marks each)
1. Explain the output and significance of the 'nodetool status' command in Cassandra.
2. Describe how the 'nodetool info' command helps administrators monitor Cassandra
18
nodes.
3. List and explain the different thread pool metrics available through 'nodetool tpstats'.
4. Compare 'nodetool status' and 'nodetool info' in terms of the information they provide.
5. Why is monitoring thread pool activity important in Cassandra? How does it affect
performance?
Section C: Long Answer Questions (15 marks each)
1. Demonstrate the usage of nodetool for Cassandra cluster maintenance. Include scenarios
for each major command discussed.
2. Analyze a scenario where nodetool tpstats reveals thread pool congestion. What
corrective actions would you recommend?
3. Explain the role of Gossip in Cassandra. How can nodetool info help verify its status and
issues?
4. Create a health monitoring plan using nodetool for a 5-node Cassandra cluster. Include
the frequency and metrics to check.
5. Describe how you would use nodetool commands during a high-latency issue to identify
and fix the root cause.
19
Cassandra – Reading and Writing Data
Question Bank
Section A: Multiple Choice Questions (1 mark each)
1. What happens when data is written to Cassandra?
a) Only stored in memory
b) Written to disk only
c) Logged in commit log and stored in memtable
d) Stored in SSTables immediately
Answer: c) Logged in commit log and stored in memtable
2. What is the purpose of the commit log in Cassandra?
a) Temporary storage
b) Caching data
c) Ensuring durability
d) Reading data
Answer: c) Ensuring durability
3. What is a memtable?
a) A file on disk
b) A cache
c) A memory-based data structure for temporary writes
d) A type of compression
Answer: c) A memory-based data structure for temporary writes
4. What triggers a memtable flush?
20
a) Disk full
b) Memory limit reached
c) Cache timeout
d) Bloom filter activation
Answer: b) Memory limit reached
5. Where is flushed memtable data written?
a) Memlog
b) Commitlog
c) SSTables
d) Row cache
Answer: c) SSTables
6. Which file holds the actual row data in an SSTable?
a) Index.db
b) Data.db
c) Filter.db
d) Statistics.db
Answer: b) Data.db
7. What does Bloom Filter do during read operations?
a) Compress data
b) Locate partition key
c) Eliminate irrelevant SSTables
d) Store entire rows
Answer: c) Eliminate irrelevant SSTables
8. Which component provides the exact disk location of data during read?
a) Row Cache
21
b) Bloom Filter
c) Partition Index
d) Memtable
Answer: c) Partition Index
9. What is the purpose of Row Cache in Cassandra?
a) Compress data
b) Store metadata
c) Store hot rows in memory
d) Prevent flush
Answer: c) Store hot rows in memory
10. Which tool can be used to manually flush memtables?
a) cassandratool
b) nodetool flush
c) cassandradump
d) nodetool restart
Answer: b) nodetool flush
11. What happens after memtable is flushed to SSTables?
a) SSTables are deleted
b) Commit log is archived
c) Commit log entries are cleaned up
d) Memtable becomes SSTable
Answer: c) Commit log entries are cleaned up
12. What is stored in the Filter.db file?
a) Compression info
b) Bloom Filter
22
c) Index info
d) Summary
Answer: b) Bloom Filter
13. What is the purpose of CompressionInfo.db?
a) Check for errors
b) Store bloom filters
c) Help read compressed data
d) Manage SSTable names
Answer: c) Help read compressed data
14. What does the partition summary provide?
a) Exact row data
b) Memory compression
c) Key range hints for index
d) Complete disk map
Answer: c) Key range hints for index
15. Which file lists all components in an SSTable?
a) TOC.txt
b) Summary.db
c) Digest.db
d) CRC.db
Answer: a) TOC.txt
Section B: Short Answer Questions (10 marks each)
1. Explain the role and purpose of the commit log and memtable in Cassandra's write path.
2. Describe the flushing process in Cassandra. When does it happen, and what are the
consequences?
23
3. List and describe the main components of an SSTable and their purposes.
4. Discuss the sequence of steps Cassandra follows when reading data, starting from
memory and ending at disk.
5. What is the importance of Bloom Filters and Row Cache in optimizing read performance
in Cassandra?
Section C: Long Answer Questions (15 marks each)
1. Draw and explain the complete write path in Cassandra, including commit log, memtable,
and flushing to SSTables.
2. Illustrate and discuss how Cassandra reads data with optimizations like Bloom Filters,
Row Cache, and Compression Maps.
3. Analyze the role of SSTables in Cassandra. What makes them efficient for reads and how
are they structured?
4. Explain Cassandra's directory structure for storing SSTables and how it supports
performance tuning.
5. Design a scenario that causes a flush in Cassandra and describe each system component's
behavior in that process.
24