CQL 33
CQL 33
2 & later
Documentation
June 10, 2016
Contents
Using CQL...................................................................................................................... 10
Starting the CQL shell (cqlsh)...........................................................................................................10
Starting cqlsh on Linux and Mac OS X................................................................................. 10
Starting cqlsh on Windows.....................................................................................................11
Creating and updating a keyspace................................................................................................... 11
Example of creating a keyspace............................................................................................ 12
Updating the replication factor............................................................................................... 12
Creating a table.................................................................................................................................13
Creating a table...................................................................................................................... 13
Using the keyspace qualifier.................................................................................................. 15
Simple Primary Key................................................................................................................ 15
Composite Partition Key......................................................................................................... 16
Compound Primary Key......................................................................................................... 17
Creating a counter table.........................................................................................................19
Create table with COMPACT STORAGE...............................................................................20
Table schema collision fix...................................................................................................... 20
Creating a materialized view.............................................................................................................21
Creating advanced data types in tables........................................................................................... 23
Creating collections................................................................................................................ 23
Creating a table with a tuple.................................................................................................. 25
Creating a user-defined type (UDT).......................................................................................25
Creating functions..............................................................................................................................26
Creating user-defined function (UDF).................................................................................... 26
Creating User-Defined Aggregate Function (UDA)................................................................ 27
Inserting and updating data.............................................................................................................. 28
Inserting simple data into a table...........................................................................................28
Inserting and updating data into a set................................................................................... 28
Inserting and updating data into a list....................................................................................29
Inserting and updating data into a map................................................................................. 30
Inserting tuple data into a table............................................................................................. 31
Inserting or updating data into a user-defined type (UDT).....................................................32
Inserting JSON data into a table............................................................................................33
Using lightweight transactions................................................................................................ 34
Expiring data with Time-To-Live.............................................................................................34
Expiring data with TTL example.............................................................................................35
Inserting data using COPY and a CSV file............................................................................35
Batching data insertion and updates................................................................................................ 35
Using and misusing batches.................................................................................................. 35
Use of BATCH statement.......................................................................................................36
Misuse of BATCH statement..................................................................................................38
Using unlogged batches......................................................................................................... 39
Querying tables................................................................................................................................. 39
2
Contents
CQL reference................................................................................................................ 81
Introduction........................................................................................................................................ 81
CQL lexical structure......................................................................................................................... 81
Uppercase and lowercase...................................................................................................... 81
Escaping characters............................................................................................................... 82
Valid literals............................................................................................................................ 83
Exponential notation............................................................................................................... 84
CQL code comments..............................................................................................................84
CQL Keywords........................................................................................................................84
CQL data types................................................................................................................................. 88
Blob type.................................................................................................................................90
Collection type........................................................................................................................ 91
Counter type........................................................................................................................... 91
UUID and timeuuid types....................................................................................................... 92
uuid and Timeuuid functions.................................................................................................. 92
3
Contents
Timestamp type...................................................................................................................... 93
Tuple type............................................................................................................................... 94
User-defined type................................................................................................................... 95
CQL keyspace and table properties................................................................................................. 95
Table properties......................................................................................................................96
Compaction subproperties...................................................................................................... 99
Compression subproperties.................................................................................................. 102
Functions......................................................................................................................................... 103
CQL limits........................................................................................................................................ 103
cqlsh commands..............................................................................................................................103
cqlsh...................................................................................................................................... 104
The cqlshrc file..................................................................................................................... 106
CAPTURE............................................................................................................................. 110
CLEAR.................................................................................................................................. 110
CONSISTENCY.................................................................................................................... 111
COPY.................................................................................................................................... 112
DESCRIBE............................................................................................................................ 117
EXPAND............................................................................................................................... 119
EXIT...................................................................................................................................... 120
LOGIN................................................................................................................................... 120
PAGING................................................................................................................................ 121
SERIAL CONSISTENCY...................................................................................................... 121
SHOW................................................................................................................................... 122
SOURCE............................................................................................................................... 123
TRACING.............................................................................................................................. 124
CQL commands...............................................................................................................................128
ALTER KEYSPACE.............................................................................................................. 128
ALTER MATERIALIZED VIEW.............................................................................................129
ALTER ROLE....................................................................................................................... 130
ALTER TABLE......................................................................................................................131
ALTER TYPE........................................................................................................................135
ALTER USER....................................................................................................................... 136
BATCH.................................................................................................................................. 137
CREATE AGGREGATE........................................................................................................139
CREATE CUSTOM INDEX (SASI).......................................................................................140
CREATE INDEX................................................................................................................... 143
CREATE FUNCTION............................................................................................................146
CREATE KEYSPACE........................................................................................................... 146
CREATE MATERIALIZED VIEW..........................................................................................150
CREATE TABLE................................................................................................................... 151
CREATE TRIGGER.............................................................................................................. 156
CREATE TYPE..................................................................................................................... 157
CREATE ROLE.................................................................................................................... 158
CREATE USER.................................................................................................................... 159
DELETE................................................................................................................................ 160
DESCRIBE............................................................................................................................ 164
DROP AGGREGATE............................................................................................................166
DROP FUNCTION................................................................................................................ 166
DROP INDEX....................................................................................................................... 167
DROP KEYSPACE............................................................................................................... 167
DROP MATERIALIZED VIEW..............................................................................................168
DROP ROLE.........................................................................................................................169
DROP TABLE....................................................................................................................... 169
DROP TRIGGER.................................................................................................................. 170
DROP TYPE......................................................................................................................... 170
DROP USER........................................................................................................................ 171
4
Contents
GRANT..................................................................................................................................172
INSERT................................................................................................................................. 173
LIST PERMISSIONS............................................................................................................ 177
LIST ROLES......................................................................................................................... 178
LIST USERS.........................................................................................................................179
REVOKE............................................................................................................................... 181
SELECT................................................................................................................................ 182
TRUNCATE...........................................................................................................................193
UPDATE................................................................................................................................194
USE....................................................................................................................................... 199
5
Introduction to Cassandra Query Language
Cassandra Query Language (CQL) is a query language for the Cassandra database.
The Cassandra Query Language (CQL) is the primary language for communicating with the Cassandra
database. The most basic way to interact with Cassandra is using the CQL shell, cqlsh. Using cqlsh, you
can create keyspaces and tables, insert and query tables, plus much more. If you prefer a graphical tool,
you can use DataStax DevCenter. For production, DataStax supplies a number of drivers so that CQL
statements can be passed from client to cluster and back. Other administrative tasks can be accomplished
using OpsCenter.
Important: This document assumes you are familiar with the Cassandra 2.2 documentation, Cassandra
3.0 documentation or Cassandra 3.x documentation.
Native protocol • The Native Protocol has been updated to version 4, with implications for
CQL use in the DataStax drivers.
6
CQL data modeling
Note: DataStax Academy provides a course in Cassandra data modeling. This course presents
techniques using the Chebotko method for translating a real-world domain model into a running Cassandra
schema.
Data modeling is a process that involves identifying the entities (items to be stored) and the relationships
between entities. To create your data model, identify the patterns used to access data and the types of
queries to be performed. These two ideas inform the organization and structure of the data, and the design
and creation of the database's tables. Indexing the data can lead to either performance or degradation of
queries, so understanding indexing is an important step in the data modeling process.
Data modeling in Cassandra uses a query-driven approach, in which specific queries are the key to
organizing the data. Queries are the result of selecting data from a table; schema is the definition of how
data in the table is arranged. Cassandra's database design is based on the requirement for fast reads and
writes, so the better the schema design, the faster data is written and retrieved.
In contrast, relational databases normalize data based on the tables and relationships designed, and
then writes the queries that will be made. Data modeling in relational databases is table-driven, and any
relationships between tables are expressed as table joins in queries.
Cassandra's data model is a partitioned row store with tunable consistency. Tunable consistency means
for any given read or write operation, the client application decides how consistent the requested data
must be. Rows are organized into tables; the first component of a table's primary key is the partition key;
within a partition, rows are clustered by the remaining columns of the key. Other columns can be indexed
separately from the primary key. Because Cassandra is a distributed database, efficiency is gained for
reads and writes when data is grouped together on nodes by partition. The fewer partitions that must be
queried to get an answer to a question, the faster the response. Tuning the consistency level is another
factor in latency, but is not part of the data modeling process.
Cassandra data modeling focuses on the queries. Throughout this topic, the example of Pro Cycling
statistics demonstrates how to model the Cassandra table schema for specific queries. The conceptual
model for this data model shows the entities and relationships.
7
CQL data modeling
The entities and their relationships are considered during table design. Queries are best designed to
access a single table, so all entities involved in a relationship that a query encompasses must be in the
table. Some tables will involve a single entity and its attributes, like the first example shown below. Others
will involve more than one entity and its attributes, such as the second example. Including all data in a
single Cassandra table contrasts with a relational database approach, where the data would be stored
in two or more tables and foreign keys would be used to relate the data between the tables. Because
Cassandra uses this single table-single query approach, queries can perform faster.
One basic query (Q1) for Pro Cycling statistics is a list of cyclists, including each cyclist's id, firstname, and
lastname. To uniquely identify a cyclist in the table, an id using UUID is used. For a simple query to list all
cyclists a table that includes all the columns identified and a partition key (K) of id is created. The diagram
below shows a portion of the logical model for the Pro Cycling data model.
Figure: Query 1: Find a cyclist's name with a specified id
A related query (Q2) searches for all cyclists by a particular race category. For Cassandra, this query is
more efficient if a table is created that groups all cyclists by category. Some of the same columns are
required (id, lastname), but now the primary key of the table includes category as the partition key (K),
and groups within the partition by the id (C). This choice ensures that unique records for each cyclist are
created.
Figure: Query 2: Find cyclists given a specified category
8
CQL data modeling
These are two simple queries; more examples will be shown to illustrate data modeling using CQL.
Notice that the main principle in designing the table is not the relationship of the table to other tables, as it
is in relational database modeling. Data in Cassandra is often arranged as one query per table, and data
is repeated amongst many tables, a process known as denormalization. Relational databases instead
normalize data, removing as much duplication as possible. The relationship of the entities is important,
because the order in which data is stored in Cassandra can greatly affect the ease and speed of data
retrieval. The schema design captures much of the relationship between entities by including related
attributes in the same table. Client-side joins in application code is used only when table schema cannot
capture the complexity of the relationships.
You've created a conceptual model of the entities and their relationships. From the conceptual model,
you've used the expected queries to create table schema. The last step in data model involves completing
an analysis of the logical design to discover modifications that might be needed. These modifications can
arise from understanding partition size limitations, cost of data consistency, and performance costs due to
a number of design choices still to be made.
For efficient operation, partitions must be sized within certain limits. Two measures of partition size are the
number of values in a partition and the partition size on disk. The maximum number of rows per partition is
not theoretically limited, although practical limits can be found with experimentation. Sizing the disk space
is more complex, and involves the number of rows and the number of columns, primary key columns and
static columns in each table. Each application will have different efficiency parameters, but a good rule of
thumb is to keep the maximum number of values below 100,000 items and the disk size under 100MB.
Data redundancy must be considered as well. Two redundancies that are a consequence of Cassandra's
distributed design are duplicate data in tables and multiple partition replicates.
Data is generally duplicated in multiple tables, resulting in performance latency during writes and requires
more disk space. Consider storing a cyclist's name and id in more than one data, along with other items
like race categories, finished races, and cyclist statistics. Storing the name and id in multiple tables
results in linear duplication, with two values stored in each table. Table design must take into account the
possibility of higher order duplication, such as unlimited keywords stored in a large number of rows. A case
of n keywords stored in m rows is not a good table design. You should rethink the table schema for better
design, still keeping the query foremost.
Cassandra replicates partition data based on the replication factor, using more disk space. Replication is a
necessary aspect of distributed databases and sizing disk storage correctly is important.
Application-side joins can be a performance killer. In general, you should analyze your queries that require
joins and consider pre-computing and storing the join results in an additional table. In Cassandra, the
goal is to use one table per query for performant behavior. Lightweight transactions (LWT) can also affect
9
Using CQL
performance. Consider whether or not the queries using LWT are necessary and remove the requirement if
it is not strictly needed.
Using CQL
CQL provides an API to Cassandra that is simpler than the Thrift API.
CQL provides an API to Cassandra that is simpler than the Thrift API. The Thrift API and legacy versions
of CQL expose the internal storage structure of Cassandra. CQL adds an abstraction layer that hides
implementation details of this structure and provides native syntaxes for collections and other common
encodings.
Accessing CQL
Common ways to access CQL are:
• Start cqlsh, the Python-based command-line client, on the command line of a Cassandra node.
• Use DataStax DevCenter, a graphical user interface.
• For developing applications, you can use one of the official DataStax C#, Java, or Python open-source
drivers.
• Use the set_cql_version Thrift method for programmatic access.
This document presents examples using cqlsh.
This procedure briefly describes how to start cqlsh on Linux and Mac OS X. The cqlsh command is
covered in detail later.
Procedure
1. Navigate to the Cassandra installation directory.
2. Start cqlsh on the Mac OSX, for example.
$ bin/cqlsh
If you use security features, provide a user name and password.
3. Print the help menu for cqlsh.
$ bin/cqlsh --help
4. Optionally, specify the IP address and port to start cqlsh on a different node.
$ easy_install readline
10
Using CQL
This procedure describes how to start cqlsh on Windows. The P command is covered in detail later.
Procedure
You can start cqlsh in two ways:
• From the Start menu:
a) Navigate to Start > Programs > DataStax Distribution of Apache Cassandra.
b) If using Cassandra 3.0+, click DataStax Distribution of Apache Cassandra > Cassandra CQL
Shell
c) If using Cassandra 2.2, click DataStax Community Edition > Cassandra CQL Shell.
The cqlsh prompt appears: cqlsh>
• From the Command Prompt:
a) Open the Command Prompt.
b) Navigate to Cassandra bin directory:
Cassandra 3.0+:
C:\> cd C:"Program Files\DataStax-DDC\apache-cassandra\bin"
Cassandra 2.2:
C:\> cd C:"Program Files\DataStax Community\apache-cassandra\bin"
c) Enter cqlsh.
The cqlsh prompt appears: cqlsh>
To get the help menu for cqlsh:
Creating a keyspace is the CQL counterpart to creating an SQL database, but a little different. The
Cassandra keyspace is a namespace that defines how data is replicated on nodes. Typically, a cluster has
one keyspace per application. Replication is controlled on a per-keyspace basis, so data that has different
replication requirements typically resides in different keyspaces. Keyspaces are not designed to be used
as a significant map layer within the data model. Keyspaces are designed to control data replication for a
set of tables.
When you create a keyspace, specify a strategy class for replicating keyspaces. Using the
SimpleStrategy class is fine for evaluating Cassandra. For production use or for use with mixed
workloads, use the NetworkTopologyStrategy class.
To use NetworkTopologyStrategy for evaluation purposes using, for example, a single node cluster,
the default data center name is used. To use NetworkTopologyStrategy for production use, you
need to change the default snitch, SimpleSnitch, to a network-aware snitch, define one or more data
center names in the snitch properties file, and use the data center name(s) to define the keyspace; see
11
Using CQL
Snitch. For example, if the cluster uses the PropertyFileSnitch, create the keyspace using the user-
defined data center and rack names in the cassandra-topologies.properties file. If the cluster
uses the Ec2Snitch, create the keyspace using EC2 data center and rack names. If the cluster uses the
GoogleCloudSnitch, create the keyspace using GoogleCloud data center and rack names.
If you fail to change the default snitch and use NetworkTopologyStrategy, Cassandra will fail to
complete any write request, such as inserting data into a table, and log this error message:
Note: You cannot insert data into a table in keyspace that uses NetworkTopologyStrategy unless
you define the data center names in the snitch properties file or you use a single data center named
datacenter1.
To query Cassandra, create and use a keyspace. Choose an arbitrary data center name and register the
name in the properties file of the snitch. Alternatively, in a cluster in a single data center, use the default
data center name, for example, datacenter1, and skip registering the name in the properties file.
Procedure
1. Determine the default data center name, if using NetworkTopologyStrategy, using nodetool status.
$ bin/nodetool status
The output is:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 41.62 KB 256 100.0% 75dcca8f... rack1
2. Create a keyspace.
Increasing the replication factor increases the total number of copies of keyspace data stored in a
Cassandra cluster. If you are using security features, it is particularly important to increase the replication
factor of the system_auth keyspace from the default (1) because you will not be able to log into the
cluster if the node with the lone replica goes down. It is recommended to set the replication factor for the
system_auth keyspace equal to the number of nodes in each data center.
12
Using CQL
Procedure
1. Update a keyspace in the cluster and change its replication strategy options.
Or if using SimpleStrategy:
Creating a table
How to create tables to store data.
Creating a table
How to create CQL tables.
In CQL, data is stored in tables containing rows of columns, similar to SQL definitions.
Note: The concept of rows and columns in the internal implementation of Cassandra are not the same.
For more information, see A thrift to CQL3 upgrade guide or CQL3 for Cassandra experts.
Tables can be created, dropped, and altered at runtime without blocking updates and queries. To create a
table, you must define a primary key and other data columns. Add the optional WITH clause and keyword
arguments to configure table properties (caching, compaction, etc.). See the Setting a table property page
for details.
Primary Key
A primary key identifies the location and order of data storage. The primary key is defined at table creation
time and cannot be altered. If the primary key must be changed, a new table schema is created and the
data is written to the new table. Cassandra is a partition row store, and a component of the primary key,
the partition key, identifies which node will hold a particular table row. See ALTER TABLE for details on
altering a table after creation.
At the minimum, the primary key must consist of a partition key. Composite partition keys can split a data
set so that related data is stored on separate partitions. A compound primary key includes clustering
columns which order the data on a partition.
The definition of a table's primary key is critical in Cassandra. Carefully model how data in a table will
be inserted and retrieved before choosing which columns will define the primary key. The size of the
partitions, the order of the data within partitions, the distribution of the partitions amongst the nodes of the
cluster - all of these considerations determine selection of the best primary key for a table.
Table characteristics
Valid table names are strings of alphanumeric characters and underscores that begin with a letter. To
specify a table name:
13
Using CQL
• To specify the keyspace in which the table will be created, put the keyspace name followed by a period
before the table name: keyspace_name.table_name. This allows the creation of a new table in a
keyspace that is different from the one set for the current session (by the USE command, for example).
• To create a table in the current keyspace, just use the new table name.
Column characteristics
Columns are an essential component of a CQL table. Several column types exist to afford flexibility to table
schema design. Each column in a table is assigned a data type during table creation. Column types, other
than collection-type columns, are specified as a parenthesized, comma-separated list of column name and
type pairs. The following example illustrates three data types, UUID, text and timestamp:
Collection column types supported are map, set, and list. A collection column is declared using the
collection type, followed by another type, such as int or text, in angle brackets. Like other columns,
collection columns are specified as a parenthesized, comma-separated list of column name and type pair.
The following example illustrates each collection type, but is not designed for an actual query:
Collection types cannot currently be nested. Collections can include a frozen data type. For examples and
usage, see Collection type on page 91
The column type tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an
alternative to a user-defined type. A tuple can accommodate many fields (32768), more than can be
prudently used. Typically, create a tuple with a small number of fields. A tuple is typically used for 2
to 5 fields. To create a tuple in a table, use angle brackets and a comma delimiter to declare the tuple
component types. Tuples can be nested. The following example illustrates a tuple type composed of a text
field and a nested tuple of two float fields:
14
Using CQL
Sometimes issuing a USE statement to select a keyspace is inconvenient. Connection pooling requires
managing multiple keyspaces. To simplify tracking multiple keyspaces, use the keyspace qualifier instead
of the USE statement. You can specify the keyspace using the keyspace qualifier in these statements:
• ALTER TABLE
• CREATE TABLE
• DELETE
• INSERT
• SELECT
• TRUNCATE
• UPDATE
Procedure
To specify a table when you are not in the keyspace that contains the table, use the name of the
keyspace followed by a period, then the table name. For example, cycling.race_winners.
For a table with a simple primary key, Cassandra uses one column name as the partition key. The primary
key consists of only the partition key in this case. Data stored with a simple primary key will be fast to insert
and retrieve if many values for the column can distribute the partitions across many nodes.
Often, your first venture into using Cassandra involves tables with simple primary keys. Keep in mind
that only the primary key can be specified when retrieving data from the table (unless you use secondary
indexes). If an application needs a simple lookup table using a single unique identifier, then a simple
primary key is the right choice. The table shown uses id as the primary key.
Use a simple primary key to create a single column that you can use to query and return results. This
example creates a cyclist_name table storing an ID number and a cyclist's first and last names in columns.
15
Using CQL
The table uses a UUID as a primary key. This table can be queried to discover the name of a cyclist given
their ID number.
A simple primary key table can be created in three different ways, as shown.
Procedure
• Create the table cyclist_name in the cycling keyspace, making id the primary key. Insert the PRIMARY
KEY keywords after the column name in the CREATE TABLE definition. Before creating the table, set
the keyspace with a USE statement.
For a table with a composite partition key, Cassandra uses multiple columns as the partition key. These
columns form logical sets inside a partition to facilitate retrieval. In contrast to a simple partition key,
a composite partition key uses two or more columns to identify where data will reside. Composite
partition keys are used when the data stored is too large to reside in a single partition. Using more than
one column for the partition key breaks the data into chunks, or buckets. The data is still grouped, but
in smaller chunks. This method can be effective if a Cassandra cluster experiences hotspotting, or
congestion in writing data to one node repeatedly, because a partition is heavily writing. Cassandra is often
used for time series data, and hotspotting can be a real issue. Breaking incoming data into buckets by
year:month:day:hour, using four columns to route to a partition can decrease hotspots.
Data is retrieved using the partition key. Keep in mind that to retrieve data from the table, values for all
columns defined in the partition key have to be supplied, if secondary indexes are not used. The table
shown uses race_year and race_name in the primary key, as a composition partition key. To retrieve data,
both parameters must be identified.
16
Using CQL
Cassandra stores an entire row of data on a node by partition key. If you have too much data in a partition
and want to spread the data over multiple nodes, use a composite partition key.
Use a composite partition key in your primary key to create a set of columns that you can use to
distribute data across multiple partitions and to query and return sorted results. This example creates
a rank_by_year_and_name table storing the ranking and name of cyclists who competed in races. The
table uses race_year and race_name as the columns defining the composition partition key of the primary
key. The query discovers the ranking of cyclists who competed in races by supplying year and race name
values.
A composite partition key table can be created in two different ways, as shown.
Procedure
• Create the table rank_by_year_and_name in the cycling keyspace. Use race_year and race_name
for the composite partition key. The table definition shown has an additional column rank used in the
primary key. Before creating the table, set the keyspace with a USE statement. This example identifies
the primary key at the end of the table definition. Note the double parentheses around the first two
columns defined in the PRIMARY KEY.
For a table with a compound primary key, Cassandra uses a partition key that is either simple or
composite. In addition, clustering column(s) are defined. Clustering is a storage engine process that sorts
data within each partition based on the definition of the clustering columns. Normally, columns are sorted
in ascending alphabetical order. Generally, a different grouping of data will benefit reads and writes better
than this simplistic choice.
Remember that data is distributed throughout the Cassandra cluster. An application can experience
high latency while retrieving data from a large partition if the entire partition must be read to gather a
small amount of data. On a physical node, when rows for a partition key are stored in order based on the
clustering columns, retrieval of rows is very efficient. Grouping data in tables using clustering columns is
the equivalent of JOINs in a relational database, but are much more performant because only one table is
accessed. This table uses category as the partition key and points as the clustering column. Notice that for
each category, the points are ordered in descending order.
17
Using CQL
Cassandra stores an entire row of data on a node by partition key and can order the data for retrieval
with clustering columns.Retrieving data from a partition is more versatile with clustering columns. For the
example shown, a query could retrieve all point values greater than 200 for the One-day-races. If you have
more complex needs for querying, use a compound primary key.
Use a compound primary key to create multiple columns that you can use to query and return sorted
results. If our pro cycling example was designed in a relational database, you would create a cyclists table
with a foreign key to the races. In Cassandra, you denormalize the data because joins are not performant
in a distributed system. Later, other schema are shown that improve Cassandra performance. Collections
and indexes are two data modeling methods. This example creates a cyclist_category table storing a
cyclist's last name, ID, and points for each type of race category. The table uses category for the partition
key and points for a single clustering column. This table can be queried to retrieve a list of cyclists and their
points in a category, sorted by points.
A compound primary key table can be created in two different ways, as shown.
Procedure
• To create a table having a compound primary key, use two or more columns as the primary key. This
example uses an additional clause WITH CLUSTERING ORDER BY to order the points in descending
order. Ascending order is more efficient to store, but descending queries are faster due to the nature of
the storage engine.
Note: The combination of the category and points uniquely identifies a row in the cyclist_category
table. More than one row with the same category can exist as long as the rows contain different
pointsvalues.
• The keyspace name can be used to identify the keyspace in the CREATE TABLE statement instead of
the USE statement.
18
Using CQL
lastname text,
PRIMARY KEY (category, points)
) WITH CLUSTERING ORDER BY (points DESC);
Tracking count in a distributed database presents an interesting challenge. In Cassandra, at any given
moment, the counter value may be stored in the Memtable, commit log, and/or one or more SSTables.
Replication between nodes can cause consistency issues in certain edge cases. Cassandra counters were
redesigned in Cassandra 2.1 to alleviate some of the difficulties. Read "What’s New in Cassandra 2.1:
Better Implementation of Counters" to discover the improvements made in the counters.
Because counters are implemented differently from other columns, counter columns can only be created
in dedicated tables. A counter column must have the datatype counter data type. This data type cannot
be assigned to a column that serves as the primary key or partition key. To implement a counter column,
create a table that only includes:
• The primary key (can be one or more columns)
• The counter column
Many counter-related settings can be set in the cassandra.yaml file.
A counter column cannot be indexed or deleted.. To load data into a counter column, or to increase or
decrease the value of the counter, use the UPDATE command. Cassandra rejects USING TIMESTAMP or
USING TTL when updating a counter column.
To create a table having one or more counter columns, use this:
• Use CREATE TABLE to define the counter and non-counter columns. Use all non-counter columns as
part of the PRIMARY KEY definition.
Using a counter
A counter is a special column for storing a number that is changed in increments.
To load data into a counter column, or to increase or decrease the value of the counter, use the UPDATE
command. Cassandra rejects USING TIMESTAMP or USING TTL in the command to update a counter
column.
19
Using CQL
Procedure
• Create a table for the counter column.
UPDATE cycling.popular_count
SET popularity = popularity + 1
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
• Take a look at the counter value and note that popularity has a value of 1.
When you create a table using compound primary keys, for every piece of data stored, the column name
needs to be stored along with it. Instead of each non-primary key column being stored such that each
column corresponds to one column on disk, an entire row is stored in a single column on disk. If you need
to conserve disk space, use the WITH COMPACT STORAGE directive that stores data in the legacy
(Thrift) storage engine format.
Using the compact storage directive prevents you from defining more than one column that is not part of
a compound primary key. A compact table using a primary key that is not compound can have multiple
columns that are not part of the primary key.
A compact table that uses a compound primary key must define at least one clustering column. Columns
cannot be added nor removed after creation of a compact table. Unless you specify WITH COMPACT
STORAGE, CQL creates a table with non-compact storage.
Collections and static columns cannot be used with COMPACT STORAGE tables.
Dynamic schema creation or updates can cause schema collision resulting in errors.
Procedure
1. Run a rolling restart on all nodes to ensure schema matches. Run nodetool describecluster on
all nodes. Check that there is only one schema version.
20
Using CQL
2. On each node, check the data directory, looking for two directories for the table in question. If there is
only one directory, go on to the next node. If there are two or more directories, the old table directory
before update and a new table directory for after the update, continue.
3. Identify which cf_id (column family ID) is the newest table ID in system.schema_columnfamilies.
The column family ID is fifth column in the results.
4. Move the data from the older table to the newer table's directory and remove the old directory. Repeat
this step as necessary.
5. Run nodetool refresh.
In Cassandra 3.0 and later, a materialized view is a table that is built from another table's data with a new
primary key specified. In Cassandra, queries are optimized by primary key definition and often there is a
table per query. If a new query is desired, a new table is created. Previously, these additional tables were
created and updated manually in the client application. Materialized views update and delete values when
the original table is updated and deleted.
Whereas secondary indexes which are suited for low cardinality data, materialized views are suited for high
cardinality data. Secondary indexes on high cardinality data require all nodes in a cluster to be queried,
causing high read latency. With materialized views, the data is arranged serially based on the new primary
key in a new table. Materialized views will cause hotspots if low cardinality data is inserted.
To create a materialized view, certain requirements must be met.
• The columns of the original table's primary key must be part of the materialized view's primary key.
• Only one new column may be added to the materialized view's primary key.
The following table is the original, or base, table for the materialized views that will be built.
CREATE TABLE cyclist_mv (cid UUID PRIMARY KEY, name text, age int, birthday
date, country text);
This table holds values for the name, age, birthday, and country affiliation of several cyclists.
A materialized view can be created from cyclist_mv that uses age as the primary index for the file.
The CREATE MATERIALIZED VIEW statement has several features. The AS SELECT phrase identifies
the columns that will be copied from the base table to the materialized view. The FROM phrase identifies
21
Using CQL
the base table from which the data will be copied. The WHERE clause must include all primary key columns
with the IS NOT NULL phrase so that only rows with data for all the primary key's columns will be copied.
Finally, as with any table, the materialized view identifies the primary key columns. Since cyclist_mv
used cid as its primary key, cid must be present in the materialized view's primary key.
In the new materialized view, data can be selected based on the cyclist's age. Because the data is now
partitioned based on the age, the data is easily and quickly retrieved.
Similarly, a materialized view can be created that keys the information to cyclists' birthdays or country of
origin.
Now, queries based on the country of origin or the birthday can use the new materialized views created.
22
Using CQL
If another INSERT is executed on the cyclist_mv table, both the base table and all three of the
materialized views created in the above examples will update. When data is deleted from the base table,
the data is also deleted from any affected materialized views.
Materialized views allow fast lookup of the data using the normal Cassandra read path. However,
materialized views do not have the same write performance that normal table writes have, because an
additional read-before-write must be done to update the materialized views. Also, a data consistency check
on each replica must be completed before the update is done. A write that includes updating materialized
views will incur latency. In addition, when data is deleted, performance may suffer. Two or more contiguous
rows in the base table may be tagged with one tombstone, but these rows may not be contiguous in the
materialized views – leading to the creation of several tombstones in each one.
A materialized view is built asynchronously after data is inserted into the base table. This can be the cause
of delays in updating the data in a materialized view. Data cannot be written directly to materialized views,
only to base tables. A read repair is done to a materialized view only after an update to the base table.
Additional information on how materialized views work can be found in New in Cassandra 3.0: Materialized
Views and Cassandra Summit 2015 talk on Materialized Views.
Data can be inserted into tables using more advanced types of data.
Creating collections
Collection types provide a way to group and store data together in a table column.
Cassandra provides collection types as a way to group and store data together in a column. For example,
in a relational database a grouping such as a user's multiple email addresses is related with a many-to-one
joined relationship between a user table and an email table. Cassandra avoids joins between two tables
by storing the user's email addresses in a collection column in the user table. Each collection specifies the
data type of the data held.
A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth
potential, like messages sent or sensor events registered every second, do not use collections. Instead,
use a table with a compound primary key where data is stored in the clustering columns.
CQL contains these collection types:
• set
• list
• map
Observe the following limitations of collections:
• Never insert more than 2 billion items in a collection, as only that number can be queried.
• The maximum number of keys for a map collection is 65,535.
• The maximum size of an item in a list or a map collection is 2GB.
• The maximum size of an item in a set collection is 65,535 bytes.
• Keep collections small to prevent delays during querying.
Collections cannot be "sliced"; Cassandra reads a collection in its entirety, impacting performance.
Thus, collections should be much smaller than the maximum limits listed. The collection is not paged
internally.
• Lists can incur a read-before-write operation for some insertions. Sets are preferred over lists whenever
possible.
Note: The limits specified for collections are for non-frozen collections.
You can expire each element of a collection by setting an individual time-to-live (TTL) property.
23
Using CQL
A set consists of a group of elements with unique values. Duplicate values will note be stored distinctly.
The values of a set are stored unordered, but will return the elements in sorted order when queried. Use
the set data type to store data that has a many-to-one relationship with another column. For example, in
the example below, a set called teams stores all the teams that a cyclist has been a member of during their
career.
Procedure
• Define teams in a table cyclist_career_teams. Each team listed in the set will have a textdata type.
A list has a form much like a set, in that a list groups and stores values. Unlike a set, the values stored in
a list do not need to be unique and can be duplicated. In addition, a list stores the elements in a particular
order and may be inserted or retrieved according to an index value.
Use the list data type to store data that has a possible many-to-many relationship with another column. For
example, in the example below, a list called events stores all the race events on an upcoming calendar.
Each month/year pairing might have several events occurring, and the races are stored in a list. The list
can be ordered so that the races appear in the order that they will take place, rather than alphabetical
order.
Procedure
• Define events in a table upcoming_calendar. Each event listed in the list will have a textdata type.
A map relates one item to another with a key-value pair. For each key, only one value may exist, and
duplicates cannot be stored. Both the key and the value are designated with a data type.
24
Using CQL
.Using the map type, you can store timestamp-related information in user profiles. Each element of the
map is internally stored as one Cassandra column that you can modify, replace, delete, and query. Each
element can have an individual time-to-live and expire when the TTL ends.
Procedure
Define teams in a table cyclist_teams. Each team listed in the map will have an integer data type for
the year a cyclist belonged to the team and a textdata type for the team name. The map collection is
specified with a map column name and the pair of data types enclosed in angle brackets.
Tuples are a data type that allow two or more values to be stored together in a column. A user-defined type
can be used, but for simple groupings, a tuple is a good choice.
Procedure
• Create a table cycling.route using a tuple to store each waypoint location name, latitude, and longitude.
In Cassandra 2.1 and later, user-defined types (UDTs) can attach multiple data fields, each named and
typed, to a single column. The fields used to create a UDT may be any valid data type, including collections
and other existing UDTs. Once created, UDTs may be used to define a column in a table.
25
Using CQL
Procedure
• Use the cycling keyspace.
Creating functions
How to create functions.
In Cassandra 3.0, users can create user-defined functions (UDFs) and user-defined aggregate functions
(UDAs). Functions are used to manipulate stored data in queries. Retrieving results using standard
aggregate functions are also available for queries.
Cassandra 2.2 and later allows users to define functions that can be applied to data stored in a table as
part of a query result. The function must be created prior to its use in a SELECT statement. The function
will be performed on each row of the table. To use user-defined functions with Java or Javascript in
Cassandra 2.2 or Javascript in Cassandra 3.0, enable_user_defined_functions must be set true
incassandra.yaml file setting to enable the functions; it is not required for Java in Cassandra 3.0. User-
defined functions are defined within a keyspace; if no keyspace is defined, the current keyspace is used.
User-defined functions are executed in a sandbox in Cassandra 3.0 and later. In Cassandra 2.2, there is
no security manager to prevent execution of malicious code; see the cassandra.yaml file for more details.
By default, Cassandra 2.2 and later supports defining functions in java and javascript. Other scripting
languages, such as Python, Ruby, and Scala can be added by adding a JAR to the classpath. Install
the JAR file into $CASSANDRA_HOME/lib/jsr223/[language]/[jar-name].jar where language is
'jruby', 'jython', or 'scala'
26
Using CQL
Procedure
• Create a function, specifying the data type of the returned value, the language, and the actual code of
the function to be performed. The following function, fLog(), computes the logarithmic value of each
input. It is a built-in java function and used to generate linear plots of non-linear data. For this example,
it presents a simple math function to show the capabilities of user-defined functions.
Note:
• CALLED ON NULL INPUT ensures the function will always be executed.
• RETURNS NULL ON NULL INPUT ensures the function will always return NULL if any of the input
arguments is NULL.
• RETURNS defines the data type of the value returned by the function.
• A function can be replaced with a different function if OR REPLACE is used as shown in the example
above. Optionally, the IF NOT EXISTS keywords can be used to create the function only if another
function with the same signature does not exist in the keyspace. OR REPLACE and IF NOT EXISTS
cannot be used in the same command.
Cassandra 2.2 and later allows users to define aggregate functions that can be applied to data stored
in a table as part of a query result. The aggregate function must be created prior to its use in a SELECT
statement and the query must only include the aggregate function itself, but no columns. The state function
is called once for each row, and the value returned by the state function becomes the new state. After
all rows are processed, the optional final function is executed with the last state value as its argument.
Aggregation is performed by the coordinator.
The example shown computes the team average for race time for all the cyclists stored in the table. The
race time is computed in seconds.
Procedure
• Create a state function, as a user-defined function (UDF), if needed. This function adds all the race
times together and counts the number of entries.
27
Using CQL
• Create the aggregate function using these two functions, and add an STYPE to define the data type
for the function. Different STYPEs will distinguish one function from another with the same name. An
aggregate can be replaced with a different aggregate if OR REPLACE is used as shown in the examples
above. Optionally, the IF NOT EXISTS keywords can be used to create the aggregate only if another
aggregate with the same signature does not exist in the keyspace. OR REPLACE and IF NOT EXISTS
cannot be used in the same command.
What to do next
For more information on user-defined aggregates, see Cassandra Aggregates - min, max, avg, group by
and A few more Cassandra aggregates.
Data can be inserted into tables using the INSERT command. With Cassandra 3.0, JSON data can be
inserted.
In a production database, inserting columns and column values programmatically is more practical than
using cqlsh, but often, testing queries using this SQL-like shell is very convenient.
Insertion, update, and deletion operations on rows sharing the same partition key for a table are performed
atomically and in isolation.
Procedure
• To insert simple data into the table cycling.cyclist_name, use the INSERT command. This example
inserts a single record into the table.
If a table specifies a set to hold data, then either INSERT or UPDATE is used to enter data.
Procedure
• Insert data into the set, enclosing values in curly brackets.
28
Using CQL
If a table specifies a list to hold data, then either INSERT or UPDATE is used to enter data.
29
Using CQL
Procedure
• Insert data into the list, enclosing values in square brackets.
These update operations are implemented internally without any read-before-write. Appending and
prepending a new element to the list writes only the new element.
• Add an element at a particular position using the list index position in square brackets.
To add an element at a particular position, Cassandra reads the entire list, and then rewrites the part
of the list that needs to be shifted to the new index positions. Consequently, adding an element at a
particular position results in greater latency than appending or prefixing an element to a list.
• Remove an element from a list, use the DELETE command and the list index position in square
brackets. For example, remove the event just placed in the list in the last step.
The method of removing elements using an indexed position from a list requires an internal read. In
addition, the client-side application could only discover the indexed position by reading the whole list
and finding the values to remove, adding additional latency to the operation. If another thread or client
prepends elements to the list before the operation is done, incorrect data will be removed.
• Remove all elements having a particular value using the UPDATE command, the subtraction operator
(-), and the list value in square brackets.
Using the UPDATE command as shown in this example is recommended over the last example because
it is safer and faster.
If a table specifies a map to hold data, then either INSERT or UPDATE is used to enter data.
30
Using CQL
Procedure
• Set or replace map data, using the INSERT or UPDATE command, and enclosing the integer and text
values in a map collection with curly brackets, separated by a colon.
Note: Using INSERT in this manner will replace the entire map.
• Use the UPDATE command to insert values into the map. Append an element to the map by enclosing
the key-value pair in curly brackets and using the addition (+) operator.
Tuples are used to group small amounts of data together that are then stored in a column.
Procedure
• Insert data into the table cycling.route which has tuple data. The tuple is enclosed in parentheses. This
tuple has a tuple nested inside; nested parentheses are required for the inner tuple, then the outer
tuple.
31
Using CQL
• Insert data into the table popular which has tuple data. The tuple called cinfo stores the country name,
cyclist name, and points total.
If a table specifies a user-defined type (UDT) to hold data, then either INSERT or UPDATE is used to enter
data.
Procedure
Inserting data into a UDT
• Set or replace user-defined type data, using the INSERT or UPDATE command, and enclosing the user-
defined type with curly brackets, separating each key-value pair in the user-defined type by a colon.
Note: Note the inclusion of null values for UDT elements that have no value. A value, whether null or
otherwise, must be included for each element of the UDT.
• Data can be inserted into a UDT that is nested in another column type. For example, a list of races,
where the race name, date, and time are defined in a UDT has elements enclosed in curly brackets that
are in turn enclosed in square brackets.
Note: The UDT nested in the list is frozen, so the entire list will be read when querying the table.
Updating individual field data in a UDT
• In Cassandra 3.6 and later, user-defined types that include only non-collection fields can update
individual field values. Update an individual field in user-defined type data using the UPDATE command.
The desired key-value pair are defined in the command. In order to update, the UDT must be defined in
the CREATE TABLE command as an unfrozen data type.
32
Using CQL
The UDT is defined in the table with basics basic_info. This example shows an inserted row,
followed by an update that only updates the value of birthday inside the UDT basics.
id | basics
|
lastname
--------------------------------------
+--------------------------------------------------------------------------------------
+-----------
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | {birthday: '2000-12-12
08:00:00.000000+0000', nationality: 'Italy', weight: '63 kg', height: '1.78
m'} | TIRALONGO
The resulting change is evident, as is the unchanged values for nationality, weight, and height.
Note: UDTs with collection fields must be frozen in table creation, and individual field values cannot be
updated.
In a production database, inserting columns and column values programmatically is more practical than
using cqlsh, but often, testing queries using this SQL-like shell is very convenient. With Cassandra 2.2 and
later, JSON data can be inserted. All values will be inserted as a string if they are not a number, but will
be stored using the column data type. For example, the id below is inserted as a string, but is stored as a
UUID. For more information, see What's New in Cassandra 2.2: JSON Support.
Procedure
• To insert JSON data, add JSON to the INSERT command.. Note the absence of the keyword VALUES
and the list of columns that is present in other INSERT commands.
• A null value will be entered if a defined column like lastname, is not inserted into a table using JSON
format.
33
Using CQL
INSERT and UPDATE statements using the IF clause support lightweight transactions, also known as
Compare and Set (CAS). A common use for lightweight transactions is an insertion operation that must
be unique, such as a cyclist's identification. Lightweight transactions should not be used casually, as the
latency of operations increases fourfold due to the due to the round-trips necessary between the CAS
coordinators.
Cassandra 2.1.1 and later support non-equal conditions for lightweight transactions. You can use <, <=, >,
>=, != and IN operators in WHERE clauses to query lightweight tables.
It is important to note that using IF NOT EXISTS on an INSERT, the timestamp will be designated by the
lightweight transaction, and USING TIMESTAMP is prohibited.
Procedure
• Insert a new cyclist with their id.
Data in a column, other than a counter column, can have an optional expiration period called TTL (time to
live). The client request specifies a TTL value, defined in seconds, for the data. After the requested amount
of time has expired, TTL data is no longer live and will not be included in results. TTL data is marked with
a tombstone after on the next read on the read path and exists for a maximum of gc_grace_seconds. After
data is marked with a tombstone, the data is automatically removed during the normal compaction and
repair processes.
Use CQL to set the TTL for data.
If you want to change the TTL of expiring data, the data must be re-inserted with a new TTL. In Cassandra,
the insertion of data is actually an insertion or update operation, depending on whether or not a previous
version of the data exists.
TTL data has a precision of one second, as calculated on the server. Therefore, a very small TTL is not
very useful. Moreover, the clocks on the servers must be synchronized; otherwise reduced precision will be
observed because the expiration time is computed on the primary host that receives the initial insertion but
is then interpreted by other hosts on the cluster.
Expiring data has an additional overhead of 8 bytes in memory and on disk (to record the TTL and
expiration time) compared to standard data.
34
Using CQL
Both the INSERT and UPDATE commands support setting a time for data in a column to expire. The
expiration time (TTL) is set using CQL.
Procedure
• Use the INSERT command to set calendar listing in the calendar table to expire in 86400 seconds, or
one day.
In a production database, inserting columns and column values programmatically is more practical than
using cqlsh, but often, testing queries using this SQL-like shell is very convenient. A comma-delimited
file, or CSV file, is useful if several records need inserting. While not strictly an INSERT command, it is a
common method for inserting data.
Procedure
1. Locate your CSV file and check options to use.
category|point|id|lastname
GC|1269|2003|TIRALONGO
One-day-races|367|2003|TIRALONGO
GC|1324|2004|KRUIJSWIJK
2. To insert the data, using the COPY command with CSV data.
Batching is used to insert or update data in tables. Understanding the use of batching, if used, is crucial to
performance.
Batches are often mistakenly used in an attempt to optimize performance. Improved performance is not a
reason to use batches. Batches place a burden on the coordinator for both logged and unlogged batches.
35
Using CQL
Batches are best used when a small number of tables must synchronize inserted or updated data. The
number of partitions involved in a batch operation and thus potential for multi-node access, can increase
the amount of time the operation takes to complete. Batch statements are logged, causing additional
performance latency. For logged batches, the coordinator sends a batch log to two other nodes, so that if
the coordinator fails, the batch will be retried by those nodes.
For unlogged batches, the coordinator manages all insert/update operations, causing that single node to
do more work. If the partition keys for the operations are stored on more than one node, extra network
hops occur.
Note: Unlogged batches are deprecated in Cassandra2.2+.
Batched statements can save network round-trips between the client and the server, and possibly between
the server coordinator and the replicas. However, consider carefully before implementing batch operations,
and decide if they are truly necessary. For information about the fastest way to load data, see "Cassandra:
Batch loading without the Batch keyword."
Batch operations can be either beneficial or detrimental. Look at the examples below to see a good use of
BATCH.
Procedure
• An example of a good batch that is logged. Note in the table definition for cyclist_expenses, the balance
column is STATIC.
As explained in the BATCH statement reference, in Cassandra 2.0.6+ you can batch conditional
updates. This example shows batching conditional updates combined with using static columns.
Note: It would be reasonable to expect that an UPDATE to the balance could be included in this
BATCH statement:
However, it is important to understand that all the statements processed in a BATCH statement
timestamp the records with the same value. The operations may not perform in the order listed in the
36
Using CQL
BATCH statement. The UPDATE might be processed BEFORE the first INSERT that sets the balance
value to zero, allowing the conditional to be met.
An acknowledgement of a batch statement is returned if the batch operation is successful.
• The balance can be adjusted separately with an UPDATE statement. Now the balance will reflect that
breakfast was unpaid.
• The table cyclist_expenses stores records about each purchase by a cyclist and includes the running
balance of all the cyclist's purchases. Because the balance is static, all purchase records for a cyclist
have the same running balance. This BATCH statement inserts expenses for two more meals changes
the balance to reflect that breakfast and dinner were unpaid.
37
Using CQL
• Finally, the cyclist pays off all outstanding bills and the balance of the account goes to zero.
Because the column is static, you can provide only the partition key when updating the data. To update
a non-static column, you would also have to provide a clustering key. Using batched conditional
updates, you can maintain a running balance. If the balance were stored in a separate table,
maintaining a running balance would not be possible because a batch having conditional updates
cannot span multiple partitions.
Misused, BATCH statements can cause many problems in a distributed database like Cassandra. Batch
operations that involve multiple nodes are a definite anti-pattern. Keep in mind which partition data will
be written to when grouping INSERT and UPDATE statements in a BATCH statement. Writing to several
partitions might require interaction with several nodes in the cluster, causing a great deal of latency for the
write operation.
Procedure
• This example shows an anti-pattern since the BATCH statement will write to several different partitions,
given the partition key id.
38
Using CQL
Unlogged BATCH statements require management by the coordinator. Best case scenario for using
unlogged BATCH statements is when inserts will all take place on a single node.
Procedure
• An unlogged batch that writes to a single partition resolves to only one write internally, regardless of
the number of writes, and is an acceptable use of batch. In this example, the partition key includes both
date and time.
Querying tables
How to query data from tables.
Data can be queried from tables using the SELECT command. With Cassandra 3.0, many new options are
available, such as retrieving JSON data, using standard aggregate functions, and manipulating retrieved
data with user-defined functions (UDFs) and user-defined aggregate functions (UDAs).
Querying tables to select data is the reason data is stored in databases. Similar to SQL, CQL can SELECT
data using simple or complex qualifiers. At its simplest, a query selects all data in a table. At its most
complex, a query delineates which data to retrieve and display and even calculate new values based on
user-defined functions. For SASI indexing, see queries in Using SASI.
39
Using CQL
Now, to get the top two racers in every race year and race name, use the following command with PER
PARTITION LIMIT 2.
Output:
Procedure
• Use a simple SELECT query to display all data from a table.
40
Using CQL
• The example below illustrates how to create a query that uses category as a filter.
Note that Cassandra will reject this query if category is not a partition key or clustering column. Queries
require a sequential retrieval across the entire cyclist_category table. In a distributed database like
Cassandra, this is a crucial concept to grasp; scanning all data across all nodes is prohibitively slow
and thus blocked from execution. The use of partition key and clustering columns in a WHERE clause
must result in the selection of a contiguous set of rows.
Queries can filter using secondary indexes, discussed in the Indexing Tables section. A query based on
lastname can result in satisfactory results if the lastname column is indexed.
• In Cassandra 3.6 and later, clustering columns can be defined in WHERE clauses if ALLOW FILTERING
is also used even if a secondary index is not created. The table definition is given and then the SELECT
command. Note that race_start_date is a clustering column that has no secondary index.
• You can also pick the columns to display instead of choosing all data.
• For a large table, limit the number of rows retrieved using LIMIT. The default limit is 10,000 rows. To
sample data, pick a smaller number. To retrieve more than 10,000 rows set LIMIT to a large value.
41
Using CQL
• You can fine-tune the display order using the ORDER BY clause. The partition key must be defined in
the WHERE clause and the ORDER BY clause defines the clustering column to use for ordering.
• Tuples are retrieved in their entirety. This example uses AS to change the header of the tuple name.
• In Cassandra 3.6 and later, the PER PARTITION LIMIT option sets the maximum number of rows that
the query returns from each partition. This is interesting because it allows a query to select a "Top 3"
selection if the partitions are separated correctly. Create a table that will sort data into more than one
partition and insert some data:
42
Using CQL
Collections do not differ from other columns in retrieval. To query for a subset of the collection, a
secondary index for the collection must be created.
Procedure
• Retrieve teams for a particular cyclist id from the set.
To query a table containing a collection, Cassandra retrieves the collection in its entirety. Keep
collections small enough to be manageable because the collection store in memory. Alternatively,
construct a data model to replace a collection if it must accommodate large amounts of data.
Cassandra returns results in an order based on the type of the elements in the collection. For example,
a set of text elements is returned in alphabetical order. If you want elements of the collection returned in
insertion order, use a list.
43
Using CQL
• Retrieve events stored in a list from the upcoming calendar for a particular year and month.
Note: The order is not alphabetical, but rather in the order of insertion.
• Retrieve teams for a particular cyclist id from the map.
The order of the map output depends on the key type of the map. In this case, the key is an integer
type.
The SELECT command can be used to retrieve data from a table in JSON format. For more information,
see What's New in Cassandra 2.2: JSON Support.
Procedure
Specify that the result should use the JSON format with the keyword JSON.
The IN keyword can define a set of clustering columns to fetch together, supporting a "multi-get" of CQL
rows. A single clustering column can be defined if all preceding columns are defined for either equality or
group inclusion. Alternatively, several clustering columns may be defined to collect several rows, as long
as all preceding columns are queried for equality or group inclusion. The defined clustering columns can
also be queried for inequality.
44
Using CQL
Note that using both IN and ORDER BY will require turning off paging with the PAGING OFF command in
cqlsh.
Procedure
• Turn off paging.
• Retrieve rows using multiple clustering columns. This example searches the partition key race_ids for
several races, but the partition key can also be composed as an equality for one value.
cqlsh> SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102) AND
(race_start_date, race_end_date) >= ('2015-05-09','2015-05-24');
45
Using CQL
Queries can scan a partition to retrieve a segment of stored data. The segment must be sequentially
stored, so clustering columns can be used to define the slice of data selected.
Procedure
• Create a table race_times to hold the race times of various cyclists for various races.
• Scan the race times in the table to find a particular segment of data using a conditional operator.
In Cassandra 2.2, the standard aggregate functions of min, max, avg, sum, and count are built-in
functions.
Procedure
• A table cyclist_points records the race points for cyclists.
46
Using CQL
• Calculate the standard aggregation function sum to find the sum of race points for a particular cyclist.
The value of the aggregate will be returned.
• Another standard aggregate function is count. A table country_flag records the country of each cyclist.
• Calculate the standard aggregation function count to find the number of cyclists from Belgium. The
value of the aggregate will be returned.
The SELECT command can be used to retrieve data from a table while applying a user-defined function
(UDF) to it.
47
Using CQL
Procedure
Use the user-defined function (UDF) fLog() created previously to retrieve data from a table
cycling.cyclist_points.
Referring back to the user-defined aggregate average(), retrieve the average of the column
cyclist_time_sec from a table.
Procedure
1. List all the data in the table.
48
Using CQL
The system keyspace includes a number of tables that contain details about your Cassandra database
objects and cluster configuration.
Cassandra 2.2 populates these tables and others in the system keyspace.
Cassandra 3.0 populates these tables and others in the system keyspace.
49
Using CQL
An alternative to the cqlsh describe_* functions or using DevCenter to discover keyspace, table, and
column information is querying system.schema_* table directly.
50
Using CQL
Procedure
• Query the defined keyspaces using the SELECT statement.
51
Using CQL
Cluster information
Querying system tables to get cluster topology information.
You can query system tables to get cluster topology information. Display the IP address of peer nodes,
data center and rack names, token values, and other information. "The Data Dictionary" article describes
querying system tables in detail.
Procedure
After setting up a cluster, query the peers and local tables.
Output from querying the peers table looks something like this:
(2 rows)
Currently, the system tables are the only method of displaying information about user-defined functions,
aggregates, and user types.
Procedure
• Show all user-defined functions in the system.schema_functions table.
52
Using CQL
Indexing tables
How to query data from tables using indexes.
Indexing
An index provides a means to access data in Cassandra using attributes other than the partition key for fast, efficient lookup of data matching a given condition.
An index provides a means to access data in Cassandra using attributes other than the partition key. The
benefit is fast, efficient lookup of data matching a given condition. The index indexes column values in a
separate, hidden table from the one that contains the values being indexed. Cassandra has a number of
techniques for guarding against the undesirable scenario where data might be incorrectly retrieved during a
query involving indexes on the basis of stale values in the index.
Indexes can be used for collections, collection columns, static columns, and any other columns except
counter columns.
In Cassandra 3.4 and later, SSTable Attached Secondary Indexes (SASI) have been introduced.
53
Using CQL
Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The
more unique values that exist in a particular column, the more overhead you will have, on average, to
query and maintain the index. For example, suppose you had a races table with a billion entries for cyclists
in hundreds of races and wanted to look up rank by the cyclist. Many cyclists' ranks will share the same
column value for race year. The race_year column is a good candidate for an index.
In Cassandra 3.4 and later, a new implementation of secondary indexes, SSTable Attached Secondary
Indexes (SASI) have greatly improved the performance of secondary indexes. If secondary indexes are
required, SASI should be considered.
Using CQL, you can create an index on a column after defining a table. In Cassandra 2.1 and later, you
can index a collection column. In Cassandra 3.4 and later, static columns can be indexed. Secondary
indexes are used to query a table using a column that is not normally queryable.
Secondary indexes are tricky to use and can impact performance greatly. The index table is stored on each
node in a cluster, so a query involving a secondary index can rapidly become a performance nightmare
54
Using CQL
if multiple nodes are accessed. A general rule of thumb is to index a column with low cardinality of few
values. Before creating an index, be aware of when and when not to create an index.
In Cassandra 3.4 and later, a new implementation of secondary indexes, SSTable Attached Secondary
Indexes (SASI) have greatly improved the performance of secondary indexes and should be used, if
possible.
Procedure
• The table rank_by_year_and_name can yield the rank of cyclists for races.
• A logical query to try is a listing of the rankings for a particular year. Because the table has a composite
partition key, this query will fail if only the first column is used in the conditional operator.
• An index is created for the race year, and the query will succeed. An index name is optional and
must be unique within a keyspace. If you do not provide a name, Cassandra will assign a name like
race_year_idx.
• A clustering column can also be used to create an index. An index is created on rank, and used in a
query.
55
Using CQL
Indexes can be created on multiple columns and used in queries. The general rule about cardinality applies
to all columns indexed. In a real-world situation, certain columns might not be good choices, depending on
their cardinality.
Procedure
• The table cycling.alt_stats can yield the statistics about cyclists.
• The indexes have been created on appropriate low cardinality columns, but the query still fails. Why?
The answer lies with the partition key, which has not been defined. When you attempt a potentially
expensive query, such as searching a range of rows, Cassandra requires the ALLOW FILTERING
directive. The error is not due to multiple indexes, but the lack of a partition key definition in the query.
Indexing a collection
How to index collections and query the database to find a collection containing a particular value.
Collections can be indexed and queried to find a collection containing a particular value. Sets and lists are
indexed slightly differently from maps, given the key-value nature of maps.
Sets and lists can index all values found by indexing the collection column. Maps can index a map key,
map value, or map entry using the methods shown below. Multiple indexes can be created on the same
56
Using CQL
map column in a table, so that map keys, values, or entries can be queried. In addition, frozen collections
can be indexed using FULL to index the full content of a frozen collection.
Note: All the cautions about using secondary indexes apply to indexing collections.
Procedure
• For set and list collections, create an index on the column name. Create an index on a set to find all the
cyclists that have been on a particular team.
• For map collections, create an index on the map key, map value, or map entry. Create an index on a
map key to find all cyclist/team combinations for a particular year.
• Create an index on the map entries and find cyclists who are the same age. An index using ENTRIES is
only valid for maps.
57
Using CQL
• Using the same index, find cyclists from the same country.
• Create an index on the map values and find cyclists who have a particular value found in the specified
map. An index using VALUES is only valid for maps.
• Create an index on the full content of a FROZEN map. The table in this example stores the number of
Pro wins, Grand Tour races, and Classic races that a cyclist has competed in. The SELECT statement
finds any cyclist who has 39 Pro race wins, 7 Grand Tour starts, and 14 Classic starts.
In Cassandra 3.4 and later, SSTable Attached Secondary Indexes (SASI) have been introduced that
improve on the existing secondary index implementation with superior performance for queries that
previously required the use of ALLOW FILTERING. SASI is significantly less resource intensive, using
less memory, disk, and CPU. It enables querying with prefix and contains on strings, similar to the
SQL implementation of LIKE = "foo%" or LIKE = "%foo%", as shown in SELECT. It also supports
SPARSE indexing to improve performance of querying large, dense number ranges such as time series
data.
SASI takes advantage of Cassandra's write-once immutable ordered data model to build indexes when
data is flushed from the memtable to disk. The SASI index data structures are built in memory as the
SSTable is written and flushed to disk as sequential writes before the SSTable writing completes. One
index file is written for each indexed column.
58
Using CQL
SASI supports all queries already supported by CQL, and supports the LIKE operator using PREFIX,
CONTAINS, and SPARSE. If ALLOW FILTERING is used, SASI also supports queries with multiple
predicates using AND. With SASI, the performance pitfalls of using filtering are not realized because the
filtering is not performed even if ALLOW FILTERING is used.
SASI is implemented using memory mapped B+ trees, an efficient data structure for indexes. B+ trees
allow range queries to perform quickly. SASI generates an index for each SSTable. Some key features that
arise from this design are:
• SASI can reference offsets in the data file, skipping the Bloom filter and partition indexes to go directory
to where data is stored.
• When SSTables are compacted, new indexes are generated automatically.
Currently, SASI does not support collections. Regular secondary indexes can be built for collections. Static
columns are supported in Cassandra 3.6 and later.
In Cassandra 3.4 and later, a new implementation of secondary indexes, SSTable Attached Secondary
Indexes (SASI), have greatly improved the performance of secondary indexes and should be used, if
possible.
Using CQL, SSTable attached secondary indexes (SASI) can be created on a non-collection column
defined in a table. Secondary indexes are used to query a table that uses a column that is not normally
queryable, such as a non primary key column. SASI implements three types of indexes, PREFIX,
CONTAINS, and SPARSE.
Procedure
PREFIX index
• Create an index fn_prefix for the table cyclist_name on the column firstname. PREFIX is the
default mode, so it does not need to be specified.
• Queries can find exact matches for values in firstname. Note that indexing is used for this query, as
the primary key id is not specified.
59
Using CQL
• Queries can find matches for values in firstname based on partial matches. The use of LIKE
specifies that the match is looking for a word that starts with the letter "M". The % after the letter "M"
will match any characters can return a matching value. Note that indexing is used for this query, as the
primary key id is not specified.
• Many queries will fail to find matches based on the partial string. All the of the following queries will fail.
The first four queries fail because of case sensitivity. "MARIANNE" is all uppercase, whereas the stored
value is not. The next three use a lowercase "m". The placement of the % are critical; since the index
specifies the PREFIX mode, only a trailing % will yield results when coupled with LIKE. The queries with
equalities fail unless the exact match is designated.
CONTAINS index
• Create an index fn_suffix for the table cyclist_name on the column firstname. CONTAINS is
the specified mode, so that pattern matching for partial patterns given, not just in the prefix.
This query returns the same results as a query using PREFIX indexing that does an exact match using
a slightly modified query.
60
Using CQL
• Queries can find matches for values in firstname based on partial matches. The use of LIKE
specifies that the match is looking for a word that contains the letter "M". The % before and after the
letter "M" will match any characters can return a matching value. Note that indexing is used for this
query, as the primary key id is not specified.
Again, the same results are returned as for the PREFIX indexing, using a slightly modified query.
• The CONTAINS indexing has a more versatile matching algorithm than PREFIX. Look at the examples
below to see what results from variations of the last search.
Each query matches the pattern, either the final characters of the column value as in %arianne or the
characters bracketed by % such as %arian%.
• With CONTAINS indexing, even inequality pattern matching is possible. Note again the use of the
ALLOW FILTERING phrase that required but causes no latency in the query response.
The only row matching the conditions returns the same value as the last query.
• Like with PREFIX indexing, many queries will fail to find matches based on the partial string. All the of
the following queries will fail.
The first query fails due to the absence of the ALLOW FILTERING phrase. The next two queries fail
because of case sensitivity. "MariAnne" has one uppercase letter, whereas the stored value does not.
The last three fail due to placement of the %.
• Either the PREFIX index or the CONTAINS index can be created with case sensitivity by adding an
analyzer class and case_sensitive option.
61
Using CQL
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};
The analyzer_class used here is the non-tokenizing analyzer that does not perform analysis on the
text in the specified column. The option case_sensitive is set to false to make the indexing case
insensitive.
• With the addition of the analyzer class and option, the following query now also works, using a
lowercase "m".
• If queries are narrowed with an indexed column value, non-indexed columns can be specified.
Compound queries can also be created with multiple indexed columns. This example alters the table to
add a column age that is not indexed before performing the query.
SPARSE index
• The SPARSE index is meant to improve performance of querying large, dense number ranges like
timestamps for data inserted every millisecond. If the data is numeric, millions of columns values with
a small number of partition keys characterize the data, and range queries will be performed against the
index, then SPARSE is the best choice. For numeric data that does not meet this criteria, PREFIX is the
best choice.
Use SPARSE indexing for data that is sparse (every term/column value has less than 5 matching keys).
Indexing the created_at field in time series data (where there is typically few matching rows/events
per created_at timestamp) is a good use case. SPARSE indexing is primarily an optimization for
range queries, especially large ranges that span large timespans.
62
Using CQL
• To illustrate the use of the SPARSE index, create a table and insert some time series data:
• Find all the comments made before the timestamp 2013-01-01 00:05:01.500.
This query returns all the results where created_at is found to be less than the timestamp supplied.
The inequalities >=, > and <= are all valid operators.
SPARSE indexing is used only for numeric data, so LIKE queries do not apply.
Using analyzers
• Analyzers can be specified that will analyze the text in the specified column. The
NonTokenizingAnalyzer is used for cases where the text is not analyzed, but case normalization
or sensitivity is required. The StandardAnalyzer is used for analysis that involves stemming, case
normalization, case sensitivity, skipping common words like "and" and "the", and localization of the
language used to complete the analysis. Altering the table again to add a lengthier text column provides
a window into the analysis.
63
Using CQL
};
• This query will search for the presence of a designated string, using the analyzed text to return a result.
This query returns all the results where ride is found either as an exact word or as a stem for another
word - rides in this case.
An advantage of indexes is the operational ease of populating and maintaining the index. Indexes are built
in the background automatically, without blocking reads or writes. Client-maintained tables as indexes
must be created manually; for example, if the artists column had been indexed by creating a table such as
songs_by_artist, your client application would have to populate the table with data from the songs table.
To perform a hot rebuild of an index, use the nodetool rebuild_index command.
Altering a table
How to alter a table to add or delete columns or change table properties.
The ALTER TABLE command can be used to add new columns to a table and to alter the column type of
an existing column.
Procedure
• Add a age column of type int to the table cycling.cyclist_alt_stats.
This creates the column metadata and adds the column to the table schema, and sets the value to
NULL for all rows.
• Add a column favorite_color of varchar, and then change the data type of the same column to text.
64
Using CQL
Note: There are limitations on altering the data type of a column. The two data types, the original and
the one changing to, must be compatible.
The ALTER TABLE command can be used to add new collection columns to a table and to alter the
column type of an existing column.
Procedure
• Alter the table cycling.upcoming_calendar to add a collection map description that can store a
description for each race listed.
Using ALTER TABLE, you can change the data type of a column after it is defined or added to a table.
Procedure
Change the favorite_color column to store as text instead of varchar by changing the data type
of the column.
Only newly inserted values will be created with the new type. However, the data type before must be
compatible with the new data type specified.
Using ALTER TABLE, you can change the table properties of a table.
65
Using CQL
Procedure
Alter a table to change the caching properties.
In Cassandra 3.0 and later, a materialized view has table properties like base tables. The ALTER
MATERIALIZED VIEW command can be used to alter the view's properties, specifying the properties using
a WITH clause. Materialized views do not perform repair, so properties regarding repair are invalid.
Procedure
• Alter a materialized view to change the caching properties.
The ALTER TYPE command can be used to add new columns to a user-defined type and to alter the data
type of an existing column in a user-defined type.
Procedure
• Add a middlename column of type text to the user-defined type cycling.fullname.
This creates the column metadata and adds the column to the type schema. To verify, display the table
ssystem.schema_usertypes.
• A column can be renamed in either ALTER TABLE or ALTER TYPE. In ALTER TABLE, only primary key
columns may be renamed.
66
Using CQL
To remove data, you can set column values for automatic removal using the TTL (time-to-expire) table
attribute. You can also drop a table or keyspace, and delete keyspace column metadata.
Procedure
• Drop the test keyspace.
CQL provides the DELETE command to delete a column or row. Deleted values are removed completely
by the first compaction following deletion.
Procedure
1. Deletes the values of the column lastname from the table cyclist_name.
2. Delete entire row for a particular race from the table calendar.
67
Using CQL
Procedure
Drop the fLog() function. The conditional option IF EXISTS can be included.
Securing a table
How to secure a table.
Data in CQL tables is secured using either user or role-based security commands.
cassandra.yaml settings must be changed in order to use authentication and authorization. Change the
following settings:
authenicator: PasswordAuthenticator
authorizer: CassandraAuthorizer
Database users
How to create and work with users.
Procedure
• Create a user with a password. IF NOT EXISTS is included to ensure a previous user definition is not
overwritten.
68
Using CQL
Database roles
How to create and work with roles.
Roles-based access control is available in Cassandra 2.2 and later. Roles enable authorization
management on a larger scale than security per user can provide. A role is created and may be granted to
other roles. Hierarchical sets of permissions can be created. For more information, see Role Based Access
Control in Cassandra.
Procedure
• Create a role with a password. IF NOT EXISTS is included to ensure a previous role definition is not
overwritten.
69
Using CQL
• Alter a role to change options. A role with SUPERUSER status can alter the SUPERUSER status of
another role, but not the role currently held. PASSWORD, LOGIN, and SUPERUSER can be modified with
ALTER ROLE. To modify properties of a role, the user must have ALTER permission.
Note: NORECURSIVE is an option to discover all roles directly granted to a user. Without
NORECURSIVE, transitively acquired roles are also listed.
• Revoke role that was previously granted to a user or a role. Any permission that derives from the role is
revoked.
Database Permissions
How to set role permissions.
Authentication and authorization should be set based on roles, rather than users. Authentication and
authorization for Cassandra 2.2 and later are based on roles, and user commands are included only for
legacy backwards compatibility.
Roles may be granted to other roles to create hierarchical permissions structures; in these hierarchies,
permissions and SUPERUSER status are inherited, but the LOGIN privilege is not.
Permissions can be granted at any level of the database hierarchy and flow downwards. Keyspaces and
tables are hierarchical as follows: ALL KEYSPACES > KEYSPACE > TABLE. Functions are hierarchical in
the following manner: ALL FUNCTIONS > KEYSPACE > FUNCTION. ROLES can also be hierarchical and
encompass other ROLES. Permissions can be granted on:
• CREATE - keyspace, table, function, role, index
• ALTER - keyspace, table, function, role
70
Using CQL
Procedure
• The first line grants anyone with the team_manager role the ability to INSERT, UPDATE, DELETE,
and TRUNCATE any table in the keyspace cycling. The second line grants anyone with the sys_admin
role the ability to view all roles in the database.
• Grant permission to drop all functions, including aggregate in the current keyspace.
What to do next
See the Apache Cassandra CQL document for more details.
In a distributed system such as Cassandra, the most recent value of data is not necessarily on every node
all the time. The client application configures the consistency level per request to manage response time
versus data accuracy. By tracing activity on a five-node cluster, this tutorial shows the difference between
these consistency levels and the number of replicas that participate to satisfy a request:
• ONE
Returns data from the nearest replica.
• QUORUM
71
Using CQL
To setup five nodes on your local computer, trace reads at different consistency levels, and then compare
the results. This example uses ccm, a tool for running multiple nodes of Cassandra on a local computer.
Procedure
1. Get the ccm library of scripts from github.
You will use this library in subsequent steps to perform the following actions:
• Download Apache Cassandra source code.
• Create and launch an Apache Cassandra cluster on a single computer.
Refer to the ccm README for prerequisites.
2. Optional: For Mac computers, set up loopback aliases. All other platforms, skip this step.
$ ccm populate -n 5
$ ccm start
6. Check that the cluster is up:
72
Using CQL
After performing the setup steps, run and trace queries that read data at different consistency levels. The
tracing output shows that using three replicas on a five-node cluster, a consistency level of ONE processes
responses from one of three replicas, QUORUM from two of three replicas, and ALL from three of three
replicas.
Procedure
1. On the cqlsh command line, create a keyspace that specifies using three replicas for data distribution in
the cluster.
3. Turn on tracing and use the CONSISTENCY command to check that the consistency level is ONE, the
default.
id | col1 | col2
----+------+------
0 | 0 | 0
(1 rows)
activity
| timestamp | source | source_elapsed
------------------------------------------------------------------------------
+----------------------------+-----------+----------------
Execute CQL3
query | 2016-04-12 16:50:55.461000 | 127.0.0.1 | 0
Parsing SELECT * FROM cycling_alt.tester WHERE id = 0; [SharedPool-
Worker-1] | 2016-04-12 16:50:55.462000 | 127.0.0.1 | 276
Preparing statement [SharedPool-
Worker-1] | 2016-04-12 16:50:55.462000 | 127.0.0.1 | 509
Executing single-partition query on tester [SharedPool-
Worker-2] | 2016-04-12 16:50:55.463000 | 127.0.0.1 | 1019
73
Using CQL
The tracing results list all the actions taken to complete the SELECT statement.
5. Change the consistency level to QUORUM to trace what happens during a read with a QUORUM
consistency level.
id | col1 | col2
----+------+------
0 | 0 | 0
(1 rows)
activity
| timestamp | source | source_elapsed
------------------------------------------------------------------------------
+----------------------------+-----------+----------------
Execute CQL3
query | 2016-04-12 16:50:42.831000 | 127.0.0.1 | 0
Parsing SELECT * FROM cycling_alt.tester WHERE id = 0; [SharedPool-
Worker-1] | 2016-04-12 16:50:42.831000 | 127.0.0.1 | 259
Preparing statement [SharedPool-
Worker-1] | 2016-04-12 16:50:42.831000 | 127.0.0.1 | 557
Executing single-partition query on tester [SharedPool-
Worker-3] | 2016-04-12 16:50:42.832000 | 127.0.0.1 | 1076
Acquiring sstable references [SharedPool-
Worker-3] | 2016-04-12 16:50:42.832000 | 127.0.0.1 | 1182
Merging memtable contents [SharedPool-
Worker-3] | 2016-04-12 16:50:42.832000 | 127.0.0.1 | 1268
Read 1 live and 0 tombstone cells [SharedPool-
Worker-3] | 2016-04-12 16:50:42.832000 | 127.0.0.1 | 1632
Request
complete | 2016-04-12 16:50:42.832887 | 127.0.0.1 | 1887
6. Change the consistency level to ALL and run the SELECT statement again.
id | col1 | col2
----+------+------
0 | 0 | 0
(1 rows)
activity
| timestamp | source | source_elapsed
74
Using CQL
------------------------------------------------------------------------------
+----------------------------+-----------+----------------
Execute CQL3
query | 2016-04-12 16:51:06.427000 | 127.0.0.1 | 0
Parsing SELECT * FROM cycling_alt.tester WHERE id = 0; [SharedPool-
Worker-1] | 2016-04-12 16:51:06.427000 | 127.0.0.1 | 324
Preparing statement [SharedPool-
Worker-1] | 2016-04-12 16:51:06.427000 | 127.0.0.1 | 524
Read-repair DC_LOCAL [SharedPool-
Worker-1] | 2016-04-12 16:51:06.428000 | 127.0.0.1 | 1016
Executing single-partition query on tester [SharedPool-
Worker-3] | 2016-04-12 16:51:06.428000 | 127.0.0.1 | 1793
Acquiring sstable references [SharedPool-
Worker-3] | 2016-04-12 16:51:06.429000 | 127.0.0.1 | 1886
Merging memtable contents [SharedPool-
Worker-3] | 2016-04-12 16:51:06.429000 | 127.0.0.1 | 1951
Read 1 live and 0 tombstone cells [SharedPool-
Worker-3] | 2016-04-12 16:51:06.429000 | 127.0.0.1 | 2176
Request
complete | 2016-04-12 16:51:06.429391 | 127.0.0.1 | 2391
Changing the consistency level can affect read performance. The tracing output shows that as you change
the consistency level from ONE to QUORUM to ALL, performance degrades in from 1714 to 1887 to 2391
microseconds, respectively. If you follow the steps in this tutorial, it is not guaranteed that you will see the
same trend because querying a one-row table is a degenerate case, used for example purposes. The
difference between QUORUM and ALL is slight in this case, so depending on conditions in the cluster,
performance using ALL might be faster than QUORUM.
Under the following conditions, performance using ALL is worse than QUORUM:
• The data consists of thousands of rows or more.
• One node is slower than others.
• A particularly slow node was not selected to be part of the quorum.
The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner
and Murmur3Partitioner distribute tokens in a completely unordered manner. When using the
RandomPartitioner or Murmur3Partitioner, Cassandra rows are ordered by the hash of their partition
key, or for one partition queries, rows are ordered by their clustering key. Hence, the order of rows is not
meaningful, because of the hashes generated.
75
Using CQL
To order the rows for display when using RandomPartitioner or Murmur3Partitioner, the token function may
be used. However, ordering with the TOKEN function does not always provide the expected results. Use the
TOKEN function to express a conditional relation on a partition key column. In this case, the query returns
rows based on the token of the partition key rather than on the value.
The TOKEN function can also be used to select a range of partitions for a ByteOrderedPartitioner. Using the
TOKEN function with ByteOrderedPartitioner will generally yield expected results.
The type of the arguments to the TOKEN function depends on the type of the columns used as the
argument of the function. The return type depends on the partitioner in use:
• Murmur3Partitioner, bigint
• RandomPartitioner, varint
• ByteOrderedPartitioner, blob
Procedure
• Select data based on a range of tokens of a particular column value.
• The results will not always be consistent with expectations, because the token function actually queries
directly using tokens. Underneath, the token function uses token-based comparisons and does not
convert year to token (not year > '2015-05-26').
76
Using CQL
• Tokens and partition keys can be mixed in conditional statements. The results will not always be
straightforward, but they are not unexpected if you understand what the TOKEN function does.
To set the TTL for data, use the USING TTL keywords. The TTL function may be used to retrieve the TTL
information.
The USING TTL keywords can be used to insert data into a table for a specific duration of time. To
determine the current time-to-live for a record, use the TTL function.
77
Using CQL
Procedure
• Insert data into the table cycling.calendar and use the USING TTL clause to set the expiration period to
86400 seconds.
If you repeat this step after some time, the time-to-live value will decrease.
• The time-to-live value can also be updated with the USING TTL keywords in an UPDATE command.
78
Using CQL
A table contains a timestamp representing the date/time that a write occurred to a column. Using the
WRITETIME function in a SELECT statement returns the date/time that the column was written to the
database. The output of the function is microseconds except in the case of Cassandra 2.1 counter
columns. Counter column writetime is milliseconds. This procedure continues the example from the
previous procedure and calls the WRITETIME function to retrieve the date/time of the writes to the
columns.
Procedure
• Retrieve the date/time that the value Paolo was written to the firstname column in the table
cyclist_points. Use the WRITETIME function in a SELECT statement, followed by the name of a column
in parentheses:
Note: The writetime output in microseconds converts to Wed, 24 Jun 2015 01:12:05 GMT.
Legacy tables
How to work with legacy tables.
Legacy tables must be handled differently from currently built CQL tables.
Internally, CQL does not change the row and column mapping from the Thrift API mapping. CQL and
Thrift use the same storage engine. CQL supports the same query-driven, denormalized data modeling
principles as Thrift. Existing applications do not have to be upgraded to CQL. The CQL abstraction layer
makes CQL easier to use for new applications. For an in-depth comparison of Thrift and CQL, see "A Thrift
to CQL Upgrade Guide" and CQL for Cassandra experts.
79
Using CQL
Compact storage stores an entire row in a single column on disk instead of storing each non-primary key
column in a column that corresponds to one column on disk. Using compact storage prevents you from
adding new columns that are not part of the PRIMARY KEY.
Using CQL, you can query a legacy table. A legacy table managed in CQL includes an implicit WITH
COMPACT STORAGE directive. When you use CQL to query legacy tables with no column names defined
for data within a partition, Cassandra generates the names (column1 and value1) for the data. Using the
RENAME clause, you can change the default column name to a more meaningful name.
CQL supports dynamic tables created in the Thrift API, CLI, and earlier CQL versions. For example, a
dynamic table is represented and queried like this:
80
CQL reference
CQL reference
CQL reference topics.
Introduction
About using CQL.
All of the commands included in the CQL language are available on the cqlsh command line. There are
a group of commands that are available on the command line, but are not support by the CQL language.
These commands are called cqlsh commands. You can run cqlsh commands from the command line
only. You can run CQL commands in a number of ways.
This reference covers CQL and cqlsh based on the CQL specification 3.3.
CQL input consists of statements. Like SQL, statements change data, look up data, store data, or change
the way data is stored. Statements end in a semicolon (;).
For example, the following is valid CQL syntax:
UPDATE MyTable
SET SomeColumn = 'SomeValue'
WHERE columnName = B70DE1D0-9908-4AE3-BE34-5573E5B09F14;
This is a sequence of two CQL statements. This example shows one statement per line, although a
statement can usefully be split across lines as well.
Identifiers created using CQL are case-insensitive unless enclosed in double quotation marks. If you enter
names for these objects using any uppercase letters, Cassandra stores the names in lowercase. You can
force the case by using double quotation marks. For example:
The following table shows partial queries that work and do not work to return results from the test table:
81
CQL reference
SELECT "foo" FROM ... works because internally, Cassandra stores foo in lowercase. The double-
quotation mark character can be used as an escape character for the double quotation mark.
Case sensitivity rules in earlier versions of CQL apply when handling legacy tables.
CQL keywords are case-insensitive. For example, the keywords SELECT and select are equivalent. This
document shows keywords in uppercase.
This is not true when the names are double-quoted. We only enforce this for keyspaces and tables
because the names are used in filenames.
Keyspace and table names created using CQL can only contain alphanumeric and underscore characters
because these identifiers are used in Cassandra filenames. Other identifiers, such as columns, user-
defined data type names and field names, user-defined function names, and user-defined aggregate
names created using CQL can only contain alphanumeric and underscore characters if not enclosed in
double quotation marks. If double quotation marks are used, any characters are valid for all identifiers
except keyspace and table names. If you enter names for these objects using anything other than
alphanumeric characters or underscores, Cassandra will issue an invalid syntax message and fail to create
the object.
Escaping characters
Using single and double quotation marks in CQL.
Column names that contain characters that CQL cannot parse need to be enclosed in double quotation
marks in CQL.
Dates, IP addresses, and strings need to be enclosed in single quotation marks. To use a single quotation
mark itself in a string literal, escape it using a single quotation mark.
82
CQL reference
An alternative is to use dollar-quoted strings. Dollar-quoted string constants can be used to create
functions, insert data, and select data when complex quoting is needed. Use double dollar signs to enclose
the desired string.
Valid literals
Values and definitions of valid literals.
'-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?
83
CQL reference
Exponential notation
Cassandra supports exponential notation.
Cassandra supports exponential notation. This example shows exponential notation in the output from a
cqlsh command.
id | value_double | value_float
-------+--------------+-------------
test1 | -2.6034e+38 | -2.6034e+38
You can use the following notation to include comments in CQL code:
• Double hyphen
-- Single-line comment
• Double forward slash
//Single-line comment
• Forward slash asterisk
/* Multi-line comment */
CQL Keywords
Table of keywords and whether or not the words are reserved.
This table lists keywords and whether or not the words are reserved. A reserved keyword cannot be used
as an identifier unless you enclose the word in double quotation marks. Non-reserved keywords have a
specific meaning in certain context but can be used as an identifier outside this context.
Table: Keywords
Keyword Reserved
ADD yes
AGGREGATE yes
ALL no
ALLOW yes
ALTER yes
AND yes
ANY yes
84
CQL reference
Keyword Reserved
APPLY yes
AS no
ASC yes
ASCII no
AUTHORIZE yes
BATCH yes
BEGIN yes
BIGINT no
BLOB no
BOOLEAN no
BY yes
CLUSTERING no
COLUMNFAMILY yes
COMPACT no
CONSISTENCY no
COUNT no
COUNTER no
CREATE yes
CUSTOM no
DECIMAL no
DELETE yes
DESC yes
DISTINCT no
DOUBLE no
DROP yes
EACH_QUORUM yes
ENTRIES yes
EXISTS no
FILTERING no
FLOAT no
FROM yes
FROZEN no
FULL yes
GRANT yes
85
CQL reference
Keyword Reserved
IF yes
IN yes
INDEX yes
INET yes
INFINITY yes
INSERT yes
INT no
INTO yes
KEY no
KEYSPACE yes
KEYSPACES yes
LEVEL no
LIMIT yes
LIST no
LOCAL_ONE yes
LOCAL_QUORUM yes
MAP no
MATERIALIZED yes
MODIFY yes
NAN yes
NORECURSIVE yes
NOSUPERUSER no
NOT yes
OF yes
ON yes
ONE yes
ORDER yes
PARTITION yes
PASSWORD yes
PER yes
PERMISSION no
PERMISSIONS no
PRIMARY yes
QUORUM yes
86
CQL reference
Keyword Reserved
RENAME yes
REVOKE yes
SCHEMA yes
SELECT yes
SET yes
STATIC no
STORAGE no
SUPERUSER no
TABLE yes
TEXT no
TIME yes
TIMESTAMP no
TIMEUUID no
THREE yes
TO yes
TOKEN yes
TRUNCATE yes
TTL no
TUPLE no
TWO yes
TYPE no
UNLOGGED yes
UPDATE yes
USE yes
USER no
USERS no
USING yes
UUID no
VALUES no
VARCHAR no
VARINT no
VIEW yes
WHERE yes
WITH yes
87
CQL reference
Keyword Reserved
WRITETIME no
CQL defines built-in data types for columns. The counter type is unique.
88
CQL reference
In addition to the CQL types listed in this table, you can use a string containing the name of a JAVA class
(a sub-class of AbstractType loadable by Cassandra) as a CQL type. The class name should either be fully
qualified or relative to the org.apache.cassandra.db.marshal package.
Enclose ASCII text, timestamp, and inet values in single quotation marks. Enclose names of a keyspace,
table, or column in double quotation marks.
Java types
The Java types, from which most CQL types are derived, are obvious to Java programmers. The derivation
of the following types, however, might not be obvious:
89
CQL reference
Clustering columns have even stricter requirements, because clustering columns mandate the order
in which data is written to disk. The following table shows the allow alterations for data types used in
clustering columns:
Blob type
Cassandra blob data type represents a constant hexadecimal number.
The Cassandra blob data type represents a constant hexadecimal number defined as 0[xX](hex)+ where
hex is an hexadecimal character, such as [0-9a-fA-F]. For example, 0xcafe. The maximum theoretical size
for a blob is 2GB. The practical limit on blob size, however, is less than 1 MB, ideally even smaller. A blob
type is suitable for storing a small image or short string.
user_name | bio
-----------+--------------------
fred | 0x0000000000000003
90
CQL reference
user_name | bio | id
-----------+--------------------+----
fred | 0x0000000000000003 | 3
Collection type
A collection column is declared using the collection type, followed by another type.
A collection column is declared using the collection type, followed by another type, such as int or text, in
angle brackets. For example, you can create a table having a list of textual elements, a list of integers, or a
list of some other element types.
list<text>
list<int>
Collection types cannot be nested, but frozen collection types can be nested inside frozen or non-frozen
collections. For example, you may define a list within a list, provided the inner list is frozen:
list<frozen <list<int>>>
For example:
Counter type
A counter column value is a 64-bit signed integer.
A counter column value is a 64-bit signed integer. You cannot set the value of a counter, which supports
two operations: increment and decrement.
Use counter types as described in the "Using a counter" section. Do not assign this type to a column
that serves as the primary key or partition key. Also, do not use the counter type in a table that contains
anything other than counter types and the primary key. To generate sequential numbers for surrogate
keys, use the timeuuid type instead of the counter type. You cannot create an index on a counter column
or set data in a counter column to expire using the Time-To-Live (TTL) property.
91
CQL reference
The UUID (universally unique id) comparator type is used to avoid collisions in column names.
Alternatively, you can use the timeuuid.
Timeuuid types can be entered as integers for CQL input. A value of the timeuuid type is a Type 1 UUID.
A Version 1 UUID includes the time of its generation and are sorted by timestamp, making them ideal
for use in applications requiring conflict-free timestamps. For example, you can use this type to identify
a column (such as a blog entry) by its timestamp and allow multiple clients to write to the same partition
key simultaneously. Collisions that would potentially overwrite data that was not intended to be overwritten
cannot occur.
A valid timeuuid conforms to the timeuuid format shown in valid literals.
Cassandra 2.0.7 and later includes the uuid() function. This function takes no parameters and generates a
random Type 4 UUID suitable for use in INSERT or SET statements.
Several timeuuid functions are designed for use with the timeuuid type:
• dateOf()
Used in a SELECT clause, this function extracts the timestamp of a timeuuid column in a result set. This
function returns the extracted timestamp as a date. Use unixTimestampOf() to get a raw timestamp.
• now()
In the coordinator node, generates a new unique timeuuid in milliseconds when the statement is
executed. The timestamp portion of the timeuuid conforms to the UTC (Universal Time) standard. This
method is useful for inserting values. The value returned by now() is guaranteed to be unique.
• minTimeuuid() and maxTimeuuid()
Returns a UUID-like result given a conditional time component as an argument. For example:
The min/maxTimeuuid example selects all rows where the timeuuid column, t, is strictly
later than 2013-01-01 00:05+0000 but strictly earlier than 2013-02-02 10:00+0000. The t >=
maxTimeuuid('2013-01-01 00:05+0000') does not select a timeuuid generated exactly at 2013-01-01
00:05+0000 and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').
The values returned by minTimeuuid and maxTimeuuid functions are not true UUIDs in that the
values do not conform to the Time-Based UUID generation process specified by the RFC 4122. The
results of these functions are deterministic, unlike the now() function.
• unixTimestampOf()
Used in a SELECT clause, this functions extracts the timestamp in milliseconds of a timeuuid column in
a result set. Returns the value as a raw, 64-bit integer timestamp.
Cassandra 2.2 and later support some additional timeuuid and timestamp functions to manipulate dates.
The functions can be used in INSERT, UPDATE, and SELECT statements.
• toDate(timeuuid)
Converts timeuuid to date in YYYY-MM-DD format.
• toTimestamp(timeuuid)
Converts timeuuid to timestamp format.
• toUnixTimestamp(timeuuid)
92
CQL reference
Timestamp type
A timestamp type can be entered as an integer for CQL input, or as a string literal in ISO 8601 formats.
Values for the timestamp type are encoded as 64-bit signed integers representing a number of
milliseconds since the standard base time known as the epoch: January 1 1970 at 00:00:00 GMT. A
timestamp type can be entered as an integer for CQL input, or as a string literal in any of the following ISO
8601 formats:
yyyy-mm-dd HH:mm
yyyy-mm-dd HH:mm:ss
yyyy-mm-dd HH:mmZ
yyyy-mm-dd HH:mm:ssZ
yyyy-mm-dd'T'HH:mm
yyyy-mm-dd'T'HH:mmZ
yyyy-mm-dd'T'HH:mm:ss
yyyy-mm-dd'T'HH:mm:ssZ
yyyy-mm-dd'T'HH:mm:ss.ffffffZ
yyyy-mm-dd
yyyy-mm-ddZ
where Z is the RFC-822 4-digit time zone, expressing the time zone's difference from UTC. For example,
for the date and time of Jan 2, 2003, at 04:05:00 AM, GMT:
2011-02-03 04:05+0000
2011-02-03 04:05:00+0000
93
CQL reference
2011-02-03T04:05+0000
2011-02-03T04:05:00+0000
In Cassandra 3.4 and later, timestamps are displayed in cqlsh in sub-second precision by default, as
shown below. Applications reading a timestamp may use the sub-second portion of the timestamp, as
Cassandra stored millisecond-precision timestamps in all versions.
%Y-%m-%d %H:%M:%S.%f%z
If no time zone is specified, the time zone of the Cassandra coordinator node handing the write request is
used. For accuracy, DataStax recommends specifying the time zone rather than relying on the time zone
configured on the Cassandra nodes.
If you only want to capture date values, you can also omit the time of day. For example:
2011-02-03
2011-02-03+0000
In this case, the time of day defaults to 00:00:00 in the specified or default time zone.
Timestamp output appears in the following format by default in cqlsh:
yyyy-mm-dd HH:mm:ssZ
You can change the format by setting the time_format property in the [ui] section of the cqlshrc file.
[ui]
time_format = %Y-%m-%d %H:%M
Tuple type
You can use a tuple as a convenient alternative to a user-defined type.
The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a
user-defined type. A tuple can accommodate many fields (32768), more than can be prudently used.
Typically, create a tuple with a few fields.
In the table creation statement, use angle brackets and a comma delimiter to declare the tuple component
types. Surround tuple values in parentheses to insert the values into a table, as shown in this example.
k | v
---+-----------------
0 | (3, 'bar', 2.1)
94
CQL reference
k | v
---+-----------------
0 | (3, 'bar', 2.1)
CREATE TABLE nested (k int PRIMARY KEY, t tuple <int, tuple<text, double>>);
User-defined type
A user-defined type facilitates handling multiple fields of related information in a table.
A user-defined type facilitates handling multiple fields of related information in a table. Applications that
required multiple tables can be simplified to use fewer tables by using a user-defined type to represent
the related fields of information instead of storing the information in a separate table. The address type
example demonstrates how to use a user-defined type.
You can create, alter, and drop a user-defined type using these commands:
• CREATE TYPE
• ALTER TYPE
• DROP TYPE
The cqlsh utility includes these commands for describing a user-defined type or listing all user-defined
types:
• DESCRIBE TYPE
• DESCRIBE TYPES
The scope of a user-defined type is the keyspace in which you define it. Use dot notation to access a type
from a keyspace outside its scope: keyspace name followed by a period followed the name of the type. An
example is test.myType where test is the keyspace name and myType is the type name. Cassandra
accesses the type in the specified keyspace, but does not change the current keyspace; otherwise, if you
do not specify a keyspace, Cassandra accesses the type within the current keyspace.
The CQL WITH clause specifies keyspace and table properties in these CQL commands:
• ALTER KEYSPACE
• ALTER TABLE
• CREATE KEYSPACE
• CREATE TABLE
95
CQL reference
Table properties
A list of CQL table properties and their syntax.
CQL supports Cassandra table properties, such as comments and compaction options, listed in the
following table.
In CQL commands, such as CREATE TABLE, you format properties in either the name-value pair or
collection map format. The name-value pair property syntax is:
The collection map format, used by compaction and compression properties, is:
comment N/A A human readable comment describing the table. See comments
below.
compaction Sets the compaction strategy for the table. See compaction below.
SizeTieredCompactionStrategy
compression LZ4CompressorThe compression algorithm to use. Valid values
are LZ4Compressor), SnappyCompressor, and
DeflateCompressor. See compression below.
96
CQL reference
Bloom filter
The Bloom filter property is the desired false-positive probability for SSTable Bloom filters. When data is
requested, the Bloom filter checks if the row exists before doing disk I/O. Bloom filter property value ranges
from 0 to 1.0. Lower Bloom filter property probabilities result in larger Bloom filters that use more memory.
The effects of the minimum and maximum values are:
• 0: Enables the unmodified, effectively the largest possible, Bloom filter.
• 1.0: Disables the Bloom filter.
The recommended setting is 0.1. A higher value yields diminishing returns.
caching
Caching optimizes the use of cache memory without manual tuning. You set table properties to
configure caching when you create or alter the table. Cassandra weighs the cached data by size and
access frequency. After configuring the caching table property, configure the global caching properties
97
CQL reference
in the cassandra.yaml file. For information about global caching properties, see Cassandra 3.0
documentation.
Configure the cache by creating a property map of values for the caching property. Options are:
• keys: ALL or NONE
• rows_per_partition: number of CQL rows (N), NONE, or ALL
For rows_per_partition, only the first N rows in a partition can be cached, as determined by the clustering
order.
For example:
comments
Comments can be used to document CQL statements in your application code. Single line comments can
begin with a double dash (--) or a double slash (//) and extend to the end of the line. Multi-line comments
can be enclosed in /* and */ characters.
compaction
The compaction property defines the compaction strategy class to use. Choose the compaction strategy
that best fits your data and environment:
Note: For more guidance, see the When to Use Leveled Compaction, Leveled Compaction in Apache
Cassandra blog.
• SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers
a minor compaction when there are a number of similar sized SSTables on disk as configured by the
table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace.
Also see STCS compaction subproperties.
• DateTieredCompactionStrategy (DTCS): Available in Cassandra 2.0.11 and 2.1.1 and later.
This strategy is particularly useful for time series data. DateTieredCompactionStrategy stores data
written within a certain period of time in the same SSTable. For example, Cassandra can store your last
hour of data in one SSTable time window, and the next 4 hours of data in another time window, and so
on. Compactions are triggered when the min_threshold (4 by default) for SSTables in those windows
is reached. The most common queries for time series workloads retrieve the last hour/day/month of
data. Cassandra can limit SSTables returned to those having the relevant data. Also, Cassandra can
store data that has been set to expire using TTL in an SSTable with other data scheduled to expire at
approximately the same time. Cassandra can then drop the SSTable without doing any compaction.
Also see DTCS compaction subproperties.
• LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a
fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables
are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the
previous. Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are
continuously being compacted into progressively larger levels. At each level, row keys are merged
into non-overlapping SSTables. This can improve performance for reads, because Cassandra can
determine which SSTables in each level to check for the existence of row key data. This compaction
strategy is modeled after Google's leveldb implementation. Also see LCS compaction subproperties.
Hybrid (leveled and size-tiered) compaction improvements to the leveled compaction strategy reduce the
performance overhead on read operations when compaction cannot keep pace with write-heavy workload.
98
CQL reference
When using the LCS, if Cassandra cannot keep pace with the workload, the compaction strategy switches
to STCS until Cassandra catches up. For this reason, it is a best practice to configure the max_threshold
subproperty for a table to use when the switch occurs.
You can specify a custom strategy. Use the full class name as a string constant.
compression
To configure compression, choose the LZ4Compressor, SnappyCompressor, or DeflateCompressor
property to use in creating or altering a table. Use an empty string ('') to disable compression, as shown in
the example of how to use subproperties. Choosing the right compressor depends on your requirements
for space savings over read performance. LZ4 is fastest to decompress, followed by Snappy, then
by Deflate. Compression effectiveness is inversely correlated with decompression speed. The extra
compression from Deflate or Snappy is not enough to make up for the decreased performance for general-
purpose workloads, but for archival data they may be worth considering. Developers can also implement
custom compression classes using the org.apache.cassandra.io.compress.ICompressor
interface. Specify the full class name enclosed in single quotation marks. Also use the compression
subproperties.
speculative retry
Sends a redundant request to minimize the latency experienced on read requests. Sending a new read
request is not done to meet the consistency level, but to optimize the latency, such as the 99th percentile
latencies. Choose one of the following values:
• ALWAYS: Send new read request to all replicas that have not received the read request yet.
• Xpercentile: Send new read request to a replica that have not received the read request previously, if
the coordinator has not received a response after some number of milliseconds that correspond to the
percentile latency set.
• Yms: Send new read request to all replicas that have not received the read request yet after specified
milliseconds.
• NONE: Do not send new read request to any replicas.
Using the speculative retry property, you can configure rapid read protection in Cassandra 2.0.2 and later.
Use this property to retry a request after some milliseconds have passed or after a percentile of the typical
read latency has been reached, which is tracked per table. For example:
Or:
Related information
Cassandra 2.0 cassandra.yaml
Cassandra 2.1 cassandra.yaml
Cassandra 2.2 cassandra.yaml
Cassandra 3.0 cassandra.yaml
Compaction subproperties
Constructing a map of the compaction property and its subproperties.
99
CQL reference
100
CQL reference
101
CQL reference
Disabling background compaction can prove harmful, as disk space is never regained and zombies,
tombstones that rewrite data as new data, are possible. In most cases, although compaction uses I/O, it is
better to leave it enabled.
Compression subproperties
Configuring compression for a table.
Using CQL, you can configure compression for a table by constructing a map of the compaction property
and the following subproperties:
102
CQL reference
sstable_compression
The compression algorithm to use. Valid values are LZ4Compressor SnappyCompressor, and
DeflateCompressor. Use an empty string ('') to disable compression:
Choosing the right compressor depends on your requirements for space savings over read performance.
LZ4 is fastest to decompress, followed by Snappy, then by Deflate. Compression effectiveness is inversely
correlated with decompression speed. The extra compression from Deflate or Snappy is not enough
to make up for the decreased performance for general-purpose workloads, but for archival data they
may be worth considering. Developers can also implement custom compression classes using the
org.apache.cassandra.io.compress.ICompressor interface. Specify the full class name as a
"string constant".
Functions
CQL support functions for transforming one or more column values into a new value.
CQL supports several functions that transform one or more column values into a new value. In addition,
users can define functions and aggregates. The native Cassandra functions are:
• Blob conversion functions
• UUID and Timeuuid functions
• Token function
• WRITETIME function
• TTL function
• Standard aggregate functions such as MIN(), MAX(), SUM(), and AVG().
• TOKEN function
CQL limits
Upper CQL limits.
cqlsh commands
The cqlsh commands.
103
CQL reference
cqlsh
Start the CQL interactive terminal.
Synopsis
cqlsh [options] [host [port]]
Description
The Cassandra installation includes the cqlsh utility, a python-based command line client for executing
Cassandra Query Language (CQL) commands. The cqlsh command is used on the Linux or Windows
command line to start the cqlsh utility. On Windows, the keyword python is used if the PATH environment
variable does not point to the python installation.
You can use cqlsh to execute CQL commands interactively. cqlsh supports tab completion. You can also
execute cqlsh commands, such as TRACE.
Requirements
The cqlsh utility uses the native protocol and the Datastax python driver. The default cqlsh listen port is
9042.
For more information about configuration, see the Cassandra 3.0 cassandra.yaml or Cassandra 2.2
cassandra.yaml file.
Options
Table: Options
104
CQL reference
bin/cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.3.0 | CQL spec 3.4.0 | Native protocol v4]
Use HELP for help.
cqlsh> USE testcql;
cqlsh:testcql>
At the cqlsh prompt, type CQL commands. Use a semicolon to terminate a command. A new line does not
terminate a command, so commands can be spread over several lines for clarity.
If a command is sent and executed successfully, results are sent to standard output.
The lexical structure of commands, covered earlier in this reference, includes how upper- and lower-case
literals are treated in commands, when to use quotation marks in strings, and how to enter exponential
notation.
105
CQL reference
The cqlshrc file can pass default configuration information to cqlsh. Use the cqlshrc file to configure
SSL encryption instead of overriding the SSL_CERTFILE environmental variables repeatedly. To help
you create this file, Cassandra includes a cqlshrc.sample file. Copy this file to the hidden .cassandra
directory in your user home folder and rename it to cqlshrc.
cqlshrc options
Configure the cqlshrc file using these options:
• [authentication] options
• [ui] options
• [cql] option
• [connection] options
• [csv] options
• [tracing] options
• [ssl] options
• [certfiles] options
• Common COPY TO and COPY FROM options
• COPY TO options
• COPY FROM options
• COPY options
[authentication] options
Note: Cassandra internal authentication must be configured before users can use the authentication
options. For more information, see Using cqlsh with SSL encryption for your version:
• Apache Cassandra 3.0
• DataStax Distribution of Apache Cassandra 3.x
• Apache Cassandra 3.0 for Windows
106
CQL reference
107
CQL reference
field_size_limit
Configures the cqlsh field size. Set to a particular field size, such as field_size_limit = 1000000000.
[tracing] option
max_trace_wait
The maximum number of seconds to wait for a trace to complete.
[ssl] options
certfile
The path to the cassandra certificate. See Using cqlsh with SSL encryption in the Cassandra documentation
(links above).
validate
Optional. Default: true.
userkey
Must be provided when require_client_auth=true in cassandra.yaml.
usercert
Must be provided when require_client_auth=true in cassandra.yaml.
[certfiles] options
Overrides default certfiles in [ssl] section.
Common COPY TO and COPY FROM options
Also see the Common COPY options for TO and FROM table.
nullval
The string placeholder for null values.
header
For COPY TO, controls whether the first line in the CSV output file contains the column names.
For COPY FROM, specifies whether the first line in the CSV file contains column names.
decimalsep
Set a separator for decimal values.
thousandssep
Set a separator for thousands digit groups. Default: empty string.
boolstyle
Set a representation for boolean values for True and False. The values are case insensitive. Example:
yes,no or 1,0.
numprocesses
Set the number of worker processes. Maximum value is 16.
maxattempts
Set the maximum number of attempts for errors.
reportfrequency
Set the frequency with which status is displayed, in seconds.
ratefile
Specify a file for printing output statistics.
COPY TO options
Also see the COPY TO table.
maxrequests
Set the maximum number of requests each worker process can work on in parallel.
108
CQL reference
pagesize
Set the page size for fetching results.
pagetimeout
Set the page timeout for fetching results.
begintoken
Set the minimum token string for exporting data.
endtoken
Set the maximum token string for exporting data.
maxoutputsize
Set the maximum size of the output file, measured in number of lines. When set, the output file is split into
segment when the value is exceeded. "-1" sets no maximum.
encoding
The encoding used for characters. The default is UTF8.
COPY FROM options
Also see the COPY FROM table.
ingestrate
Set an approximate ingest rate in rows per second. Must be set to a greater value than chunk size.
maxrows
Set the maximum number of rows. "-1" sets no maximum.
skiprows
The number of rows to skip.
skipcols
Set a comma-separated list of column names to skip.
maxparseerrors
Set the maximum global number of parsing errors. "-1" sets no maximum.
maxinserterrors
Set the maximum global number of insert errors. "-1" sets no maximum.
errfile
Set a file to store all rows that are not imported. If no value is set, the information is stored in
import_ks_table.err where <ks> is the keyspace and <table> is the table name.
maxbatchsize
Set the maximum size of an import batch.
minbatchsize
Set the minimum size of an import batch.
chunksize
Set the size of chunks passed to worker processes.
COPY options
Also see the COPY table.
chunksize
Set the size of chunks passed to worker processes.
ingestrate
Set an approximate ingest rate in rows per second. Must be set to a greater value than chunk size.
pagetimeout
Set the page timeout for fetching results.
109
CQL reference
CAPTURE
Captures command output and appends it to a file.
Synopsis
CAPTURE ('<file>' | OFF )
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
To start capturing the output of a query, specify the path of the file relative to the current directory. Enclose
the file name in single quotation marks. The shorthand notation in this example is supported for referring to
$HOME.
Examples
CAPTURE '~/mydir/myfile.txt'
Output is not shown on the console while it is captured. Only query result output is captured. Errors and
output from cqlsh-only commands still appear.
To stop capturing output and return to normal display of output, use CAPTURE OFF.
To determine the current capture state, use CAPTURE with no arguments.
CLEAR
Clears cqlsh console screen.
Synopsis
CLEAR | CLS
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
110
CQL reference
Description
Clears the cqlsh console screen. A CTRL+L will also clear the cqlsh console screen.
CONSISTENCY
Shows the current consistency level, or given a level, sets it.
Synopsis
CONSISTENCY level
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
Providing an argument to the CONSISTENCY command overrides the default consistency level of ONE, and
configures the consistency level for future requests. Valid values are: ANY, ONE, TWO, THREE, QUORUM,
ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL and LOCAL_SERIAL.
Running the command without an argument shows the current consistency level.
Examples
Use CONSISTENCY without arguments to discover the current consistency level.
cqlsh> CONSISTENCY
If you haven't changed the default, the output of the CONSISTENCY command with no arguments is:
Use CONSISTENCY with an argument to set the consistency level to a new value.
The consistency level can be set to SERIAL or LOCAL_SERIAL, but only SELECT queries are supported.
While the consistency level is set to either of these options, INSERT and UPDATE commands will fail. See
SERIAL CONSISTENCYSERIAL CONSISTENCY on page 121 to set the serial consistency level.
Note: Consistency level refers to the consistency level for all non-lightweight transaction reads and writes.
Serial consistency level refers to the consistency level of lightweight transaction reads and writes where IF
EXISTS and IF NOT EXISTS are used.
111
CQL reference
COPY
Imports and exports CSV (comma-separated values) data to and from Cassandra.
Imports and exports CSV (comma-separated values) data to and from Cassandra.
Synopsis
COPY table_name ( column, ...)
FROM ( 'file_name1', 'file_name2', ... | STDIN )
WITH option = 'value' AND ...
Note: COPY uses an argument of one or more comma-separated filenames or python glob expressions.
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
Using the COPY options in a WITH clause, you can change the CSV format. These tables describe the
available options:
112
CQL reference
113
CQL reference
The ENCODING option is available only for the COPY TO command. This table shows that, by default,
Cassandra expects the CSV data to consist of fields separated by commas (,), records separated by
line separators (a newline, \r\n), and field values enclosed in double-quotation marks (""). Also, to avoid
ambiguity, escape a literal double-quotation mark using a backslash inside a string enclosed in double-
quotation marks ("\""). By default, Cassandra does not expect the CSV file to have a header record on the
first line that consists of the column names. COPY TO includes the header in the output if HEADER=TRUE.
COPY FROM ignores the first line if HEADER=TRUE.
114
CQL reference
COPY FROM is intended for importing small datasets (a few million rows or less) into Cassandra. For
importing larger datasets, use the Cassandra bulk loader.
After inserting data into the table, you can copy the data to a CSV file in another order by specifying the
column names in parentheses after the table name:
COPY airplanes
(name, mach, year, manufacturer)
TO 'temp.csv'
USE test;
115
CQL reference
TRUNCATE airplanes;
TRUNCATE airplanes;
Output is:
Copying collections
Cassandra supports round-trip copying of collections to and from CSV files. To perform this example,
download the sample code now.
1. Unzip the downloaded file named cql_collections.zip.
2. Copy/paste all the CQL commands from the cql_collections.txt file to the cqlsh command line.
3. Take a look at the contents of the songs table. The table contains a map of venues, a list of reviews,
and a set of tags.
id |album|artist|data|reviews |tags |
title|venue
116
CQL reference
------------+-----+------+----+-------------------+-----------------+-----
+----------------------------------------------------------------------
7db1a490...| null| null |null|['hot dance music']| {'rock'}|
null|{'2013-09-22...': 'The Fillmore', '2013-10-01...': 'The Apple
Barrel'}
a3e64f8f...| null| null |null| null|{'1973', 'blues'}|
null|null
8a172618...| null| null |null| null|'2007', 'covers'}|
null|null
4. Copy the music.songs table to a CSV file named songs-20140603.csv.
cqlsh> exit;
$ cat songs-20140603.csv
7db1a490...,,,,['hot dance music'],{'rock'},,"{'2013-09-22...': 'The
Fillmore', '2013-10-01....': 'The Apple Barrel'}"
a3e64f8f...,,,,,"{'1973', 'blues'}",,
8a172618...,,,,,"{'2007', 'covers'}",,
6. Start cqlsh again, and create a table definition that matches the data in the songs-204140603 file.
DESCRIBE
Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster.
Provides information about the connected Cassandra cluster, or about the objects stored in the cluster.
Synopsis
DESCRIBE FULL ( CLUSTER | SCHEMA )
| KEYSPACES
| ( KEYSPACE keyspace_name )
| TABLES
| ( TABLE table_name )
| TYPES
| ( TYPE user_defined_type )
| FUNCTIONS
| ( FUNCTION user_defined_function )
117
CQL reference
| AGGREGATES
| ( AGGREGATE user_defined_aggregate )
| INDEX
| ( INDEX index_name )
| MATERIALIZED VIEW
| ( MATERIALIZED VIEW view_name )
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
The DESCRIBE or DESC command outputs information about the connected Cassandra cluster, or about
the objects stored on in the cluster. To query the system tables directly, use SELECT.
In Linux the keyspace and table name arguments are case-sensitive and need to match the upper or
lowercase names stored internally. In Windows, the keyspace and table name arguments are not case-
sensitive. Use the DESCRIBE commands to list objects by their internal names. Use DESCRIBE FULL
SCHEMA if you need the schema of system_* keyspaces.
DESCRIBE functions in the following ways:
118
CQL reference
EXPAND
Formats the output of a query vertically.
Synopsis
EXPAND ( ON | OFF )
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
This command lists the contents of each row of a table vertically, providing a more convenient way to read
long rows of data than the default horizontal format. You scroll down to see more of the row instead of
scrolling to the right. Each column name appears on a separate line in column one and the values appear
in column two.
119
CQL reference
cqlsh:my_ks> EXPAND ON
Now printing expanded output
cqlsh:my_ks> SELECT * FROM users;
@ Row 1
------------+----------------------------------------------
userid | samwise
emails | {'samwise@gmail.com', 's@gamgee.com'}
first_name | Samwise
last_name | Gamgee
todo | {'2013-09-22 12:01:00-0700': 'plant trees'}
top_scores | null
@ Row 2
------------+----------------------------------------------
userid | frodo
emails | {'baggins@gmail.com', 'f@baggins.com'}
first_name | Frodo
last_name | Baggins
todo | {'2012-10-02 12:00:00-0700': 'throw my precious into mount
doom'}
top_scores | null
(2 rows)
EXIT
Terminates cqlsh.
Terminates cqlsh.
Synopsis
EXIT | QUIT
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
LOGIN
Use LOGIN to switch from one user to another within cqlsh.
Synopsis
cqlsh> LOGIN user_name password
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
120
CQL reference
Description
Use this command to change login information without requiring cqlsh to restart. Login using a specified
username. If the password is specified, it will be used. Otherwise, you will be prompted to enter the
password.
Examples
Login as the user cutie with the password patootie.
PAGING
Enables or disables query paging.
Synopsis
PAGING ( ON | OFF )
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
In Cassandra 2.1.1, you can use query paging in cqlsh. Paging provides the output of queries in 100-
line chunks followed by the more prompt. To get the next chunk, press the space bar. Turning paging on
enables query paging for all further queries. Using no ON or OFF argument with the command shows the
current query paging status.
SERIAL CONSISTENCY
Sets the current serial consistency level.
Sets the current serial consistency level that will be used for lightweight transactions. The serial
consistency level is different from the consistency level for non-lightweight transaction commands.
Synopsis
SERIAL CONSISTENCY level
121
CQL reference
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
Providing an argument to the SERIAL CONSISTENCY command sets the serial consistency level. Valid
values are: SERIAL and LOCAL_SERIAL.
Examples
Use SERIAL CONSISTENCY without an argument to see the serial consistency level.
[APPLIED] | foo
------+----
FALSE | foo
SHOW
Shows the Cassandra version, host, or tracing information for the current cqlsh client session.
Shows the Cassandra version, host, or tracing information for the current cqlsh client session.
Synopsis
SHOW VERSION
| HOST
| SESSION tracing_session_id
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
122
CQL reference
Description
A SHOW command displays this information about the current cqlsh client session:
• The version and build number of the connected Cassandra instance, as well as the CQL specification
version for cqlsh and the native protocol version used by the connected Cassandra instance.
• The host information of the Cassandra node that the cqlsh session is currently connected to.
• Tracing information for the current cqlsh session.
The SHOW SESSION command retrieves tracing session information, which is available for 24 hours.
After that time, the tracing information time-to-live expires.
These examples show how to use the commands.
activity
| timestamp
| source | source_elapsed
----------------------------------------------------------------------------------------
+--------------+-----------+----------------
execute_cql3_query | 12:19:52,372
| 127.0.0.1 | 0
Parsing CREATE TABLE emp (\n empID int,\n deptID int,\n first_name
varchar,\n last_name varchar,\n PRIMARY KEY (empID, deptID)\n); |
12:19:52,372 | 127.0.0.1 | 153
SOURCE
Executes a file containing CQL statements.
Synopsis
SOURCE 'file'
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
123
CQL reference
Description
To execute the contents of a file, specify the path of the file relative to the current directory. Enclose the
file name in single quotation marks. The shorthand notation in this example is supported for referring to
$HOME:
Examples
SOURCE '~/mydir/myfile.txt'
The output for each statement, if there is any, appears in turn, including any error messages. Errors do not
abort execution of the file.
Alternatively, use the --file option to execute a file while starting CQL.
TRACING
Enables or disables request tracing.
Synopsis
TRACING ( ON | OFF )
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
To turn tracing read/write requests on or off, use the TRACING command. After turning on tracing,
database activity creates output that may help you understand Cassandra internal operations and
troubleshoot performance problems. For example, using the tracing tutorial you can see how different
consistency levels affect operations. Some tracing messages refer to internals of the database that can be
provided those troubleshooting Cassandra issues.
For 24 hours, Cassandra saves the tracing information in sessions and events tables in the
system_traces keyspace, which you query when using probabilistic tracing. For information about
probabilistic tracing, see Cassandra 3.0 documentation or Cassandra 2.2 documentation.
124
CQL reference
The source_elapsed column stores the elapsed time in microseconds before the event occurred on the
source node.
To keep tracing information, copy the data in sessions and event tables to another location. Alternatively,
use the tracing session id to retrieve session information using SHOW SESSION. Tracing session
information expires after one day.
TRACING ON
Cassandra provides a description of each step it takes to satisfy the request, the names of nodes that are
affected, the time for each step, and the total time for the request.
125
CQL reference
Output is:
The tracing output of this read request looks something like this (a few rows have been truncated to fit on
this page):
126
CQL reference
127
CQL reference
CQL commands
CQL commands.
ALTER KEYSPACE
Change property values of a keyspace.
Synopsis
ALTER ( KEYSPACE | SCHEMA ) keyspace_name
WITH REPLICATION = map
| ( WITH DURABLE_WRITES = ( true | false ))
AND ( DURABLE_WRITES = ( true | false))
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
ALTER KEYSPACE changes the map that defines the replica placement strategy and/or the
DURABLE_WRITES value. You can also use the alias ALTER SCHEMA. Use these properties and values
to construct the map. To set the replica placement strategy, construct a map of properties and values, as
shown in the table of map properties on the CREATE KEYSPACE reference page. CQL property map keys
must be lower case. For example, class and replication_factor are correct.
You cannot change the name of the keyspace.
128
CQL reference
Example
Change the definition of the mykeyspace to use the NetworkTopologyStrategy in a single data center. Use
the default data center name in Cassandra and a replication factor of 3.
Synopsis
ALTER MATERIALIZED VIEW keyspace_name.view_name ( WITH property AND
property ... )
property is a CQL table property and value, such as speculative_retry = '10ms'. Enclose a string
property in single quotation marks.
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
ALTER MATERIALIZED VIEW manipulates the view metadata. You can change view properties. No
results are returned.
You can qualify view names by keyspace. For example, to alter the cyclist_by_age in the cycling
keyspace:
129
CQL reference
Change the values of the caching property. For example, change the keys option from ALL, the default, to
NONE and change the rows_per_partition to 15.
Note: In Cassandra 3.0 and later, cold_reads_to_omit is removed for SizeTieredCompactionStrategy.
Changing caching
In Cassandra 2.1 and later, you create and change the caching options using a property map.
ALTER ROLE
Alter a role.
Synopsis
ALTER ROLE role_name
WITH PASSWORD = [ 'password'
[AND | WITH] LOGIN = true | false ]
130
CQL reference
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
The role used to alter roles must have ALTER permission, either directly or on ALL ROLES. To alter a
superuser role, a superuser role must be used. To prevent disabling all superusers, superusers cannot
change their own superuser status. Ordinary roles can change only their own password.
Enclose the role name in single quotation marks if it contains non-alphanumeric characters. Enclose the
password in single quotation marks.
Examples
ALTER ROLE coach WITH PASSWORD 'bestTeam';
ALTER TABLE
Modify the column metadata of a table.
Synopsis
ALTER TABLE keyspace_name.table_name instruction
instruction is:
cql_type is compatible with the original type and is a CQL type, other than a collection or counter.
Exceptions: ADD supports a collection type and also, if the table is a counter, a counter type.
property is a CQL table property and value, such as speculative_retry = '10ms'. Enclose a string property in
single quotation marks.
Synopsis legend
• Uppercase means literal
131
CQL reference
Description
ALTER TABLE manipulates the table metadata. You can change the data storage type of columns, add
new columns, drop existing columns, and change table properties. No results are returned. You can also
use the alias ALTER COLUMNFAMILY.
First, specify the name of the table to be changed after the ALTER TABLE keywords, followed by the type
of change: ALTER, ADD, DROP, RENAME, or WITH. Next, provide the rest of the needed information, as
explained in the following sections.
You can qualify table names by keyspace. For example, to alter the addamsFamily table in the monsters
keyspace:
The column must already exist in current rows. The bytes stored in values for that column remain
unchanged, and if existing data cannot be deserialized according to the new type, your CQL driver or
interface might report errors.
These changes to a column type are not allowed:
• Changing the type of a clustering column.
• Changing columns on which an index is defined.
Altering the type of a column after inserting data can confuse CQL drivers/tools if the new type is
incompatible with the data.
Adding a column
To add a column, other than a column of a collection type, to a table, use ALTER TABLE and the ADD
keyword in the following way:
132
CQL reference
Dropping a column
To drop a column from the table, use ALTER TABLE and the DROP keyword. Dropping a column removes
the column from the table.
ALTER DROP removes the column from the table definition, removes data pertaining to that column,
and eventually reclaims the space formerly used by the column. The column is unavailable for querying
immediately after being dropped. The actual data removal occurs during compaction; data is not included
in SSTables in the future. To force the removal of dropped columns before compaction occurs, use the
nodetool upgradesstables command followed by an ALTER TABLE statement, which updates the table
metadata to register the drop.
After re-adding a dropped column, a query does not return values written before the column was last
dropped. Do not re-add a dropped column to a table using client-supplied timestamps, which is not a
Cassandra-generated write time.
You cannot drop columns from tables defined with the COMPACT STORAGE option.
Renaming a column
The main purpose of the RENAME clause is to change the names of CQL-generated primary key and
column names that are missing from a legacy table. Primary key columns can be renamed. You cannot
rename an indexed column or a static column, which is supported in Cassandra 2.0.6 and later.
Enclose a text property value in single quotation marks. You cannot modify properties of a table having
compact storage.
133
CQL reference
Change the values of the caching property. For example, change the keys option from ALL, the default, to
NONE and change the rows_per_partition to 15.
Note: In Cassandra 3.0 and later, cold_reads_to_omit is removed for SizeTieredCompactionStrategy.
Changing caching
In Cassandra 2.1, you create and change the caching options using a property map.
In Cassandra 2.0.x, you alter the caching options using the WITH directive.
134
CQL reference
ALTER TYPE
Modify a user-defined type. Cassandra 2.1 and later.
Synopsis
ALTER TYPE field_name instruction
instruction is:
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
ALTER TYPE can change a user-defined type in the following ways:
• Change the type of an existing field.
• Append a new field to an existing type.
• Rename a field defined by the type.
First, after the ALTER TYPE keywords, specify the name of the user-defined type to be changed, followed
by the type of change: ALTER, ADD, or RENAME. Next, provide the rest of the needed information, as
explained in the following sections.
135
CQL reference
Clustering column data types are very restricted in the options for alteration, because clustering column
data is used to order rows in tables. Indexed columns cannot be altered.
To add a collection map field called point_release in this example that represents the release date and
decimal designator, use this syntax:
ALTER USER
Alter existing user options.
Synopsis
ALTER USER user_name
WITH PASSWORD 'password'
[ SUPERUSER | NOSUPERUSER ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
Superusers can change a user's password or superuser status. To prevent disabling all superusers,
superusers cannot change their own superuser status. Ordinary users can change only their own
136
CQL reference
password. Enclose the user name in single quotation marks if it contains non-alphanumeric characters.
Enclose the password in single quotation marks. See CREATE ROLE for more information about
SUPERUSER and NOSUPERUSER.
Examples
Alter a user's password:
BATCH
Write multiple DML statements.
Synopsis
Cassandra 2.1 and later. Note that the UNLOGGED option is deprecated from Cassandra 3.0.
Cassandra 2.0.x
dml_statement is:
• INSERT
• UPDATE
• DELETE
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
137
CQL reference
Description
A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE,
DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the
statements in the batch. Batching multiple statements can save network exchanges between the client/
server and server coordinator/replicas. Cassandra spreads requests across nearby nodes as much as
possible to optimize performance. Batch statements will have an uneven workload, and may access many
nodes or a single node repeatedly.
Using batches to optimize performance is generally not successful, as described in Using and misusing
batches topic. Instead, using batches to synchronize data to tables is a legitimate operation. For
information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."
Batches are logged by default. In the context of a Cassandra batch operation, logged batches means that
if any of the batch succeeds, all of it will. To achieve this, Cassandra first writes the serialized batch to the
batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have
been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance
penalty associated with the batchlog, as it is written to two other nodes. If you do not want to incur this
penalty, prevent Cassandra writing to the batchlog system by using the UNLOGGED keyword.
Note: UNLOGGED BATCH is deprecated starting in Cassandra 3.0.
Although a logged batch guarantees that if any part of the batch succeeds, all of it will, no other
transactional enforcement is done at the batch level. For example, there is no batch isolation. Clients are
able to read the first updated rows from the batch, while other rows are still being updated on the server.
However, transactional row updates within a partition key are isolated: clients cannot read a partial update.
Statement order does not matter within a batch; Cassandra applies the same timestamp to all rows. Use
client-supplied timestamps to achieve a particular order.
Using a timestamp
BATCH supports setting a client-supplied timestamp, an integer, in the USING clause with one exception:
if a DML statement in the batch contains a compare-and-set (CAS) statement, such as the following
statement, do not attempt to use a timestamp:
cqlsh> INSERT INTO users (id, lastname) VALUES (999, 'Sparrow') IF NOT
EXISTS
The timestamp applies to all statements in the batch. If not specified, the current time of the insertion (in
microseconds) is used. The individual DML statements inside a BATCH can specify a timestamp if one is
not specified in the USING clause.
For example, specify a timestamp in an INSERT statement.
balance | writetime_balance
---------+-------------------
-8 | 19998889022757000
138
CQL reference
A continuation of this example shows how to use a static column with conditional updates in batch.
CREATE AGGREGATE
Create user-defined aggregate.
Synopsis
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] <keyspace>.aggregate-name
( <arg-name> <arg-type> )
SFUNC <state-function-name>
STYPE <type>
FINALFUNC <final-function-name>
INITCOND ( <value>,<value> )
139
CQL reference
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
A user-defined aggregate function can be created using user-defined functions. Returns a single value that
is the aggregate value over all rows returned in query.
Examples
Create an aggregate that calculates a user-defined average.
In Cassandra 3.4 and later, a new type of index, the SSTable Attached Secondary Index (SASI). SASI
indexing and querying improves on the existing secondary index implementation with superior performance
for queries that previously required the use of ALLOW FILTERING. SASI uses significantly using fewer
memory, disk, and CPU resources. It enables querying with PREFIX and CONTAINS on strings, similar to
the SQL implementation of LIKE = "foo*" or LIKE = "*foo*".
A SASI index can be created on a single column of a table. For more information about SASI, see Using
SASI.
Synopsis
CREATE CUSTOM INDEX IF NOT EXISTS index_name
ON keyspace_name.table_name ( column_name )
USING 'org.apache.cassandra.index.sasi.SASIIndex' (WITH OPTIONS = map)
index_name is an identifier, enclosed or not enclosed in double quotation marks, excluding reserved
words.
WITH OPTIONS identifies the characteristics of the SASI index.
KEYWORD USE
mode PREFIX, CONTAINS, or SPARSE. Default:
PREFIX
analyzed true or false. Indicates whether literal column is
analyzed using the specified analyzer.
analyzer_class Specifies an analyzer class.
Two classes are available:
org.apache.cassandra.index.sasi.analyzer.NonToken
140
CQL reference
and
org.apache.cassandra.index.sasi.analyzer.Standard
is_literal Designates a column as literal.
max_compaction_flush_memory_in_mb
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
CREATE CUSTOM INDEX creates a new SASI index on the specified table for column specified after
the ON keyword. You can optionally define a name for the index before ON. Add the column name in
parentheses after ON. It is not necessary for the column to exist on any current rows when the index is
created, but the column and its data type must be defined when the table is created, or added afterward by
altering the table.
141
CQL reference
You can specify the name of the table's keyspace before the table name, with period between them. If
you do, Cassandra creates the index on a table in the specified keyspace, without changing the session's
current keyspace; otherwise, Cassandra creates the index on the table within the current keyspace.
If you try to create a custom index that already exists, the command returns an error message. Exception:
if you use the IF NOT EXISTS option, the command fails and outputs feedback to standard output.
The SASI mode PREFIX is the default, and does not need to be specified.
The SASI mode SPARSE must be specified. This mode is used for dense number columns that store
timestamps or millisecond sensor readings.
Using the non-tokenizing analyzer is a method to specify case sensitivity or character case normalization
without analyzing the specified column.
142
CQL reference
CREATE INDEX
Define a new index on a single column of a table.
Define a new index on a single column of a table. To create indexes using SASI, see CREATE CUSTOM
INDEX.
Synopsis
CREATE CUSTOM INDEX IF NOT EXISTS index_name
ON keyspace_name.table_name ( KEYS ( column_name ) )
(USING class_name) (WITH OPTIONS = map)
Restrictions: Using class_name is allowed only if CUSTOM is used and class_name is a string literal
containing a java class name.
index_name is an identifier, enclosed or not enclosed in double quotation marks, excluding reserved
words.
map is a map collection, a JSON-style array of literals:
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
143
CQL reference
Description
CREATE INDEX creates a new index on the given table for the named column. Attempting to create an
already existing index will return an error unless the IF NOT EXISTS option is used. If you use the option,
the statement will be a no-op if the index already exists. Optionally, specify a name for the index itself
before the ON keyword. Enclose a single column name in parentheses. It is not necessary for the column
to exist on any current rows. The column and its data type must be specified when the table is created, or
added afterward by altering the table.
You can use dot notation to specify a keyspace for the table: keyspace name followed by a period followed
the name of the table. Cassandra creates the table in the specified keyspace, but does not change the
current keyspace; otherwise, if you do not use a keyspace name, Cassandra creates the index for the table
within the current keyspace.
If data already exists for the column, Cassandra indexes the data during the execution of this statement.
After the index is created, Cassandra indexes new data for the column automatically when new data is
inserted.
Cassandra supports creating an index on most columns, excluding counter columns but including a
clustering column of a compound primary key or on the partition (primary) key itself. Cassandra 2.1 and
later supports creating an index on a collection or the key of a collection map. Cassandra rejects an
attempt to create an index on both the collection key and value. In Cassandra 3.4 and later, static columns
can be indexed.
Indexing can impact performance greatly. Before creating an index, be aware of when and when not to
create an index.
Cassandra supports creating a custom index, which is primarily for internal use, and options that apply to
the custom index. For example:
In Cassandra 3.4 or later, a new custom SASI index has been added that has many advantages.
144
CQL reference
If the collection is a map, Cassandra can create an index on map values. Assume the users table contains
this map data from the example of a todo map:
The map key, the timestamp, is located to the left of the colon, and the map value is located to the right of
the colon, 'die'. Indexes can be created on both map keys and map entries .
To query the table, you can use CONTAINS KEYin WHERE clauses.
145
CQL reference
To index collection entries, you use the FULL keyword and collection name in nested parentheses. For
example, index the list rnumbers.
CREATE FUNCTION
Create user-defined function.
Synopsis
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] <keyspace>.function-name
( <arg-name> <arg-type> )
(CALLED | RETURNS NULL) ON NULL INPUT
RETURNS <type>
LANGUAGE <language>
AS <body>
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
Create or replace a user-defined function (UDF) that defines a function using Java or Javascript. By
default, UDFs include Java generic methods where possible.
Examples
Compute logarithm of an input value. CALLED ON NULL INPUT ensures that the function will always be
executed.
Compute logarithm of an input value. Return NULL if the input argument is NULL.
CREATE KEYSPACE
Define a new keyspace and its replica placement strategy.
146
CQL reference
Synopsis
CREATE ( KEYSPACE | SCHEMA ) IF NOT EXISTS keyspace_name
WITH REPLICATION = map
AND DURABLE_WRITES = ( true | false )
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
CREATE KEYSPACE creates a top-level namespace and sets the keyspace name, replica placement
strategy class, replication factor, and DURABLE_WRITES options for the keyspace. For information about
the replica placement strategy, see Cassandra 2.2 replica placement strategy or Cassandra 2.1 replica
placement strategy.
When you configure NetworkTopologyStrategy as the replication strategy, you set up one or more virtual
data centers. Alternatively, you use the default data center. Use the same names for data centers as
those used by the snitch. For information about the snitch, see Cassandra 2.2 snitch documentation or
Cassandra 2.1 snitch documentation.
You assign different nodes, depending on the type of workload, to separate data centers. For example,
assign Hadoop nodes to one data center and Cassandra real-time nodes to another. Segregating
workloads ensures that only one type of workload is active per data center. The segregation prevents
incompatibility problems between workloads, such as different batch requirements that affect performance.
A map of properties and values defines the two different types of keyspaces:
147
CQL reference
CQL property map keys must be lower case. For example, class and replication_factor are correct.
Keyspace names are 48 or fewer alpha-numeric characters and underscores, the first of which is an alpha
character. Keyspace names are case-insensitive. To make a name case-sensitive, enclose it in double
quotation marks.
You can use the alias CREATE SCHEMA instead of CREATE KEYSPACE. Attempting to create an
already existing keyspace will return an error unless the IF NOT EXISTS option is used. If the option is
used, the statement will be a no-op if the keyspace already exists.
Using SimpleStrategy is fine for evaluating Cassandra. For production use or for use with mixed workloads,
use NetworkTopologyStrategy.
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 127.0.0.1 46.59 KB 256 100.0% dd867d15-6536-4922-b574-
e22e75e46432 rack1
148
CQL reference
Cassandra uses datacenter1 as the default data center name. Create a keyspace named NTSkeyspace on
a single node cluster, for example:
To use NetworkTopologyStrategy with data centers in a production environment, you need to change the
default snitch, SimpleSnitch, to a network-aware snitch, define one or more data center names in the snitch
properties file, and use those data center name(s) to define the keyspace; otherwise, Cassandra will fail to
find a node, to complete a write request, such as inserting data into a table.
After configuring Cassandra to use a network-aware snitch, such as the PropertyFileSnitch, you define
data center and rack names in the cassandra-topology.properties file.
Construct the CREATE KEYSPACE statement using NetworkTopologyStrategy for the class value in the
map. Set one or more key-value pairs consisting of the data center name and number of replicas per data
center, separated by a colon and enclosed in curly brackets. For example:
This example sets three replicas for a data center named dc1 and two replicas for a data center named
dc2. The data center name you use depends on the cluster-configured snitch you are using. There is a
correlation between the data center name defined in the map and the data center name as recognized
by the snitch you are using. The nodetool status command prints out data center names and rack
locations of your nodes if you are not sure what they are.
Setting DURABLE_WRITES
You can set the DURABLE_WRITES option after the map specification of the CREATE KEYSPACE
command. When set to false, data written to the keyspace bypasses the commit log. Be careful using this
option because you risk losing data. Do not set this attribute on a keyspace using the SimpleStrategy.
149
CQL reference
system | True |
org.apache.cassandra.locator.LocalStrategy | {}
system_traces | True |
org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
(5 rows)
Cassandra converted the excelsior keyspace to lowercase because quotation marks were not used to
create the keyspace and retained the initial capital letter for the Excalibur because quotation marks were
used.
Related information
Cassandra 2.2 replication strategy
Cassandra 2.1 replication strategy
Cassandra 2.2 snitch configuration
Cassandra 2.1 snitch configuration
Cassandra 2.2 property file snitch
Cassandra 2.1 property file snitch
Synopsis
CREATE MATERIALIZED VIEW [IF NOT EXISTS] keyspace_name.view_name AS SELECT
column1, column2, ... FROM
keyspace_name.table_name WHERE column1 IS NOT NULL AND column2 IS NOT
NULL ... PRIMARY KEY(column1, column2, ...)
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
In Cassandra 3.0 and later, CREATE MATERIALIZED VIEW will create a new materialized view from a
specified base table.
To create a materialized view, certain requirements must be met.
• The columns of the original table's primary key must be part of the materialized view's primary key.
• Only one new column may be added to the materialized view's primary key.
Examples
A materialized view is created from a table. The fields used in the primary key must be constrained to non-
NULL values in the WHERE phrase.
150
CQL reference
CREATE TABLE
Define a new table.
Synopsis
CREATE TABLE IF NOT EXISTS <keyspace_name>.table_name
( column_definition(, column_definition, ...))
WITH property (AND property ...) | option (AND option ...)
column_definition is:
Note: Frozen collections can be used for primary key columns. Non-frozen collections cannot be used for
primary key columns.
Restrictions:
• There should always be exactly one primary key definition.
• cql_type of the primary key must be a CQL data type or a user-defined type.
• cql_type of a collection uses this syntax:
LIST<cql_type>
| SET<cql_type>
| MAP<cql_type, cql_type>
column_name
| ( column_name1, column_name2, column_name3 ...)
| ((column_name4, column_name5), column_name6, column_name7 ...)
COMPACT STORAGE
| ID <id>
| ( CLUSTERING ORDER BY (clustering_column ( ASC) | DESC ), ...) )
151
CQL reference
property is a CQL table property, enclosed in single quotation marks in the case of strings, or one of these
directives:
Synopsis legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
The CREATE TABLE command creates a new table under the current keyspace.
The IF NOT EXISTS keywords may be used in creating a table. Attempting to create an existing table
returns an error unless the IF NOT EXISTS option is used. If the option is used, the statement if a no-op if
the table already exists.
A static column can store the same data in multiple clustered rows of a partition, and then retrieve that data
with a single SELECT statement.
You can add a counter column to a counter table.
Defining a column
You assign a type to columns during table creation. Column types, other than collection-type columns, are
specified as a parenthesized, comma-separated list of column name and type pairs.
This example shows how to create a table that includes collection columns and a tuple.
See "Creating a user-defined type" for information on creating UDTs. This example shows the use of a
UDT in a table.
152
CQL reference
In Cassandra 3.6 and later, UDTs can be created unfrozen if only non-collection fields are used in the
user-defined type creation. If the table is created with an unfrozen UDT, then individual field values can be
updated and deleted.
The only schema information that must be defined for a table is the primary key and its associated data
type. Unlike earlier versions, CQL does not require a column in the table that is not part of the primary key.
A primary key can have any number (1 or more) of component columns.
If the primary key consists of only one column, you can use the keywords, PRIMARY KEY, after the
column definition:
Alternatively, you can declare the primary key consisting of only one column in the same way as you
declare a compound primary key. Do not use a counter column for a key.
Using the optional WITH clause and keyword arguments, you can configure caching, compaction, and
a number of other operations that Cassandra performs on new table. You can use the WITH clause
to specify the properties of tables listed in CQL table properties, including caching, table comments,
compression, and compaction. Format the property as either a string or a map. Enclose a string property in
153
CQL reference
single quotation marks. For example, to embed a comment in a table, you format the comment as a string
property:
To specify using compact storage or clustering order use the WITH clause.
To configure caching in Cassandra 2.1, you also use a property map.
// Cassandra 2.1
To configure caching in Cassandra 2.0.x, you do not use a property map. Simply set the caching property
to a value:
154
CQL reference
A compound primary key consists of more than one column and treats the first column declared in a
definition as the partition key. To create a compound primary key, use the keywords, PRIMARY KEY,
followed by the comma-separated list of column names enclosed in parentheses.
A composite partition key is a partition key consisting of multiple columns. You use an extra set of
parentheses to enclose columns that make up the composite partition key. The columns within the primary
key definition but outside the nested parentheses are clustering columns. These columns form logical sets
inside a partition to facilitate retrieval.
For example, the composite partition key consists of block_id and breed. The clustering columns, color and
short_hair, determine the clustering order of the data. Generally, Cassandra will store columns having the
same block_id but a different breed on different nodes, and columns having the same block_id and breed
on the same node.
You can order query results to make use of the on-disk sorting of columns. You can order results in
ascending or descending order. The ascending order will be more efficient than descending. If you need
results in descending order, you can specify a clustering order to store columns on disk in the reverse
order of the default. Descending queries will then be faster than ascending ones.
The following example shows a table definition that changes the clustering order to descending by insertion
time.
In a table that uses clustering columns, non-clustering columns can be declared static in the table
definition. Static columns are only static within a given partition.
CREATE TABLE t (
155
CQL reference
k text,
s text STATIC,
i int,
PRIMARY KEY (k, i)
);
INSERT INTO t (k, s, i) VALUES ('k', 'I''m shared', 0);
INSERT INTO t (k, s, i) VALUES ('k', 'I''m still shared', 1);
SELECT * FROM t;
Output is:
k | s | i
----------------------------
k | "I'm still shared" | 0
k | "I'm still shared" | 1
Restrictions
• A table that does not define any clustering columns cannot have a static column. The table having no
clustering columns has a one-row partition in which every column is inherently static.
• A table defined with the COMPACT STORAGE directive cannot have a static column.
• A column designated to be the partition key cannot be static.
You can batch conditional updates to a static column.
In Cassandra 2.0.9 and later, you can use the DISTINCT keywordto select static columns. In this case,
Cassandra retrieves only the beginning (static column) of the partition.
CREATE TRIGGER
Registers a trigger on a table.
Synopsis
CREATE TRIGGER IF NOT EXISTS trigger_name ON table_name
USING 'java_class'
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
The implementation of triggers includes the capability to register a trigger on a table using the familiar
CREATE TRIGGER syntax. The Trigger API is semi-private and subject to change.
In Cassandra 2.1 and later, you need to enclose trigger names that use uppercase characters in
double quotation marks. The logic comprising the trigger can be written in any Java (JVM) language
156
CQL reference
and exists outside the database. The Java class in this example that implements the trigger is named
org.apache.cassandra.triggers and defined in an Apache repository. The trigger defined on a table
fires before a requested DML statement occurs to ensures the atomicity of the transaction.
Place the custom trigger code (JAR) in the triggers directory on every node. The custom JAR loads at
startup. The location of triggers directory depends on the installation:
• Cassandra 2.0.x tarball: install_location/lib/triggers
• Cassandra 2.1.x tarball: install_location/conf/triggers
• Datastax Enterprise 4.5 and later: Installer-No Services and tarball: install_location/
resources/cassandra/conf/triggers
• Datastax Enterprise 4.5 and later: Installer-Services and packages: /etc/dse/cassandra/
triggers
Cassandra 2.1.1 and later supports lightweight transactions for creating a trigger. Attempting to create
an existing trigger returns an error unless the IF NOT EXISTS option is used. If the option is used, the
statement is a no-op if the table already exists.
CREATE TYPE
Create a user-defined type in Cassandra 2.1 and later.
Synopsis
CREATE TYPE IF NOT EXISTS keyspace.type_name
( field_name type, , field_name type, ...)
The type_name is a type identifier other than the reserved type names. The field_name is an arbitrary
identifier for the field. The type is a CQL collection or non-collection type other than a counter type.
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
A user-defined type is one or more typed fields. A user-defined type facilitates handling multiple fields of
related information, such as personal information: birthday, nationality, weight, and height. Attempting to
create an already existing type will return an error unless the IF NOT EXISTS option is used. If the option is
used, the statement will be a no-op if the type already exists.
Cassandra creates the type in the specified keyspace, but does not change the current keyspace;
otherwise, if you do not specify a keyspace, Cassandra creates the type within the current keyspace. To
specify a keyspace for the UDT other than the current keyspace, use dot notation to specify a keyspace for
the type: keyspace name followed by a period followed the name of the type.
157
CQL reference
Example
This example creates a user-defined type cycling.basic_info that consists of personal data about an
individual cyclist.
After defining the UDT, you can create a table using the UDT. CQL collection columns and other columns
support the use of user-defined types, as shown in Using CQL examples.
CREATE ROLE
Create a new role.
Synopsis
CREATE ROLE [ IF NOT EXISTS ] role_name
WITH PASSWORD = [ 'password'
[AND | WITH] LOGIN = true | false ]
[AND | WITH] SUPERUSER = true | false ]
[ OPTIONS = <map_literal> ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
CREATE ROLE defines a new database role. By default, roles do not have superuser status or login
privileges. Use SUPERUSER or LOGIN to assign these privileges to a role. A superuser or a user with
CREATE permissions can issue CREATE ROLE requests. Use NOSUPERUSER to revoke superuser status
from a role. The user cassandra exists as a default superuser; after creating a non-default superuser role,
alter the cassandra user to a non-superuser.
Roles accounts are used for logging in under internal authentication and authorization. If internal
authentication is not enabled, or login is not authorized, WITH PASSWORD is not necessary.
Enclose the role name in single quotation marks if it contains non-alphanumeric characters. If a role exists,
it cannot be recreated. To change the superuser status or password, use ALTER ROLE.
158
CQL reference
CREATE ROLE coach WITH PASSWORD = 'All4One2day!' AND LOGIN = true AND
SUPERUSER = true;
If internal authentication has not been set up, you do not need the WITH PASSWORD clause:
CREATE USER
Create a new user.
Synopsis
CREATE USER [ IF NOT EXISTS ] user_name
WITH PASSWORD 'password'
[ SUPERUSER | NOSUPERUSER ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
159
CQL reference
Description
CREATE USER defines a new database user account. By default users accounts do not have superuser
status. Only a superuser can issue CREATE USER requests. See CREATE ROLE for more information
about SUPERUSER and NOSUPERUSER.
User accounts are required for logging in under internal authentication and authorization.
Enclose the user name in single quotation marks if it contains non-alphanumeric characters. You cannot
recreate an existing user. To change the superuser status or password, use ALTER USER.
If internal authentication has not been set up, WITH PASSWORD is not required.
DELETE
Removes entire rows or one or more columns from one or more rows.
Removes entire rows or one or more columns from one or more rows.
Synopsis
DELETE [ column_name [ , column_name ] [ … ] | column_name [ term ] ]
FROM [ keyspace_name. ] table_name
[ USING TIMESTAMP timestamp_value ]
WHERE row_specification
[ IF [ EXISTS | condition [ AND condition ] [ … ] ] ]
160
CQL reference
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
DELETE removes one or more columns from one or more rows in a table, or removes the entire rows.
Cassandra applies selections within the same partition key atomically and in isolation.
When a column is deleted, it is not removed from disk immediately. The deleted column is marked with
a tombstone and then removed after the configured grace period has expired. The optional timestamp
defines the new tombstone record.
Parameters:
column_name
The name of the column to be deleted, or a list of column names
term
If column_name refers to a collection (a list or map), the parameter in parentheses indicates the term in the
collection to be deleted. For each collection type, use term to specify:
list index of the element in the list (the first item in the
list has index 0, etc.)
map the key of the element to be deleted
keyspace_name
The name of the keyspace containing the table from which data will be deleted. Not needed if the keyspace
has been set for the session with the USE command.
table_name
The name of the table from which data will be deleted
timestamp_value
If a TIMESTAMP is specified, the command only deletes elements older than the timestamp_value (which
match the WHERE and optional IF conditions).
row_specification
The WHERE clause must identify the row or rows to be deleted by primary key:
• To specify one row, use primary_key_name = primary_key_value. If the primary key is a
combination of elements, follow this with AND primary_key_name = primary_key_value ....
In Cassandra 2.2, the WHERE clause must specify a value for every component of the primary key. In
Cassandra 3.0 and later, range deletion may be done using inequality operators for clustering columns
in the primary key.
161
CQL reference
When no column are listed after DELETE, command deletes the entire row or rows specified in the WHERE
clause.
Without IF EXISTS, the command proceeds with no standard output. If IF EXISTS returns true (if a row
with this primary key does exist), standard output displays a table like the following:
162
CQL reference
If no such row exists, however, the conditions returns FALSE and the command fails. In this case, standard
output looks like:
Use IF condition to apply tests to one or more column values in the selected row:
If all the conditions return TRUE, standard output is the same as if IF EXISTS returned true (see above).
If any of the conditions fails, standard output displays False in the [applied] column and also displays
information about the condition that failed:
Conditional deletions incur a non-negligible performance cost and should be used sparingly.
To delete more than one row, use the keyword IN and supply a list of values in parentheses, separated by
commas:
In Cassandra 2.0 and later, CQL supports an empty list of values in the IN clause, useful in Java Driver
applications.
163
CQL reference
To delete an element from a list, specify the column_name followed by the list index position in square
brackets:
DESCRIBE
Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster.
Provides information about the connected Cassandra cluster, or about the objects stored in the cluster.
Synopsis
DESCRIBE FULL ( CLUSTER | SCHEMA )
| KEYSPACES
| ( KEYSPACE keyspace_name )
| TABLES
| ( TABLE table_name )
| TYPES
| ( TYPE user_defined_type )
| FUNCTIONS
| ( FUNCTION user_defined_function )
| AGGREGATES
| ( AGGREGATE user_defined_aggregate )
| INDEX
| ( INDEX index_name )
| MATERIALIZED VIEW
| ( MATERIALIZED VIEW view_name )
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
Description
The DESCRIBE or DESC command outputs information about the connected Cassandra cluster, or about
the objects stored on in the cluster. To query the system tables directly, use SELECT.
In Linux the keyspace and table name arguments are case-sensitive and need to match the upper or
lowercase names stored internally. In Windows, the keyspace and table name arguments are not case-
164
CQL reference
sensitive. Use the DESCRIBE commands to list objects by their internal names. Use DESCRIBE FULL
SCHEMA if you need the schema of system_* keyspaces.
DESCRIBE functions in the following ways:
165
CQL reference
DROP AGGREGATE
Drop a user-defined aggregate.
Synopsis
DROP AGGREGATE IF EXISTS keyspace_name.aggregate_name
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
Drop a user-defined aggregate.
Examples
Text
DROP FUNCTION
Drop user-defined function (UDF).
Synopsis
DROP FUNCTION IF EXISTS keyspace_name.function_name
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
166
CQL reference
Description
Drop a user-defined function from a keyspace.
Examples
Text
DROP INDEX
Drop the named index.
Synopsis
DROP INDEX IF EXISTS keyspace.index_name
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
A DROP INDEX statement removes an existing index. If the index was not given a name during creation,
the index name is <table_name>_<column_name>_idx. If the index does not exists, the statement will
return an error, unless IF EXISTS is used in which case the operation is a no-op. You can use dot notation
to specify a keyspace for the index you want to drop: keyspace name followed by a period followed the
name of the index. Cassandra drops the index in the specified keyspace, but does not change the current
keyspace; otherwise, if you do not use a keyspace name, Cassandra drops the index for the table within
the current keyspace.
Example
DROP INDEX user_state;
DROP KEYSPACE
Remove the keyspace.
Synopsis
DROP ( KEYSPACE | SCHEMA ) IF EXISTS keyspace_name
167
CQL reference
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
A DROP KEYSPACE statement results in the immediate, irreversible removal of a keyspace, including all
tables and data contained in the keyspace. You can also use the alias DROP SCHEMA. If the keyspace
does not exists, the statement will return an error unless IF EXISTS is used, in which case the operation is
a no-op.
Cassandra takes a snapshot of the keyspace before dropping it. In Cassandra 2.0.4 and earlier, the user
was responsible for removing the snapshot manually.
Example
DROP KEYSPACE MyTwitterClone;
Synopsis
DROP MATERIALIZED VIEW IF EXISTS keyspace_name.view_name
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
A DROP MATERIALIZED VIEW statement results in the immediate, irreversible removal of a materialized
view, including all data contained in the materialized view.
Example
cqlsh> DROP MATERIALIZED VIEW cycling.cyclist_by_age;
168
CQL reference
DROP ROLE
Remove a role.
Remove a role.
Synopsis
DROP ROLE [ IF EXISTS ] role_name
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
DROP ROLE removes an existing user. In Cassandra 2.2 and later, you can test that the role exists.
Attempting to drop a role that does not exist results in an invalid query condition unless the IF EXISTS
option is used. If the option is used, the statement will be a no-op if the role does not exist.
The role used to drop roles must have DROP permission, either directly or on ALL ROLES. To drop a
superuser role, a superuser role must be used. Roles cannot drop a role that is the currently logged in role.
Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.
Examples
DROP ROLE IF EXISTS coach;
DROP TABLE
Remove the named table.
Synopsis
DROP TABLE IF EXISTS keyspace_name.table_name
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
169
CQL reference
Description
A DROP TABLE statement results in the immediate, irreversible removal of a table, including all data
contained in the table. You can also use the alias DROP COLUMNFAMILY.
Example
DROP TABLE worldSeriesAttendees;
DROP TRIGGER
Removes registration of a trigger.
Synopsis
DROP TRIGGER IF EXISTS trigger_name ON table_name
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
The DROP TRIGGER statement removes the registration of a trigger created using CREATE TRIGGER.
Cassandra 2.1.1 and later supports the IF EXISTS syntax for dropping a trigger. Cassandra checks for the
existence of the trigger before dropping it.
The Trigger API is semi-private and subject to change.
DROP TYPE
Drop a user-defined type. Cassandra 2.1 and later.
Synopsis
DROP TYPE IF EXISTS type_name
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
170
CQL reference
Description
This statement immediately and irreversibly removes a type. To drop a type, use ALTER TYPE and the
DROP keyword as shown in the following example. Attempting to drop a type that does not exist will return
an error unless the IF EXISTS option is used. If the option is used, the statement will be a no-op if the type
already exists. Dropping a user-defined type that is in use by a table or another type is not allowed.
DROP USER
Remove a user.
Remove a user.
Notice: DROP USER is supported for backwards compatibility. Authentication and authorization for
Cassandra 2.2 and later are based on ROLES, and DROP ROLE should be used.
Synopsis
DROP USER [ IF EXISTS ] user_name
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
DROP USER removes an existing user. Attempting to drop a user that does not exist results in an invalid
query condition unless the IF EXISTS option is used. If the option is used, the statement will be a no-op
if the user does not exist. A user must have appropriate permission to issue a DROP USER statement.
Users cannot drop themselves.
Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.
Examples
Drop a user if the user exists:
171
CQL reference
Drop a user:
GRANT
Provide access to database objects.
Synopsis
GRANT [ ALL ]
CREATE | ALTER | DROP | SELECT | MODIFY | AUTHORIZE | DESCRIBE | EXECUTE
ON [ ALL KEYSPACES | KEYSPACE keyspace_name |
TABLE table_name |
ALL ROLES ROLE role_name |
ALL FUNCTIONS [ IN KEYSPACE keyspace_name ] | FUNCTION function_name ]
TO role_name
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
Permissions to access all keyspaces, a named keyspace, or a table can be granted to a role. Enclose the
role name in single quotation marks if it contains non-alphanumeric characters.
This table lists the permissions needed to use CQL statements:
172
CQL reference
To perform SELECT queries on a table, SELECT permission on the table, on its parent keyspace, or on
ALL KEYSPACES is required. To perform CREATE TABLE, CREATE permission on its parent keyspace
or ALL KEYSPACES is required. A role with SUPERUSER status or AUTHORIZE permission on a resource
(or one of its parents in the hierarchy) must also have the permission in question to GRANT or REVOKE
that permission for a role. GRANT, REVOKE and LIST permissions check for the existence of the table and
keyspace before execution. GRANT and REVOKE check that the user exists.
Examples
Give the role coach permission to perform SELECT queries on all tables in all keyspaces:
Give the role manager permission to perform INSERT, UPDATE, DELETE and TRUNCATE queries on all
tables in the field keyspace.
Give the role coach permission to perform ALTER KEYSPACE queries on the cycling keyspace, and also
ALTER TABLE, CREATE INDEX and DROP INDEX queries on all tables in cycling keyspace:
Give the role coach permission to run all types of queries on cycling.name table.
INSERT
Add or update a row.
Synopsis
INSERT INTO [ keyspace_name. ] table_name
( column_name [ , column_name [ … ] )
VALUES ( column_value [ , column_value [ … ] ) [ IF NOT EXISTS ]
[ USING [ TTL time_value ] | [ TIMESTAMP timestamp_value ] ]
173
CQL reference
Synopsis Legend
This markup: Indicates:
BOLD UPPERCASE String to be typed literally
Italics String to be replaced with a specific value for this
operation
non-boldface Component with its own set of elements
Square brackets ( [ ] )enclosing an element Optional element or set of elements
Pipe symbol ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) The preceding element can be repeated
The semicolon ( ; ) that must terminate each CQL command is not included in the synopsis.
Description
INSERT writes a row to a Cassandra table atomically and in isolation. The command must specify the row
by primary key. If the table already contains a row with the same primary key, the command writes (or
overwrites) its values to the existing row. (Exception: if the command includes IF NOT EXISTS, and the
row already exists, the command fails.)
INSERT does not return any result unless it includes IF NOT EXISTS. The command requires a value
for each component of the primary key, but not for any of the row's other columns. Any column which is
defined for the row but not populated with a value takes up no space on disk.
Internally, INSERT and UPDATE preform the same operation. A key difference: UPDATE supports counters,
INSERT does not.
Parameters
keyspace_name
The name of the keyspace containing the table into which data will be inserted. Not needed if the keyspace
has been set for the session with the USE command.
table_name
The name of the table into which data will be inserted
column_name
A list of one or more columns into which data will be inserted
column_value
For each column listed in the INSERT INTO list, the VALUES list must contain a corresponding value. The
values must occur in the same order as the column names.
A column_value can be one of:
• a literal
• a set:
{ literal [, ... ] }
[ literal [, ... ] ]
174
CQL reference
time_value
The value for TTL is a number of seconds. Column values in a command marked with TTL are automatically
marked as deleted (with a tombstone) after the specified amount of time has expired. The TTL applies to
the marked column values, not the column itself. Any subsequent update of the column resets the TTL to
the TTL specified in the update. By default, values never expire. You cannot set a time_value for data in
a counter column.
timestamp_value
If the optional TIMESTAMP parameter is used, the inserted column is marked with its value – a timestamp
in microseconds. If a TIMESTAMP value is not set, Cassandra uses the time (in microseconds) that the
write occurred to the column.
Note: IF NOT EXISTS and USING TIMESTAMP cannot be used in the same INSERT.
Insert a map named teams that lists two recent team memberships for the user VOS.
175
CQL reference
Without IF NOT EXISTS, the command proceeds with no standard output. If IF NOT EXISTS returns true
(if there is no row with this primary key), standard output displays a table like the following:
If, however, the row does already exist, the command fails, and standard out displays a table with the value
false in the [applied] column, and the values that were not inserted, as in the following example:
Note: Using IF NOT EXISTS incurs a performance hit associated with using Paxos internally. For
information about Paxos, see Cassandra 3.0 documentation.
Related information
Cassandra 2.1 tunable consistency
Cassandra 2.0 tunable consistency
The "Example of a music service" section described the playlists table. This example shows how to insert
data into that table.
Procedure
Use the INSERT command to insert UUIDs for the compound primary keys, title, artist, and album data
of the playslists table.
176
CQL reference
LIST PERMISSIONS
List permissions granted to a role.
Synopsis
LIST [ ALL PERMISSIONS] |
[ CREATE | ALTER | DROP | SELECT | MODIFY | AUTHORIZE | DESCRIBE | EXECUTE
ON [ ALL KEYSPACES | KEYSPACE keyspace_name |
TABLE table_name |
ALL ROLES ROLE role_name |
ALL FUNCTIONS [ IN KEYSPACE keyspace_name ] | FUNCTION function_name ] ]
PERMISSION(S)
OF role_name [ NORECURSIVE ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
Permissions checks are recursive. If you omit the NORECURSIVE specifier, permission on the requests
resource and its parents in the hierarchy are shown. The LIST command requires DESCRIBE permission.
• Omitting the resource name (ALL KEYSPACES, keyspace, or table), lists permissions on all tables and
all keyspaces.
• Omitting the role name lists permissions of all roles. You need to be a superuser to list permissions of
all roles. If you are not, you must add
OF <my_rolename>
• Enclose the role name in single quotation marks only if it contains non-alphanumeric characters.
After creating roles and granting the permissions in the GRANT examples, you can list permissions that
roles have on resources and their parents.
Example
List all permissions given to coach:
Output is:
177
CQL reference
Output is:
Output is:
LIST ROLES
List existing roles.
178
CQL reference
Synopsis
LIST ROLES [ OF 'role_name'] [ NORECURSIVE ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
The LIST ROLES command will display users, their superuser status and login privilege. A user must have
DESCRIBE permission to generate a list.
Example
Show all the roles that exist. Sufficient privileges are required to show this information.
LIST ROLES;
Show the roles for a particular role. Sufficient privileges are required to show this information.
LIST USERS
List existing users and their superuser status.
179
CQL reference
Synopsis
LIST USERS
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
Assuming you use internal authentication, created users with the command CREATE USER, and have not
yet changed the default user, the following example shows the output of LIST USERS.
Example
List the current users:
LIST USERS;
Output is:
name | super
-----------+-------
cassandra | True
boone | False
akers | True
spillman | False
LOGIN
Use LOGIN to switch from one user to another within cqlsh.
Synopsis
cqlsh> LOGIN user_name password
Synopsis Legend
In the synopsis section of each statement, formatting has the following meaning:
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
180
CQL reference
Description
Use this command to change login information without requiring cqlsh to restart. Login using a specified
username. If the password is specified, it will be used. Otherwise, you will be prompted to enter the
password.
Examples
Login as the user cutie with the password patootie.
REVOKE
Deny access to database objects.
Synopsis
REVOKE [ ALL ]
CREATE | ALTER | DROP | SELECT | MODIFY | AUTHORIZE | DESCRIBE | EXECUTE
ON [ ALL KEYSPACES | KEYSPACE keyspace_name |
TABLE table_name |
ALL ROLES ROLE role_name |
ALL FUNCTIONS [ IN KEYSPACE keyspace_name ] | FUNCTION function_name ]
FROM role_name
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
Permissions to access all keyspaces, a named keyspace, or a table can be revoked from a role. Enclose
the role name in single quotation marks if it contains non-alphanumeric characters.
The table in GRANT lists the permissions needed to use CQL statements:
Example
REVOKE SELECT ON cycling.name FROM manager;
181
CQL reference
The role manager can no longer perform SELECT queries on the cycling.name table. Exceptions: Because
of inheritance, the user can perform SELECT queries on cycling.name if one of these conditions is met:
• The user is a superuser.
• The user has SELECT on ALL KEYSPACES permissions.
• The user has SELECT on the cycling keyspace.
The role coach can no longer perform GRANT, ALTER or REVOKE commands on all roles.
SELECT
Retrieve data from a Cassandra table.
Retrieve data from a Cassandra table. For retrieving data using a SSTable Attached Secondary Index, see
Using SASI.
Synopsis
SELECT select_expression
FROM [ keyspace_name. ] table_name
[ WHERE relation [ AND relation ] … ] ]
[ ORDER BY clustering_column [ ASC | DESC ] … ]
[ LIMIT n ]
[ ALLOW FILTERING ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
A SELECT statement returns one or more rows from a Cassandra table. The WHERE clause introduces
matching conditions that specify the rows to be returned. ORDER BY tells Cassandra how to sort these
rows.
Parameters
select_expression
A specifies the rows to be returned.
Options:
• *
182
CQL reference
operator
The logical symbol that specifies the relationship between the two sides of the relation. Casandra supports
the following operators:
= | < | > | <= | >= | CONTAINS | CONTAINS KEY
term
• a constant: string, number, uuid, boolean, hex
• a function
• a collection:
• set
183
CQL reference
term-tuple
Specifying columns
The columns referenced in the SELECT clause must exist in the target table.
Columns in big data applications duplicate values. Use the DISTINCT keyword to return only distinct
(different) values of partition keys.
You can program Cassandra to perform calculations on returned values by specifying functions that
operate on the columns being returned.For details, see Retrieving aggregate values
SELECT event_id,
dateOf(created_at) AS creation_date,
blobAsText(content) AS content
FROM timeline;
Even if the query matched 105,291 rows, Cassandra would only return the first 50,000.
The cqlsh shell has a default row limit of 10,000. The Cassandra server and native protocol do not limit
the number of returned rows, but they apply a timeout to prevent malformed queries from causing system
instability.
184
CQL reference
Now, to get the top two racers in every race year and race name, use the following command with PER
PARTITION LIMIT 2.
Output:
185
CQL reference
The SELECT statement includes in the partition key, so the WHERE clause can use the id column:
Restriction: a relation that references the partition key can only use an equality operator — = or IN. For
more details about the IN operator, see Examples below.
Filtering on a clustering column
You can use a relation on a clustering column only if it is preceded by relations that reference all the
elements of the partition key.
Example:
Output:
system.sum(race_points)
-------------------------
195
(1 rows)
In Cassandra 3.6 and later, it is possible to filter only on a non-indexed cluster column if ALLOW
FILTERING is included. The table definition is included in this example to show that race_start_date is
a clustering column without a secondary index.
Example:
186
CQL reference
Output:
It is possible to combine the partition key and a clustering column in a single relation. For details, see
Comparing clustering columns.
Filtering on indexed columns
A WHERE clause in a SELECT on an indexed table must include at least one equality relation to the
indexed column. For details, see Indexing a column.
Using the IN relation operator
Use IN, an equals condition operator, to specify multiple possible values for a column. For example, select
two columns, first_name and last_name, from three rows having employee ids (primary key) 105, 107,
or 104:
SELECT first_name, last_name FROM emp WHERE empID IN (105, 107, 104);
CREATE TABLE parts (part_type text, part_name text, part_num int, part_year
text, serial_num text, PRIMARY KEY ((part_type, part_name), part_num,
part_year));
when using IN, you can omit the equality test for clustering columns other than the last. But this usage
may require the use of ALLOW FILTERING, so its performance can be unpredictable. For example:
CQL supports an empty list of values in the IN clause, useful in Java Driver applications when passing
empty arrays as arguments for the IN clause.
When not to use IN
Under most conditions, using IN in relations on the partition key is not recommended. To process a list of
values, the SELECT may query many nodes, which degrades performance. For example, in a single local
datacenter cluster with 30 nodes; a replication factor of 3; and a consistency level of LOCAL_QUORUM, a
single partition key query goes out to two nodes. If the SELECT uses the IN condition, the operation can
involve more nodes — up to 20, depending on where the keys fall in the token range.
Using IN for clustering columns is a safer. See Cassandra Query Patterns: Not using the “in” query for
multiple partitions for additional logic about using IN.
Filtering on collections
You can query a table containing a collection to retrieve the collection in its entirety. You can also index
the collection column, and then use the CONTAINS condition in the WHERE clause to filter the data for a
187
CQL reference
particular value in the collection, or CONTAINS KEY filter by key. This example features a collection of tags
in the playlists table. When the tags are indexed. When you have indexed the tags, you can filter on 'blues'
in the tags set.
After indexing the music venue map, you can filter on map values, such as 'The Fillmore':
After indexing the collection keys in the venues map, you can filter on map keys.
This query finds all cyclists who are 23 years old based on their entry in the blist map of the table
birthday_list.
The following SELECT finds any cyclist who has 39 Pro wins, 7 Grand Tour starts, and 14 Classic starts in
a frozen list.
Range relations
The TOKEN function may be used for range queries on the partition key.
Cassandra supports greater-than and less-than comparisons, but for a given partition key, the conditions
on the clustering column are restricted to the filters that allow Cassandra to select a contiguous set of rows.
For example:
188
CQL reference
This query constructs a filter that selects data about stewards whose reign started by 2450 and ended
before 2500. If king were not a component of the primary key, you would need to create an index on king
to use this query:
The output:
(2 rows)
To allow Cassandra to select a contiguous set of rows, the WHERE clause must apply an equality
condition to the king component of the primary key. The ALLOW FILTERING clause is also required.
ALLOW FILTERING provides the capability to query the clustering columns using any condition when the
query is not part of a production application.
CAUTION:
Only use ALLOW FILTERING for development! When you attempt a potentially expensive query, such as
searching a range of rows, this prompt appears:
To run the query, use the ALLOW FILTERING clause, and restrict the output to n rows using the LIMIT n
clause. This is recommended to reduce memory used. For example:
Using LIMIT does not prevent all problems caused by ALLOW FILTERING. In this example, if there are no
entries without a value for king, the SELECT scans the entire table, no matter what LIMIT is.
Comparing clustering columns
The partition key and clustering columns can be grouped and compared to values for scanning a partition.
For example:
The syntax used in the WHERE clause compares records of steward_name and king as a tuple against
the Boromir, Brego tuple.
Using compound primary keys and sorting results
ORDER BY clauses can work on a single column only. That column must be the second column in a
compound PRIMARY KEY. This also applies to tables with more than two column components in the
primary key. Ordering can be done in ascending or descending order using the ASC or DESC keywords
(default is ascending).
189
CQL reference
In the ORDER BY clause, refer to a column using the actual name, not an alias.
For example, set up the playlists table (which uses a compound primary key), insert the example data, and
use this query to get information about a particular playlist, ordered by song_order. You do not need to
include the ORDER BY column in the select expression.
Output:
Or, create an index on playlist artists, and use this query to get titles of Fu Manchu songs on the playlist:
Output:
After inserting data, SELECT using the TOKEN function to find the data using the partition key.
190
CQL reference
For example: count the number of last names in the cyclist_name table:
Count the number of rows and get the max value for points in the users table:
Output:
MIN returns the minimum value. If the query includes a where clause, MIN returns the smallest value from
the rows that satisfy the WHERE condition.
Output:
Note: If the column referenced by MAX or MIN has an ascii or text datatype, these functions return the
last or first item in an alphabetic sort of the column values. If the specified column has datatype date or
timestamp, these functions return the most recent or least recent times/dates.
Note: If any of the rows returned has a null value for the column referenced for MIN aggregation,
Cassandra does not count use a zero value for that row when finding the minimum.
Getting the sum or average of a column of numbers
191
CQL reference
numbers:
Note: The sum and avg functions do not work with text, uuid or date fields.
Note: If any of the rows returned has a null value for the column referenced for AVG aggregation,
Cassandra includes that row in the row count, but uses a zero value to calculate the average.
-----------------------
1353010594789000
The WRITETIME output in microseconds converts to November 15, 2012 at 12:16:34 GMT-8
.
192
CQL reference
After inserting the TTL, you can use SELECT statement to check its current value:
Output:
ttl(race_name)
----------------
276
(1 rows)
TRUNCATE
Remove all data from a table.
Synopsis
TRUNCATE [ TABLE ][ keyspace_name.table_name ]
Synopsis legend
Markup Indicates
BOLD UPPERCASE Literal String
Italics String to be replaced with a specific value for this
operation
non-boldface Parameter with its own set of elements
Square brackets [] Optional element or set of elements
Pipe/vertical line ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) Preceding value can be repeated
Semi-colon (;) Terminates the command
Description
The TRUNCATE command removes all data from the specified table, immediately and irreversibly.
Examples
Make sure all nodes are up and receiving connections. Use TRUNCATE or TRUNCATE TABLE, followed by
the table name:
TRUNCATE user_activity;
Note: TRUNCATE sends a JMX command to all nodes, telling them to delete SSTables that hold the data
from the specified table. If any of these nodes is down or doesn't respond, the command fails and outputs
a message like the following:
193
CQL reference
UPDATE
Update columns in a row.
Synopsis
UPDATE [ [ keyspace_name. ] table_name
[ USING TTL time_value | TIMESTAMP timestamp_value ]
SET assignment [, assignment ] [ … ]
WHERE row_specification
[ IF [ EXISTS | condition [ AND condition ] [ … ] ] ]
Synopsis Legend
This markup: Indicates:
BOLD UPPERCASE String to be typed literally
Italics String to be replaced with a specific value for this
operation
non-boldface Component with its own set of elements
Square brackets ( [ ] )enclosing an element Optional element or set of elements
Pipe symbol ( | ) between elements Set of options, only one can be used
Ellipsis ( … ) The preceding element can be repeated
Description
UPDATE writes one or more column values to a row in a Cassandra table. If the specified row does not
exist, the command creates it. All UPDATEs within the same partition key are applied atomically and in
isolation.
The USING clause can add a timestamp or a 'time to live' value to the row. You cannot specify these
options on counter columns.
Assign new values to the row's columns in the SET clause. UPDATE cannot update the values of a
row's primary key fields. To update a counter column value in a counter table, specify the increment or
decrement to the counter column.
Note: Unlike the INSERT command, the UPDATE command supports counters. Otherwise, the UPDATE
and INSERT operations are identical internally.
The WHERE clause specifies the row or rows to be updated. To specify a row,the WHERE clause must
specify a value for each column of its primary key. You can use the IN to specify more than one column
value, but only for the last column of the primary key.
The IF EXISTS or IF options make the UPDATE into a lightweight transaction:
UPDATE cycling.cyclist_name
SET comments ='='Rides hard, gets along with others, a real winner'
WHERE id = fb372533-eb95-4bb4-8685-6ef61e994caa IF EXISTS;
Use the IF keyword followed by a condition to be met for the update to succeed. Using an IF condition
incurs a performance hit associated with using Paxos internally to support linearizable consistency.
UPDATE does not return any result unless it includes IF EXISTS.
194
CQL reference
Parameters
keyspace_name
The name of the keyspace containing the table to be updated. Not needed if the keyspace has been set
for the session with the USE command.
table_name
The name of the table to be updated
time_value
The value for TTL is a number of seconds. Column values in a command marked with TTL are automatically
marked as deleted (with a tombstone) after the specified amount of time. The TTL applies to the marked
column values, not the column itself. Any subsequent update of the column resets the value to the TTL
specified in the update. By default, values never expire. You cannot set a time_value for data in a counter
column.
timestamp_value
If TIMESTAMP is used, the inserted column is marked with its value – a timestamp in microseconds. If a
TIMESTAMP value is not set, Cassandra uses the time (in microseconds) that the update occurred to the
column.
assignment
Asigns a value to an existing element. Can be one of:
column_name = column_value
set_or_list_item = set_or_list_item ( + | - ) ...
map_name = map_name ( + | - ) ...
map_name = map_name ( + | - ) { map_key : map_value, ... }
column_name [ term ] = value
counter_column_name = counter_column_name ( + | - ) integer
• set is:
{ literal, literal, . . . }
[ literal, literal ]
[ list_index_position | [ key_value ]
row_specification
The WHERE clause must identify the row or rows to be updated by primary key:
• To specify one row, use primary_key_name = primary_key_value. If the primary key is a
combination of elements, follow this with AND primary_key_name = primary_key_value ....
The WHERE clause must specify a value for every component of the primary key.
• To specify more than one row, use primary_key_name IN ( primary_key_value,
primary_key_value … ) As in the previous option, each primary_key_name and
primary_key_value must match the composition of the primary key.
195
CQL reference
Note: To update a static column, only the partition key must be specified.
IF EXISTS / IF condition
An IF clause can limit the command's action on rows that match the WHERE clause. Two options for IF:
• Use IF EXISTS to make the UPDATE fail if there are no rows that match the WHERE conditions.
• Use IF to specify one or more conditions that must test true for the values in the specified row or rows.
If an IF is used, the command returns a result to standard output. See Conditionally updating columns for
examples.
UPDATE users
SET state = 'TX'
WHERE user_uuid
IN (88b8fd18-b1ed-4e96-bf79-4280797cba80,
06a8913c-c0d6-477c-937d-6c1b69a95d43,
bc108776-7cb5-477f-917d-869c12dfffa8);
CQL supports an empty list of values in the IN clause, useful in Java Driver applications when passing
empty arrays as arguments for the IN clause.
Update several columns in a single row:
UPDATE cycling.cyclist_teams
SET firstname = 'Marianne,
lastname = 'VOS'
WHERE id = 88b8fd18-b1ed-4e96-bf79-4280797cba80;
To use a lightweight transaction on a counter column to ensure accuracy, put one or more counter updates
in the batch statement.
UPDATE cycling.cyclist_career_teams
196
CQL reference
UPDATE cycling.cyclist_career_teams
SET teams = teams - {'DSB Bank - Nederland bloeit'} WHERE
id=5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
UPDATE cycling.cyclist_career_teams
SET teams = {} WHERE id=5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
UPDATE cycling.upcoming_calendar
SET description = description + {'Criterium du Dauphine' : 'Easy race'}
WHERE year = 2015;
To update or set a specific element - for example, update a map named events to add a new race to the
calendar:
UPDATE cycling.upcoming_calendar
SET events[2] = 'Vuelta Ciclista a Venezuela' WHERE year = 2016 AND month =
06;
In Cassandra 2.1.1 and later, you can update the map by adding one or more elements separated by
commas:
UPDATE cycling.upcoming_calendar
SET description = description + {'Criterium du Dauphine' : 'Easy race',
'Tour du Suisse' : 'Hard uphill race'}
WHERE year = 2015 AND month = 6;
UPDATE cycling.upcoming_calendar
SET events = ['Criterium du Dauphine','Tour de Suisse'];
To prepend an element to the list, enclose it in square brackets and use the addition (+) operator:
UPDATE cycling.upcoming_calendar
SET events = [ 'Tour de France' ] + events WHERE year=2015 AND month=06;
197
CQL reference
To append an element to the list, switch the order of the new element data and the list name:
UPDATE cycling.upcoming_calendar
SET events = events + [ 'Tour de France' ] WHERE year=2017 AND month=05;
To add an element at a particular position, use the list index position in square brackets:
UPDATE cycling.upcoming_calendar
SET events[4] = 'Tour de France' WHERE year=2016 AND month=07;
To remove all elements having a particular value, use the subtraction operator (-) and put the list value in
square brackets:
UPDATE cycling.upcoming_calendar
SET events = events - ['Criterium du Dauphine'] WHERE year=2016 AND
month=07;
To update data in a collection column of a user-defined type, enclose components of the type in
parentheses within the curly brackets, as shown in "Using a user-defined type."
Without IF EXISTS, the command proceeds with no standard output. If IF EXISTS returns true (if a row
with this primary key does exist), standard output displays a table like the following:
If no such row exists, however, the condition returns FALSE and the command fails. In this case, standard
output looks like:
198
CQL reference
Use IF condition to apply tests to one or more column values in the selected row:
If all the conditions return TRUE, standard output is the same as if IF EXISTS returned true (see above).
If any of the conditions fails, standard output displays False in the [applied] column and also displays
information about the condition that failed:
Conditional updates are examples of "lightweight transactions." They incur a non-negligible performance
cost and should be used sparingly.
USE
Connect the client session to a keyspace.
Synopsis
USE keyspace_name
Synopsis Legend
• Uppercase means literal
• Lowercase means not literal
• Italics mean optional
• The pipe (|) symbol means OR or AND/OR
• Ellipsis (...) means repeatable
• Orange ( and ) means not literal, indicates scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
A USE statement identifies the keyspace that contains the tables to query for the current client session. All
subsequent operations on tables and indexes are in the context of the named keyspace, unless otherwise
specified or until the client connection is terminated or another USE statement is issued.
To use a case-sensitive keyspace, enclose the keyspace name in double quotation marks.
Example
USE PortfolioDemo;
USE "Excalibur";
199