0% found this document useful (0 votes)
48 views11 pages

Columnstore Indexes

Columnstore indexes store and organize database table data in a column-based format rather than the traditional row-based format. This columnar organization provides significantly higher data compression of up to 10x and faster analytic query performance of up to 10x compared to row-based storage. Columnstore indexes are particularly suitable for large data warehousing and analytics workloads involving scans of entire tables or large ranges of data.

Uploaded by

raghunathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views11 pages

Columnstore Indexes

Columnstore indexes store and organize database table data in a column-based format rather than the traditional row-based format. This columnar organization provides significantly higher data compression of up to 10x and faster analytic query performance of up to 10x compared to row-based storage. Columnstore indexes are particularly suitable for large data warehousing and analytics workloads involving scans of entire tables or large ranges of data.

Uploaded by

raghunathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Columnstore indexes: Overview

Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
This index uses column-based data storage and query processing to achieve gains up to 10 times the
query performance in your data warehouse over traditional row-oriented storage. You can also
achieve gains up to 10 times the data compression over the uncompressed data size. Beginning with
SQL Server 2016 (13.x) SP1, columnstore indexes enable operational analytics: the ability to run
performant real-time analytics on a transactional workload.

What is a columnstore index?

A columnstore index is a technology for storing, retrieving, and managing data by using a columnar
data format, called a columnstore.

Key terms and concepts

The following key terms and concepts are associated with columnstore indexes.

Columnstore

A columnstore is data that's logically organized as a table with rows and columns, and physically
stored in a column-wise data format.

Rowstore

A rowstore is data that's logically organized as a table with rows and columns, and physically stored
in a row-wise data format. This format is the traditional way to store relational table data. In SQL
Server, rowstore refers to a table where the underlying data storage format is a heap, a clustered
index, or a memory-optimized table.

Note

In discussions about columnstore indexes, the terms rowstore and columnstore are used to
emphasize the format for the data storage.

Rowgroup

A rowgroup is a group of rows that are compressed into columnstore format at the same time. A
rowgroup usually contains the maximum number of rows per rowgroup, which is 1,048,576 rows.

For high performance and high compression rates, the columnstore index slices the table into
rowgroups, and then compresses each rowgroup in a column-wise manner. The number of rows in
the rowgroup must be large enough to improve compression rates, and small enough to benefit from
in-memory operations.
A rowgroup from where all data has been deleted transitions from COMPRESSED into TOMBSTONE
state, and is later removed by a background process named the tuple-mover. For more information
about rowgroup statuses, see sys.dm_db_column_store_row_group_physical_stats (Transact-SQL).

Tip

Having too many small rowgroups decreases the columnstore index quality. Until SQL Server 2017
(14.x), a reorganize operation is required to merge smaller COMPRESSED rowgroups, following an
internal threshold policy that determines how to remove deleted rows and combine the compressed
rowgroups.
Starting with SQL Server 2019 (15.x), a background merge task also works to merge COMPRESSED
rowgroups from where a large number of rows has been deleted.
After merging smaller rowgroups, the index quality should be improved.

Note

Starting with SQL Server 2019 (15.x), Azure SQL Database, Azure SQL Managed Instance, and
dedicated SQL pools in Azure Synapse Analytics, the tuple-mover is helped by a background merge
task that automatically compresses smaller OPEN delta rowgroups that have existed for some time
as determined by an internal threshold, or merges COMPRESSED rowgroups from where a large
number of rows has been deleted. This improves the columnstore index quality over time.

Column segment

A column segment is a column of data from within the rowgroup.

 Each rowgroup contains one column segment for every column in the table.
 Each column segment is compressed together and stored on physical media.
 There is metadata with each segment to allow for fast elimination of segments without reading
them.

Clustered columnstore index


A clustered columnstore index is the physical storage for the entire table.

To reduce fragmentation of the column segments and improve performance, the columnstore index
might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs for
deleted rows. The deltastore operations are handled behind the scenes. To return the correct query
results, the clustered columnstore index combines query results from both the columnstore and the
deltastore.

Note

SQL Server documentation uses the term B-tree generally in reference to indexes. In rowstore
indexes, SQL Server implements a B+ tree. This does not apply to columnstore indexes or in-memory
data stores. For more information, see the SQL Server and Azure SQL index architecture and
design guide.

Delta rowgroup

A delta rowgroup is a clustered B-tree index that's used only with columnstore indexes. It improves
columnstore compression and performance by storing rows until the number of rows reaches a
threshold (1,048,576 rows) and are then moved into the columnstore.

When a delta rowgroup reaches the maximum number of rows, it transitions from an OPEN to
CLOSED state. A background process named the tuple-mover checks for closed row groups. If the
process finds a closed rowgroup, it compresses the delta rowgroup and stores it into the columnstore
as a COMPRESSED rowgroup.

When a delta rowgroup has been compressed, the existing delta rowgroup transitions into
TOMBSTONE state to be removed later by the tuple-mover when there is no reference to it.

For more information about rowgroup statuses,


see sys.dm_db_column_store_row_group_physical_stats (Transact-SQL).

Note
Starting with SQL Server 2019 (15.x), the tuple-mover is helped by a background merge task that
automatically compresses smaller OPEN delta rowgroups that have existed for some time as
determined by an internal threshold, or merges COMPRESSED rowgroups from where a large
number of rows has been deleted. This improves the columnstore index quality over time.

Deltastore

A columnstore index can have more than one delta rowgroup. All of the delta rowgroups are
collectively called the deltastore.

During a large bulk load, most of the rows go directly to the columnstore without passing through the
deltastore. Some rows at the end of the bulk load might be too few in number to meet the minimum
size of a rowgroup, which is 102,400 rows. As a result, the final rows go to the deltastore instead of
the columnstore. For small bulk loads with less than 102,400 rows, all of the rows go directly to the
deltastore.

Nonclustered columnstore index

A nonclustered columnstore index and a clustered columnstore index function the same. The
difference is that a nonclustered index is a secondary index that's created on a rowstore table, but a
clustered columnstore index is the primary storage for the entire table.

The nonclustered index contains a copy of part or all of the rows and columns in the underlying table.
The index is defined as one or more columns of the table and has an optional condition that filters the
rows.

A nonclustered columnstore index enables real-time operational analytics where the OLTP workload
uses the underlying clustered index while analytics run concurrently on the columnstore index. For
more information, see Get started with columnstore for real-time operational analytics.

Batch mode execution

Batch mode execution is a query processing method that's used to process multiple rows together.
Batch mode execution is closely integrated with, and optimized around, the columnstore storage
format. Batch mode execution is sometimes known as vector-based or vectorized execution. Queries
on columnstore indexes use batch mode execution, which improves query performance typically by
two to four times. For more information, see the Query processing architecture guide.

Why should I use a columnstore index?

A columnstore index can provide a very high level of data compression, typically by 10 times, to
significantly reduce your data warehouse storage cost. For analytics, a columnstore index offers an
order of magnitude better performance than a B-tree index. Columnstore indexes are the preferred
data storage format for data warehousing and analytics workloads. Starting with SQL Server 2016
(13.x), you can use columnstore indexes for real-time analytics on your operational workload.

Reasons why columnstore indexes are so fast:


 Columns store values from the same domain and commonly have similar values, which result in
high compression rates. I/O bottlenecks in your system are minimized or eliminated, and
memory footprint is reduced significantly.
 High compression rates improve query performance by using a smaller in-memory footprint. In
turn, query performance can improve because SQL Server can perform more query and data
operations in memory.
 Batch execution improves query performance, typically by two to four times, by processing
multiple rows together.
 Queries often select only a few columns from a table, which reduces total I/O from the physical
media.

When should I use a columnstore index?

Recommended use cases:

 Use a clustered columnstore index to store fact tables and large dimension tables for data
warehousing workloads. This method improves query performance and data compression by up
to 10 times. For more information, see Columnstore indexes for data warehousing.
 Use a nonclustered columnstore index to perform analysis in real time on an OLTP workload.
For more information, see Get started with columnstore for real-time operational analytics.
 For more usage scenarios for columnstore indexes, see Choose the best columnstore index for
your needs.

How do I choose between a rowstore index and a columnstore index?

Rowstore indexes perform best on queries that seek into the data, when searching for a particular
value, or for queries on a small range of values. Use rowstore indexes with transactional workloads
because they tend to require mostly table seeks instead of table scans.

Columnstore indexes give high performance gains for analytic queries that scan large amounts of
data, especially on large tables. Use columnstore indexes on data warehousing and analytics
workloads, especially on fact tables, because they tend to require full table scans rather than table
seeks.

Beginning with SQL Server 2022 (16.x), ordered clustered columnstore indexes improve performance
for queries based on ordered column predicates. Ordered columnstore indexes can improve row-
group elimination, which can deliver performance improvements by skipping row groups altogether.
For more information, see Performance tuning with ordered clustered columnstore index.

Can I combine rowstore and columnstore on the same table?

Yes. Beginning with SQL Server 2016 (13.x), you can create an updatable nonclustered columnstore
index on a rowstore table. The columnstore index stores a copy of the selected columns, so you need
extra space for this data, but the selected data is compressed on average 10 times. You can run
analytics on the columnstore index and transactions on the rowstore index at the same time. The
columnstore is updated when data changes in the rowstore table, so both indexes work against the
same data.

Beginning with SQL Server 2016 (13.x), you can have one or more nonclustered rowstore indexes on
a columnstore index and perform efficient table seeks on the underlying columnstore. Other options
become available too. For example, you can enforce a primary key constraint by using a UNIQUE
constraint on the rowstore table. Because a non-unique value fails to insert into the rowstore table,
SQL Server can't insert the value into the columnstore.

Metadata

All of the columns in a columnstore index are stored in the metadata as included columns. The
columnstore index doesn't have key columns.

sys.indexes (Transact-SQL)

sys.index_columns (Transact-SQL)

sys.partitions (Transact-SQL)

sys.internal_partitions (Transact-SQL)

sys.column_store_segments (Transact-SQL)

sys.column_store_dictionaries (Transact-SQL)

sys.column_store_row_groups (Transact-SQL)

sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)

sys.dm_db_column_store_row_group_physical_stats (Transact-SQL)

sys.dm_column_store_object_pool (Transact-SQL)

sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)

sys.dm_db_index_operational_stats (Transact-SQL)

sys.dm_db_index_physical_stats (Transact-SQL)

Related tasks

All relational tables, unless you specify them as a clustered columnstore index, use rowstore as the
underlying data format. CREATE TABLE creates a rowstore table unless you specify the WITH
CLUSTERED COLUMNSTORE INDEX option.

When you create a table with the CREATE TABLE statement, you can create the table as a
columnstore by specifying the WITH CLUSTERED COLUMNSTORE INDEX option. If you already
have a rowstore table and want to convert it to a columnstore, you can use the CREATE
COLUMNSTORE INDEX statement.

Task Reference topics Notes


Create a table as CREATE TABLE Beginning with SQL Server 2016 (13.x), you
a columnstore. (Transact-SQL) can create the table as a clustered
columnstore index. You don't have to first
Task Reference topics Notes
create a rowstore table and then convert it
to columnstore.
Create a memory CREATE TABLE Beginning with SQL Server 2016 (13.x), you
table with a (Transact-SQL) can create a memory-optimized table with a
columnstore columnstore index. The columnstore index
index. can also be added after the table is created
by using the ALTER TABLE ADD
INDEX syntax.
Convert a CREATE Convert an existing heap or B-tree to a
rowstore table to a COLUMNSTORE INDEX columnstore. Examples show how to handle
columnstore. (Transact-SQL) existing indexes and also the name of the
index when performing this conversion.
Convert a CREATE CLUSTERED Usually this conversion isn't necessary, but
columnstore table INDEX (Transact- there can be times when you need to
to a rowstore. SQL) or Convert a convert. Examples show how to convert a
columnstore table back columnstore to a heap or clustered index.
to a rowstore heap
Create a CREATE A rowstore table can have one columnstore
columnstore index COLUMNSTORE INDEX index. Beginning with SQL Server 2016
on a rowstore (Transact-SQL) (13.x), the columnstore index can have a
table. filtered condition. Examples show the basic
syntax.
Create performant Get started with Describes how to create complementary
indexes for columnstore for real-time columnstore and B-tree indexes, so that
operational operational analytics OLTP queries use B-tree indexes and
analytics. analytics queries use columnstore indexes.
Create performant Columnstore indexes for Describes how to use B-tree indexes on
columnstore data warehousing columnstore tables to create performant
indexes for data data warehousing queries.
warehousing.
Use a B-tree index Columnstore indexes for Shows how to combine B-tree and
to enforce a data warehousing columnstore indexes to enforce primary key
primary key constraints on the columnstore index.
constraint on a
columnstore
index.
Drop a DROP INDEX (Transact- Dropping a columnstore index uses the
columnstore SQL) standard DROP INDEX syntax that B-tree
index. indexes use. Dropping a clustered
columnstore index converts the columnstore
table to a heap.
Delete a row from DELETE (Transact-SQL) Use DELETE (Transact-SQL) to delete a
a columnstore row.
index.
columnstore row: SQL Server marks the
row as logically deleted, but doesn't reclaim
the physical storage for the row until the
index is rebuilt.
Task Reference topics Notes
deltastore row: SQL Server logically and
physically deletes the row.
Update a row in UPDATE (Transact- Use UPDATE (Transact-SQL) to update a
the columnstore SQL) row.
index.
columnstore row: SQL Server marks the
row as logically deleted and then inserts the
updated row into the deltastore.

deltastore row: SQL Server updates the


row in the deltastore.
Load data into a Columnstore indexes
columnstore data loading
index.
Force all rows in ALTER INDEX ALTER INDEX with the REBUILD option
the deltastore to (Transact- forces all rows to go into the columnstore.
go into the SQL) ... REBUILD
columnstore.
Reorganize and Rebuild
Indexes
Defragment a ALTER INDEX ALTER INDEX ...
columnstore (Transact-SQL) REORGANIZE defragments columnstore
index. indexes online.
Merge tables with MERGE (Transact-SQL)
columnstore
indexes.

Columnstore indexes - Data Warehouse

Columnstore indexes, in conjunction with partitioning, are essential for building a SQL Server data
warehouse.

Key features for data warehousing

SQL Server 2016 (13.x) introduced these features for columnstore performance enhancements:

 Always On supports querying a columnstore index on a readable secondary replica.


 Multiple Active Result Sets (MARS) supports columnstore indexes.
 A new dynamic management view sys.dm_db_column_store_row_group_physical_stats
(Transact-SQL) provides performance troubleshooting information at the row group level.
 Single-threaded queries on columnstore indexes can run in batch mode. Previously, only multi-
threaded queries could run in batch mode.
 The SORT operator runs in batch mode.
 Multiple DISTINCT operation runs in batch mode.
 Window Aggregates now runs in batch mode for database compatibility level 130 and higher.
 Aggregate Pushdown for efficient processing of aggregates. This is supported on all database
compatibility levels.
 String predicate pushdown for efficient processing of string predicates. This is supported on all
database compatibility levels.
 Snapshot isolation for database compatibility level 130 and higher.
 Ordered cluster columnstore indexes are available in SQL Server 2022 (16.x). For more
information, see CREATE COLUMNSTORE INDEX and Performance tuning with ordered
clustered columnstore index.

Improve performance by combining nonclustered and columnstore indexes

Starting with SQL Server 2016 (13.x), you can define nonclustered indexes on a clustered
columnstore index.

Example: Improve efficiency of table seeks with a nonclustered index

To improve efficiency of table seeks in a data warehouse, you can create a nonclustered index
designed to run queries that perform best with table seeks. For example, queries that look for
matching values or return a small range of values will perform better against a B-tree index rather
than a columnstore index. They don't require a full table scan through the columnstore index and will
return the correct result faster by doing a binary search through a B-tree index.

SQLCopy
--BASIC EXAMPLE: Create a nonclustered index on a columnstore table.

--Create the table


CREATE TABLE t_account (
AccountKey int NOT NULL,
AccountDescription nvarchar (50),
AccountType nvarchar(50),
UnitSold int
);
GO

--Store the table as a columnstore.


CREATE CLUSTERED COLUMNSTORE INDEX taccount_cci ON t_account;
GO

--Add a nonclustered index.


CREATE UNIQUE INDEX taccount_nc1 ON t_account (AccountKey);

Example: Use a nonclustered index to enforce a primary key constraint on a columnstore


table

By design, a columnstore table does not allow a clustered primary key constraint. Now you can use a
nonclustered index on a columnstore table to enforce a primary key constraint. A primary key is
equivalent to a UNIQUE constraint on a non-NULL column, and SQL Server implements a UNIQUE
constraint as a nonclustered index. Combining these facts, the following example defines a UNIQUE
constraint on the non-NULL column accountkey. The result is a nonclustered index that enforces a
primary key constraint as a UNIQUE constraint on a non-NULL column.

Next, the table is converted to a clustered columnstore index. During the conversion, the
nonclustered index persists. The result is a clustered columnstore index with a nonclustered index
that enforces a primary key constraint. Since any update or insert on the columnstore table will also
affect the nonclustered index, all operations that violate the unique constraint and the non-NULL will
cause the entire operation to fail.

The result is a columnstore index with a nonclustered index that enforces a primary key constraint on
both indexes.

SQLCopy
--EXAMPLE: Enforce a primary key constraint on a columnstore table.

--Create a rowstore table with a unique constraint.


--The unique constraint is implemented as a nonclustered index.
CREATE TABLE t_account (
AccountKey int NOT NULL,
AccountDescription nvarchar (50),
AccountType nvarchar(50),
UnitSold int,

CONSTRAINT uniq_account UNIQUE (AccountKey)


);

--Store the table as a columnstore.


--The unique constraint is preserved as a nonclustered index on the columnstore table.
CREATE CLUSTERED COLUMNSTORE INDEX t_account_cci ON t_account

--By using the previous two steps, every row in the table meets the UNIQUE constraint
--on a non-NULL column.
--This has the same end-result as having a primary key constraint
--All updates and inserts must meet the unique constraint on the nonclustered index or they will fail.

--If desired, add a foreign key constraint on AccountKey.

ALTER TABLE [dbo].[t_account]


WITH CHECK ADD FOREIGN KEY([AccountKey]) REFERENCES my_dimension(Accountkey);

Improve performance by enabling row-level and row-group-level locking

To complement the nonclustered index on a columnstore index feature, SQL Server 2016 (13.x)
offers granular locking capability for select, update, and delete operations. Queries can run with row-
level locking on index seeks against a nonclustered index and rowgroup-level locking on full table
scans against the columnstore index. Use this to achieve higher read/write concurrency by using row-
level and rowgroup-level locking appropriately.

SQLCopy
--Granular locking example
--Store table t_account as a columnstore table.
CREATE CLUSTERED COLUMNSTORE INDEX taccount_cci ON t_account
GO

--Add a nonclustered index for use with this example


CREATE UNIQUE INDEX taccount_nc1 ON t_account (AccountKey);
GO
--Look at locking with access through the nonclustered index
SET TRANSACTION ISOLATION LEVEL repeatable read;
GO

BEGIN TRAN
-- The query plan chooses a seek operation on the nonclustered index
-- and takes the row lock
SELECT * FROM t_account WHERE AccountKey = 100;
COMMIT TRAN

You might also like