CREATE INDEX
Synopsis
Use the CREATE INDEX
statement to create an index on the specified columns of the specified table. Indexes are primarily used to improve query performance.
Syntax
create_index ::= CREATE [ UNIQUE ] INDEX
[ CONCURRENTLY | NONCONCURRENTLY ]
[ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name
[ USING access_method_name ] ( index_elem [ , ... ] )
[ INCLUDE ( column_name [ , ... ] ) ]
[ TABLESPACE tablespace_name ]
[ SPLIT { INTO int_literal TABLETS
| AT VALUES ( split_row [ , ... ] ) } ]
[ WHERE boolean_expression ]
index_elem ::= { column_name | ( expression ) }
[ operator_class_name ] [ HASH | ASC | DESC ]
[ NULLS { FIRST | LAST } ]
Semantics
When an index is created on a populated table, YugabyteDB automatically backfills the existing data into the index. In most cases, this uses an online schema migration. The following table explains some differences between creating an index online and not online.
Condition | Online | Not online |
---|---|---|
Safe to do other DMLs during CREATE INDEX ? |
yes | no |
Keeps other transactions alive during CREATE INDEX ? |
mostly | no |
Parallelizes index loading? | yes | no |
CREATE INDEX CONCURRENTLY
is supported, though online index backfill is enabled by default. Some restrictions apply (see CONCURRENTLY).
To disable online schema migration for YSQL CREATE INDEX
, set the flag ysql_disable_index_backfill=true
on all nodes and both YB-Master and YB-TServer.
To disable online schema migration for one CREATE INDEX
, use CREATE INDEX NONCONCURRENTLY
.
Note
For details on how online index backfill works, refer to Online Index Backfill.Regarding colocation, indexes follow their table. If the table is colocated, its index is also colocated; if the table is not colocated, its index is also not colocated.
Partitioned Indexes
Creating an index on a partitioned table automatically creates a corresponding index for every partition in the default tablespace. It's also possible to create an index on each partition individually, which you should do in the following cases:
- Parallel writes are expected while creating the index, because concurrent builds for indexes on partitioned tables aren't supported. In this case, it's better to use concurrent builds to create indexes on each partition individually.
- Row-level geo-partitioning is being used. In this case, create the index separately on each partition to customize the tablespace in which each index is created.
CREATE INDEX CONCURRENTLY
is not supported for partitioned tables (see CONCURRENTLY).
UNIQUE
Enforce that duplicate values in a table are not allowed.
CONCURRENTLY
Enable online schema migration (see Semantics for details), with some restrictions:
- When creating an index on a temporary table, online schema migration is disabled.
CREATE INDEX CONCURRENTLY
is not supported for partitioned tables.CREATE INDEX CONCURRENTLY
is not supported inside a transaction block.
NONCONCURRENTLY
Disable online schema migration (see Semantics for details).
ONLY
Indicates not to recurse creating indexes on partitions, if the table is partitioned. The default is to recurse.
access_method_name
The name of the index access method. By default, lsm
is used for YugabyteDB tables and btree
is used otherwise (for example, temporary tables).
GIN indexes can be created in YugabyteDB by using the ybgin
access method.
INCLUDE clause
Specify a list of columns which will be included in the index as non-key columns.
TABLESPACE clause
Specify the name of the tablespace that describes the placement configuration for this index. By default, indexes are placed in the pg_default
tablespace, which spreads the tablets of the index evenly across the cluster.
WHERE clause
A partial index is an index that is built on a subset of a table and includes only rows that satisfy the condition specified in the WHERE
clause.
It can be used to exclude NULL or common values from the index, or include just the rows of interest.
This speeds up any writes to the table because rows containing the common column values don't need to be indexed.
It also reduces the size of the index, thereby improving the speed for read queries that use the index.
name
Specify the name of the index to be created.
table_name
Specify the name of the table to be indexed.
index_elem
column_name
Specify the name of a column of the table.
expression
Specify one or more columns of the table and must be surrounded by parentheses.
HASH
- Use hash of the column. This is the default option for the first column and is used to shard the index table.ASC
— Sort in ascending order. This is the default option for second and subsequent columns of the index.DESC
— Sort in descending order.NULLS FIRST
- Specifies that nulls sort before non-nulls. This is the default when DESC is specified.NULLS LAST
- Specifies that nulls sort after non-nulls. This is the default when DESC is not specified.
SPLIT INTO
For hash-sharded indexes, you can use the SPLIT INTO
clause to specify the number of tablets to be created for the index. The hash range is then evenly split across those tablets.
Presplitting indexes, using SPLIT INTO
, distributes index workloads on a production cluster. For example, if you have 3 servers, splitting the index into 30 tablets can provide higher write throughput on the index. For an example, see Create an index specifying the number of tablets.
Note
By default, YugabyteDB presplits an index intoysql_num_shards_per_tserver * num_of_tserver
tablets. The SPLIT INTO
clause can be used to override that setting on a per-index basis.
SPLIT AT VALUES
For range-sharded indexes, you can use the SPLIT AT VALUES
clause to set split points to presplit range-sharded indexes.
Example
CREATE TABLE tbl(
a INT,
b INT,
PRIMARY KEY(a ASC, b DESC);
);
CREATE INDEX idx1 ON tbl(b ASC, a DESC) SPLIT AT VALUES((100), (200), (200, 5));
In the example above, there are three split points, so four tablets will be created for the index:
- tablet 1:
b=<lowest>, a=<lowest>
tob=100, a=<lowest>
- tablet 2:
b=100, a=<lowest>
tob=200, a=<lowest>
- tablet 3:
b=200, a=<lowest>
tob=200, a=5
- tablet 4:
b=200, a=5
tob=<highest>, a=<highest>
Note
By default, YugabyteDB creates a range sharded index as a single tablet. TheSPLIT AT
clause can be used to override that setting on a per-index basis.
Examples
Unique index with HASH column ordering
Create a unique index with hash ordered columns.
yugabyte=# CREATE TABLE products(id int PRIMARY KEY,
name text,
code text);
yugabyte=# CREATE UNIQUE INDEX ON products(code);
yugabyte=# \d products
Table "public.products"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
name | text | | |
code | text | | |
Indexes:
"products_pkey" PRIMARY KEY, lsm (id HASH)
"products_code_idx" UNIQUE, lsm (code HASH)
ASC ordered index
Create an index with ascending ordered key.
yugabyte=# CREATE INDEX products_name ON products(name ASC);
yugabyte=# \d products_name
Index "public.products_name"
Column | Type | Key? | Definition
--------+------+------+------------
name | text | yes | name
lsm, for table "public.products
INCLUDE columns
Create an index with ascending ordered key and include other columns as non-key columns
yugabyte=# CREATE INDEX products_name_code ON products(name) INCLUDE (code);
yugabyte=# \d products_name_code;
Index "public.products_name_code"
Column | Type | Key? | Definition
--------+------+------+------------
name | text | yes | name
code | text | no | code
lsm, for table "public.products"
Create an index specifying the number of tablets
To specify the number of tablets for an index, you can use the CREATE INDEX
statement with the SPLIT INTO clause.
CREATE TABLE employees (id int PRIMARY KEY, first_name TEXT, last_name TEXT) SPLIT INTO 10 TABLETS;
CREATE INDEX ON employees(first_name, last_name) SPLIT INTO 10 TABLETS;
Partial indexes
Consider an application maintaining shipments information. It has a shipments
table with a column for delivery_status
. If the application needs to access in-flight shipments frequently, then it can use a partial index to exclude rows whose shipment status is delivered
.
yugabyte=# create table shipments(id int, delivery_status text, address text, delivery_date date);
yugabyte=# create index shipment_delivery on shipments(delivery_status, address, delivery_date) where delivery_status != 'delivered';
Expression indexes
An index column need not be just a column of the underlying table, but can be a function, or scalar expression computed from one or more columns of the table. You can also obtain fast access to tables based on the results of computations.
A basic example is indexing unique emails in a users table similar to the following:
CREATE TABLE users(id BIGSERIAL PRIMARY KEY, email TEXT NOT NULL);
CREATE UNIQUE INDEX users_email_idx ON users(lower(email));
Creating a unique index prevents inserting duplicate email addresses using a different case.
Note that index expressions are only evaluated at index time, so to use the index for a specific query the expression must match exactly.
SELECT * FROM users WHERE lower(email)='user@example.com'; # will use the index created above
SELECT * FROM users WHERE email='user@example.com'; # will NOT use the index
Expression indexes are often used to index jsonb columns.
Troubleshooting
If the following troubleshooting tips don't resolve your issue, ask for help in our community Slack or file a GitHub issue.
Invalid index
If online CREATE INDEX
fails, an invalid index may be left behind. These indexes are not usable in queries and cause internal operations, so they should be dropped.
For example, the following commands can create an invalid index:
yugabyte=# CREATE TABLE uniqueerror (i int);
CREATE TABLE
yugabyte=# INSERT INTO uniqueerror VALUES (1), (1);
INSERT 0 2
yugabyte=# CREATE UNIQUE INDEX ON uniqueerror (i);
ERROR: ERROR: duplicate key value violates unique constraint "uniqueerror_i_idx"
yugabyte=# \d uniqueerror
Table "public.uniqueerror"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
i | integer | | |
Indexes:
"uniqueerror_i_idx" UNIQUE, lsm (i HASH) INVALID
Drop the invalid index as follows:
yugabyte=# DROP INDEX uniqueerror_i_idx;
DROP INDEX
Common errors and solutions
-
ERROR: duplicate key value violates unique constraint "uniqueerror_i_idx"
Reason: When creating a unique index, a unique constraint violation was found.
Fix: Resolve the conflicting row(s).
-
ERROR: Backfilling indexes { timeoutmaster_i_idx } for tablet 42e3857759f54733a47e3bb817636f60 from key '' in state kFailed
Reason: Server-side backfill timeout is repeatedly hit.
Fixes
Do any or all of the following:
- Increase the YB-Master flag ysql_index_backfill_rpc_timeout_ms from 60000 (one minute) to 300000 (five minutes).
- Increase the YB-TServer flag backfill_index_timeout_grace_margin_ms from -1 (the system automatically calculates the value to be approximately 1 second) to 60000 (one minute).
- Decrease the YB-TServer flag backfill_index_write_batch_size from 128 to 32.
-
ERROR: BackfillIndex RPC (request call id 123) to 127.0.0.1:9100 timed out after 86400.000s
Reason: Client-side backfill timeout is hit.
Fixes
The master leader may have changed during backfill. This is currently not supported. Retry creating the index, and keep an eye on the master leader.
Try increasing parallelism. Index backfill happens in parallel across each tablet of the table. A one-tablet table in an RF-3 setup would not take advantage of the parallelism. (One-tablet tables are default for range-partitioned tables and colocated tables.) On the other hand, no matter how much parallelism there is, a one-tablet index would be a bottleneck for index backfill writes. Partitioning could be improved with tablet splitting.
In case the backfill really needs more time, increase YB-TServer flag
backfill_index_client_rpc_timeout_ms
to as long as you expect the backfill to take (for example, one week).
To prioritize keeping other transactions alive during the index backfill, set each of the following to be longer than the longest transaction anticipated:
- YB-Master flag
index_backfill_wait_for_old_txns_ms
- YSQL parameter
yb_index_state_flags_update_delay
To speed up index creation by a few seconds when you know there will be no online writes, set the YSQL parameter yb_index_state_flags_update_delay
to zero.