Secondary indexes
Using indexes enhances database performance by enabling the database server to find rows faster. You can create, drop, and list indexes, as well as use indexes on expressions.
Indexes are created in the following format:
CREATE INDEX idx_name ON table_name
((columns), columns) INCLUDE (columns)
-- [SHARDING] [CLUSTERING] [COVERING]
The columns that are specified in the CREATE INDEX statement are of three kinds:
- Sharding - Columns that determine how the index data is distributed.
- Clustering - Optional columns that determine how index rows that match the same sharding key are ordered.
- Covering - Optional columns that are stored in the index to avoid a trip to the table.
Setup
The examples run on any YugabyteDB universe.
Set up a local cluster
If a local universe is currently running, first destroy it.
Start a local one-node universe with an RF of 1
by first creating a single node, as follows:
./bin/yugabyted start \
--advertise_address=127.0.0.1 \
--base_dir=${HOME}/var/node1 \
--cloud_location=aws.us-east-2.us-east-2a
After starting the yugabyted processes on all the nodes, configure the data placement constraint of the universe, as follows:
./bin/yugabyted configure data_placement --base_dir=${HOME}/var/node1 --fault_tolerance=zone
This command can be executed on any node where you already started YugabyteDB.
To check the status of a running multi-node universe, run the following command:
./bin/yugabyted status --base_dir=${HOME}/var/node1
Setup
To set up a universe, refer to Set up a YugabyteDB Anywhere universe.Setup
To set up a cluster, refer to Set up a YugabyteDB Aeon cluster.Create a census table as follows:
CREATE TABLE census(
id int,
name varchar(255),
age int,
zipcode int,
employed boolean,
PRIMARY KEY(id ASC)
);
Add some data to the table as follows.
INSERT INTO public.census ( id,name,age,zipcode,employed ) VALUES
(1,'Zachary',55,94085,True), (2,'James',56,94085,False), (3,'Kimberly',50,94084,False),
(4,'Edward',56,94085,True), (5,'Barry',56,94084,False), (6,'Tyler',45,94084,False),
(7,'Nancy',47,94085,False), (8,'Sarah',52,94084,True), (9,'Nancy',59,94084,False),
(10,'Diane',51,94083,False), (11,'Ashley',42,94083,False), (12,'Jacqueline',58,94085,False),
(13,'Benjamin',49,94084,False), (14,'James',48,94083,False), (15,'Ann',43,94083,False),
(16,'Aimee',47,94085,True), (17,'Michael',49,94085,False), (18,'Rebecca',40,94085,False),
(19,'Kevin',45,94085,True), (20,'James',45,94084,False), (21,'Sandra',60,94085,False),
(22,'Kathleen',40,94085,True), (23,'William',42,94084,False), (24,'James',42,94083,False),
(25,'Tyler',50,94085,False), (26,'James',49,94085,True), (27,'Kathleen',55,94083,True),
(28,'Zachary',55,94083,True), (29,'Rebecca',41,94085,True), (30,'Jacqueline',49,94085,False),
(31,'Diane',48,94083,False), (32,'Sarah',53,94085,True), (33,'Rebecca',55,94083,True),
(34,'William',47,94085,False), (35,'William',60,94085,True), (36,'Sarah',53,94085,False),
(37,'Ashley',47,94084,True), (38,'Ashley',54,94084,False), (39,'Benjamin',42,94083,False),
(40,'Tyler',47,94085,True), (41,'Michael',42,94084,False), (42,'Diane',50,94084,False),
(43,'Nancy',51,94085,False), (44,'Rebecca',56,94085,False), (45,'Tyler',41,94085,True);
Simple index
You can create indexes using CREATE INDEX to speed up lookups on a singe column.
For example to create an index on zipcode, you can run:
CREATE INDEX idx_zip on census(zipcode);
Multi-column index
You can create indexes on multiple columns, enabling queries with conditions on all the columns in the index to perform faster. For example, to speed up a query that looks up data based on name and zipcode:
SELECT * FROM census WHERE name = 'Kevin' AND zipcode = 94085;
Create an index on name and zipcode as follows:
CREATE INDEX idx_name_zip on census(name, zipcode);
List indexes
YSQL inherits all the functionality of the PostgreSQL pg_indexes view that allows you to retrieve a list of all indexes in the database as well as detailed information about every index.
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'census';
Remove indexes
You can remove one or more existing indexes using the DROP INDEX statement. For example, to drop the index on zipcode, you can run:
DROP INDEX idx_zip;
Understanding performance
You can use the EXPLAIN ANALYZE statement to check if a query uses an index and determine the query plan before execution. For example, consider the following query:
EXPLAIN (ANALYZE, DIST, COSTS OFF) SELECT * FROM census WHERE name = 'Kevin' AND zipcode = 94085;
Without the corresponding index, the query plan would be:
QUERY PLAN
---------------------------------------------------------------------------------------
Seq Scan on public.census (actual time=2.153..2.159 rows=1 loops=1)
Output: id, name, age, zipcode, employed
Remote Filter: (((census.name)::text = 'Kevin'::text) AND (census.zipcode = 94085))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 1.888 ms
Storage Table Rows Scanned: 45
All 45 rows in the table were scanned to retrieve one row. Now add an index for this query.
CREATE INDEX idx_name_zip on census(name, zipcode);
The query plan after the index is added would be something like:
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using idx_name_zip on census (actual time=5.821..5.825 rows=1 loops=1)
Index Cond: (((name)::text = 'Kevin'::text) AND (zipcode = 94085))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.720 ms
Storage Table Rows Scanned: 1
Storage Index Read Requests: 1
Storage Index Read Execution Time: 2.324 ms
Storage Index Rows Scanned: 1
Now only one row is scanned to retrieve one row.
Learn more
- Designing secondary indexes
- Benefits of Index-only scan
- Blog on Pushdown #3: Filtering using index predicates discusses the performance boost of distributed SQL queries using indexes.
- How To Design Distributed Indexes for Optimal Query Performance