A covering index is an index that includes all the columns required by a query, including columns that would typically not be a part of an index. This is done by using the INCLUDE keyword to list the columns you want to include.

A covering index is an efficient way to perform index-only scans, where you don't need to scan the table, just the index, to satisfy the query.

Setup

The examples run on any YugabyteDB universe.

Set up a local cluster

If a local universe is currently running, first destroy it.

Start a local one-node universe with an RF of 1 by first creating a single node, as follows:

./bin/yugabyted start \
                --advertise_address=127.0.0.1 \
                --base_dir=${HOME}/var/node1 \
                --cloud_location=aws.us-east-2.us-east-2a

After starting the yugabyted processes on all the nodes, configure the data placement constraint of the universe, as follows:

./bin/yugabyted configure data_placement --base_dir=${HOME}/var/node1 --fault_tolerance=zone

This command can be executed on any node where you already started YugabyteDB.

To check the status of a running multi-node universe, run the following command:

./bin/yugabyted status --base_dir=${HOME}/var/node1

Setup

To set up a universe, refer to Set up a YugabyteDB Anywhere universe.

Setup

To set up a cluster, refer to Set up a YugabyteDB Aeon cluster.

Index-Only scan

The following exercise demonstrates how to perform an index-only scan on an expression (functional) index, and further optimize the query performance using a covering index.

  1. Create and insert some rows into a table demo with two columns id and username.

    CREATE TABLE IF NOT EXISTS demo (id bigint, username text);
    
    INSERT INTO demo SELECT n,'Number'||to_hex(n) from generate_series(1,1000) n;
    
  2. Run a select query to fetch a row with a particular username.

    SELECT * FROM demo WHERE username='Number42';
    
     id | username
    ----+----------
     66 | Number42
     (1 row)
    
  3. Run another select query to show how a sequential scan runs before creating an index.

    EXPLAIN ANALYZE SELECT * FROM demo WHERE upper(username)='NUMBER42';
    
                                                      QUERY PLAN
    ------------------------------------------------------------------------------------------------------
     Seq Scan on demo  (cost=0.00..105.00 rows=1000 width=40) (actual time=15.279..15.880 rows=1 loops=1)
       Filter: (upper(username) = 'NUMBER42'::text)
       Rows Removed by Filter: 999
     Planning Time: 0.075 ms
     Execution Time: 15.968 ms
     Peak Memory Usage: 0 kB
    (6 rows)
    
  4. Optimize the SELECT query by creating an expression index as follows:

    CREATE INDEX demo_upper ON demo( (upper(username)) );
    
    EXPLAIN ANALYZE SELECT upper(username) FROM demo WHERE upper(username)='NUMBER42';
    
                                                           QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------
     Index Scan using demo_upper on demo  (cost=0.00..5.28 rows=10 width=32) (actual time=1.939..1.942 rows=1 loops=1)
       Index Cond: (upper(username) = 'NUMBER42'::text)
     Planning Time: 7.289 ms
     Execution Time: 2.052 ms
     Peak Memory Usage: 8 kB
    (5 rows)
    

    Using an expression index enables faster access to the rows requested in the query. The problem is that the query planner just takes the expression, sees that there's an index on it, and knows that you'll select the username column and apply a function to it. It then thinks it needs the username column without realizing it already has the value with the function applied. In this case, an index-only scan covering the column to the index can optimize the query performance.

  5. Create a covering index by specifying the username column in the INCLUDE clause.

    For simplicity, the username column is used with the INCLUDE keyword to create the covering index. Generally, a covering index allows you to perform an index-only scan if the query select list matches the columns that are included in the index and the additional columns added using the INCLUDE keyword.

    Ideally, specify columns that are updated frequently in the INCLUDE clause. For other cases, it is probably faster to index all the key columns.

    CREATE INDEX demo_upper_covering ON demo( (upper(username))) INCLUDE (username);
    
    EXPLAIN ANALYZE SELECT upper(username) FROM demo WHERE upper(username)='NUMBER42';
    
                                                                   QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------
     Index Only Scan using demo_upper_covering on demo  (cost=0.00..5.18 rows=10 width=32) (actual time=1.650..1.653 rows=1     loops=1)
       Index Cond: ((upper(username)) = 'NUMBER42'::text)
       Heap Fetches: 0
     Planning Time: 5.258 ms
     Execution Time: 1.736 ms
     Peak Memory Usage: 8 kB
    (6 rows)
    

Learn more