Create indexes and track the progress
YugabyteDB supports online index backfill, that allows you to build indexes on tables that already have data, without affecting other concurrent writes. YugabyteDB also supports the CREATE INDEX NONCONCURRENTLY statement to disable online index backfill.
Tracking index creation
The current state of an index backfill can be viewed by executing the pg_stat_progress_create_index
view to report the progress of the CREATE INDEX command execution. The view contains one row for each backend connection that is currently running a CREATE INDEX command, and the row entry is cleared after the completion of the command execution.
The pg_stat_progress_create_index
view can provide the following details:
- Number of rows processed during an index backfill.
- The current phase of the command is either
initializing
orbackfilling
. - Index progress report for all the different configurations of an index or index build such as non-concurrent index builds, GIN indexes, partial indexes, and include indexes.
Columns such as lockers_total
, lockers_done
, current_locker_pid
, blocks_total
, and blocks_done
in the pg_stat_progress_create_index
view do not apply to YugabyteDB and always have null values.
Setup
The examples run on any YugabyteDB universe.
Set up a local cluster
If a local universe is currently running, first destroy it.
Start a local one-node universe with an RF of 1
by first creating a single node, as follows:
./bin/yugabyted start \
--advertise_address=127.0.0.1 \
--base_dir=${HOME}/var/node1 \
--cloud_location=aws.us-east-2.us-east-2a
After starting the yugabyted processes on all the nodes, configure the data placement constraint of the universe, as follows:
./bin/yugabyted configure data_placement --base_dir=${HOME}/var/node1 --fault_tolerance=zone
This command can be executed on any node where you already started YugabyteDB.
To check the status of a running multi-node universe, run the following command:
./bin/yugabyted status --base_dir=${HOME}/var/node1
Setup
To set up a universe, refer to Set up a YugabyteDB Anywhere universe.Setup
To set up a cluster, refer to Set up a YugabyteDB Aeon cluster.-
From your local YugabyteDB installation directory, create an index on an existing table as follows:
CREATE TABLE test(id int);
-
Populate the table with some data.
INSERT INTO test(id) SELECT n FROM generate_series(1,1000000) AS n;
-
On a separate parallel YSQL connection on the same node, select from the view to see the progress of the command in a repeated fashion with
\watch 1
as follows:SELECT tbl.relname as tblname, idx.relname as indexname, command, phase, tuples_total, tuples_done FROM pg_stat_progress_create_index INNER JOIN pg_class as tbl on tbl.oid = relid INNER JOIN pg_class as idx on idx.oid = index_relid where tbl.relname = 'test'; \watch 1
Initially, you will see an empty output like this.
tblname | indexname | command | phase | tuples_total | tuples_done ---------+-----------+---------+-------+--------------+-------------
-
Now create an index on the
id
column.CREATE INDEX idx_id ON test(id);
-
On the other terminal, you will see the progress on the index creation first with the first phase,
initializing
as:tblname | indexname | command | phase | tuples_total | tuples_done ---------+-----------+---------------------------+--------------+--------------+------------- test | idx_id | CREATE INDEX CONCURRENTLY | initializing | 0 | 0
And then you will see the index backfilling happen as:
tblname | indexname | command | phase | tuples_total | tuples_done ---------+-----------+---------------------------+-------------+--------------+------------- test | idx_id | CREATE INDEX CONCURRENTLY | backfilling | 0 | 0
You will see the
tuples_done
count increasing as the backfilling progresses. When the backfilling is done, you will see thetuples_done
count to be updated correctly.tblname | indexname | command | phase | tuples_total | tuples_done ---------+-----------+---------------------------+-------------+--------------+------------- test | idx_id | CREATE INDEX CONCURRENTLY | backfilling | 0 | 1000000
Memory usage
Backfilling consumes some amount of memory. The memory consumption is directly proportional to the data size per-row, the number of write operations batched together, and the number of parallel backfills. You can view the approximate memory usage by executing the following SQL statement:
SELECT
indexname, tablet, now()-backend_start started, pg_size_pretty(allocated_mem_bytes) mem, pg_size_pretty(rss_mem_bytes) rss
FROM (
SELECT
allocated_mem_bytes, rss_mem_bytes, backend_start, query_start, query
,regexp_replace(query,'(BACKFILL INDEX) ([0-9]*) .* PARTITION (.*);','\2')::oid AS indexoid
,regexp_replace(query,'(BACKFILL INDEX) ([0-9]*) .* PARTITION (.*);','\3')::text AS tablet
FROM pg_stat_activity
WHERE query LIKE 'BACKFILL INDEX %'
) pg_stat_activity
LEFT OUTER JOIN ( SELECT
oid AS indexoid, relname AS indexname FROM pg_class
) pg_class
USING(indexoid);
This should give you an output similar to the following when an index is being backfilled.
indexname | tablet | started | mem | rss
-----------+---------+-----------------+-------+-------
idx_id | x'aaaa' | 00:00:04.895382 | 27 MB | 25 MB
Limitations
-
In YugabyteDB, the
pg_stat_progress_create_index
view is local; it only has entries for CREATE INDEX commands issued by local YSQL clients. -
In PostgreSQL,
tuples_done
andtuples_total
refer to the tuples of the index. However, in YugabyteDB, these fields refer to the tuples of the indexed table. This discrepancy is only observed for partial indexes, where the reported progress is less than the actual progress.tuples_total
is an estimate that is retrieved frompg_class.reltuples
. -
In YugabyteDB,
tuples_done
andtuples_total
are not displayed (set to null) for temporary indexes.