Primary keys
The Primary Key constraint is a means to uniquely identify a specific row in a table via one or more columns. To define a primary key, you create a constraint that is, functionally, a unique index applied to the table columns.
When choosing and designing the primary key of a table, it's important to get the following characteristics right:
-
Uniqueness.
The primary key is a column or a set of columns that act as the unique identifier for the rows of the table. This is essential to identify a row uniquely across the different nodes in the cluster.
-
Data distribution.
In YugabyteDB, data is distributed based on the primary key. In hash sharding, the data is distributed based on the hash of the primary key. In range sharding, it is based on the actual value of the primary key.
-
Data ordering.
The table data is internally ordered based on the primary key of the table. In hash sharding, the data is ordered based on the hash of the Primary key. In range sharding, it is ordered on the actual value of the primary key.
Definition of the primary key
Primary keys can be defined when the table is defined using the PRIMARY KEY (columns)
clause. They can also be added after table creation using the ALTER TABLE statement, but this is not recommended as adding a primary key after data has been loaded could be an expensive operation to re-order and re-distribute the data.
In hash sharding the primary key definition has the following format:
PRIMARY KEY ((columns), columns)
-- [SHARDING] [CLUSTERING]
The first set of columns, typically referred to as sharding columns, is used for the distribution of the rows. The second set of columns, referred to as Clustering columns, defines the ordering of rows with the same sharding values.
In range sharding, the primary key has the following format:
PRIMARY KEY (columns)
-- [CLUSTERING]
The order of the keys matters a lot in range sharding. The data is distributed and ordered based on the first column, and for rows with the same first column, the rows are ordered on the second column, and so on.
Setup
The examples run on any YugabyteDB universe.
Set up a local cluster
If a local universe is currently running, first destroy it.
Start a local one-node universe with an RF of 1
by first creating a single node, as follows:
./bin/yugabyted start \
--advertise_address=127.0.0.1 \
--base_dir=${HOME}/var/node1 \
--cloud_location=aws.us-east-2.us-east-2a
After starting the yugabyted processes on all the nodes, configure the data placement constraint of the universe, as follows:
./bin/yugabyted configure data_placement --base_dir=${HOME}/var/node1 --fault_tolerance=zone
This command can be executed on any node where you already started YugabyteDB.
To check the status of a running multi-node universe, run the following command:
./bin/yugabyted status --base_dir=${HOME}/var/node1
Setup
To set up a universe, refer to Set up a YugabyteDB Anywhere universe.Setup
To set up a cluster, refer to Set up a YugabyteDB Aeon cluster.Create a census table as follows:
CREATE TABLE census(
id int,
name varchar(255),
age int,
zipcode int,
employed boolean,
PRIMARY KEY(id ASC)
);
Add some data to the table as follows.
INSERT INTO public.census ( id,name,age,zipcode,employed ) VALUES
(1,'Zachary',55,94085,True), (2,'James',56,94085,False), (3,'Kimberly',50,94084,False),
(4,'Edward',56,94085,True), (5,'Barry',56,94084,False), (6,'Tyler',45,94084,False),
(7,'Nancy',47,94085,False), (8,'Sarah',52,94084,True), (9,'Nancy',59,94084,False),
(10,'Diane',51,94083,False), (11,'Ashley',42,94083,False), (12,'Jacqueline',58,94085,False),
(13,'Benjamin',49,94084,False), (14,'James',48,94083,False), (15,'Ann',43,94083,False),
(16,'Aimee',47,94085,True), (17,'Michael',49,94085,False), (18,'Rebecca',40,94085,False),
(19,'Kevin',45,94085,True), (20,'James',45,94084,False), (21,'Sandra',60,94085,False),
(22,'Kathleen',40,94085,True), (23,'William',42,94084,False), (24,'James',42,94083,False),
(25,'Tyler',50,94085,False), (26,'James',49,94085,True), (27,'Kathleen',55,94083,True),
(28,'Zachary',55,94083,True), (29,'Rebecca',41,94085,True), (30,'Jacqueline',49,94085,False),
(31,'Diane',48,94083,False), (32,'Sarah',53,94085,True), (33,'Rebecca',55,94083,True),
(34,'William',47,94085,False), (35,'William',60,94085,True), (36,'Sarah',53,94085,False),
(37,'Ashley',47,94084,True), (38,'Ashley',54,94084,False), (39,'Benjamin',42,94083,False),
(40,'Tyler',47,94085,True), (41,'Michael',42,94084,False), (42,'Diane',50,94084,False),
(43,'Nancy',51,94085,False), (44,'Rebecca',56,94085,False), (45,'Tyler',41,94085,True);
Single column
Most commonly, the primary key is added to the table when the table is created, as demonstrated by the following syntax:
CREATE TABLE census(
...
id int,
PRIMARY KEY(id ASC)
...
);
It can also be specified along the definition of the column like so:
CREATE TABLE census(
...
id int PRIMARY KEY,
...
);
Multi-column key
Multiple columns can be grouped to be defined as the primary key as follows:
CREATE TABLE census(
id int,
name varchar(255),
...
PRIMARY KEY(id, name ASC)
);
Adding via a CONSTRAINT
You can define a primary key using the constraint clause as follows:
CONSTRAINT constraint_name PRIMARY KEY(column1, column2, ...);
CONSTRAINT census_pkey PRIMARY KEY(id);
Using ALTER TABLE
Although you should define the primary key along with the table definition, when needed you can use the ALTER TABLE statement to create a primary key on a table after the table is created or defined:
ALTER TABLE census ADD PRIMARY KEY (id);