Explore Yugabyte SQL
After creating a local cluster, you can start exploring YugabyteDB's PostgreSQL-compatible, fully-relational Yugabyte SQL API.
Set up the sample database
The examples in this tutorial use two tables, representing departments and employees. To start, use ysqlsh to create a database and schema, and insert data into the tables.
Open the YSQL shell
Using the YugabyteDB SQL shell, ysqlsh, you can connect to your cluster and interact with it using distributed SQL. ysqlsh is installed with YugabyteDB and is located in the bin directory of the YugabyteDB home directory.
To open the YSQL shell, run ysqlsh.
$ ./bin/ysqlsh
ysqlsh (11.2-YB-{{<yb-version version="preview">}}-b0)
Type "help" for help.
yugabyte=#
To open the YSQL shell, run ysqlsh.
$ ./bin/ysqlsh
ysqlsh (11.2-YB-{{<yb-version version="preview">}}-b0)
Type "help" for help.
yugabyte=#
To open the YSQL shell, run ysqlsh.
$ docker exec -it yugabyte /home/yugabyte/bin/ysqlsh --echo-queries
ysqlsh (11.2-YB-{{<yb-version version="preview">}}-b0)
Type "help" for help.
yugabyte=#
To open the YSQL shell (ysqlsh), run the following.
$ kubectl --namespace yb-demo exec -it yb-tserver-0 -- sh -c "cd /home/yugabyte && ysqlsh -h yb-tserver-0 --echo-queries"
ysqlsh (11.2-YB-{{<yb-version version="preview">}}-b0)
Type "help" for help.
yugabyte=#
Create a database
To create a database (yb_demo
), do the following:
-
Enter the following
CREATE DATABASE
command:yugabyte=# CREATE DATABASE yb_demo;
-
Connect to the new database using the ysqlsh
\c
meta command:yugabyte=# \c yb_demo;
Create the schema
The database for this tutorial includes two tables: dept
for Departments, and emp
for Employees.
Create the database schema by running the following commands.
CREATE TABLE IF NOT EXISTS public.dept (
deptno integer NOT NULL,
dname text,
loc text,
description text,
CONSTRAINT pk_dept PRIMARY KEY (deptno asc)
);
CREATE TABLE IF NOT EXISTS emp (
empno integer generated by default as identity (start with 10000) NOT NULL,
ename text NOT NULL,
job text,
mgr integer,
hiredate date,
sal integer,
comm integer,
deptno integer NOT NULL,
email text,
other_info jsonb,
CONSTRAINT pk_emp PRIMARY KEY (empno hash),
CONSTRAINT emp_email_uk UNIQUE (email),
CONSTRAINT fk_deptno FOREIGN KEY (deptno) REFERENCES dept(deptno),
CONSTRAINT fk_mgr FOREIGN KEY (mgr) REFERENCES emp(empno),
CONSTRAINT emp_email_check CHECK ((email ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$'::text))
);
The emp
table references the dept
table through a foreign key constraint. The emp
table also references itself through a foreign key constraint to ensure that an employee's manager is in turn an employee themselves.
The emp
table uses constraints to ensure integrity of data, such as uniqueness and validity of email addresses.
Load data
Insert rows into the tables using multi-value inserts to reduce client-server round trips.
-
Load data into the
dept
table by running the following command.INSERT INTO dept (deptno, dname, loc, description) values (10, 'ACCOUNTING', 'NEW YORK','preparation of financial statements, maintenance of general ledger, payment of bills, preparation of customer bills, payroll, and more.'), (20, 'RESEARCH', 'DALLAS','responsible for preparing the substance of a research report or security recommendation.'), (30, 'SALES', 'CHICAGO','division of a business that is responsible for selling products or services'), (40, 'OPERATIONS', 'BOSTON','administration of business practices to create the highest level of efficiency possible within an organization');
-
Load data into the
emp
table by running the following command.INSERT INTO emp (empno, ename, job, mgr, hiredate, sal, comm, deptno, email, other_info) values (7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, NULL, 20,'SMITH@acme.com', '{"skills":["accounting"]}'), (7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30,'ALLEN@acme.com', null), (7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30,'WARD@compuserve.com', null), (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, NULL, 20,'JONES@gmail.com', null), (7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30,'MARTIN@acme.com', null), (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, NULL, 30,'BLAKE@hotmail.com', null), (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, NULL, 10,'CLARK@acme.com', '{"skills":["C","C++","SQL"]}'), (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, NULL, 20,'SCOTT@acme.com', '{"cat":"tiger"}'), (7839, 'KING', 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10,'KING@aol.com', null), (7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30,'TURNER@acme.com', null), (7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, NULL, 20,'ADAMS@acme.org', null), (7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, NULL, 30,'JAMES@acme.org', null), (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, NULL, 20,'FORD@acme.com', '{"skills":["SQL","CQL"]}'), (7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, NULL, 10,'MILLER@acme.com', null);
You now have sample data and are ready to begin exploring YSQL in YugabyteDB.
Explore YugabyteDB
To display the schema of the emp
table, enter the following shell meta-command:
yb_demo=# \d emp
Table "public.emp"
Column | Type | Collation | Nullable | Default
------------+---------+-----------+----------+----------------------------------
empno | integer | | not null | generated by default as identity
ename | text | | not null |
job | text | | |
mgr | integer | | |
hiredate | date | | |
sal | integer | | |
comm | integer | | |
deptno | integer | | not null |
email | text | | |
other_info | jsonb | | |
Indexes:
"pk_emp" PRIMARY KEY, lsm (empno HASH)
"emp_email_uk" UNIQUE CONSTRAINT, lsm (email HASH)
Check constraints:
"emp_email_check" CHECK (email ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$'::text)
Foreign-key constraints:
"fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)
"fk_mgr" FOREIGN KEY (mgr) REFERENCES emp(empno)
Referenced by:
TABLE "emp" CONSTRAINT "fk_mgr" FOREIGN KEY (mgr) REFERENCES emp(empno)
SQL updates
The UPDATE statement can compute a new value and return it without the need to do another query. Using the RETURNING
clause returns the new values in the same call.
The following adds 100 to the salaries of all employees who are not managers and shows the new value:
UPDATE emp SET sal=sal+100
WHERE job != 'MANAGER'
RETURNING ename,sal AS new_salary;
ename | new_salary
--------+------------
SMITH | 900
ADAMS | 1200
WARD | 1350
KING | 5100
FORD | 3100
MARTIN | 1350
JAMES | 1050
ALLEN | 1700
MILLER | 1400
SCOTT | 3100
TURNER | 1600
(11 rows)
Join
A self-join is a regular join where the table is joined with itself. The following statement matches employees with their manager and filters those that are earning more than their manager.
SELECT
employee.ename,
employee.sal,
manager.ename AS "manager ename",
manager.sal AS "manager sal"
FROM
emp employee
JOIN emp manager ON
employee.mgr = manager.empno
WHERE
manager.sal<employee.sal
ORDER BY employee.sal;
ename | sal | manager ename | manager sal
-------+------+---------------+-------------
FORD | 3100 | JONES | 2975
SCOTT | 3100 | JONES | 2975
(2 rows)
Prepared statements
Use a prepared statement with typed input to prevent SQL injection. A prepared statement declares parameterized SQL.
-
Prepare the statement
employee_salary
with a parameterized query. The following prepared statement accepts the input of an employee number as an integer only and displays the name and salary:prepare employee_salary(int) AS SELECT ename,sal FROM emp WHERE empno=$1;
PREPARE
-
Use
EXECUTE
to execute a prepared statement. The following executes the prepared statement for the employee ID 7900:EXECUTE employee_salary(7900);
ename | sal -------+------ JAMES | 1050 (1 row)
-
Execute the same prepared statement with another value:
EXECUTE employee_salary(7902);
ename | sal -------+------ FORD | 3100 (1 row)
-
A prepared statement stays in the session until it is de-allocated. The following frees the memory used by this statement:
DEALLOCATE employee_salary;
DEALLOCATE
Indexes
Use indexes to query table values more efficiently.
-
Create a table with randomly generated rows. You can use the
generate_series()
function to generate rows. The following usesgenerate_series()
to create a table with 42 rows and a random value from 1 to 10:CREATE TABLE demo AS SELECT generate_series(1,42) num, round(10*random()) val;
SELECT 42
-
Create the index
demo_val
on thedemo
table. The following statement creates an index onval
(hashed for distribution) andnum
in ascending order:CREATE INDEX demo_val ON demo(val,num);
CREATE INDEX
-
Use
ANALYZE
to gather optimizer statistics on the table. The query planner chooses the best access path when provided with statistics about the data stored in the table:analyze demo;
ANALYZE
-
Query the Top 3 numbers for a specific value:
SELECT * FROM demo WHERE val=5 ORDER BY num FETCH FIRST 3 ROWS only;
num | val -----+----- 11 | 5 35 | 5 (2 rows)
-
Verify that index is leading to faster query execution using
EXPLAIN ANALYZE
. When defining an index for a specific access pattern, verify that the index is used. The following shows that an Index Only Scan was used, without the need for an additional Sort operation:EXPLAIN ANALYZE SELECT * FROM demo WHERE val=5 ORDER BY num FETCH FIRST 3 ROWS only;
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- -- Limit (cost=0.00..3.19 rows=3 width=12) (actual time=1.757..1.765 rows=3 loops=1) -> Index Only Scan using demo_val on demo (cost=0.00..4.47 rows=4 width=12) (actual time=1.754..1.758 rows=3 loops=1) Index Cond: (val = '5'::double precision) Heap Fetches: 0 Planning Time: 0.214 ms Execution Time: 1.860 ms Peak Memory Usage: 8 kB (7 rows)
-
Clean up the table for this exercise:
DROP TABLE IF EXISTS demo;
DROP TABLE
Recursive queries
The following example uses a recursive common table expression (CTE) to show the manager hierarchy. The emp_manager
CTE is built using the WITH RECURSIVE
clause to follow the hierarchy under JONES, down to the last level. The first subquery in the recursive clause starts at JONES. The second lists the employees who have JONES as a manager. They are declared with a UNION ALL
and are executed recursively to get the other levels. The main query is then run on the CTE.
WITH RECURSIVE emp_manager AS (
SELECT empno, ename, ename AS PATH
FROM emp WHERE ename='JONES'
UNION ALL
SELECT emp.empno, emp.ename, emp_manager.path || ' manages ' || emp.ename
FROM emp JOIN emp_manager ON emp.mgr = emp_manager.empno
)
SELECT * FROM emp_manager;
empno | ename | path
-------+-------+-----------------------------------
7566 | JONES | JONES
7788 | SCOTT | JONES manages SCOTT
7902 | FORD | JONES manages FORD
7876 | ADAMS | JONES manages SCOTT manages ADAMS
7369 | SMITH | JONES manages FORD manages SMITH
(5 rows)
Window functions
Use analytic window functions to compare the hiring time interval by department.
The following SQL statement uses WINDOW
to define groups of employees by department, ordered by hiring date. The LAG window function is used to access the previous row to compare the hiring date interval between two employees. FORMAT
builds text from column values, and COALESCE
handles the first hire for which there is no previous row in the group. Without these window functions, this query would need to read the table twice.
SELECT
dname,ename,job,
coalesce (
'hired '||to_char(hiredate -
lag(hiredate) over (per_dept_hiredate),'999')||' days after '||
lag(ename) over (per_dept_hiredate),
format('(1st hire in %L)',dname)
) AS "last hire in dept"
FROM emp JOIN dept USING(deptno)
WINDOW per_dept_hiredate
AS (PARTITION BY dname ORDER BY hiredate)
ORDER BY dname,hiredate;
dname | ename | job | last hire in dept
------------+--------+-----------+------------------------------
ACCOUNTING | CLARK | MANAGER | (1st hire in 'ACCOUNTING')
ACCOUNTING | KING | PRESIDENT | hired 161 days after CLARK
ACCOUNTING | MILLER | CLERK | hired 67 days after KING
RESEARCH | SMITH | CLERK | (1st hire in 'RESEARCH')
RESEARCH | JONES | MANAGER | hired 106 days after SMITH
RESEARCH | FORD | ANALYST | hired 245 days after JONES
RESEARCH | SCOTT | ANALYST | hired 371 days after FORD
RESEARCH | ADAMS | CLERK | hired 34 days after SCOTT
SALES | ALLEN | SALESMAN | (1st hire in 'SALES')
SALES | WARD | SALESMAN | hired 2 days after ALLEN
SALES | BLAKE | MANAGER | hired 68 days after WARD
SALES | TURNER | SALESMAN | hired 130 days after BLAKE
SALES | MARTIN | SALESMAN | hired 20 days after TURNER
SALES | JAMES | CLERK | hired 66 days after MARTIN
(14 rows)
REGEXP matching
Use regular expressions in an array to do pattern matching. REGEXP performs a pattern match of a string expression. The following lists employees with an e-mail ending in '.org' or a domain starting with 'gmail.':
SELECT * FROM emp
WHERE email ~ any ( ARRAY[ '@.*\.org$' , '@gmail\.' ] );
empno | ename | job | mgr | hiredate | sal | comm | deptno | email | other_info
-------+-------+---------+------+------------+------+------+--------+-----------------+------------
7876 | ADAMS | CLERK | 7788 | 1983-01-12 | 1100 | | 20 | ADAMS@acme.org |
7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | | 20 | JONES@gmail.com |
7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | | 30 | JAMES@acme.org |
(3 rows)
Arithmetic date intervals
Using arithmetic on date intervals, you can find employees with overlapping evaluation periods.
The interval data type allows you to store and manipulate a period of time in years, months, days, and so forth. The following example compares overlapping evaluation periods. The WITH
clause defines the evaluation period length depending on the job:
WITH emp_evaluation_period AS (
SELECT ename,deptno,hiredate,
hiredate + CASE WHEN job IN ('MANAGER','PRESIDENT')
THEN interval '3 month' ELSE interval '4 weeks'
END evaluation_end FROM emp
)
SELECT * FROM emp_evaluation_period e1
JOIN emp_evaluation_period e2
ON (e1.ename>e2.ename) AND (e1.deptno=e2.deptno)
WHERE (e1.hiredate,e1.evaluation_end)
overlaps (e2.hiredate,e2.evaluation_end);
ename | deptno | hiredate | evaluation_end | ename | deptno | hiredate | evaluation_end
--------+--------+------------+---------------------+--------+--------+------------+---------------------
MILLER | 10 | 1982-01-23 | 1982-02-20 00:00:00 | KING | 10 | 1981-11-17 | 1982-02-17 00:00:00
TURNER | 30 | 1981-09-08 | 1981-10-06 00:00:00 | MARTIN | 30 | 1981-09-28 | 1981-10-26 00:00:00
WARD | 30 | 1981-02-22 | 1981-03-22 00:00:00 | ALLEN | 30 | 1981-02-20 | 1981-03-20 00:00:00
(3 rows)
Cross table pivots
Use a cross table to show the sum of salary per job, by department. The shell \crosstabview meta-command displays rows as columns. The following statement sums the salaries across jobs and departments and displays them as a cross table:
SELECT job, dname, sum(sal)
FROM emp JOIN dept USING(deptno)
GROUP BY dname, job
\crosstabview
job | ACCOUNTING | SALES | RESEARCH
-----------+------------+-------+----------
PRESIDENT | 5000 | |
CLERK | 1300 | 950 | 1900
SALESMAN | | 5600 |
MANAGER | 2450 | 2850 | 2975
ANALYST | | | 6000
(5 rows)
ntile function
To send the e-mails to all employees in different batches, split them into three groups using the ntile() window function. Then format them using the format()
function, and aggregate them in a comma-separated list using the string_agg()
function:
WITH groups AS (
SELECT ntile(3) OVER (ORDER BY empno) group_num
,*
FROM emp
)
SELECT string_agg(format('<%s> %s',ename,email),', ')
FROM groups GROUP BY group_num;
string_agg
-------------------------------------------------------------------------------------------------------------------------------
<ADAMS> ADAMS@acme.org, <JAMES> JAMES@acme.org, <FORD> FORD@acme.com, <MILLER> MILLER@acme.com
<BLAKE> BLAKE@hotmail.com, <CLARK> CLARK@acme.com, <SCOTT> SCOTT@acme.com, <KING> KING@aol.com, <TURNER> TURNER@acme.com
<SMITH> SMITH@acme.com, <ALLEN> ALLEN@acme.com, <WARD> WARD@compuserve.com, <JONES> JONES@gmail.com, <MARTIN> MARTIN@acme.com
(3 rows)
GIN index on documents
The employee skills are stored in a semi-structured JSON document. You can query them using the @>
, ?
, ?&
, and ?|
operators. For best performance, index them using a GIN index. GIN indexes provide quick access to elements inside a JSON document.
(GIN indexes are only available in YugabyteDB v2.11.0 or later. If you are using an earlier version, skip this scenario.)
-
Create the GIN index on the JSON document.
CREATE INDEX emp_skills ON emp USING gin((other_info->'skills'));
This creates an index on the
skills
attributes in theother_info
JSON column. -
Query on the JSON attribute list. SQL queries can navigate into the JSON document using
->
and check if an array contains a value using@>
. The following searches the employees with the "SQL" skill.SELECT * FROM emp WHERE other_info->'skills' @> '"SQL"';
-
Explain the plan to verify that the index is used.
explain SELECT * FROM emp WHERE other_info->'skills' @> '"SQL"';
Thanks to the GIN index, this search doesn't need to read all documents.
Text search
SQL queries can search in text using the to_tsvector()
text search function to extract a list of words that can be compared. This exercise finds all department descriptions with the words 'responsible' and 'services' in it using a GIN index.
(GIN indexes are only available in YugabyteDB v2.11.0 or later. If you are using an earlier version, skip this scenario.)
-
Create a text search index on the description column. The following creates an index for the simple-grammar vector of words extracted from the department description:
CREATE INDEX dept_description_text_search ON dept USING gin (( to_tsvector('simple',description) ));
-
Query on description for matching words. The following compares the simple-grammar vector of words extracted from the department description with a word search pattern to find the departments that contain "responsible" and "service" in their description.
SELECT * FROM dept WHERE to_tsvector('simple',description) @@ to_tsquery('simple','responsible & services');
-
Explain the plan to verify that the index is used.
explain SELECT * FROM dept WHERE to_tsvector('simple',description) @@ to_tsquery('simple','responsible & services');
Thanks to the GIN index, this search doesn't need to read all rows and text.
Stored procedures
A stored procedure encapsulates procedural logic into an atomic operation. Use stored procedures to encapsulate transactions with error handling. The following example creates a procedure in PL/pgSQL, named "commission_transfer", that transfers a commission "amount" from empno1
to empno2
.
-
Create the procedure for the commission transfer between employees. The procedure has two SQL operations: decrease from
empno1
and add toempno2
. It also adds error checking to raise a custom exception ifempno1
doesn't have sufficient funds to transfer.CREATE OR REPLACE PROCEDURE commission_transfer(empno1 int, empno2 int, amount int) AS $$ begin update emp set comm=comm-commission_transfer.amount where empno=commission_transfer.empno1 and comm>commission_transfer.amount; if not found then raise exception 'Cannot transfer % from %',amount,empno1; end if; update emp set comm=comm+commission_transfer.amount where emp.empno=commission_transfer.empno2; if not found then raise exception 'Cannot transfer from %',empno2; end if; end; $$ language plpgsql;
CREATE PROCEDURE
-
Call the procedure with employee IDs and the amount to be transferred. The following CALL statement calls the stored procedure, with values for all parameters, transferring 100 from employee 7521 to employee 7654.
CALL commission_transfer(7521,7654,100);
CALL
-
List all employees who have received commission to verify the transfer:
SELECT * FROM emp WHERE comm IS NOT NULL;
empno | ename | job | mgr | hiredate | sal | comm | deptno | email | other_info -------+--------+----------+------+------------+------+------+--------+---------------------+------------ 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1350 | 400 | 30 | WARD@compuserve.com | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1350 | 1500 | 30 | MARTIN@acme.com | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1700 | 300 | 30 | ALLEN@acme.com | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1600 | 0 | 30 | TURNER@acme.com | (4 rows)
-
Call the procedure with an amount that is not available. The following attempts to transfer 1000000, which is more than what 7521 has available:
CALL commission_transfer(7521,7654,999999);
ERROR: Cannot transfer 999999 from 7521 CONTEXT: PL/pgSQL function commission_transfer(integer,integer,integer) line 5 at RAISE
This raises the "Cannot transfer" error defined in the procedure, and automatically reverts all intermediate changes to return to a consistent state.
Triggers
Use triggers to automatically update data. This example uses a trigger to record the last time each row is updated automatically.
-
Add a column to store the last update time.
ALTER TABLE dept ADD last_update timestamptz;
ALTER TABLE
-
Add a function to set the last update time. The following query uses the
transaction_timestamp()
function, which returns the current date and time at the start of the current transaction. A stored procedure declares some procedural logic that returns a value. In this case,dept_last_update()
returns the "new" state for a trigger after setting the "last_update" to the current time.CREATE OR REPLACE FUNCTION dept_last_update() RETURNS trigger AS $$ begin new.last_update:=transaction_timestamp(); return new; end; $$ language plpgsql;
CREATE FUNCTION
-
Use a trigger to call the procedure automatically. The following statement creates a trigger to execute the procedure on each row update of the
dept
table.CREATE trigger dept_last_update BEFORE UPDATE ON dept for each row EXECUTE PROCEDURE dept_last_update();
CREATE TRIGGER
-
Display the current state of the table. To verify the automatic logging of the last update time, the following displays the current state of departments before any update.
SELECT deptno,dname,loc,last_update FROM dept;
deptno | dname | loc | last_update --------+------------+----------+------------- 10 | ACCOUNTING | NEW YORK | 20 | RESEARCH | DALLAS | 30 | SALES | CHICAGO | 40 | OPERATIONS | BOSTON | (4 rows)
-
Update multiple rows in a single transaction. You can declare multiple updates in a single atomic transaction using
BEGIN TRANSACTION
andCOMMIT
. The following updates the location of departments 30 and 40 with a 3 second interval.BEGIN TRANSACTION; UPDATE dept SET loc='SUNNYVALE' WHERE deptno=30; SELECT pg_sleep(3); UPDATE dept SET loc='SUNNYVALE' WHERE deptno=40; COMMIT;
BEGIN UPDATE 1 pg_sleep ---------- (1 row) UPDATE 1 COMMIT
-
Display the new state of the table.
SELECT deptno,dname,loc,last_update FROM dept;
deptno | dname | loc | last_update --------+------------+-----------+------------------------------- 10 | ACCOUNTING | NEW YORK | 20 | RESEARCH | DALLAS | 30 | SALES | SUNNYVALE | 2022-01-11 22:15:34.831474+00 40 | OPERATIONS | SUNNYVALE | 2022-01-11 22:15:34.831474+00 (4 rows)
In addition to the changed location, the last update timestamp has been automatically set. Although the updates were done at 3 second intervals, they show the same update time because they were run in the same atomic transaction.
Materialized views
To get fast on-demand reports, create a materialized view to store pre-joined and pre-aggregated data.
(Materialized views are only available in YugabyteDB v2.11.2 or later. If you are using an earlier version, skip this scenario.)
-
Create the materialized view. This view stores the total salary per department, the number of employees, and the list of jobs in the department:
CREATE MATERIALIZED VIEW report_sal_per_dept AS SELECT deptno,dname, sum(sal) sal_per_dept, count(*) num_of_employees, string_agg(distinct job,', ') distinct_jobs FROM dept join emp using(deptno) GROUP BY deptno,dname ORDER BY deptno;
SELECT 3
-
Create an index on the view. This allows fast queries on a range of total salary:
CREATE INDEX report_sal_per_dept_sal ON report_sal_per_dept(sal_per_dept desc);
CREATE INDEX
-
You can schedule a daily refresh to recompute the view in the background. Use the REFRESH MATERIALIZED VIEW command to refresh the view:
REFRESH MATERIALIZED VIEW report_sal_per_dept;
REFRESH MATERIALIZED VIEW
-
Query the materialized view to show the data is consistent as of the last refresh. This lists the departments with a total salary lower than 10000:
SELECT * FROM report_sal_per_dept WHERE sal_per_dept<=10000 ORDER BY sal_per_dept;
deptno | dname | sal_per_dept | num_of_employees | distinct_jobs --------+------------+--------------+------------------+--------------------------- 10 | ACCOUNTING | 8750 | 3 | CLERK, MANAGER, PRESIDENT 30 | SALES | 9400 | 6 | CLERK, MANAGER, SALESMAN (2 rows)
-
The execution plan shows that no additional join or group by is needed when querying the materialized view:
EXPLAIN ANALYZE SELECT * FROM report_sal_per_dept WHERE sal_per_dept<=10000 ORDER BY sal_per_dept;
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan Backward using report_sal_per_dept_sal on report_sal_per_dept (cost=0.00..5.33 rows=10 width=84) (actual time=1.814..1.821 rows=2 loops=1) Index Cond: (sal_per_dept <= 10000) Planning Time: 0.143 ms Execution Time: 1.917 ms Peak Memory Usage: 8 kB (4 rows)