postgresql-hll extension
The postgresql-hll extension adds a new data type hll, which is a HyperLogLog data structure. HyperLogLog is a fixed-size, set-like structure used for distinct value counting with tunable precision. For example, in 1280 bytes hll can estimate the count of tens of billions of distinct values with only a few percent error.
First, enable the extension:
CREATE EXTENSION "hll";
To run the helloworld example from the postgresql-hll repository, connect using ysqlsh and run the following:
CREATE TABLE helloworld (id integer, set hll);
CREATE TABLE
Insert an empty HLL as follows:
INSERT INTO helloworld(id, set) VALUES (1, hll_empty());
INSERT 0 1
Add a hashed integer to the HLL as follows:
UPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1;
UPDATE 1
Add a hashed string to the HLL as follows:
UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1;
UPDATE 1
Get the cardinality of the HLL as follows:
SELECT hll_cardinality(set) FROM helloworld WHERE id = 1;
hll_cardinality
-----------------
2
(1 row)
For a more advanced example, see the Data Warehouse Use Case.