CDC using PostgreSQL replication protocol EARLY ACCESS
Overview
YugabyteDB CDC captures changes made to data in the database and streams those changes to external processes, applications, or other databases. CDC allows you to track and propagate changes in a YugabyteDB database to downstream consumers based on its Write-Ahead Log (WAL). YugabyteDB CDC captures row-level changes resulting from INSERT, UPDATE, and DELETE operations in the configured database and publishes it further to be consumed by downstream applications.
Highlights
Resilience
YugabyteDB CDC with PostgreSQL Logical Replication provides resilience as follows:
-
Following a failure of the application, server, or network, the replication can continue from any of the available server nodes.
-
Replication continues from the transaction immediately after the transaction that was last acknowledged by the application. No transactions are missed by the application.
Security
Because YugabyteDB is using the PostgreSQL Logical Replication model, the following applies:
-
The CDC user persona will be a PostgreSQL replication client.
-
A standard replication connection is used for consumption, and all the server-side configurations for authentication, authorizations, SSL modes, and connection load balancing can be leveraged automatically.
Guarantees
CDC in YugabyteDB provides the following guarantees.
GUARANTEE | DESCRIPTION |
---|---|
Per-slot ordered delivery guarantee | Changes from transactions from all the tables that are part of the replication slot's publication are received in the order they were committed. This also implies ordered delivery across all the tablets that are part of the publication's table list. |
At least once delivery | Changes from transactions are streamed at least once. Changes from transactions may be streamed again in case of restart after failure. For example, this can happen in the case of a Kafka Connect node failure. If the Kafka Connect node pushes the records to Kafka and crashes before committing the offset, it will again get the same set of records upon restart. |
No gaps in change stream | Receiving changes that are part of a transaction with commit time t implies that you have already received changes from all transactions with commit time lower than t. Thus, receiving any change for a row with commit timestamp t implies that you have received all older changes for that row. |
Key concepts
The YugabyteDB logical replication feature makes use of PostgreSQL concepts like replication slot, publication, replica identity, and so on. Understanding these key concepts is crucial for setting up and managing a logical replication environment effectively.
Getting started
Get started with YugabyteDB logical replication using the YugabyteDB Connector.
Monitoring
You can monitor the activities and status of the deployed connectors using the http end points provided by YugabyteDB.
YugabyteDB Connector
To capture and stream your changes in YugabyteDB to an external system, you need a connector that can read the changes in YugabyteDB and stream it out. For this, you can use the YugabyteDB Connector, which is based on the Debezium platform. The connector is deployed as a set of Kafka Connect-compatible connectors, so you first need to define a YugabyteDB connector configuration and then start the connector by adding it to Kafka Connect.
Limitations
-
LSN Comparisons Across Slots.
In the case of YugabyteDB, the LSNĀ does not represent the byte offset of a WAL record. Hence, arithmetic on LSN and any other usages of the LSN making this assumption will not work. Also, currently, comparison of LSN values from messages coming from different replication slots is not supported.
-
The following functions are currently unsupported:
pg_current_wal_lsn
pg_wal_lsn_diff
IDENTIFY SYSTEM
txid_current
pg_stat_replication
Additionally, the functions responsible for pulling changes instead of the server streaming it are unsupported as well. They are described in Replication Functions in the PostgreSQL documentation.
-
Restriction on DDLs
DDL operations should not be performed from the time of replication slot creation till the start of snapshot consumption of the last table.
-
There should be a primary key on the table you want to stream the changes from.
-
CDC is not supported on a target table for xCluster replication 11829.
-
Currently, CDC doesn't support schema evolution for changes that require table rewrites (for example, ALTER TYPE), or DROP TABLE and TRUNCATE TABLE operations.
-
YCQL tables aren't currently supported. Issue 11320.
-
Support for point-in-time recovery (PITR) is tracked in issue 10938.
-
Support for transaction savepoints is tracked in issue 10936.
-
Support for enabling CDC on Read Replicas is tracked in issue 11116.