Transaction retries in YSQL

YugabyteDB returns different error codes for the various scenarios that go wrong during transaction processing. Applications need to be designed to handle these scenarios correctly to be highly available so that users aren't impacted. Although most errors are common across multiple isolation levels, some errors are very specific to certain transaction isolation levels.

The examples in the following sections illustrate failure scenarios and techniques you can use to handle these failures in your applications.

Prerequisites

Follow the setup instructions to start a single local YugabyteDB instance, and create a table as follows:

Create the table as follows:

CREATE TABLE txndemo (
  k int,
  V int,
  PRIMARY KEY(k)
);

Add some data to the table as follows:

INSERT INTO txndemo VALUES (1,10),(2,10),(3,10),(4,10),(5,10);

Automatic retries

YugabyteDB retries failed transactions automatically on the server side whenever possible without client intervention as per the concurrency control policies. This is the case even for single statements, which are implicitly considered transactions.

In some scenarios, a server-side retry is not suitable. For example, the retry limit has been reached or the transaction is not in a valid state. In these cases, it is the client's responsibility to retry the transaction at the application layer.

Client-side retry

Most transaction errors that happen due to conflicts and deadlocks can be restarted by the client.

Scenarios where retries should be avoided

Retries should only be performed when they are deemed safe, and should be avoided in the following situations:

For unfamiliar errors, such as internal errors, to prevent potential data corruption.
For commit or auto-commit statements, as it's unclear whether the failure occurred before or after the commit.

The following scenarios describe the causes for failures, and the required methods to be handled by the applications.

Execute the transaction in a try..catch block in a loop. When a re-tryable failure happens, issue ROLLBACK and then retry the transaction. To avoid overloading the server and ending up in an indefinite loop, wait for some time between retries and limit the number of retries. The following illustrates a typical client-side retry implementation:

max_attempts = 10   # max number of retries
sleep_time = 0.002  # 2 ms - base sleep time
backoff = 2         # exponential multiplier

attempt = 0
while attempt < max_attempts:
    attempt += 1
    try :
        cursor = cxn.cursor()
        cursor.execute("BEGIN");

        # Execute transaction statments here

        cursor.execute("COMMIT");
        break
    except psycopg2.errors.SerializationFailure as e:
        cursor.execute("ROLLBACK")
        if attempt < max_attempts:
            time.sleep(sleep_time)
            sleep_time *= backoff

If the COMMIT is successful, the program exits the loop. attempt < max_attempts limits the number of retries to max_attempts, and the amount of time the code waits before the next retry also increases with sleep_time *= backoff. Choose values as appropriate for your application.

40001 - SerializationFailure

SerializationFailure errors happen when multiple transactions are updating the same set of keys (conflict). During a conflict, certain transactions are retried.

ERROR:  could not serialize access due to concurrent update (...)

The correct way to handle this error is with a retry loop with exponential backoff, as described in Client-side retry. When the UPDATE or COMMIT fails because of SerializationFailure, the code retries after waiting for sleep_time seconds, up to max_attempts.

Read Committed

In read committed isolation level, as the server retries internally, the client does not need to worry about handling SerializationFailure. Only transactions operating in repeated read and serializable levels need to handle serialization failures.

Another way to handle these failures is would be to rollback to a checkpoint before the failed statement and proceed further as described in Savepoints.

40001 - Deadlock detected

This error occurs when two or more transactions wait on each other to form a deadlock cycle. One or more of the transactions in the cycle are aborted and they fail with the following error:

ERROR:  deadlock detected (...)

Retries to handle this error are similar to serialization errors (40001).

Savepoints

Savepoints are named checkpoints that can be used to rollback just a few statements, and then proceed with the transaction, rather than aborting the entire transaction when there is an error.

Consider the following example that inserts a row [k=1, v=30]:

connstr = 'postgresql://yugabyte@localhost:5433/yugabyte'
cxn = psycopg2.connect(connstr)
cursor = cxn.cursor()
try:
    cursor.execute("BEGIN")

    # ... Execute other statements

    cursor.execute("SAVEPOINT before_insert")
    try:
        # insert a row
        cursor.execute("INSERT INTO txndemo VALUES (1,30)")
    except psycopg2.errors.UniqueViolation as e:
        print(e)
        # k=1 already exists in our table
        cursor.execute("ROLLBACK TO SAVEPOINT before_insert")
        cursor.execute("UPDATE txndemo SET v=30 WHERE k=1;")

    # ... Execute other statements
    cursor.execute("COMMIT")
except Exception as e:
  print(e)
  cursor.execute("ROLLBACK")

If the row [k=1] already exists in the table, the INSERT would result in a UniqueViolation exception. Technically, the transaction would be in an error state and further statements would result in a 25P02: In failed SQL transaction error. You have to catch the exception and rollback. But instead of rolling back the entire transaction, you can rollback to the previously declared savepoint before_insert, and update the value of the row with k=1. Then you can continue with other statements in the transaction.

Non-retriable errors

Although most transactions can be retried in most error scenarios, there are cases where retrying a transaction will not resolve an issue. For example, errors can occur when statements are issued out of place. These statements have to be fixed in code to continue further.

25001 - Specify transaction isolation level

Transaction level isolation should be specified before the first statement of the transaction is executed. If not the following error occurs:

BEGIN;

BEGIN
Time: 0.797 ms

UPDATE txndemo SET v=20 WHERE k=1;

UPDATE 1
Time: 10.416 ms

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

ERROR:  25001: SET TRANSACTION ISOLATION LEVEL must be called before any query
Time: 3.808 ms

25006 - Modify a row in a read-only transaction

This error occurs when a row is modified after specifying a transaction to be READ ONLY as follows:

BEGIN READ ONLY;

BEGIN
Time: 1.095 ms

UPDATE txndemo SET v=20 WHERE k=1;

ERROR:  25006: cannot execute UPDATE in a read-only transaction
Time: 4.417 ms

25P02 - InFailedSqlTransaction

This error occurs when a statement is issued after there's already an error in a transaction. The error message would be similar to the following:

ERROR:  25P02: current transaction is aborted, commands ignored until end of transaction block

Consider the following scenario:

BEGIN;

BEGIN
Time: 0.393 ms

INVALID TXN STATEMENT;

ERROR:  42601: syntax error at or near "INVALID"
Time: 2.523 ms

SELECT * from txndemo where k=1;

ERROR:  25P02: current transaction is aborted, commands ignored until end of transaction block
Time: 17.074 ms

The only valid statements at this point would be ROLLBACK or COMMIT.

42XXX - Syntax errors

Error codes starting with 42 typically relate to issues with SQL syntax, invalid references, or access permissions. Retrying these errors will likely have no effect unless the respective issue is fixed.

For the list of errors in the class 42 category, see Syntax Error or Access Rule Violation

Learn more

Transaction isolation levels - Various isolation levels supported by YugabyteDB.
Concurrency control - Policies to handle conflicts between transactions.
Transaction priorities - Priority buckets for transactions.
Transaction options - Options supported by transactions.
Designing a Retry Mechanism for Resilient Spring Boot Applications