Tags · MironAtHome/yugabyte-db

2.27.0.0-b376

[PLAT-17905]: Prevent non-restart upgrades when universe nodes are in…

…-transit.

Summary: Add a check to prevent non-restart upgrades when universe nodes are in-transit

Test Plan: unit test

Reviewers: anijhawan

Reviewed By: anijhawan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D45598

Jul 28, 2025
8c752d8
zip
tar.gz

2.20.12.0-b23

[BACKPORT 2.20][PLAT-17905]: Prevent non-restart upgrades when univer…

…se nodes are in-transit.

Summary:
Add a check to prevent non-restart upgrades when universe nodes are in-transit
Original diff/commit: 8c752d8 / D45598

Test Plan: unit test

Reviewers: anijhawan, dkumar

Reviewed By: dkumar

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D45628

Jul 28, 2025
034f6e5
zip
tar.gz

2025.1.0.1-b1

[BACKPORT 2025.1.0][PLAT-18178]: Fix K8s software upgrade rollback af…

…ter catalog upgrade failure

Summary:
**Problem**:
During a Kubernetes software upgrade rollback involving PostgreSQL major version upgrades (e.g., 2024.2 → 2025.1), a critical issue occurs when the catalog upgrade phase failed during software uprgade. In this scenario, only master nodes are upgraded to the target version (2025.1) while all tservers remain on the previous version (2024.2). When attempting to update PostgreSQL compatibility flags during the rollback process, we face a dilemma:
 - We cannot set the compatibility flag to version 2024.2 because the catalog upgrade needs to be rolled back, for which we need master on 2025.1
 - We cannot set the compatibility flag to version 2025.1 because the catalog upgrade was never completed
This creates a deadlock situation where the gflags upgrade task cannot proceed safely.

**Root Cause:**
The issue is specific to Kubernetes universes because both gflags upgrades and software upgrades are performed through Helm upgrades, which require specifying the software version during gflags operations. This differs from VM deployments, where controlled flag upgrades can be performed in mixed software mode.

**Solution:**
Implemented a tracking mechanism to store whether all tservers were successfully upgraded during the software upgrade process. This information is captured in the prevYBSoftwareConfig and used during rollback to determine if the PostgreSQL compatibility flags should be updated:
- If all tservers were upgraded, we can safely perform the gflags upgrade during rollback
- If not all tservers were upgraded, we skip the gflags update since the flags are already set from the failed software upgrade task

Original diff/commit: f5fdf87/D45616

Test Plan:
Tested manually by making a K8s software upgrade fail during catalog upgrade and rolling back the software version successfully.
Verified that the rollback upgrade works when the software upgrade is completed.

Reviewers: anijhawan, hsunder, anabaria

Reviewed By: anabaria

Subscribers: yash.priyam, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D45622

Jul 27, 2025
de4a14a
zip
tar.gz

2.27.0.0-b375

[yugabyte#28084] YSQL: [pg15 upgrade] Use RPC bind address on master

Summary:
Use RPC bind address on master just like we do on tserver.
This is needed for kubernetes like deployments where the node_name and bind addresses are different, and the cert name is not the hostname.
Jira: DB-17714

Test Plan: jenkins: urgent

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D45608

Jul 26, 2025
eaca126
zip
tar.gz

2025.1.1.0-b88

[BACKPORT 2025.1][yugabyte#26912] YSQL: Fix flaky test PgDDLConcurren…

…cyTest.IndexCreation

Summary:
The test PgDDLConcurrencyTest.IndexCreation is flaky. It runs concurrent
create index statements to trigger race conditions that can cause some of the create
index statements to fail. The test verifies that when create index aborts, the
PG backend's DDL state is properly reset. The test has a set of expected errors that
are suppressed. The test fails if an unexpected error is encountered, or the test
itself times out after 10 minutes.

When the test fails the following error is found:

```
Bad status: Network error (yb/yql/pgwrapper/libpq_utils.cc:457): Execute of 'CREATE INDEX IF NOT EXISTS t0_v ON t0(v)' failed: 7, message: ERROR:  timed out waiting for postgres backends to catch up
DETAIL:  2 backends on database 13515 are still behind catalog version 15.
HINT:  Run the following query on all tservers to find the lagging backends: SELECT * FROM pg_stat_activity WHERE backend_type != 'walsender' AND backend_type != 'yb-conn-mgr walsender' AND catalog_version < 15 AND datid = 13515; (pgsql error XX000) (aux msg ERROR:  timed out waiting for postgres backends to catch up

```

I found two reasons for the test flakiness:

(1) The test can fail due to `WaitForYsqlBackendsCatalogVersion` timed out.
Because we do not officially support concurrent DDLs, it can happen that a PG
backend cannot update its local catalog version because it is calling
`WaitForYsqlBackendsCatalogVersion`. If another PG backend also stucks for the
same reason, we can have a deadlock like situation until
`WaitForYsqlBackendsCatalogVersion` times out.
By default --ysql_yb_wait_for_backends_catalog_version_timeout=300000ms (5 min).
It only takes two `WaitForYsqlBackendsCatalogVersion` timeout before the test itself
times out.

(2) Even when `WaitForYsqlBackendsCatalogVersion` times out, the test will check
for the returned error to see if it should be suppressed, and PG will usually append
the following message to the error message:

```
[ts-1] 2025-07-24 11:58:42.372 GMT [56059] CONTEXT:  Catalog Version Mismatch: A DDL occurred while processing this query. Try again.
```

The test has
```
Status SuppressAllowedErrors(const Status& s) {
  if (HasTransactionError(s) || IsRetryable(s)) {
    return Status::OK();
  }
  return s;
}

bool IsRetryable(const Status& status) {
  static const auto kExpectedErrors = {
      "Try again",
      "Catalog Version Mismatch",
      "Restart read required",
      "schema version mismatch for table"
  };
  return HasSubstring(status.message(), kExpectedErrors);
}

```
which means on seeing "Try again" in the error message, the test will continue
its execution and will not fail unless due to timeout as described in (1).

However, PG only appends the "Try again" message in the common case, in other
uncommon situations (e.g., when `need_global_cache_refresh` is false), PG does
not append the "Try again" message. When that happens, the error is not
suppressed and the test fails with just the error show above.

To fix the test failure, I made two changes:

(1) changed two gflags to have smaller values:
--wait_for_ysql_backends_catalog_version_client_master_rpc_timeout_ms
from 20s to 2s
--ysql_yb_wait_for_backends_catalog_version_timeout
from 300s to 30s

(2) if the error contains "waiting for postgres backends to catch up",
suppress the error and let the test continue to execute.

Jira: DB-16337

Original commit: 88880a4 / D45593

Test Plan:
./yb_build.sh release --cxx-test pgwrapper_pg_ddl_concurrency-test --gtest_filter PgDDLConcurrencyTest.IndexCreation -n 200

Backport-through: 2025.1

The test seems stable in 2024.2, 2024.1 and 2.20. Probably some code changes have happened
that caused the flakiness. For example, some PG error handling code may have changed so that
earlier we always had "Try again" in the error text and the error was suppressed.

Reviewers: jason, sanketh

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D45604

Jul 25, 2025
d51d106
zip
tar.gz

2025.1.1.0-b87

[BACKPORT 2025.1][yugabyte#27267] Backup, Tests: Test dump_role_check…

…s flag against YBC

Summary:
Original commit: 083be7c / D44990

The tests were implemented for yb_backup.py in the commit: 06bb2e5 / D41975

This diff enables the tests against YBC backup process.
All 5 tests are implemented via `TestYbBackup::doTestBackupRestoreRoles`.

Test Plan:
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithDumpRoleChecks
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithoutDumpRoleChecks
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRestoreRoles
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithoutUseRoles
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithoutRestoreRoles

NOTE: YB_TEST_YB_CONTROLLER=1 by default

Reviewers: mihnea, sanketh

Reviewed By: sanketh

Subscribers: yql, dshubin, vkumar

Differential Revision: https://phorge.dev.yugabyte.com/D45550

Jul 25, 2025
f4020d3
zip
tar.gz

2025.1.1.0-b86

[BACKPORT 2025.1][yugabyte#27705, yugabyte#26393, yugabyte#28039] Doc…

…db: Handle out of order messages for table locks.

Summary:
Original commit: 7912ca0 / D44898
Also included is a follow-up fix: cb345c2 / D45508
If there are out of order messages, caused by duplicate/retrying acquire lock messages,
that end up being processed after the corresponding release has been processed; we need
a mechanism to clean the locks that may be taken to prevent lock leakage.

Each Acquire request comes in with a `ignore_after_hybrid_time`, that specifies the time
after which it may be ignored. The main change here is to track Acquire operations's `ignore_after_hybrid_time`s ensure that
they get cleaned up twice.
 - Once at the end of the transaction; and
 - again once after we are past the corresponding max of `ignore_after_hybrid_time`s.

This information is tracked in the ts_local_lock_manager, and will be used to schedule
the duplicate release.

Jira: DB-17306, DB-15754, DB-17662

Test Plan:
Added test to cause out-of-order/duplication of acquire message between
a) LocalTServer -> Master; and
b) Master -> Dest TServer.

yb_build.sh fastdebug --cxx-test pg_object_locks-test --test_args --vmodule=tablet_service=0,libpq_utils=2,object_lock_manager=3,object_lock_info_manager=3,ts_local*=3,pg_client_session=2,pg_txn_manager=2,ysql_ddl_*=3,transaction_participant=0,async_rpc_task*=3 --gtest_filter *TestOutOfOrderMessageHandling*/*

Note that the PgPerform call from Pg-backend -> Local TServer does not contribute to  out-of-order lock acquire, as it does not retry upon failure(s).

Reviewers: bkolagani, rthallam, zdrudi

Reviewed By: bkolagani

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D45594

Jul 25, 2025
56a52c8
zip
tar.gz

2024.2.5.0-b31

[BACKPORT 2024.2][yugabyte#27267] Backup, Tests: Test dump_role_check…

…s flag against YBC

Summary:
Original commit: 083be7c / D44990

The tests were implemented for yb_backup.py in the commit: 06bb2e5 / D41975

This diff enables the tests against YBC backup process.
All 5 tests are implemented via `TestYbBackup::doTestBackupRestoreRoles`.
Jira: DB-16748

Test Plan:
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithDumpRoleChecks
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithoutDumpRoleChecks
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRestoreRoles
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithoutUseRoles
YB_TEST_YB_CONTROLLER=1 ./yb_build.sh --java-test org.yb.pgsql.TestYbBackup#testBackupRolesWithoutRestoreRoles

NOTE: YB_TEST_YB_CONTROLLER=1 by default

Reviewers: mihnea, sanketh

Reviewed By: sanketh

Subscribers: yql, dshubin, vkumar

Differential Revision: https://phorge.dev.yugabyte.com/D45551

Jul 25, 2025
1943cce
zip
tar.gz

2024.2.5.0-b30

[BACKPORT 2024.2][yugabyte#27996, yugabyte#28032] Docdb: Fix log spew…

… in transaction.cc

Summary:
Original commit: 3aa0773 / D45447
transaction.cc is meant to log the transaction's trace whenever the time taken by the transaction is `> FLAGS_txn_slow_op_threshold_ms`

However, after the refactor in a0e6d55
The log line does not special case the fact that the flag being 0 should disable printing the log line.

This causes log spew as `FLAGS_txn_slow_op_threshold_ms` defaults to 0.

We should only consider printing the trace if this flag is non-zero.

Also includes a4dc4d3 / D45498
Jira: DB-17617, DB-17653

Test Plan: yb_build.sh fastdebug --cxx-test pg_mini-test --gtest_filter PgMiniTestTracing/PgMiniTestTracing.Tracing/*

Reviewers: rthallam, bkolagani

Reviewed By: bkolagani

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D45571

Jul 25, 2025
c3d9853
zip
tar.gz

2.27.0.2337-b1

Bumping version to 2.27.0.2337 on branch 2.27.0.2337

Jul 25, 2025
1e2d5f4
zip
tar.gz

Previous Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2.27.0.0-b376

2.20.12.0-b23

2025.1.0.1-b1

2.27.0.0-b375

2025.1.1.0-b88

2025.1.1.0-b87

2025.1.1.0-b86

2024.2.5.0-b31

2024.2.5.0-b30

2.27.0.2337-b1

Tags: MironAtHome/yugabyte-db