Pgmoneta Dev Guide
Pgmoneta Dev Guide
Developer Guide
pgmoneta
Contents
1 Introduction 5
1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Installation 7
2.1 Fedora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 RHEL 9 / RockyLinux 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Compiling the source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 RHEL / RockyLinux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 FreeBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Compiling the documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Extension installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 Install pgmoneta_ext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.2 Verify success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.3 Granting SUPERUSER Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 C programming 13
3.1 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Git guide 13
4.1 Basic steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Start by forking the repository . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Clone your repository locally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Add upstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Do a work branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.3 Make the changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.4 Multiple commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.5 Rebase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.6 Force push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.7 Format source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.8 Repeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.9 Undo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Architecture 16
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1
pgmoneta
6 Encryption 21
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2 Encryption Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3 Encryption / Decryption CLI Commands . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3.1 decrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3.2 encrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7 RPM 26
7.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.2 Setup RPM development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.3 Create source package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.4 Create RPM package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8 Test 27
8.1 Container Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.1.1 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.1.2 Podman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.2 Test suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9 WAL Reader 29
9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9.2 pgmoneta-walinfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9.3 High-Level API Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9.3.1 Struct walfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2
pgmoneta
3
pgmoneta
10 Troubleshooting 54
10.1 Could not get version for server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
11 Acknowledgement 55
11.1 Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11.2 Committers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11.3 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
12 License 57
12.1 libart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4
pgmoneta
1 Introduction
Ideally, you would not need to do backups and disaster recovery, but that isn’t how the real World
works.
• Data corruption
• System failure
• Human error
• Natural disaster
and then it is up to the database administrator to get the database system back on-line, and to the
correct recovery point.
• Recovery Point Objective (RPO): Maximum targeted period in which data might be lost from an
IT service due to a major incident
• Recovery Time Objective (RTO): The targeted duration of time and a service level within which a
business process must be restored after a disaster (or disruption) in order to avoid unacceptable
consequences associated with a break in business continuity
You would like to have both of these as close to zero as possible, since RPO of 0 means that you won’t
lose data, and RTO of 0 means that your system recovers at once. However, that is easier said than
done.
pgmoneta is focused on having features that will allow database systems to get as close to these goals
as possible such that high availability of 99.99% or more can be implemented, and monitored through
standard tools.
1.1 Features
• Full backup
• Restore
• Compression (gzip, zstd, lz4, bzip2)
• AES encryption support
• Symlink support
• WAL shipping support
5
pgmoneta
• Hot standby
• Prometheus support
• Remote management
• Offline mode
• Transport Layer Security (TLS) v1.2+ support
• Daemon mode
• User vault
1.2 Platforms
• Fedora 38+
• RHEL 9
• RockyLinux 9
• FreeBSD
• OpenBSD
6
pgmoneta
2 Installation
2.1 Fedora
You need to add the PostgreSQL YUM repository, for example for Fedora 40
Additional information
• PostgreSQL YUM
• Linux downloads
x86_64
aarch64
We recommend using Fedora to test and run pgmoneta, but other Linux systems, FreeBSD and MacOS
are also supported.
pgmoneta requires
7
pgmoneta
• clang
• cmake
• make
• libev
• OpenSSL
• zlib
• zstd
• lz4
• bzip2
• systemd
• rst2man
• libssh
• libcurl
• libarchive
dnf install git gcc clang clang-analyzer cmake make libev libev-devel \
openssl openssl-devel \
systemd systemd-devel zlib zlib-devel \
libzstd libzstd-devel \
lz4 lz4-devel libssh libssh-devel \
libcurl libcurl-devel \
python3-docutils libatomic \
bzip2 bzip2-devel \
libarchive libarchive-devel
On RHEL / Rocky, before you install the required packages some additional repositories need to be
enabled or installed first.
Otherwise, if you have a Red Hat corporate account (you need to specify the company/organization
name in your account), you can register using
8
pgmoneta
Then use the dnf command for pgmoneta to install the required packages.
2.3.2 FreeBSD
git gcc cmake libev openssl libssh zlib-ng zstd liblz4 bzip2 curl \
py39-docutils libarchive
2.3.3 Build
2.3.3.1 Release build The following commands will install pgmoneta in the /usr/local hierar-
chy.
2.3.3.2 Debug build The following commands will create a DEBUG version of pgmoneta.
9
pgmoneta
• pandoc
• texlive
You will need the Eisvogel template as well which you can install through
wget https://github.com/Wandmalfarbe/pandoc-latex-template/releases/
download/2.4.2/Eisvogel-2.4.2.tar.gz
tar -xzf Eisvogel-2.4.2.tar.gz
mkdir -p $HOME/.local/share/pandoc/templates
mv eisvogel.latex $HOME/.local/share/pandoc/templates
2.4.0.1 Generate API guide This process is optional. If you choose not to generate the API HTML
files, you can opt out of downloading these dependencies, and the process will automatically skip the
generation.
Download dependencies
2.4.1 Build
These packages will be detected during cmake and built as part of the main build.
When you configure the extra parameter in the server section of pgmoneta.conf, it requires the
server side to have the pgmoneta_ext extension installed to make it work.
The following instructions can help you easily install pgmoneta_ext. If you encounter any problems,
please refer to the more detailed instructions in the DEVELOPERS documentation.
10
pgmoneta
After you have successfully installed pgmoneta, the following commands will help you install
pgmoneta_ext:
You need to add the pgmoneta_ext library for PostgreSQL in postgrersql.conf as well:
shared_preload_libraries = ’pgmoneta_ext’
psql
\c testdb
pgmoneta_ext_version
----------------------
0.1.0
(1 row)
Some functions in pgmoneta_ext require SUPERUSER privileges. To enable these, grant the repl
role superuser privileges using the command below. Please proceed with caution: granting superuser
11
pgmoneta
privileges bypasses all permission checks, allowing unrestricted access to the database, which can
pose security risks. We are committed to enhancing privilege security in future updates.
To revoke superuser privileges from the repl role, use the following command:
12
pgmoneta
3 C programming
pgmoneta is developed using the C programming language so it is a good idea to have some knowledge
about the language before you begin to make changes.
• C in a Nutshell
• 21st Century C
3.1 Debugging
In order to debug problems in your code you can use gdb, or add extra logging using the
pgmoneta_log_XYZ() API
4 Git guide
This is done by
13
pgmoneta
Do
cd pgmoneta
git remote add upstream https://github.com/pgmoneta/pgmoneta.git
Use
[#xyz] Description
as the commit message where [#xyz] is the issue number for the work, and Description is a
short description of the issue in the first line
for example. It is p for the first one, then s for the rest
4.2.5 Rebase
Always rebase
When you are done with your changes force push your branch
14
pgmoneta
Use
./uncrustify.sh
4.2.8 Repeat
Based on feedback keep making changes, squashing, rebasing and force pushing
4.2.9 Undo
Normally you can reset to an earlier commit using git reset <commit hash> --hard.
But if you accidentally squashed two or more commits, and you want to undo that, you need to know
where to reset to, and the commit seems to have lost after you rebased.
But they are not actually lost - using git reflog, you can find every commit the HEAD pointer has
ever pointed to. Find the commit you want to reset to, and do git reset --hard.
15
pgmoneta
5 Architecture
5.1 Overview
pgmoneta use a process model (fork()), where each process handles one Write-Ahead Log (WAL)
receiver to PostgreSQL.
A memory segment (shmem.h) is shared among all processes which contains the pgmoneta state
containing the configuration and the list of servers.
The configuration of pgmoneta (struct configuration) and the configuration of the servers
(struct server) is initialized in this shared memory segment. These structs are all defined in
pgmoneta.h.
All communication is abstracted using the struct message data type defined in messge.h.
Reading and writing messages are handled in the message.h (message.c) files.
16
pgmoneta
5.4 Memory
Each process uses a fixed memory block for its network communication, which is allocated upon
startup of the process.
That way we don’t have to allocate memory for each network message, and more importantly free it
after end of use.
5.5 Management
pgmoneta has a management interface which defines the administrator abilities that can be performed
when it is running. This include for example taking a backup. The pgmoneta-cli program is used
for these operations (cli.c).
The management interface is defined in management.h. The management interface uses its own
protocol which uses JSON as its foundation.
5.5.1 Write
17
pgmoneta
5.5.2 Read
The remote management functionality uses the same protocol as the standard management method.
However, before the management packet is sent the client has to authenticate using SCRAM-SHA-256
using the same message format that PostgreSQL uses, e.g. StartupMessage, AuthenticationSASL, Au-
thenticationSASLContinue, AuthenticationSASLFinal and AuthenticationOk. The SSLRequest message
is supported.
18
pgmoneta
libev is used to handle network interactions, which is “activated” upon an EV_READ event.
Each process has its own event loop, such that the process only gets notified when data related only to
that process is ready. The main loop handles the system wide “services” such as idle timeout checks
and so on.
5.7 Signals
The main process of pgmoneta supports the following signals SIGTERM, SIGINT and SIGALRM as a
mechanism for shutting down. The SIGABRT is used to request a core dump (abort()).
It should not be needed to use SIGKILL for pgmoneta. Please, consider using SIGABRT instead,
and share the core dump and debug logs with the pgmoneta community.
5.8 Reload
However, some configuration settings requires a full restart of pgmoneta in order to take effect. These
are
• hugepage
• libev
• log_path
• log_type
• unix_socket_dir
• pidfile
5.9 Prometheus
pgmoneta has support for Prometheus when the metrics port is specified.
19
pgmoneta
The metrics endpoint supports Transfer-Encoding: chunked to account for a large amount of
data.
5.10 Logging
5.11 Protocol
20
pgmoneta
6 Encryption
6.1 Overview
AES Cipher block chaining (CBC) mode and AES Counter (CTR) mode are supported in pgmoneta. The
default setup is no encryption.
CBC is the most commonly used and considered save mode. Its main drawbacks are that encryption is
sequential (decryption can be parallelized).
Along with CBC, CTR mode is one of two block cipher modes recommended by Niels Ferguson and
Bruce Schneier. Both encryption and decryption are parallelizable.
Longer the key length, safer the encryption. However, with 20% (192 bit) and 40% (256 bit) extra
workload compare to 128 bit.
aes | aes-256 | aes-256-cbc: AES CBC (Cipher Block Chaining) mode with 256 bit key
length
aes-192 | aes-192-cbc: AES CBC mode with 192 bit key length
aes-128 | aes-128-cbc: AES CBC mode with 128 bit key length
aes-256-ctr: AES CTR (Counter) mode with 256 bit key length
6.3.1 decrypt
Decrypt the file in place, remove encrypted file after successful decryption.
Command
21
pgmoneta
6.3.2 encrypt
Encrypt the file in place, remove unencrypted file after successful encryption.
Command
6.4 Benchmark
Test decrypt
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
22
pgmoneta
Stepping: 10
BogoMIPS: 5183.98
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 s
s ht syscall nx pdpe1gb rdtscp lm constant_tsc
rep_good nopl xtopology cpuid pni pclmulqdq
vmx ssse
3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes
xsave avx f16c rdrand hypervisor lahf_lm abm 3
dnowpr
efetch invpcid_single pti ssbd ibrs ibpb stibp
tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1
avx2 s
mep bmi2 erms invpcid rdseed adx smap clflushopt
xsaveopt xsavec xgetbv1 xsaves flush_l1d
arch_capa
bilities
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 1.5 MiB (6 instances)
L3: 12 MiB (1 instance)
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache
flushes, SMT vulnerable
Mds: Vulnerable: Clear CPU buffers attempted, no
microcode; SMT Host state unknown
Meltdown: Mitigation; PTI
Spec store bypass: Mitigation; Speculative Store Bypass disabled via
prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user
pointer sanitization
Spectre v2: Mitigation; Full generic retpoline, IBPB
conditional, IBRS_FW, STIBP conditional, RSB filling
Srbds: Unknown: Dependent on hypervisor status
Tsx async abort: Not affected
23
pgmoneta
24
pgmoneta
25
pgmoneta
7 RPM
7.1 Requirements
dnf install gcc rpm-build rpm-devel rpmlint make python bash coreutils
diffutils patch rpmdevtools chrpath
rpmdev-setuptree
cp pgmoneta-$VERSION.tar.gz ~/rpmbuild/SOURCES
QA_RPATHS=0x0001 rpmbuild -bb pgmoneta.spec
26
pgmoneta
8 Test
8.1.1 Docker
dnf update
docker --version
If you see the Docker version, then you have successfully installed Docker on Fedora.
8.1.2 Podman
podman --version
If you see the Podman version, then you have successfully installed Podman on Fedora.
27
pgmoneta
The podman-docker.noarch package simplifies the use of Podman for users accustomed to
Docker.
You can simply use CTest to test all PostgreSQL versions from 13 to 16. It will automatically run
testsuite.sh to test pgmoneta and pgmoneta_ext for each version. The script will automati-
cally create the Docker container, run it, and then use the check framework to test their functions
inside it. After that, it will automatically clean up everything for you.
chmod +x testsuite.sh
After you follow the DEVELOPERS.md to install pgmoneta, go to the directory /pgmoneta/build
and run the test.
make test
testsuite.sh accepts three variables. The first one is dir, which specifies the /test direc-
tory location, with a default value of ./. The second one is dockerfile, with a default value of
Dockerfile.rocky8. The third one is the PostgreSQL version, with a default value of 13.
28
pgmoneta
9 WAL Reader
9.1 Overview
This document provides an overview of the wal_reader tool, with a focus on the parse_wal_file
function, which serves as the main entry point for parsing Write-Ahead Log (WAL) files. Currently, the
function only parses the given WAL file and prints the description of each record. In the future, it will
be integrated with other parts of the code.
9.2 pgmoneta-walinfo
pgmoneta-walinfo is a command line utility designed to read and display information about
PostgreSQL Write-Ahead Log (WAL) files. The tool provides output in either raw or JSON format,
making it easy to analyze WAL files for debugging, auditing, or general information purposes.
In addition to standard WAL files, pgmoneta-walinfo also supports encrypted (aes) and com-
pressed WAL files in the following formats: zstd, gz, lz4, and bz2.
9.2.0.1 Usage
pgmoneta-walinfo
Command line utility to read and display Write-Ahead Log (WAL) files
Usage:
pgmoneta-walinfo <file>
Options:
-c, --config CONFIG_FILE Set the path to the pgmoneta.conf file
-o, --output FILE Output file
-F, --format Output format (raw, json)
-L, --logfile FILE Set the log file
-q, --quiet No output only result
--color Use colors (on, off)
-v, --verbose Output result
-V, --version Display version information
-?, --help Display help
9.2.0.2 Raw Output Format In raw format, the default, the output is structured as follows:
Resource Manager | Start LSN | End LSN | rec len | tot len | xid |
description (data and backup)
• Resource Manager: The name of the resource manager handling the log record.
• Start LSN: The start Log Sequence Number (LSN).
29
pgmoneta
• Red: Header information (resource manager, record length, transaction ID, etc.).
• Green: Description of the WAL record.
• Blue: Backup block references or additional data.
This format makes it easy to visually distinguish different parts of the WAL file for quick analysis.
The following section provides a high-level overview of how users can interact with the functions
and structures defined in the walfile.h file. These APIs allow you to read, write, and manage
Write-Ahead Log (WAL) files.
The walfile struct represents the core structure used for interacting with WAL files in PostgreSQL. A
WAL file stores a log of changes to the database and is used for crash recovery, replication, and other
purposes. Each WAL file consists of pages (each 8192 bytes by default), containing records that capture
database changes.
9.3.1.1 Fields:
• magic_number: Identifies the PostgreSQL version that created the WAL file. You can find more
info on supported magic numbers here.
• long_phd: A pointer to the extended header (long header) found on the first page of the WAL file.
This header contains additional metadata.
30
pgmoneta
• page_headers: A deque of headers representing each page in the WAL file, excluding the first
page.
• records: A deque of decoded WAL records. Each record represents a change made to the database
and contains both metadata and the actual data to be applied during recovery or replication.
The walfile.h file provides three key functions for interacting with WAL files: pgmoneta_read_walfile
, pgmoneta_write_walfile, and pgmoneta_destroy_walfile. These functions allow
users to read from, write to, and destroy WAL file objects, respectively.
9.3.2.1 pgmoneta_read_walfile
int pgmoneta_read_walfile(int server, char* path, struct walfile** wf);
9.3.2.1.1 Description: This function reads a WAL file from a specified path and populates a
walfile structure with its contents, including the file’s headers and records.
9.3.2.1.2 Parameters:
9.3.2.1.3 Return:
9.3.2.2 pgmoneta_write_walfile
int pgmoneta_write_walfile(struct walfile* wf, int server, char* path);
9.3.2.2.1 Description: This function writes the contents of a walfile structure back to disk, saving
it as a WAL file at the specified path.
31
pgmoneta
9.3.2.2.2 Parameters:
9.3.2.2.3 Return:
9.3.2.3 pgmoneta_destroy_walfile
void pgmoneta_destroy_walfile(struct walfile* wf);
9.3.2.3.1 Description: This function frees the memory allocated for a walfile structure, including
its headers and records.
9.3.2.3.2 Parameters:
9.4.1 parse_wal_file
This function is responsible for reading and parsing a PostgreSQL Write-Ahead Log (WAL) file.
32
pgmoneta
9.4.1.1 Parameters
• path: The file path to the WAL file that needs to be parsed.
• server_info: A pointer to a server structure containing information about the server.
9.4.1.2 Description The parse_wal_file function opens the WAL file specified by the path
parameter in binary mode and reads the WAL records. It processes these records, handling various cases
such as records that cross page boundaries, while ensuring correct memory management throughout
the process.
parse_wal_file("/path/to/wal/file", &my_server);
The image illustrates the structure of a WAL (Write-Ahead Logging) file in PostgreSQL, focusing on how
XLOG records are organized within WAL segments.
Source: https://www.interdb.jp/pg/pgsql09/03.html
A WAL segment, by default, is a 16 MB file, divided into pages of 8192 bytes (8 KB) each. The first page
contains a header defined by the XLogLongPageHeaderData structure, while all subsequent pages
33
pgmoneta
have headers described by the XLogPageHeaderData structure. XLOG records are written sequentially
in each page, starting at the beginning and moving downward.
The figure highlights how the WAL ensures data consistency by sequentially writing XLOG records in
pages, structured within larger 16 MB WAL segments.
In the context of the WAL reader, resource managers (rm) are responsible for handling different types
of records found within a WAL file. Each record in the WAL file is associated with a specific resource
manager, which determines how that record is processed.
Each resource manager is defined in the rm_[name].h header file and implemented in the corre-
sponding rm_[name].c source file.
In the rmgr.h header file, the resource managers are declared as an enum, with each resource
manager having a unique identifier.
Each resource manager implements the rm_desc function, which provides a description of the record
type associated with that resource manager. In the future, they will be extended to implement the
rm_redo function to apply the changes to another server.
The WAL structure has evolved across PostgreSQL versions 13 to 17, requiring different handling for
each version. To accommodate these differences, we have implemented a wrapper-based approach,
such as the factory pattern, to handle varying WAL structures.
Below are the commit hashes for the officially supported magic values in each PostgreSQL version:
34
pgmoneta
struct xl_end_of_recovery_v16 {
timestamp_tz end_time;
timeline_id this_timeline_id;
timeline_id prev_timeline_id;
};
struct xl_end_of_recovery_v17 {
timestamp_tz end_time;
timeline_id this_timeline_id;
timeline_id prev_timeline_id;
int wal_level;
};
struct xl_end_of_recovery {
int pg_version;
union {
struct xl_end_of_recovery_v16 v16;
struct xl_end_of_recovery_v17 v17;
} data;
void (*parse)(struct xl_end_of_recovery* wrapper, const void* rec);
char* (*format)(struct xl_end_of_recovery* wrapper, char* buf);
};
return wrapper;
}
35
pgmoneta
;
}
This section lists the changes in the WAL format between different versions of PostgreSQL.
9.6.1 xl_clog_truncate
17
struct xl_clog_truncate
{
int64 pageno; /**< The page number of the clog to truncate
*/
transaction_id oldestXact; /**< The oldest transaction ID to retain */
oid oldestXactDb; /**< The database ID of the oldest transaction
*/
};
16
36
pgmoneta
struct xl_clog_truncate
{
int64 pageno; /**< The page number of the clog to truncate
*/
transaction_id oldestXact; /**< The oldest transaction ID to retain */
oid oldestXactDb; /**< The database ID of the oldest transaction
*/
};
9.6.2 xl_commit_ts_truncate
17:
16:
9.6.3 xl_heap_prune
17:
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID
follows,
* unaligned
*/
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
16:
37
pgmoneta
9.6.4 xlhp_freeze_plan
Removed xl_heap_freeze_page
17:
9.6.5 spgxlogState
17:
16:
38
pgmoneta
bool isBuild;
} spgxlogState;
9.6.6 xl_end_of_recovery
16:
16 → 15
9.6.7 gingxlogSplit
15:
39
pgmoneta
RelFileNode node;
BlockNumber rrlink; /* right link, or root’s blocknumber if
root
* split */
BlockNumber leftChildBlkno; /* valid on a non-leaf split */
BlockNumber rightChildBlkno;
uint16 flags; /* see below */
} ginxlogSplit;
9.6.8 gistxlogDelete
16:
15:
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
} gistxlogDelete;
#define SizeOfGistxlogDelete (offsetof(gistxlogDelete, ntodelete) +
sizeof(uint16))
9.6.9 gistxlogPageReuse
16:
40
pgmoneta
BlockNumber block;
FullTransactionId snapshotConflictHorizon;
bool isCatalogRel; /* to handle recovery conflict during
logical
* decoding on standby */
} gistxlogPageReuse;
#define SizeOfGistxlogPageReuse (offsetof(gistxlogPageReuse, isCatalogRel)
+ sizeof(bool))
15:
9.6.10 xl_hash_vacuum_one_page
16:
15:
41
pgmoneta
9.6.11 xl_heap_prune
16:
15:
9.6.12 xl_heap_freeze_plan
16:
15:
42
pgmoneta
uint16 t_infomask2;
uint16 t_infomask;
uint8 frzflags;
} xl_heap_freeze_tuple;
9.6.13 xl_heap_freeze_page
16:
/*
* In payload of blk 0 : FREEZE PLANS and OFFSET NUMBER ARRAY
*/
} xl_heap_freeze_page;
15:
9.6.14 xl_btree_reuse_page
16:
15:
43
pgmoneta
{
RelFileNode node;
BlockNumber block;
FullTransactionId latestRemovedFullXid;
} xl_btree_reuse_page;
9.6.15 xl_btree_delete
16:
/*----
* In payload of blk 0 :
* - DELETED TARGET OFFSET NUMBERS
* - UPDATED TARGET OFFSET NUMBERS
* - UPDATED TUPLES METADATA (xl_btree_update) ARRAY
*----
*/
} xl_btree_delete;
15:
9.6.16 spgxlogVacuumRedirect
16:
44
pgmoneta
15:
15 → 14
9.6.17 xl_xact_prepare
15:
45
pgmoneta
14:
9.6.18 xl_xact_parsed_commit
15:
int nsubxacts;
TransactionId *subxacts;
46
pgmoneta
int nrels;
RelFileNode *xnodes;
int nstats;
xl_xact_stats_item *stats;
int nmsgs;
SharedInvalidationMessage *msgs;
XLogRecPtr origin_lsn;
TimestampTz origin_timestamp;
} xl_xact_parsed_commit;
14:
int nsubxacts;
TransactionId *subxacts;
int nrels;
RelFileNode *xnodes;
int nmsgs;
SharedInvalidationMessage *msgs;
XLogRecPtr origin_lsn;
TimestampTz origin_timestamp;
} xl_xact_parsed_commit;
47
pgmoneta
9.6.19 xl_xact_parsed_abort
15:
int nsubxacts;
TransactionId *subxacts;
int nrels;
RelFileNode *xnodes;
int nstats;
xl_xact_stats_item *stats;
XLogRecPtr origin_lsn;
TimestampTz origin_timestamp;
} xl_xact_parsed_abort;
14:
int nsubxacts;
TransactionId *subxacts;
int nrels;
RelFileNode *xnodes;
XLogRecPtr origin_lsn;
TimestampTz origin_timestamp;
} xl_xact_parsed_abort;
48
pgmoneta
15:
#define BKPIMAGE_COMPRESSED(info) \
((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4 | \
BKPIMAGE_COMPRESS_ZSTD)) != 0)
14:
14 → 13
9.6.21 xl_heap_prune
14:
13:
49
pgmoneta
9.6.22 xl_heap_vacuum
14:
13:
9.6.23 xl_btree_metadata
14:
13:
50
pgmoneta
9.6.24 xl_btree_reuse_page
14:
13:
9.6.25 xl_btree_delete
14:
13:
51
pgmoneta
9.6.26 xl_btree_unlink_page
14:
/*
* Information needed to recreate a half-dead leaf page with correct
* topparent link. The fields are only used when deletion operation’s
* target page is an internal page. REDO routine creates half-dead
page
* from scratch to keep things simple (this is the same convenient
* approach used for the target page itself).
*/
BlockNumber leafleftsib;
BlockNumber leafrightsib;
BlockNumber leaftopparent; /* next child down in the subtree */
13:
/*
* Information needed to recreate the leaf page, when target is an
* internal page.
*/
BlockNumber leafleftsib;
BlockNumber leafrightsib;
BlockNumber topparent; /* next child down in the branch */
52
pgmoneta
For more details on the internal workings and additional helper functions used in parse_wal_file,
refer to the source code in wal_reader.c.
53
pgmoneta
10 Troubleshooting
If you get this FATAL during startup check your PostgreSQL logins
psql postgres
and
Setting log_level to DEBUG5 in pgmoneta.conf could provide more information about the
error.
54
pgmoneta
11 Acknowledgement
11.1 Authors
11.2 Committers
11.3 Contributing
• Ask a question
• Raise an issue
• Feature request
• Code submission
55
pgmoneta
Consider giving the project a star on GitHub if you find it useful. And, feel free to follow the project on
Twitter as well.
56
pgmoneta
12 License
BSD-3-Clause
12.1 libart
Our adaptive radix tree (ART) implementation is based on The Adaptive Radix Tree: ARTful Indexing for
Main-Memory Databases and libart which has a 3-BSD license as
57
pgmoneta
58