Beyond pg_dump: A Simple Guide to PostgreSQL Point-in-Time Recovery

You have backups. You run pg_dump every night with a cron job. You feel safe. But are you?

A nightly dump protects you from a server catching fire. It does not protect you from yourself. If you run a bad migration at 10 AM and your last backup was from midnight you have lost ten hours of data. For many applications that is a fatal error.

The solution is not more frequent dumps. The solution is to think about backups differently. Instead of taking snapshots you should be recording a continuous history of changes. PostgreSQL has a built in mechanism for this. It is called Point in Time Recovery or PITR.

The Idea Behind PITR

Every change you make to your database first gets written to a journal. This journal is called the Write Ahead Log or WAL. It is a sequence of files that contain a record of every transaction.

PITR works by combining two things. A full physical backup of your database from some point in the past. And a continuous archive of all the WAL files generated since that backup.

To recover you start with the physical backup. Then you tell PostgreSQL to replay the WAL files from your archive stopping at the exact moment before your disaster. It is like having a time machine for your data.

Step 1: Enable Archiving

First you need to tell your primary database to start archiving its WAL files. You do this by changing a few lines in your postgresql.conf file.

Find this file. It is usually in /etc/postgresql/14/main/postgresql.conf or similar. You need to set three parameters.

# postgresql.conf

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/backups/wal_archive/%f'

wal_level = replica tells PostgreSQL to write enough information to the WAL to support replication and archiving.

archive_mode = on turns the feature on.

archive_command is the important part. This is a shell command that PostgreSQL will run every time a new WAL file is ready to be archived. The %p is a placeholder for the path to the WAL file and %f is the filename.

Here I am using a simple cp command to copy the files to a backup disk mounted at /mnt/backups/wal_archive. In a real setup you might use rsync to send it to a separate backup server.

Make sure the destination directory exists and the postgres user has permission to write to it. mkdir -p /mnt/backups/wal_archive chown postgres:postgres /mnt/backups/wal_archive

You must restart PostgreSQL for these changes to take effect.

sudo systemctl restart postgresql

Step 2: Take a Base Backup

Now that archiving is running you need to create the starting point for your recovery. This is not a pg_dump. A logical dump will not work. You need a physical copy of the database files. The easiest way to get one is with pg_basebackup.

This tool connects to your server and makes a binary copy of the entire data directory.

# Run this as the postgres user
sudo -u postgres pg_basebackup -D /mnt/backups/base -P -v --wal-method=stream

Let’s break that down.

After this command finishes you will have a complete snapshot of your database in /mnt/backups/base. And your archive_command will be continuously copying new WAL files to /mnt/backups/wal_archive. Your PITR setup is now active.

Step 3: Simulate a Disaster

Let’s see it work. Connect to your database and create some test data.

CREATE TABLE secrets (id serial primary key, content text not null);
INSERT INTO secrets (content) VALUES ('the first secret');
-- Note the current time. This is your recovery point.
SELECT now();
-- 2023-10-27 10:30:00.123456+00
INSERT INTO secrets (content) VALUES ('the second secret');

Imagine you run this query a minute later.

DROP TABLE secrets;

The table is gone. Your application is broken. A nightly pg_dump would not save the data you inserted this morning.

First thing to do is stop the database to prevent further damage.

sudo systemctl stop postgresql

Step 4: Recovering to a Point in Time

Now we will perform the recovery on a new clean data directory. In a real scenario this might be on a new server. For this test we will just use a new directory.

First prepare the new data directory. Let’s assume it is /var/lib/postgresql/14/recovered.

Stop the old server completely if you have not already. Then copy the base backup into your new data directory.

cp -a /mnt/backups/base/* /var/lib/postgresql/14/recovered/

Now you need to tell PostgreSQL how to recover. Create a file named recovery.signal in the new data directory. This is just an empty file that tells PostgreSQL to start in recovery mode.

touch /var/lib/postgresql/14/recovered/recovery.signal

Then you add recovery settings to postgresql.conf in the new directory.

# /var/lib/postgresql/14/recovered/postgresql.conf

restore_command = 'cp /mnt/backups/wal_archive/%f %p'
recovery_target_time = '2023-10-27 10:30:00'

The restore_command tells PostgreSQL how to fetch the WAL files from your archive. It is the inverse of your archive_command.

The recovery_target_time is where the magic happens. You are telling PostgreSQL to replay all transactions right up to that specific timestamp. This should be the time you noted just before the bad query.

Now fix the permissions and start PostgreSQL using this new data directory. The way you do this depends on your setup but often involves editing the systemd service file or using pg_ctl.

When PostgreSQL starts it will see the recovery.signal file. It will enter recovery mode. It will use the restore_command to fetch WAL files from your archive and replay them one by one until it reaches your target time. Then it will stop and become a normal active database.

Check the logs. You will see it replaying the WALs. Once it is ready you can connect.

SELECT * FROM secrets;

You will see 'the first secret'. You will not see 'the second secret' because it was inserted after your recovery time. And most importantly the DROP TABLE command was never replayed. You have successfully traveled back in time.

This seems like a lot of steps but the idea is simple. Take one physical backup and then continuously archive the journal of changes. This setup moves you from having nightly backups to having a true continuous backup. It is the difference between a safety net and a time machine.

Now that you see how it works what’s the one query you’re most afraid of running by accident? Thinking about it is the first step to protecting yourself. Try answering the prompt below.

— Rishi Banerjee
September 2025