Skip to main content

Command Palette

Search for a command to run...

The Easiest Way to Clone a PostgreSQL Database

Published
1 min read
J

On multiple occasions in my career I've been tasked with creating production data samples and setting up environments to use this data. Seeing a pattern here, and having spent a lot of time working on perfecting this, I built a business out of this (https://www.redactics.com) and later found my co-founders to take the original concept, generalize it, and extend the original concept into full-fledged data privacy product with its initial focus around bringing safe production data into test, development (local and cloud-based), and demo environments.

If you use a managed database service like AWS RDS or Google Cloud SQL, it is super easy to create a snapshot and create a new instance based on this snapshot, but there is an even easier way to clone a database. At Redactics our use case was building automated workflows to test your database migrations against a copy of your production database (and, incidentally, this is free for use in case this is of interest to you), but there are likely dozens of additional use cases. For example, you may want to A/B test against a particular database or query optimization.

This approach leverages PostgreSQL's support for Linux pipes, and the fact that pg_dump output can be directed into pg_restore as input. This approach means that what would normally be two steps can be combined into one, and you don't need to set aside a bunch of disk space for this task.

pg_dump -h [DB_hostname] -U [username] -W -d [source_database] -v -Fc | pg_restore -h [DB_hostname] -U [username] -W -v -d [destination_database]

Since you likely need for this to be run without interaction, i.e. entering the database password, you might need to create a .pgpass file or leverage environment variables for storing this info. This will allow you to remove the -h, -U, and -W arguments.