The Easiest Way to Clone a PostgreSQL Database

PublishedDecember 6, 2022

•1 min read

On multiple occasions in my career I've been tasked with creating production data samples and setting up environments to use this data. Seeing a pattern here, and having spent a lot of time working on perfecting this, I built a business out of this (https://www.redactics.com) and later found my co-founders to take the original concept, generalize it, and extend the original concept into full-fledged data privacy product with its initial focus around bringing safe production data into test, development (local and cloud-based), and demo environments.

If you use a managed database service like AWS RDS or Google Cloud SQL, it is super easy to create a snapshot and create a new instance based on this snapshot, but there is an even easier way to clone a database. At Redactics our use case was building automated workflows to test your database migrations against a copy of your production database (and, incidentally, this is free for use in case this is of interest to you), but there are likely dozens of additional use cases. For example, you may want to A/B test against a particular database or query optimization.

This approach leverages PostgreSQL's support for Linux pipes, and the fact that pg_dump output can be directed into pg_restore as input. This approach means that what would normally be two steps can be combined into one, and you don't need to set aside a bunch of disk space for this task.

pg_dump -h [DB_hostname] -U [username] -W -d [source_database] -v -Fc | pg_restore -h [DB_hostname] -U [username] -W -v -d [destination_database]

Since you likely need for this to be run without interaction, i.e. entering the database password, you might need to create a .pgpass file or leverage environment variables for storing this info. This will allow you to remove the -h, -U, and -W arguments.

#postgresql #databases #devops

Comments

Join the discussion

No comments yet. Be the first to comment.

More from this blog

How To Get the Most Out of Airflow's Dynamic Task Mapping

Dynamic task mapping (DTM) is a major feature that adds a lot of flexibility to how you build your DAGs. However, it does not provide infinite flexibility and break you free of being beholden to Airflow's patterns. For example, your task mappings are...

Dec 5, 20223 min read

How To Automate Database Migration Testing/Dry-runs in Your CI/CD Pipelines

The most logical starting place is to stand up a new database cluster based on a snapshot, especially if you are using a managed database service like AWS RDS where doing so is easy. However, this still requires some manual steps or some sort of scri...

Nov 28, 20227 min read

How To Poll an Airflow Job (i.e. DAG Run)

Ever wanted to actually know when an Airflow DAG Run has completed? Perhaps your use case involves this completed work being some sort of workflow dependency, or perhaps it is used in a CI/CD pipeline. I'm sure there are a myriad of possible scenario...

Nov 22, 20221 min read

What Every Developer Should Know About Data Security/Privacy

There are studies, including this one, that show that the vast majority of data breaches are due to human error. Some of this human error is the result of being vulnerable to manipulation, social engineering, etc. Other forms are due to bad practices...

Nov 18, 20227 min read

Redactics: Supercharge your Data Architecture

13 posts

Command Palette

Comments

More from this blog