Mohamed Wadie Nsiri
on 18 April 2023
PostgreSQL high availability made charmingly easy
In a previous blog, we talked about patterns to run a database in a highly available manner. In this blog, we present our open source recipe for PostgreSQL high availability.
Benefits of a highly available PostgreSQL
A good recipe should always start by exposing the benefits of the concerned meal. Therefore, we will start this section with a brief introduction to the benefits of using PostgreSQL and why high availability is, often, a mandate.
Benefits of consuming the elephant’s services
PostgreSQL has been successfully used for more than three decades in all IT sectors. Its maturity and its vibrant community consistently make it a first choice database among developers, as the latest survey from StackOverflow shows. PostgreSQL is also known for its excellent support of the SQL standard and for its versatility with more than 1000 extensions available.
Now that we have an overview of PostgreSQL, we will discuss why running only one instance of PostgreSQL is generally a bad idea.
Why one is not enough
Reports like ManageForce’s or Phenom’s estimated the cost of an outage to be around $500,000 per hour, on average. A database outage often cascades to its consuming applications and causes a degradation on the dependent services. Therefore, a database outage is one of the worst scenarios for any company.
Unfortunately, there is more to outages than impact on revenue. According to IDC’s 2021 Worldwide State of Data Protection and DR Survey, an outage might also lead to loss of productivity, data loss and reputation damage.
This is why running a highly available PostgreSQL deployment is beneficial for your company. In the next section, we will discuss the components we choose – at Canonical – to automate the deployment of H.A. PostgreSQL clusters.
The open-source ingredients
The following diagram provides an overview on the components we use:
The following table lists the versions we will use:
Component | Purpose/Functions | Version (major) |
PostgreSQL | Database server. | 14 |
PgBouncer | Connection pooling and failover. | 1 |
Patroni | Availability and replication manager. | 3 |
pgBackRest | Backup and restore. | 2 |
Charmed operator | Software operator for PostgreSQL to help us automate tasks like deployment and scaling. | 14/stable |
Juju | Operator framework allowing our PostgreSQL operator to run on top of many clouds including K8s, OpenStack, AWS, Azure and GCP. | 3.4 |
LXD | Manager of virtual machines and containers. | 5.21 |
Let’s cover how to install these components.
Our recipe for PostgreSQL high availability
Creating a playground
We will start by creating a virtual machine to serve as a sandbox. In this blog, we will use LXD to create our VM. So let’s start by installing LXD:
sudo snap install lxd --channel=5.21/stable
sudo lxd init --auto
Let’s now create the VM and connect to it:
sudo lxc launch ubuntu:24.04 pg --vm -c limits.cpu=4 -c limits.memory=12GiB
sudo lxc exec pg bash
su -l ubuntu
Preparing the environment
Now that we have a playground, the first step is to install Juju, our operator framework:
sudo snap install juju --channel=3.4/stable mkdir -p /home/ubuntu/.local/share
Next, we will again install LXD but within the created VM. This time it will help us emulate a local cloud by spawning containers as needed by Juju.
sudo snap install lxd --channel=5.21/stable sudo lxd init --auto
We will also disable IPV6 as follows:
sudo lxc network set lxdbr0 ipv6.address none
sudo iptables -P FORWARD ACCEPT
The next step is to bootstrap Juju by performing the following commands:
juju bootstrap Clouds aws aws-china aws-gov azure azure-china equinix google localhost oracle Select a cloud [localhost]: localhost Enter a name for the Controller [localhost-localhost]: demo-postgres
We will start by creating a model (you can think namespace) to host our PostgreSQL instances.
juju add-model mymodel
Deploying our first PostgreSQL instance
Finally, we can start deploying a single PostgreSQL instance:
juju deploy postgresql db1 --channel 14/stable --config profile=testing
Please note that the first time you run the above commands, it might take Juju several minutes to download the required charm, the software operator, and its dependencies.
In order to check the status of the deployment, you can use the following command:
juju status # You can type ‘juju status --watch 1s’ to continuously monitor the output
For more details around what is happening in the background you can type:
juju debug-log --tail --replay
After a few minutes, (your mileage may vary), you should get an output similar to the following after typing juju status:
Model Controller Cloud/Region Version SLA Timestamp mymodel demo-postgres localhost/localhost 3.4.5 unsupported 08:07:27Z App Version Status Scale Charm Channel Rev Exposed Message db1 14.11 active 1 postgresql 14/stable 429 no Unit Workload Agent Machine Public address Ports Message db1/0* active idle 0 10.234.206.17 5432/tcp Primary Machine State Address Inst id Base AZ Message 0 started 10.234.206.17 juju-92c34d-0 [email protected] Running
And our first elephant is ready !
Scaling the number of instances
As we said earlier, running a single instance is not a good idea. So we will now explore another neat feature of our charms, on-demand scaling. Adding replicas to PostgreSQL is as simple as running the following command:
juju add-unit -n 2 db1
After some minutes, running juju status should yield an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp mymodel demo-postgres localhost/localhost 3.4.5 unsupported 08:13:19Z App Version Status Scale Charm Channel Rev Exposed Message db1 14.11 active 3 postgresql 14/stable 429 no Unit Workload Agent Machine Public address Ports Message db1/0* active idle 0 10.234.206.17 5432/tcp Primary db1/1 active idle 1 10.234.206.62 5432/tcp db1/2 active idle 2 10.234.206.141 5432/tcp Machine State Address Inst id Base AZ Message 0 started 10.234.206.17 juju-92c34d-0 [email protected] Running 1 started 10.234.206.62 juju-92c34d-1 [email protected] Running 2 started 10.234.206.141 juju-92c34d-2 [email protected] Running
You can also inspect the containers created by LXD by taping the command:
lxc list
The result should like the following:
+---------------+---------+-----------------------+------+-----------+------+
| NAME | STATE | IPV4 | IPV6 | TYPE | S... |
+---------------+---------+-----------------------+------+-----------+------+
| juju-92c34d-0 | RUNNING | 10.234.206.17 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+------+
| juju-92c34d-1 | RUNNING | 10.234.206.62 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+------+
| juju-92c34d-2 | RUNNING | 10.234.206.141 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+------+
| juju-a8b5cf-0 | RUNNING | 10.234.206.198 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+------+
Deploying our database proxy
Let’s deploy the PgBouncer component next, using the following command:
juju deploy pgbouncer pgb1 --channel 1/stable
After a few minutes, running juju status should show a new application:
Model Controller Cloud/Region Version SLA Timestamp mymodel demo-postgres localhost/localhost 3.4.5 unsupported 08:27:13Z App Version Status Scale Charm Channel Rev Exposed Message db1 14.11 active 3 postgresql 14/stable 429 no pgb1 unknown 0 pgbouncer 1/stable 278 no Unit Workload Agent Machine Public address Ports Message db1/0* active idle 0 10.234.206.17 5432/tcp Primary db1/1 active idle 1 10.234.206.62 5432/tcp db1/2 active idle 2 10.234.206.141 5432/tcp Machine State Address Inst id Base AZ Message 0 started 10.234.206.17 juju-92c34d-0 [email protected] Running 1 started 10.234.206.62 juju-92c34d-1 [email protected] Running 2 started 10.234.206.141 juju-92c34d-2 [email protected] Running
As you might have noticed, PgBouncer shows unknown as the Status. You should not worry, as it is expected. PgBouncer is actually a subordinate charm that is deployed inside the same system container. Therefore, PgBouncer will only be invoked when it is used.
Deploying a test application
In order to emulate an application that uses PostgreSQL, we will use the Data Integrator charm:
juju deploy data-integrator di1 --config database-name=testdb
After some minutes, running juju status should yield an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp mymodel demo-postgres localhost/localhost 3.4.5 unsupported 09:08:49Z App Version Status Scale Charm Channel Rev Exposed . db1 14.11 active 3 postgresql 14/stable 429 no di1 blocked 1 data-integrator latest/stable 27 no . pgb1 unknown 0 pgbouncer 1/stable 278 no Unit Workload Agent Machine Public address Ports Message db1/0* active idle 0 10.234.206.17 5432/tcp Primary db1/1 active idle 1 10.234.206.62 5432/tcp db1/2 active idle 2 10.234.206.141 5432/tcp di1/0* blocked idle 3 10.234.206.184 Please relate... Machine State Address Inst id Base AZ Message 0 started 10.234.206.17 juju-92c34d-0 [email protected] Running 1 started 10.234.206.62 juju-92c34d-1 [email protected] Running 2 started 10.234.206.141 juju-92c34d-2 [email protected] Running 3 started 10.234.206.184 juju-92c34d-3 [email protected] Running
Wiring everything together
Juju provides a powerful integration abstraction (a.k.a. relation) that allows to establish a communication link between two workloads (e.g. PgBouncer and PostgreSQL server).
Therefore, relating PgBouncer to PostgreSQL is as simple as running the following command:
juju relate db1 pgb1
And relating the Data Integrator to PgBouncer is as simple as the previous operation:
juju relate di1 pgb1
After some minutes, running juju status should yield an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp mymodel demo-postgres localhost/localhost 3.4.5 unsupported 09:15:02Z App Version Status Scale Charm Channel Rev Exposed db1 14.11 active 3 postgresql 14/stable 429 no di1 active 1 data-integrator latest/stable 27 no pgb1 1.21.0 active 1 pgbouncer 1/stable 278 no Unit Workload Agent Machine Public address Ports Message db1/0* active idle 0 10.234.206.17 5432/tcp Primary db1/1 active idle 1 10.234.206.62 5432/tcp db1/2 active idle 2 10.234.206.141 5432/tcp di1/0* active idle 3 10.234.206.184 pgb1/0* active idle 10.234.206.184 6432/tcp Machine State Address Inst id Base AZ Message 0 started 10.234.206.17 juju-92c34d-0 [email protected] Running 1 started 10.234.206.62 juju-92c34d-1 [email protected] Running 2 started 10.234.206.141 juju-92c34d-2 [email protected] Running 3 started 10.234.206.184 juju-92c34d-3 [email protected] Running
And voila ! The communication links are now established between:
- PgBouncer and PostgreSQL
- Data Integrator and PgBouncer
Now that we have the full stack ready, let’s play with it.
Time for tasting
Talking to the elephant
We will continue our journey by communicating with the PostgreSQL server. The following command will create a user and display its credentials:
juju run di1/leader get-credentials
The latter will , only one time, display a username and password in the following format:
ok: "True" postgresql: data: '{"database": "testdb", "external-node-connectivity": "true", "requested-secrets": "[\"username\", \"password\", \"tls\", \"tls-ca\", \"uris\"]"}' database: testdb endpoints: 10.234.206.184:6432 password: <password> subordinated: "true" username: <username> version: "14.11"
We can now connect to the deployed PostgreSQL by using the following commands:
juju ssh db1/0 psql --host=<ip address of db1/0* that displayed in juju status output> --username=<previously provided user name> --password testdb
After typing the previously provided password we get a prompt where we can run queries against our PostgreSQL deployment. For example, we can issue the following query:
test-database=> SELECT VERSION();
You can exit the prompt by typing \q and you can exit the system container by typing exit.
Messing, kindly, with the elephant
High availability is not only about deploying several replicas of PostgreSQL. It is also about providing automatic failover when a problem affects one of the replicas. Let’s check together what our charmed operator will do for us when we simulate a failure in one of the PostgreSQL units.
We recommend, from now on, to execute the upcoming instructions in a new terminal/tab so that you can follow the behaviour of our charmed operator.
We will simulate a first issue in the PostgreSQL primary by doing the following:
# Connect to the system container where the primary elephant is hosted juju ssh db1/0 # The following will display all running processes ps -edf # The following will terminate all postgresql related processes sudo kill -9 <pid of the parent postgres process> ps -edf
Patroni should restart all the postgreSQL processes shortly after they are killed. As a consumer of PostgreSQL, you will not notice any issue.
We will now try to abruptly stop the system container where the primary PostgreSQL is running. This can emulate a crash of a server or a sudden network isolation of the primary.
exit Juju status # Note the ip address of the primary PostgreSQL # Then note the Inst id corresponding to the primary (in the Machine section of juju status’ output) lxc stop --force --timeout 0 <Inst id> # The above will abruptly stop the Primary’s host lxc list
By checking juju status, you can see that an automatic failover happened and that our cluster self healed after only a few seconds!
A juju status should present an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp mymodel demo-postgres localhost/localhost 3.4.5 unsupported 09:47:52Z App Version Status Scale Charm Channel Rev Exposed db1 14.11 active 2/3 postgresql 14/stable 429 no di1 active 1 data-integrator latest/stable 27 no pgb1 1.21.0 active 1 pgbouncer 1/stable 278 no Unit Workload Agent Machine Public address Ports Message db1/0 unknown lost 0 10.234.206.17 5432/tcp agent lost... db1/1 active idle 1 10.234.206.62 5432/tcp Primary db1/2* active idle 2 10.234.206.141 5432/tcp di1/0* active idle 3 10.234.206.184 pgb1/0* active idle 10.234.206.184 6432/tcp Machine State Address Inst id Base AZ Message 0 down 10.234.206.17 juju-92c34d-0 [email protected] Running 1 started 10.234.206.62 juju-92c34d-1 [email protected] Running 2 started 10.234.206.141 juju-92c34d-2 [email protected] Running 3 started 10.234.206.184 juju-92c34d-3 [email protected] Running
Contact us for more
Canonical can help you secure and automate the deployment, maintenance and upgrades of your PostgreSQL databases across private and public clouds. Let us know more about your use cases and needs by contacting our experts.