How to perform a minor upgrade
Example: OpenSearch X.Y → OpenSearch X.Y+1
This guide explains how to perform a minor upgrade of the OpenSearch cluster deployed with the Charmed OpenSearch operator. A minor upgrade is an upgrade from one minor version to another, for example, from OpenSearch 2.14 to OpenSearch 2.15.
This guide will walk you through the steps to upgrade your OpenSearch cluster, including pre-upgrade checks, upgrading the OpenSearch cluster, preparing the application for the in-place upgrade, initiating the upgrade, resuming the upgrade, and checking the cluster’s health.
Summary
- Pre-upgrade checks
- Upgrade the OpenSearch cluster
- Prepare the application for the in-place upgrade
- Initiate the upgrade
- Resume upgrade
- Rollback (optional)
- Scale-back (optional)
- Check the cluster health
Pre-upgrade checks
Before upgrading your OpenSearch cluster, ensure that you have completed the following steps:
- Backup your data: Before upgrading, back up your data to prevent data loss in case of failure. For more information, see Hot to create a backup.
- Make sure not to perform any extraordinary operations: Avoid performing any concurrent operations on the cluster during the upgrade process. This can lead to an inconsistent state of the cluster. This includes:
- Adding or removing units
- Creating or destroying new relations
- Changes in workload configuration
- Upgrading other connected/related/integrated applications simultaneously
- Backup / restore of snapshots
Upgrade the OpenSearch cluster
To upgrade your OpenSearch cluster, follow these steps:
- Collect all necessary pre-upgrade information. It will be required for the rollback (if requested). Do NOT skip this step.
- (optional) Scale-up: The new sacrificial unit will be the first to be updated, and will simplify the rollback procedure in case of the upgrade failure.
- Prepare the “Charmed OpenSearch” Juju application for the in-place upgrade. See the step description below for all the technical details the charm executes.
- Upgrade: Only one app unit will be upgraded once started. In case of failure, roll back with juju refresh.
- Resume upgrade: The upgrade can be resumed if the upgrade of the first unit is successful. All units in an app will be executed sequentially from the highest to lowest unit number.
- (optional) Consider rolling back in case of disaster. Please inform and include us in your case scenario troubleshooting to trace the source of the issue and prevent it in the future.
- (optional) Scale back: Remove no longer necessary units created in step 2 (if any).
- Post-upgrade check: Ensure all units are in the proper state and the cluster is healthy.
Collect all necessary pre-upgrade information
The first step is to record the revision of the running application, as a safety measure for a rollback action. To accomplish this, run the juju status
command and look for the deployed Charmed OpenSearch revision in the command output, e.g.:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:16:46Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/edge 144 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp
opensearch/1 active idle 1 10.214.176.220 9200/tcp
opensearch/2* active idle 2 10.214.176.175 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 [email protected] Running
1 started 10.214.176.220 juju-0c35d2-1 [email protected] Running
2 started 10.214.176.175 juju-0c35d2-2 [email protected] Running
3 started 10.214.176.31 juju-0c35d2-3 [email protected] Running
For this example, the current revision is 144 for OpenSearch.
Make sure to store the revision number in case of rollback. If the deployment is of a local charm, save a copy of the current .charm
file.
Scale-up (optional)
Optionally, it is recommended to scale the application up by one unit before upgrading.
The new unit will be the first one to be updated, and it will assert that the upgrade is possible. In case of failure, having the extra unit will ease the rollback procedure, without disrupting service -more in Minor rollback how-to.
juju add-unit opensearch
Wait for the new unit to be up and ready.
Prepare the application for the in-place upgrade
- IMPORTANT: Create a backup of your cluster
Refer to How to create a backup.
- Perform the
pre-upgrade-check
action
After the application has settled, it’s necessary to run the pre-upgrade-check
action against the leader unit:
juju run opensearch/leader pre-upgrade-check
The output should be the following:
Running operation 1 with 1 task
- task 2 on unit-opensearch-2
Waiting for task 2...
result: Charm is ready for upgrade
The action will ensure and check the health of OpenSearch and determine if the charm is well prepared to start an upgrade procedure.
Initiate the upgrade
Caution: Charmed OpenSearch supports performance profiles and will have different RAM consumption according to the profile chosen:
production
: consumes 50% of the RAM available, up to 32Gstaging
: consumes 25% of the RAM available, up to 32Gtesting
: consumes 1G of RAM
In case your charm is running on revision prior to 185
, the testing
profile will be your default value. Ensure you have it set at upgrade and then feel free to switch to another profile that is more suitable to your use-case.
Use the juju refresh command to trigger the charm upgrade process. You have control over what upgrade you want to apply:
-
You can upgrade the charm to the latest revision available in the charm store for a specific channel, in this case, the edge channel:
# If your charm is running a revision prior to 185, then set the profile explicitly: juju refresh opensearch --channel 2/edge --config profile="testing" # Otherwise, just refresh juju refresh opensearch --channel 2/edge
-
You can also upgrade the charm to a specific revision:
juju refresh opensearch --revision 145
-
Or you can upgrade the charm using a local charm file:
juju refresh opensearch --path /path/to/your/charm/file.charm
The OpenSearch upgrade will execute only on the highest ordinal unit, for the running example OpenSearch, the juju status will look as follows:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:29:07Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch blocked 4 opensearch 2/edge 145 no Upgrading. Verify highest unit is healthy & run `resume-upgrade` action. To rollback, `juju refresh` to last revision
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/1 active idle 1 10.214.176.220 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/2* active idle 2 10.214.176.175 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/3 active idle 4 10.214.176.7 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 [email protected] Running
1 started 10.214.176.220 juju-0c35d2-1 [email protected] Running
2 started 10.214.176.175 juju-0c35d2-2 [email protected] Running
3 started 10.214.176.31 juju-0c35d2-3 [email protected] Running
4 started 10.214.176.7 juju-0c35d2-4 [email protected] Running
The unit should recover shortly after, but the time can vary depending on the amount of data written to the cluster while the unit was not part of the cluster. Please be patient with the huge installations.
Resume upgrade
After the first unit is upgraded, the charm will set the unit upgrade state as completed. If deemed necessary, you can further assert the success of the upgrade. If the unit is healthy within the cluster, the next step is to resume the upgrade process by running:
juju run opensearch/leader resume-upgrade
The resume-upgrade
action will roll out the OpenSearch upgrade for the remaining units in the application. The action will be executed sequentially from the highest unit number to the lowest.
After every unit is upgraded, its status will be set to active/idle
and its message will indicate the new version of OpenSearch running on the unit. The juju status output will look as follows:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:39:06Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch maintenance 4 opensearch 2/edge 145 no Upgrading. To rollback, `juju refresh` to the previous revision
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/1 waiting executing 1 10.214.176.220 9200/tcp Waiting for OpenSearch to start...
opensearch/2* active idle 2 10.214.176.175 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
opensearch/3 active idle 4 10.214.176.7 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 [email protected] Running
1 started 10.214.176.220 juju-0c35d2-1 [email protected] Running
2 started 10.214.176.175 juju-0c35d2-2 [email protected] Running
3 started 10.214.176.31 juju-0c35d2-3 [email protected] Running
4 started 10.214.176.7 juju-0c35d2-4 [email protected] Running
Once all units are upgraded, the application status will be set to active
and the message indicating the new version of OpenSearch running on the units will disappear.
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:43:41Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 4 opensearch 2/edge 145 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp
opensearch/1 active idle 1 10.214.176.220 9200/tcp
opensearch/2* active idle 2 10.214.176.175 9200/tcp
opensearch/3 active idle 4 10.214.176.7 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 [email protected] Running
1 started 10.214.176.220 juju-0c35d2-1 [email protected] Running
2 started 10.214.176.175 juju-0c35d2-2 [email protected] Running
3 started 10.214.176.31 juju-0c35d2-3 [email protected] Running
4 started 10.214.176.7 juju-0c35d2-4 [email protected] Running
Notice the Rev
column in the juju status
output. The revision number should reflect the new revision of the application.
Rollback (optional)
In case of a failed upgrade, you can roll back to the previous revision. To do so, follow the guide How to perform a minor rollback.
Scale-back (optional)
If you scaled up the application in step 2, you can now scale it back down to the original number of units:
juju remove-unit opensearch/<highest unit number>
Check the cluster health
First, check the units have settled as `active/idle” state on juju status, with the newer revision number:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:45:39Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/edge 145 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp
opensearch/1 active idle 1 10.214.176.220 9200/tcp
opensearch/2* active idle 2 10.214.176.175 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 [email protected] Running
1 started 10.214.176.220 juju-0c35d2-1 [email protected] Running
2 started 10.214.176.175 juju-0c35d2-2 [email protected] Running
3 started 10.214.176.31 juju-0c35d2-3 [email protected] Running
Check the cluster is healthy. OpenSearch’s upstream documentation suggests the following check:
GET "/_cluster/health?pretty"
The response should look similar to the following example:
{
"cluster_name" : "opensearch-wvmy",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 5,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}