Greenplum Database Failure Recovery test

Source: Internet
Author: User

This article describes the master failure and recovery test for gpdb and the segment failure recovery test.

Environment Introduction:
GPDB version: 5.5.02 binary version
Operating system version: CentOS Linux 7.0
Master segment: Hostname:mfsmaster
Stadnby segemnt: hostname:server227
Segment node 1: hostname:server227
Segment node 2: hostname:server17
Segment node 3: hostname:server11
Run one primary segment and one mirror segment on each segment node respectively

First, view the original state

select * from gp_segment_configuration;

$ gpstate-f20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:-starting gpstate with args:-f20180320 : 13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:-local greenplum Version: ' Postgres (greenplum Database) 5.5.0 Build Commit:67afa18296aa238d53a2dfcc724da60ed2f944f0 ' 20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:- Master Greenplum Version: ' PostgreSQL 8.3.23 (greenplum Database 5.5.0 Build Commit  : 67afa18296aa238d53a2dfcc724da60ed2f944f0) on X86_64-pc-linux-gnu, compiled by GCC gcc (gcc) 6.2.0, 64-bit compiled on Feb 2018 15:23:55 ' 20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:-O btaining Segment details from master ... 20180320:13:50:38:021814 Gpstate:mfsmaster:gpadmin-[info]:-standby Master details20180320:13:50:38:021814 gpstate:   Mfsmaster:gpadmin-[info]:-----------------------20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:- Standby address = server22720180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:-Standby data Directory =/home/gpadmin/master/gpseg-120180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:-Standby port = 543220180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:-Standby PID = 2227920180320:13:50:38:02181 4 gpstate:mfsmaster:gpadmin-[info]:-Standby status = Standby host passive20180320:13:50:38:021814 Gpstate:mfs Master:gpadmin-[info]:--------------------------------------------------------------20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:--pg_stat_replication20180320:13:50:38:021814 Gpstate:mfsmaster:gpadmin-[INFO] :--------------------------------------------------------------20180320:13:50:38:021814 Gpstate:mfsmaster: Gpadmin-[info]:--wal Sender state:streaming20180320:13:50:38:021814 Gpstate:mfsmaster:gpadmin-[info]:--sync State: sync20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:--sent location:0/cf2c47020180320:13:50:38:021814 Gpstate:mfsmaster:gpadmin-[info]:--flush location:0/cf2c47020180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[info]:--replay location:0/cf2c47020180320:13:50:38:021814 Gpstate:mfsmaster: Gpadmin-[info]:--------------------------------------------------------------

Second, master master-slave switch
1, simulate the current main library outage, where the killall gpadmin user directly under all processes to simulate

2, on the Master standby node (227 server) to perform the switch command, Increase 227 to master

$ gpactivatestandby-d master/gpseg-1/20180320:13:53:20:030558 Gpactivatestandby:server227:gpadmin-[info]:-------- ----------------------------------------------20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[ Info]:-standby Data Directory =/home/gpadmin/master/gpseg-120180320:13:53:20:030558 gpactivatestandby:server227: Gpadmin-[info]:-standby Port = 543220180320:13:53:20:030558 Gpactivatestandby:server227:gpadmin-[info]:-stand  By running = yes20180320:13:53:20:030558 Gpactivatestandby:server227:gpadmin-[info]:-force Standby activation = no20180320:13:53:20:030558 Gpactivatestandby:server227:gpadmin-[info]:----------------------------------------- -------------does want to continue with standby master activation? yy| Nn (default=n):> y20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[info]:-found standby postmaster process20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[info]:-updating Transaction Files Filespace Flat FIles ... 20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[info]:-updating Temporary files Filespace flat files ... 20180320:13:53:26:030558 Gpactivatestandby:server227:gpadmin-[info]:-P romoting standby ... 20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[debug]:-waiting for connection ... 20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-standby Master is promoted20180320 : 13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-reading Current configuration ... 20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[debug]:-connecting to Dbname= ' Postgres ' 20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-writing The Gp_dbid file-/home/gpadmin/master /gpseg-1/gp_dbid ... 20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-but found an already existing file.20180320 : 13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-hence removed that existing file.20180320 : 13:53:27:030558 gpactivatestandby:server227:gPadmin-[info]:-creating a new file ... 20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-wrote dbid:1 to the file.20180320 : 13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-now marking it as read only ... 20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-verifying the file ... 20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:------------------------------------------- -----------20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-the activation of the standby master Has completed successfully.20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-server227 are now the New primary master.20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-you would need to update your User access mechanism to reflect20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-the change of Master hostname.20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-D o not re-start tHe failed master while the Fail-over master is20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-o Perational, this could result in database corruption!20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[ Info]:-master_data_directory is now/home/gpadmin/master/gpseg-1 if20180320:13:53:27:030558 gpactivatestandby:  Server227:gpadmin-[info]:-this has changed as a result of the standby master activation, remember20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-to change the This on any startup scripts etc, that is May configured20180320:13: 53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-to Set this value.20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-master_port is now 5432, if the has changed, you20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-may need to make additional configuration changes to allow access20180320 : 13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-to the Greenplum instance.20180320:13:53:27:030558 Gpactivatestandby:server227:gpadmin-[info]:-refer to the Administrator guide for Instructions on what to re-activate20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:-the master to Its previous state once it becomes available.20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:- Query Planner Statistics must be updated on all databases20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[ info]:-following Standby Master activation.20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[info]:- When convenient, run ANALYZE against all user databases.20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[ INFO]:------------------------------------------------------

3, test the upgrade of the main library is normal

$ psql -d postgres -c ‘ANALYZE‘postgres=# select * from gp_segment_configuration;

4. You may need to configure the pg_hba.conf file synchronously to connect remotely via the client.

Here we have completed the failover of the master node.

Third, add a new master standby
1. When executing the GPSTART-A command on 225 server, the error "Error:standby active, this node no more can act as master" is started when the GPDB database is launched. When standby is promoted to master, the original master server recovers from the failure and needs to be joined with the standby role.

2, the data on the original master server 225 for backup

$ cd master/$ lsgpseg-1$ mv gpseg-1/ backup-gpseg-1

3. Gpinitstandby add 225 to standby on current master server 227

$ gpinitstandby -s mfsmaster$ gpstate -f

Iv. primary segment and mirror segment switching
1. First, let's brush up on the current database environment
Master segment: hostname:server227
Stadnby segemnt: Hostname:mfsmaster
Segment node 1: hostname:server227
Segment node 2: hostname:server17
Segment node 3: hostname:server11
Run one primary segment and one mirror segment on each segment node respectively

2, then we use the same way to the 227 server gpadmin users of all to kill

$ killall -u gpadmin

3. Perform the Switch Master command on the 225 server

$ gpactivatestandby -d master/gpseg-1/

4, after completing the switch using the Client tool connection to view the segment status, you can see the server227 on the 227 server
Both the primary and mirror nodes have been shut down.

5, here for the sake of viewing, we use the Greenplum-cc-web tool to view the cluster status

$ gpcmdr --start hbjy

You need to restore the pg_hba.conf file back, because all the segment on the 227 have been down, the execution gpstop-u command will have an error

In the Segment Status page, you can see that the status of the current segment is abnormal. Server11 There are two groups of primary segment, which is dangerous, if unfortunately Server11 also down, the entire cluster state becomes unusable.

6, server227 as master standby rejoin the cluster

$ cd master/$ mv gpseg-1/ backupgpseg-1$ gpinitstandby -s server227

7. Restart the cluster on master

$ gpstop -M immediate$ gpstart -a

8. Restore the cluster on master

Although all segment have been started, there are still two groups of primary segment on the SERVER11.

9. Restore the Segment node to the original state on master

$ gprecoverseg -r

Reference Documentation:

Greenplum Database Failure Recovery test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.