GreenPlum database fault recovery test

Source: Internet
Author: User

GreenPlum database fault recovery test

This article describes gpdb master fault and recovery testing and segment fault recovery testing.

Environment Introduction:
Gpdb version: Binary version 5.5.0
OS: CentOS linux 7.0
Master segment: 192.168.1.225/24 hostname: mfsmaster
Stadnby segemnt: 192.168.1.227/24 hostname: server227
Segment Node 1: 192.168.1.227/24 hostname: server227
Segment Node 2: 192.168.1.17/24 hostname: server17
Segment Node 3: 192.168.1.11/24 hostname: server11
One primary segment and one mirror segment are run on each segment node.

1. view the original status

Select * from gp_segment_configuration;

$ Gpstate-f
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Starting gpstate with args:-f
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-local Greenplum Version'
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-master Greenplum Version: 'postgresql 8.3.23 (Greenplum Database 5.5.0 build commit: commit) on x86_64-pc-linux-gnu, compiled by GCC gcc (GCC) 6.2.0, 64-bit compiled on Feb 17 2018 15:23:55'
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Obtaining Segment details from master...
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Standby master details
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: ---------------------
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Standby address = server227
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Standby data directory =/home/gpadmin/master/gpseg-1
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Standby port = 5432
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Standby PID = 22279
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]:-Standby status = Standby host passive
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: --------------------------------------------------------------
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: -- pg_stat_replication
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: --------------------------------------------------------------
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: -- WAL Sender State: streaming
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: -- Sync state: sync
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: -- Sent Location: 0/CF2C470
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: -- Flush Location: 0/CF2C470
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: -- Replay Location: 0/CF2C470
20180320: 13: 50: 38: 021814 gpstate: mfsmaster: gpadmin-[INFO]: --------------------------------------------------------------

Ii. master-slave switchover
1. simulate that the current master database is down. Here, all processes under the killall gpadmin user are directly used to simulate

2. Execute the switch command on the master standby node (on the 227 server) to increase 227 to the master node

$ Gpactivatestandby-d master/gpseg-1/
20180320: 13: 53: 20: 030558 gpactivatestandby: server227: gpadmin-[INFO]: ------------------------------------------------------
20180320: 13: 53: 20: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Standby data directory =/home/gpadmin/master/gpseg-1
20180320: 13: 53: 20: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Standby port = 5432
20180320: 13: 53: 20: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Standby running = yes
20180320: 13: 53: 20: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Force standby activation = no
20180320: 13: 53: 20: 030558 gpactivatestandby: server227: gpadmin-[INFO]: ------------------------------------------------------
Do you want to continue with standby master activation? Yy | Nn (default = N ):
> Y
20180320: 13: 53: 26: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-found standby postmaster process
20180320: 13: 53: 26: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Updating transaction files filespace flat files...
20180320: 13: 53: 26: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Updating temporary files filespace flat files...
20180320: 13: 53: 26: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Promoting standby...
20180320: 13: 53: 26: 030558 gpactivatestandby: server227: gpadmin-[DEBUG]:-Waiting for connection...
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Standby master is promoted
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Reading current configuration...
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[DEBUG]:-Connecting to dbname ='s S'
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Writing the gp_dbid file-/home/gpadmin/master/gpseg-1/gp_dbid...
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-But found an already existing file.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Hence removed that existing file.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Creating a new file...
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Wrote dbid: 1 to the file.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Now marking it as read only...
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Verifying the file...
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]: ------------------------------------------------------
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-The activation of the standby master has completed successfully.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-server227 is now the new primary master.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-You will need to update your user access mechanisms to reflect
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-the change of master hostname.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Do not re-start the failed master while the fail-over master is
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-operational, this cocould result in database failed uption!
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-MASTER_DATA_DIRECTORY is now/home/gpadmin/master/gpseg-1 if
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-this has changed as a result of the standby master activation, remember
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-to change this in any startup scripts etc, that may be configured
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-to set this value.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-MASTER_PORT is now 5432, if this has changed, you
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-may need to make additional configuration changes to allow access
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-to the Greenplum instance.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Refer to the Administrator Guide for instructions on how to re-activate
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-the master to its previous state once it becomes available.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-Query planner statistics must be updated on all databases
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-following standby master activation.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]:-When convenient, run ANALYZE against all user databases.
20180320: 13: 53: 27: 030558 gpactivatestandby: server227: gpadmin-[INFO]: ------------------------------------------------------

3. test whether the upgraded master database is normal.

$ Psql-d postgres-c 'analyze'
Postgres = # select * from gp_segment_configuration;

4. You may need to configure the pg_cmd.conf file synchronously to remotely connect to the instance through the client.

Here we have completed the Failover of the master node.

3. Add a new master standby
1. When the gpstart-a command is executed on the 225 server to start the gpdb database, an error is returned, "error: Standby active, this node no more can act as master ". When standby is upgraded to master, the original master server is recovered from the fault and must be added as a standby role.

2. Back up data on 225 of the original master server

$ Cd master/
$ Ls
Gpseg-1
$ Mv gpseg-1/backup-gpseg-1

3. Add gpinitstandby 227 as standby on the current master server 225.

$ Gpinitstandby-s mfsmaster
$ Gpstate-f

4. Switch between primary segment and mirror segment
1. First, let's look at the current database environment.
Master segment: 192.168.1.227/24 hostname: server227
Stadnby segemnt: 192.168.1.225/24 hostname: mfsmaster
Segment Node 1: 192.168.1.227/24 hostname: server227
Segment Node 2: 192.168.1.17/24 hostname: server17
Segment Node 3: 192.168.1.11/24 hostname: server11
One primary segment and one mirror segment are run on each segment node.

2. Then we use the same method to kill all the gpadmin users on the 227 server.

$ Killall-u gpadmin

3. Execute the master switch command on the 225 Server

$ Gpactivatestandby-d master/gpseg-1/

4. After the switchover is complete, use the client tool to connect to view the segment status. You can see server227 on server 227.
The primary and mirror nodes are down.

5. Here we use greenplum-cc-web tool to view the cluster status.

$ Gpcmcm -- start hbjy

You need to restore the pg_cmd.conf file back. Because all the segments on 227 are down, an error is reported when you run the gpstop-u command.

On the segment status page, you can see that the current segment status is abnormal. There are two groups of primary segment on server11, which is very dangerous. If server11 is down, the status of the entire cluster becomes unavailable.

6. Use server227 as the master standby to re-join the cluster.

$ Cd master/
$ Mv gpseg-1/backupgpseg-1
$ Gpinitstandby-s server227

7. Restart the cluster on the master node.

$ Gpstop-M immediate
$ Gpstart-

8. Restore the cluster on the master node.

$ Gprecoverseg

Although all segments have been started, there are still two groups of primary segment on server11.

9. Restore the segment Node Distribution to the original state on the master node.

$ Gprecoverseg-r

Reference: greenplum

This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151477.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.