Oceanbase several common problems and troubleshooting ideas _ database

Source: Internet
Author: User
Tags cpu usage iptables

Click to have a surprise


In the first case: one of the observer hangs off, database exception

Oceanbase uses the multi-replica cluster model, which has different servers in different zone, and each zone server is backed up with each other. The upper layer is distributed through SLB so what happens to the business when one of the servers hangs.

Manufacturing Phenomenon: 50 users concurrent a query transaction, kill off a certain (business observer and business no longer observer) observer.

Log on to the OB server where the business is located to execute the following command:

Ps-ef|grep observer;kill-9 PID

Expected impact: TPS was first dropped to 0, returned to normal within 1 minutes, and there was a business failure.

Monitoring found: TPs first dropped to 0, 40 seconds after return to normal. Failed 238 pens.

Log on to the observer of the business execute the following command:

Ps-ef|grep observer;kill-9 PID

Expected impact: No impact on the system.

Monitoring discovery: Transaction TPS and response time have not changed significantly

Recovery method: View the 9 observer process, find the hanging observer, and use the Admin user to start the observer. The following commands are executed:

Su-admin;/bin/observer


2ã server level CPU usage, database exception

Manufacturing phenomenon: Fills the server CPU on the observer on which the business resides.

Log on to the server on the observer where the business is running script, the script reads as follows:

#! /bin/bash
# filename killcpu.sh
endless_loop ()
{
echo-ne "i=0;
While true
do
i=i+100;
I=100 done
"|/bin/bash &
}
If [$#!= 1]; then
  echo" USAGE: $ <CPUs> "
  exit 1;
Fi for
i in ' seq $ '
do
  endless_loop
  pid_array[$i]=$!;
Done to
I in "${pid_array[@]}"; Do
  echo ' kill ' $i
'; Done
#运行命令:./killcpu.sh    
#参数位占用几颗cpu Nuclear number

Expected impact: TPS drops and server CPU utilization is high.

Monitoring discovery: TPS reduced from 700 to 550,OB CPU utilization 90%, maintaining smooth

Recovery method: You can use the Linux command to find out the current database to occupy CPU more processes, to determine whether it is important to kill

Ps-aux | Sort-k4nr | Head-n

3, server level full disk, database anomaly Walkthrough

Manufacturing phenomenon: Full disk of observer server

Log on to the Observer server to execute the command:

DD If=/dev/zero Of=/home/admin/oceanbase/log/1.log bs=100k count=1600000

Expected impact: TPS reduced to 0 transactions continued to complain.

Monitoring found: TPS from 700 down to 0, disk full after the transaction continued to complain total failure 3014, trading performance fluctuations

Recovery means: The impact of this occurs mainly from two places, if only OB and OB-related components are installed on the machine, the full disk is either a data file or a log file, and if it is a data file, then there is nothing to do but to augment the resource, if it is a log file, locate the corresponding directory, and delete the redundant log files.

4, the database has a lot of bad SQL, database exception

Manufacturing phenomenon: A normal business operation, this time concurrency of a bad SQL business.

Expected impact: Normal business TPS is down and there may be a failure phenomenon.

Monitoring discovery: Batch launched 1000 database concurrent operations, transaction TPS immediately reduced from 700 to 150, while the batch query timeout failed.

Recovery means: SQL Rotten is a common problem of the database, we can according to show processlist; View the current database is currently executing SQL, find out the execution time is relatively long. It is then optimized and then based on Oceanbase's own view: Gv$sql,gv$sql_audit to see what the database has done before slow SQL optimizes it.

5, the business suddenly increased, and expansion observer, database anomaly

Manufacturing phenomenon: A business suddenly increased by 50 user volume. Thereafter adjust the tenant's Cup:alter resource pool xxx_poll unit c12_unit;.

Expected impact: The amount of concurrent data will generate TPS and response time rise, the expansion of resources will occur jitter, TPS rise, RT Decline

Monitoring discovery: Increased user TPS from 700 to 800, transaction response time from 0.070 to 0.095 seconds. The capacity of the expansion resource takes a little jitter and the TPS recovers to 800,rt time to 0.095 seconds.

Recovery means: None

6, Analog database server network failure.

Manufacturing phenomenon: Observer 2882 and 2881 ports off

To execute a command:

Iptables-a input-i bond0-p TCP--dport 2882-j drop
iptables-a input-i bond0-p tcp--dport 2881-j drop
ipta Bles-a output-p TCP--dport 2882-j drop
iptables-a output-p tcp--dport 2881-j drop

Expected phenomenon: TPS will come down first, then return to normal, the fair error.

Monitoring found: TPs first fell in half, 1 minutes after the return to normal, the transaction error continued for 4 minutes.

Recovery means:

iptables-d input 1
iptables-d input 1
iptables-d OUTPUT 1
iptables-d Output 1

Click to have a surprise


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.