First, describe
Today found enterprises buckle all kinds of CPU, disk IO overload alarm, poor is just moved has not handled broadband, only carry the most intimate partner came to the company, to the company is 1:30, Shenyang, this wind blowing, completely not in the state, fortunately there are colleagues in Dalian to engage.
Second, the process
Check the database, Telnet to the host, view performance glances, view alarms, slow query logs, etc., log in to the database to view threads, these steps must go.
2. Halfway through a Beijing phone, hundred letter bank, 669, coordinates Beijing, asked me to go, a face of the circle told him I was Shenyang, the phone after the hang, the mood is very uncomfortable.
3. Hundreds of Binlog synchronization of the user connection, very abnormal, so that colleagues to stop the related services, there are more than 30 concurrent insertions, has been written, the problem is really write not go in, IO wait super high, knock a command are very good, suddenly, from the client automatically jumped out. Look at the log, the database automatically restarted, in the automatic recovery ~ Waiting for half a day or not up, it is not patient, directly kill off also not so, direct kill-9 (business imitation, need to have patience).
4. For a number of reasons, the business can not be closed, it had to change the database port, configuration file/ETC/MY.CNF. No one to disturb me, the database up, you come slowly. Who also landed not come up haha, because do not know the port, must not even come in.
5. There are two large tables with dozens of G, there are no partitions, there are slow queries, the problem is the log table, the data can be cleared, normally, we will create a partitioned table. I do not know why there was no partition, the reason is not said. Start writing a script, the idea is to rename the table, and then create two partition table, the database port to change back, so that the application can be connected again. Finally, the database will be poured back into the newly created two partition table.
6. Why is there a huge number of concurrent writes today? I have to observe the observation, recently a little ill-cared for it, have to self-criticism. It's been a long time since there was nothing wrong.
Third, the problems encountered
1. partition is timestamp type, toss for half a day, the official website found the need to use the Unix_timestamp function, otherwise it will error, the previously created partition table fields are mostly datetime, here need to record a bit.
2. Why the data can be poured back, there is no loss of data, because our primary key is the UUID, or to consider whether the primary key conflicts.
3. Follow-up to some of the bak_ beginning of the table deleted, the partition was cleaned up, and then recorded the problem, ready to go home.
No matter what you do, everyone must have a professional spirit. Even if it is not the database of their own company, the shot will also have to go haha ~ welcome everyone together to learn, Exchange Oracle MySQL Hbase database ha
This article is from the "ROIDBA" blog, make sure to keep this source http://roidba.blog.51cto.com/12318731/1922726
Shenyang This gale, also did not stop me to the company processing production database Problems ~