[O & M success _ 04] A summary of ten O & M work experiences

Source: Internet
Author: User
HenceFaults, DBAs, and O & M personnel are always pain points, but the principle of avoiding failures is the same thing by the way.
It is listed as follows.

(I) changes must be rolled back and tested in the same environment first.

Buddha said: each trauma is a mature one, which is a true portrayal of O & M personnel.
In a sense, O & M is an experienced discipline and a trial and error discipline.

Something you have never done is always a pain point for you.

Please protect the site and give the change a chance to go back


(Ii) Be careful with destructive operations

What are destructive operations?
For example:
For ORACLE: truncate table_name, delete table_name, drop table_name
These statements are easy and simple to execute, but remember! Even if the data can be rolled back, the cost is very high!
For Linux, all data in the RM-R and Its subdirectories will be deleted.
Most people who have experienced such a fault will give Rm an alias
Alias Rm = 'rm-I'
Likewise, CP and MV can have the same options:
Alias CP = 'cp-I'
Alias mv = 'mv-I'



(3) set command prompt

Before performing this operation, check whether you are in the master database or slave database? Current directory? Which schema? Session? Time?
For example:

For ORACLE:

idle> set sqlprompt 'RAC-node1-primary@10g>>'RAC-node1-primary@10g>>

Of course, you can also set it in glogin. SQL

For Linux, the bash environment reminder can be set to PS1 to know the current directory, Login User Name and host information, etc.
For more information about ps1, see man PS1.


(Iv) backup and verify the validity of the Backup

How can we survive in a world where people are not Sages? Is a day when the machine will crash in a planned or unplanned manner.
What should I do? Backup !!!
Backups are learned a lot and can be divided into different dimensions:Cold Standby and hot standby; real-time and non-real-time; physical and logical

OLTP online services and databases require real-time hot standby
Can this be done?
If a developer's delete without any conditions deletes all data by mistake
Therefore, in addition to real-time and non-real-time backup, you need to restore the database from a logical error.

Is backup ready?
No! The validity of the backup must be verified.
There are always so many backups that 100% of recovery cannot be guaranteed.
A simple verification is to find an empty database and restore it.


(V) Never fear the production environment

Accounting personnel have a professional ethics training before they start their career
Similarly, this should be the first quality that O & M personnel need to possess when entering the industry.
For example:
In Oracle, you can run an RDA to check the health status of the database.
In Linux, whether password aging exists and the Internet is isolated


(6) exercise caution when handing over and taking vacations are most likely to cause faults or changes

To take over others' work, you must confirm the change plan. The ability is not good if you don't know it.
Before taking a vacation, you 'd better do everything you can, and you 'd better prepare a document to indicate under what circumstances how to do and who to contact.
When someone else takes over the work during the holiday, the "Drag and Drop" operation must be performed: You must confirm the operation details with the original O & M personnel.


(Vii) set up alarms and get error information in a timely manner; set up performance monitoring and forecast trends

The tool for O & M personnel to survive isAlarm and monitoring

Alarm allows you to know in a timely manner what exceptions occur in the system, so that you can follow up in time and eliminate faults in the cradle.

Monitoring allows you to understand the historical performance information of the system, learn from the past, learn from Iot platform, and perform optimization early.

Alarm and optimization are good siblings of clothing broadband water.


Caution when changing the token automatically

For example, Oracle storage-level HA solution: Data guard
The master database submitted an order and the result was switchover. The order was not synchronized to the slave database.
The seller loses a sales order, the customer, and the company.


Else be careful, be paranoid, check, check again

There is such a person:

① When making a change, he will first send an email one or two weeks in advance and notify the relevant person by phone
② Write the script on the test machine and call everyone to review the operation steps and scripts
③ Copy the data to the production environment after the test is completed.
④ Log on to the corresponding machine, "Open, close, open, close" the script
⑤ Confirm with the relevant personnel whether the operation, sequence, time point, possible impact and rollback are all ready
6. log out of the machine before execution, and then log on to the machine. "Open, close" the script.
7. Finally, the script is run in the background and logged on in another window, PS and view result output at any time

During this period, the posture is correct, the breathing is fast and even, and the eyes are dignified. The operator is not tired, but the operator is tired.


Simplicity is beautiful

This is similar to the idea of GNU/Linux.
We are always faced with various temptations:
New system architecture, new and more intelligent commands and tools, the latest hardware platform, and more comprehensive ha software...
You can install and test it online. If you want to use it in a production environment, think twice !!

If you can use built-in system commands, you do not need to consider other software that requires special download and installation.
The script itself can complete the function, there is no need to find a function-rich software to do
The character interfaces provided by Linux are simpler and more convenient than those complicated graphic interfaces.
......
......


In the end, I wish you a smooth operation and maintenance work, with many advantages and disadvantages. % >_<%

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.