Network 24-hour on-duty f&q

Source: Internet
Author: User

  1. Zabbix Alarm

    1. When the warning machine Room link lost packets, delay increase, down the machine and other problems landed on the corresponding equipment for two-way MTR.

    2. The resulting two-way MTR directly to both sides of the machine Room Fault Group (group), @ related technology can improve response speed.

    3. The MTR will show packet loss, delay increase, the node IP is not through, query the attribution of the IP, if the two-way MTR within the incident IP is the same city, then focus on follow up the room.

    4. When the query fault attribution to the city, no response within the group, timely call the room 24 hours on duty personnel telephone. and informs the case that there is a failure in the IP node.

  2. Fault point not on backbone link

    1. When the server is on the MTR to the terminal, the packet is dropped from the first hop (the first hop is the switch), then the server pings the switch IP to see if the packet is actually dropped. The IP of the server default gateway is the switch IP address.

    2. If the ping switch drops packets, it is possible that the fiber module is causing the failure to call the network group members in time.

    3. If the MTR second packet loss serious, preliminary judgment for the machine room equipment problems (including agents), can be directly to the computer room personnel.

  3. Ensure business is not affected

    1. When contacting the room, was told that the fault can not be restored in time, should cut off the business flow.

    2. Contact the Network group if you encounter a situation that cannot be handled in a timely manner.

    3. When the fault is more than one person to handle, contact network group to deal with network failure.

  4. Failure recovery

    1. If the fault is continuous, indirect, physical factors caused by the failure, do not revert to use.

    2. If the failure has ensured recovery, the MTR, Ping, and wget are normal values that can be tangent back to the traffic recovery use. If necessary, you can adjust the size of the cut by adjusting the polling scale.

  5. Recording

    1. According to the Zabbix alarm record the time of failure, according to the time of test failure result is the failure recovery time.

    2. If a multi-engine room to the same machine room to produce a fault, most of the latter caused by the failure, so only record the room fault can be.

    3. Record the name of the person on duty and send the email.


Network 24-hour on-duty f&q

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.