Blue growth note-chasing DBA (13): Coordinating hardware manufacturers, six stories: what you see as "servers, storage, switches", dba hardware manufacturers

Source: Internet
Author: User

Blue growth note-chasing DBA (13): Coordinating hardware manufacturers, six stories: what you see as "servers, storage, switches", dba hardware manufacturers

Original Works are from the blog of "Deep Blue blog". You are welcome to repost them. Please indicate the source when you repost them. Otherwise, the copyright will be held legally responsible.

Deep Blue blog:Http://blog.csdn.net/huangyanlong/article/details/43989939

 

 

[Overview]

Personal Growth records on the oracle Road, expressed in blue, share the growing emotions, vision and technological changes and growth. Sensitive information is replaced by English and will not disclose any confidential information of the company. It is only used for technology sharing.

The creative inspiration comes from self-reflection and recording. I'm glad to have some help or resonance with the database friends who just started out.

Please leave a message or email (hyldba@163.com) indicating that the technical details are incorrect. Thank you very much.

[Preface]

This is an accumulation of personal records. As we enter the blue ocean of oracle, we can't go all the way and continue to test. Share the growth history of blue with database friends.

I don't know when I began to become obsessed with blue. I was obsessed with its wide range, its depth, and its close proximity.

But it cannot be clear from when, watching the oracle red dazzling, illuminating a light in front of the eyes, unknown and confused under their own feet began to reveal some of the enrichment of life and the feedback of youth.

Step by step on the path to pursue DBA's dream.

 

 

___________________________________________________________________

In the face of knowledge that you do not understand, to seize the opportunity, you must "ask more questions and learn more ".

-- Deep Blue

___________________________________________________________________

 

First Story: Read More-damage to the server disk

Symptom: the server cannot be started normally.

Before that, I would like to mention that, as a person who wants to engage in the database or IT field, do not think that things will not happen with a low probability. This is because a small probability occurs.

A small operation is recorded below, because a disk on the server is damaged. As a result, the server is started and stuck on the scroll bar interface after the operating system is installed. At the beginning, I don't know what's going on. When I saw an engineer from the system manufacturer and knocked on "ESC", the cloud was confused, because I saw the system startup process and reported a dev device (Disk) and cannot continue. I suddenly thought that the hard drive indicator lights up and reports a fault yesterday. But then the green light is restored (I don't know why). I think it should be because the hard disk is still uncertain (I have ruled out looseness and suspected that there are bad channels or controller damages ). Therefore, the hardware manufacturer contacted the server manufacturer (inspur factory Engineer). After some descriptions, the customer contacted the server provider to go to the site for inspection the next day. Then, inspur engineers replaced the faulty disk and the system started to return to normal. A disk can be damaged due to RAID 5. Therefore, the operating system and original disk data are not damaged or lost. Finally, it was determined that the system could not be started because it failed to pass the self-check of this disk. Because the system did not pass the detection, the system would repeatedly check this disk, resulting in a choppy phenomenon.

The last thing I want to talk about is that this server is newly launched along with the Project. From system integrators installing the system to our on-site hardware check, there will be no more than a week before and after. Without any business pressure, a disk breaks down inexplicably. However, when we encounter this situation, we will not think of it as novel. Any electronic product may have problems of uncertainty. That's why there is a disaster tolerance solution. Ha, So I realized that the warranty period of the hardware manufacturer is not useless, because the manufacturer may foresee damage when leaving the factory. This situation is beyond the control of the current people. But if I think about it, is it true that the product's quality defects are exposed? I think of the broken screen danger of a hammer mobile phone. Is it because the engineer has foreseen a high probability of broken screen, so he made a Quality Assurance solution (I am confused here ). I also miss Nokia. If I don't give a warranty, I won't worry about damages (good quality ). However, perhaps this is the path to the Development of Human Science and Technology, which must be experienced in the process of exploration. progress requires a certain price.

So far, I would like to mention this simple operation: when starting the LINUX system, on the progress bar interface, If we click ESC, we will see the operation during system startup. For example:

Click "esc" to see the startup process as follows:

To be honest, I did not know before.

When I learned something, I still felt ashamed. I knew this little operation ~~~. I have taken it as a warning. I don't know if you know it. I don't know it before, but I will know it later.

The second story: Multi-question-what is multi-path?

When a host (PC-server) or minicomputer is connected to an "optical fiber switch", it is connected to "Optical Fiber storage", which forms multiple links, if a disk is selected from the storage on the host, many disks may be released on the host, which will make the operator unable to choose which disk to operate. To avoid this cumbersome disk link, the multi-path software is used to aggregate multiple paths. It facilitates disk management and load balancing. If one link is disconnected, other links are still available, which will not affect disk access.

 

Several multi-path software:

(1) IBM host and IBM storage: RDAC, MPIO, SDDPCM

(2) Hitachi storage: HDLM

(3) EMC storage: PowerPath

The third story: Learn More-what is the LUN in storage?

(1) initial exploration of LUN

LUN is a term in the SCSI protocol and a finer-level address number of the scsi id. More lun ids can be found under each scsi id (Target ID. For large disk arrays, hundreds or even thousands of virtual disks can be produced. It is far from enough to assign a scsi id to each virtual disk. Because no SCSI bus allows up to 16 devices to access (currently 32-bit SCSI standard allows up to 32 devices ). It is impossible to place more than 16 physical devices on a bus. LUN is such a secondary addressing ID. The disk array can Virtualize multiple LUN addresses under a scsi id, and each LUN address corresponds to a virtual disk. In this way, a large number of virtual disks can be generated on a bus to meet the requirements.

Later, Virtual Disks generated at the hardware level were collectively referred to as "LUN". Whether in a SCSI environment or not, although LUN was originally a concept in the SCSI system. Virtual Disks generated by software are collectively referred to as "volumes", such as virtual disks generated by various volume management software and soft RAID software.

-- From Zhang donggua, big talk storage 2

(2) SAN

The following figure shows a SAN Storage Architecture (from e-books:

SAN (storage area network): About the storage area network

Network, not only Ethernet, TCP/IP network, but also SCSI network, PCI network, and USB network. A raid controller is equivalent to a vro, that is, a protocol converter.

When a disk is stored outside the host, another independent Network is formed between the Storage device and the Host: Storage Area Network (SAN ).

(From, Zhang donggua, big talk storage 2 ).

 

(3) storage plan

Finally, the Huawei S5700S series was purchased because of the error in the preliminary procurement of the Scheme (because we had not intervened at the time). Although we were somewhat disappointed with the efficiency, but do the work we should do.

Divide the storage, including the storage RAID deployment, storage initialization, and RAID partitioning. (The partitioning solution is proposed by us and implemented by hardware vendors) complete the ing (install the multi-path software in the multi-server system ).

The storage solution for 12 or 6 disks is described as follows:

Storage name

RAID Solution

Physical Capacity

Available capacity

LUN Division

Database storage

RAID 5 + 1 hot spare Disk

12*3 TB

10*3 TB

(3*20G) + (55*500G)

Backup storage

RAID 5 + 1 hot spare Disk

6*3 TB

4*3 TB

2*5119G

Database Server

RAID5

6*1 TB

5*1 TB

4 T + 1 T

Take a look at the physical map:

Fourth story: power failure due to a low-level mistake

Messy cables, cables that haven't been sorted out before, experience a sudden power failure. This was an unexpected power failure that caused the server to crash. In fact, it is very easy to recover. You only need to restart the data center. This time, I made a low mistake. Since we had a network disconnection experience from the network department, I subconsciously thought it was a "human accident" and gave a look at the status of the database server. The indicators were running normally, you can view the cables at the rear of the Cabinet, for example:

I made a mistake this time. Due to my carelessness, I went directly to the network department and found the network department engineers, and said the server was okay, but the network was disconnected. The network engineer did not conceal that there was a power outage yesterday, and I do not know whether it is related to this. So I checked the port Line for accessing the LAN, but no problems were found. Then, I checked the vlan settings of our vswitch, indicating that the configuration may be washed out due to power failure. As of now, we have started a fantasy in the wrong direction. The reasons for this are as follows:

1. Router power failure cannot invalidate the configuration (I don't know if the network engineer was confused or I really thought so. I almost believed it, ah ~~, I admit that I am mentally retarded. Is it possible to put the configuration information in the memory ~~ (⊙ O ⊙ ).)

2. Check whether the server is started directly after the power is down, instead of directly contacting the network Department (here I have another one, eh ~~, Again ). The reason for such a low-level error is actually very simple. In fact, I checked the server, but I only checked the database server, because the two servers are set to automatically start after power-on. Without looking at other servers, this low-level error occurred.

Fifth story: depressing PDU

The power supply is fixed, and a simple task is delayed. It is related to the person who coordinates, but the reason why it is depressing is that it is related to the irrelevant incident, and then it is not handled well. This is depressing.

In this case, the responsible department is not responsible, that is, the data center manager, and the hardware service provider, system integrator, or our software developers do not undertake this part of work. As a result, the customer's head is also big, and the cables have not been laid (just like that, messy ~~). Because the cables are not laid, the cables in the cabinet are congested to the backend of the server, causing security risks to normal operation.

Fortunately, after a long time, the customer finally solved the problem and our hardware was finally stable.

For more information about PDU, PDU (Power Distribution Unit), and Power Distribution Unit, also known as "cabinet Power outlet", see the following two figures:

Sixth story: broaden your horizons-face the legendary "oracle all-in-one"

I thought it was a mysterious and tall all-in-one machine. I was not very careful. This equipment worth tens of millions of yuan was shown in front of me.

I remember that I had a brief introduction to exadata (oracle all-in-one) in an oracle Open Class. Here I would like to make a simple record and start with an all-in-one machine to solve four problems.

(1) how to solve the performance problems caused by traditional database deployment?

Traditional database deployment:

Storage layer: 1. Increasing the I/O bottleneck of data volume; 2. unevenly distributed data;

Network Layer: insufficient bandwidth

Server layer: excessive Data Processing

Solution: reduce load, increase bandwidth, and improve Parallelism

1. Channel widening

2. Reduce the amount of data transferred to the Database Server

3. Added parallel system processing.

 

Changhua:

Horizontally scalable storage for Parallel Processing

Storage layer: intelligent, pre-processing

The server-side unload preprocessing work is shared to the storage layer and then returned to the database server after processing.

 

(2) has solved the problem that resources are independent and cannot be shared?

Traditional databases:

Allocated to different hosts is not shared. Resources are unevenly allocated, some resources are insufficient, and some resources are excessive. The production environment is dynamically changing and cannot dynamically meet the Load target.

 

Design principles: resource sharing and resource control

 

Solution:

For example: 1. Change the storage server resources into a storage pool, and one storage pool is used by one or more databases. The server can use all the disk capabilities to meet the IO throughput;

2. Control IO and process it based on the business priority.

 

(3) how to balance the configuration of complex database systems?

Traditional databases:

Large Storage Systems and EMC storage;

Partition;

Expensive equipment;

Technical maintenance management;

The total cost is very high;

 

Balance the performance of all-in-one machines:

Use exadata to balance and optimize configurations

Host, storage, database, and application: Tuning

Exadata end-to-end optimization: hardware and software configured

Easy to go online without design, optimization, maintenance, and Setup

 

(4) how to solve the complexity of system maintenance and expansion

All-in-One:

Simplified deployment: eliminates complex deployment;

The deployment is completed on the same day to obtain ultimate performance.

 

The above introduces the all-in-one machine through theory and overview. The specific technical details are not involved. If you are really interested in exadata technology, you may wish to read the official manual or reference documents, and I believe it will be helpful.

I have just started to understand the all-in-one machine. I 'd like to briefly introduce this to you. After all, I am still a cainiao, O (∩) O ~

 

 

Series links:

Blue growth note-chasing DBA (1): traveling on the road to Shandong 

Blue growth notes-Chase DBA (2): Install! Install! Long-lost memories have aroused my new awareness of DBAs.

Blue growth note-chasing DBA (3): importing and exporting data on antiques becomes a problem 

Blue growth note-chasing DBA (4): recalling the sorrow of teenagers, and exploring oracle Installation (10g and 11g in Linux) 

Blue growth note-chasing DBA (5): Not talking about technology or business, annoying Application Systems

Blue growth note-chasing DBA (6): doing things and being human: Small technology, great human

Blue growth note-Chase DBA (7): Basic commands, foundation stone

Blue growth notes-chasing DBA (8): repicking SP reports and recalling oracle's STATSPACK Experiment

Blue growth-chasing DBA (9): Chasing DBA, new planning, new departure

Blue growth note-chasing DBA (10): Flying knife defense, familiarity rather than expertise: Playing with middleware Websphere

Blue growth note-chasing DBA (11): It's easy to go home and wake up.

Blue growth notes-Chase DBA (12): seven days and seven gains of SQL

 

Original Works are from the blog of "Deep Blue blog". You are welcome to repost them. Please indicate the source when you repost them. Otherwise, the copyright will be held legally responsible.

Deep Blue blog:Http://blog.csdn.net/huangyanlong/article/details/43989939

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.