In 2005, the data centre of the Pacific National Laboratory (Pacific Northwest Nation Laboratory, PHHL) had arrived at the most critical moment.
Accidental power outages occur almost once a week, killing a few hours in the data center every time. At that time, the organization continued to buy racks of servers, the result of more and more. Because the number of computing resources is exploding, rack servers are cheaper at the time--as Ralph Wescott, a data center Service manager, says. As a result, in 2005 years, the capacity of the engine room has reached a critical point.
Wescott said: "The organization bought the server, directly to me to install, but the room has no room, power and cooling capacity is not enough." If you install another server, I am afraid the computer room will be paralyzed. ”
Wescott and PNNL are embarking on a plan to renovate the data center without breaking the budget. Each quarter of the next three years, the data center team will spend a weekend, turn off the engine room, eliminate a batch of old servers, discard the tangled cables under the floor, and replace them with more efficient and powerful servers, which are connected by a more streamlined wiring on the ceiling. This new configuration frees up space under the floor for more efficient refrigeration.
What's the result? PNNL used to run 500 applications on 500 servers, and now 150 servers can take up 800 applications.
Joseph Pucciarelli, an analyst at IDC, said that in the face of economic tension the morale of dealing with such it projects needed to tighten wallets. "This is a very common situation, the company will only give you just enough money, they only think that the problem can be almost solved." ”
In this PNNL from the crisis, we learned five points:
1, positive plan, do not react negatively. The first problem that Wescott needs to solve is to correct the data center team's habits, because they always respond to minor problems after they occur, rather than observing the system, and then planning to build a durable equipment system. In addition to 500 servers, their data centers have 33,000 cables to power up, network, or connect to the security system.
"It's up to us to determine the shape and capacity of the data center," he said. ”
The team inferred that the current orbital structure had resulted in 3,000 applications running on their respective servers, which lasted for 10 years. Now that 81% of the data center's applications are virtualized (and 17% http://www.aliyun.com/zixun/aggregation/13995.html "> Server Virtualization"), Wescott plans to 90% Application Virtualization.
Joseph Pucciarelli that if companies want to increase capacity, they should focus on three places: reducing the number of physical servers, operating applications in virtual systems to reduce power requirements, using more efficient refrigeration systems and improving distribution.
"It's a classic trilogy, so you can upgrade the data center." ”
Pucciarelli has seen many companies swap about 50 servers for 2 to 3 larger capacity systems and use virtualization to run applications.
2. Manage by measuring energy consumption. Wescott advises managers to find ways to monitor the status of data centers, but usually they don't have the right tools. Before the change was made, PNNL had no way of measuring the energy efficiency of the data center until the power was cut out.
"If there are too many amps in the power supply, then I can only feel the switch by touch, if it's hot, it's a problem." So there's still a monitoring tool. ”
Now PNNL in every four cabinets low, medium and high place sensors, to create a 3D room heat map. This allows the Wescott to change the cooling policy according to the data, to raise the overall temperature and to cool the place where refrigeration is needed.
"This will save a lot of money, and my air-conditioning will also reduce wear and tear," he said. Wescott said he added that the energy efficiency estimate for refrigeration is now 40% higher than before.
3, one small step to carry out. The first issue, Wescott, is to fundamentally reconfigure the data center without interrupting normal operations. The company's manager suggested a small step to reduce the possibility of power outages, but at the same time threw the problem to the men.
"I gave management two kinds of proposals," he said. "One is we turn off the data center, straighten it out for seven days, and start all over again. ”
In the end they chose the second proposal, and their team decided to replace a row of servers first. Within three days of the first weekend, a team of 30 people spent 14 hours replacing a row of server racks in the data center and testing the new configuration. Wescott found that data center reliability and stability immediately increased.
If management does not agree to do so, abandoning the plan to suspend the correct way, I am afraid that there will be a sudden failure event. Wescott: "You can't repair the bottom when the ship is sailing, but the ship will sink if you don't fix it." ”
So the answer is obvious.
4, for the long-term interests, prepare for temporary loss. Management cannot afford to give up long-term benefits for a temporary cost.
In order to reduce the energy requirements of the refrigeration system, Wescott's team estimated the waterside Economizer (which uses water and outside temperatures to cool the server's rack). They found that using a surround refrigeration system would be more power-saving in the long run, while Waterside Economizer would cost 10% more for refrigeration than the budget. Wescott had to communicate with suppliers to reduce the cost to within budget.
5. Find a place you don't know. In the process of refurbishing the data center, administrators need to know where the energy is not consumed or very little. A common problem is the presence of rogue servers and ghost servers in the data center.
Ghost Server is a server that is configured but never used. They still eat energy, but they don't work for data centers. A rogue server is a server that some people privately place in their offices, bypassing the constraints of data center personnel.
Such a server would waste energy budgets, Wescott said.
The air-conditioner, which was supposed to be turned off at night, was opened overnight by rogue servers. ”
After rectification, the data center's energy efficiency has made great progress. Since the Wescott began to renovate the equipment, only one accidental power outage occurred in the data center because of the extremely hot weather and the cooling system failure. Wescott knew that his task would continue.
(Author: source: TT Blog editor: Xu Jinyang)