"Editor's note" the author of Raymie Stata is the founder and CEO of Hadoop, the Altiscale, and the former CTO of Yahoo, helping Yahoo to complete its open source strategy and participate in the launch of the Apache Hadoop project. The expansion and operation of Hadoop is a very complex process that hides potential crises in its specific implementation, Raymie 7 crisis signals and corresponding solutions based on experience to help users avoid disasters in advance.
The following is the translation:
The Hadoop extension is a very complex process, with 7 of common problems and solutions listed here.
All Hadoop implementations have a potential crisis, including some very tricky running problems with Hadoop. This type of problem can lead to the abandonment of Hadoop before it is put into production, but if it occurs in a production environment it means a "successful catastrophe" (in fact, more likely a pure disaster).
The expansion and implementation of Hadoop is complex. But if you can really understand the root of the problem, or you can avoid the "disaster", the following is based on experience summed up some of the crisis signals.
Crisis Signal 1: Unable to put into production environment
From concept validation to production environment use is an important step in the large data workflow. Hadoop expansion is challenging, and the larger workload is often not done in time, and the test environment does not fully cover the real-world environment, such as a common problem with data testing: Concept validation often uses impractical small or single datasets.
Before putting into production, scale and stress tests are required, and applications with such tests are scalable and fault tolerant, and can assist in developing their own capacity planning models.
Crisis Signal 2: Start the extension
The first application into the production environment marks the ease with which you can implement SLAs, but as the number of Hadoop clusters increases and its uptime becomes unpredictable, the first extension problem is easily overlooked, and over time the situation gets worse and ends up causing a crisis.
Don't wait for the crisis to take action. Before capacity is challenged, you can expand capacity or optimize the program appropriately. Adjust the expected capacity model, paying particular attention to capacity detection in the worst-case performance environment to allow for more realistic performance.
Crisis Signal 3: Start telling customers that it's impossible to save all data
Another sign of the crisis is the need to reduce data retention. At first you wanted to keep 13 months of data for annual data analysis, but because of space constraints, you started to shrink the time to keep the data, which in some ways is equivalent to the loss of Hadoop's large data analysis capabilities.
Reducing data retention time does not solve the problem, and avoiding this problem requires early action, a re-examination of the capacity model, a search for the cause of the failure, and then an adjustment of the model to better track the source of the problem.
Crisis Signal 4: Data scientists lose their place
Overuse of Hadoop clusters can stifle innovation, leading to data scientists not having enough resources to run large operations, and there is not enough space for scientists to store large amounts of computational results.
Capacity planning is often overlooked, and the role of data scientists is often overlooked. Neglect, coupled with inadequate production-environment load planning, means that data scientists are often marginalized. Make sure your requirements include the need for data scientists and can play a role early in the capacity problem.
Crisis Signal 5: Data scientists solve problems through stack overflow
In the early days of Hadoop implementation, the operational team worked with data scientists. With the success of Hadoop implementation, the maintenance pressure of the operational team increased, the scientists must solve the problem of Hadoop themselves, usually through the stock overflow look for ways to deal with.
As Hadoop expands and key tasks increase, the amount of maintenance effort begins to increase, and if you want to ensure that data experts focus on data research, you need to readjust the size of the operational team.
Crisis Signal 6: Server temperature rise
When allocating server power, we often assume that they will not run at full capacity, but large hadoop jobs are likely to keep the server loaded for several hours, seriously threatening your grid (there are similar problems with cooling). So make sure your Hadoop cluster can run for a long time in a full power environment.
Crisis Signal 7: Spending out of control
In the Hadoop environment based on IaaS deployments, the first "success disaster" is the loss of control. You'll suddenly find that the bill is three times times the size of last month, and it's heavily out of budget.
Capacity planning is an important step in the implementation of Hadoop based on IaaS, not only to manage capacity but also to manage costs. But good capacity planning is just the beginning, and if you want to extend the implementation of the IaaS based Hadoop, it's best to invest in a system like Netflix to track and optimize costs.
Smooth Hadoop extension
The Hadoop program generally underestimates the amount of work required to keep the Hadoop cluster running stably, and this miscalculation is understandable. The initial optimization implementation cost of the traditional enterprise application is much higher than the subsequent maintenance and support, it is often mistaken that Hadoop follows the same pattern, in fact, the maintenance of Hadoop is very difficult and requires a lot of operation and dimension.
High-quality capacity planning is essential; with a good capacity model, it also needs to be updated in time to avoid deviation from the actual scenario; do not let innovation become a late problem, give the data scientists enough support; expansion is not the only solution to the problem, and management usage is equally important Allow users (and business owners) to do enough job optimization, and a little optimization can reduce existing costs.
Original link: Seven signs your hair is on fire:the challenges of scaling Hadoop (translation/Chingling Zebian/Zhonghao)
Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!
CSDN publishes related cloud computing information, such as virtualization, Docker, OpenStack, Cloudstack, and data centers, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, memory calculations, stream computing, Machine learning and intelligent algorithms and other related large data views, providing cloud computing and large data technology, platform, practice and industry information services.