Seven Hazardous Signals During Hadoop Expansion

Source: Internet
Author: User
Keywords Scaling Data Scientist Running Capacity Planning Danger
Tags apache assisted based big data company cto data data scientist

Raymie Stata, co-founder and CEO of Altiscale, a Hadoop as-a-service company, and former CTO of Yahoo, assisted Yahoo in completing the open source strategy and was involved in the launch of the Apache Hadoop project. Hadoop's expansion and operation are complex processes that hide potential crises in their implementation. Raymie has listed seven crisis signals and corresponding solutions based on experience to help users avoid disasters in advance.

The following is the translation:

Hadoop extension is a very complex process, here are seven common problems and solutions listed.

All Hadoop implementations have potential crises, including some very tricky Hadoop run issues. This type of problem can cause Hadoop to be deprecated before it goes into production, but a "successful disaster" (in fact, more likely to be a pure catastrophe) when it comes to production.

Hadoop expansion and implementation is very complex. However, if you can know exactly where the root cause of the problem is, you can avoid the occurrence of a "disaster." Here are some crisis signals summarized from experience.

Crisis signal 1: can not be put into production

From proof of concept to production environment usage is an important step in Big Data workflows. Hadoop expansion work full of challenges, the larger workload often can not be completed in time, the test environment can not completely cover the real operating environment, such as data testing is a common problem: proof of concept often use unrealistic small or single data set.

Before going into production, scale and stress tests are required. Applications that pass such tests are scalable and fault-tolerant and can assist in developing their own capacity planning models.

Crisis signal 2: start delay

The first application put into production marks the ease with which you can implement SLAs, but as the number of Hadoop clusters increases, its runtimes become unpredictable and the first delay problem can easily be overlooked, and over time Getting worse and worse, eventually leading to a crisis.

Do not wait for the crisis to take action. Before capacity can be challenged, capacity or optimizer may be appropriately expanded. Adjust the expected capacity model, with particular attention to capacity testing in the worst performance environment, giving it a more realistic performance.

Crisis signal 3: Start telling customers that it is impossible to save all the data

Another symptom of a crisis is the reduction of data retention requirements. At first you want to keep 13 months of data for annual data analysis, but because of space constraints you are starting to reduce the time it takes to keep data, which is somewhat equivalent to losing the benefits of Hadoop big data analytics.

Reducing data retention does not solve the problem. To avoid this problem, you must act early to re-examine the capacity model to find the cause of the failure and then adjust the model to better track the root cause of the problem.

Crisis 4: Data scientists lose status

Overusing Hadoop clusters can stifle innovation, leaving data scientists without the resources to run large jobs and without the space to store large numbers of computational results for scientists.

Capacity planning is often easily overlooked and the role of data scientists is often overlooked. Neglecting the lack of planning for the production environment load means that data scientists are often marginalized. Make sure your needs include the needs of data scientists and be effective early in the capacity issue.

Crisis Signals 5: Data scientists solve problems with Stack Overflow

In the early days of Hadoop implementation, operations teams and data scientists worked together. With the success of Hadoop implementation, O & M team maintenance pressure increases, scientists must solve Hadoop problems, often through Stock Overflow to find treatment.

As Hadoop expands and mission-critical additions begin, workloads for maintenance begin to increase, and if you want to keep data experts focused on data research, you need to resize your operations teams.

Crisis signal 6: server temperature increases

When allocating server power supplies, we often assume that they will not run at full capacity, but large Hadoop jobs are likely to let the server run for hours, posing a serious threat to your grid (there are similar issues with cooling). So make sure your Hadoop cluster can operate for long periods of time in full power.

Crisis signal 7: Expenses out of control

In a Hadoop environment based on IaaS deployments, the # 1 "successful disaster" is out of control. You suddenly find that the bill is three times the cost of last month, seriously exceeding the budget.

Capacity planning is a fairly significant step in the implementation of IaaS-based Hadoop, not only for managing capacity but also for managing costs. But good capacity planning is just the beginning, and if you want to scale with Iaas-based implementations of Hadoop, it's best to invest heavily in systems like Netflix to track and optimize costs.

Smooth Hadoop extension

Hadoop plans often underestimate the amount of work required to keep a Hadoop cluster running smoothly, a miscalculation that is understandable. The cost of initial implementation of traditional enterprise applications is many orders of magnitude higher than subsequent maintenance and support. People often mistakenly believe that Hadoop follows the same pattern. In fact, Hadoop is very difficult to maintain and requires a lot of work and maintenance.

Good capacity planning is essential; with good capacity models, it needs to be updated in time to avoid deviating from the real world scenarios; do not let innovation be a late issue and give data scientists enough support; expansion is not the only solution to the problem The same is true for managing usage, and allowing users (and business owners) to do enough job optimization with a bit of optimization to reduce existing costs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.