For many companies, the Hadoop framework is just beginning to be enabled, and some examples of best practices have only recently emerged.
Piyush Bhargava, chief Data Architect at Cisco Systems, says how to choose the Hadoop release and how to integrate Hadoop and mapreduce with existing systems is the main dilemma the company faces when it comes to enabling Hadoop. He suggested that the company should consider the feasibility when putting into production.
Bhargava's work on the development of Hadoop is part of Cisco's overall information program, and with Hadoop, the company can more effectively support a variety of application cases, and managers can gain greater value from the data.
Hadoop Best Practices
Bhargava and his team have started creating an enterprise-class Hadoop platform. The first task is to reduce the workload of the Data Warehouse. Some Hadoop user cases have been put into the market, such as integrated offline and online customer information. Although Hadoop is now small, it will grow exponentially over the next two years. This requires developers to centralize external Hadoop resources into a central resource pool.
Bhargava that today's Hadoop, like the 90 's ERP, will eventually become the core analysis tool for the enterprise, so it's time to integrate it into the organization.
Through hard work, Cisco's workload management has been successful. Bhargava said that the management of Hadoop must be focused on the entire cluster, not just a single job. To manage Hadoop, traditional data warehouses, and other systems, Cisco has set up a data management schedule.
In addition to workload management, cloud computing and building people are key to implementing Hadoop best practices.
Like all other jobs, Hadoop also needs to improve the proper team building. Because Hadoop has a lot of work to do, more like a mainframe era, the team is more important to Hadoop.
"My database team is in need of people with programming minds, and COBOL (general business language) programmers from MapReduce are very popular," said Scott Russom, director of software engineering at Solutionary, the managing security services provider. ”
At the same time, cloud computing is a way to implement Hadoop. The United States Climate company has deployed a framework for integrating private and public clouds in Hadoop. Andrew Mutz, its engineer director, said that by deploying the Hadoop cluster in-house, the company was able to quickly test the climate model, draw conclusions as soon as possible, and learn how to expand safely. After that, Hadoop can move to the cloud.
"This combination of internal deployment and cloud computing works very well," he said. We work directly from data sources to avoid delays. ”
For Cisco's Bhargava, the best practice for Hadoop management comes from good planning. "You often attend meetings and see all kinds of gorgeous products, but in the end, you need to be grounded." You need to consider its scalability, and at the outset of planning, consider how it will grow in the future. ”
The companies are using Hadoop, a mapr issue, because it focuses more on the management of Hadoop. This is the company that provided the Hadoop release earlier than the Apache Foundation's release of Hadoop.
Play around with the Hadoop tool
Forrester analyst Mike Gualtieri believes other technologies need to evolve with Hadoop, such as security, scalability, and high availability.
Hadoop, he cautions, is still in its infancy. Gualtieri said a recent Forrester survey showed that only 16% of respondents were using Hadoop, and many were just watching. In general, Hadoop is only a "very cool tool", and only a few of the pioneers are using it.
Chasm, author of the crossing, said in the book that Hadoop software ecosystems have many tools, including Hive, Accumulo, Giraph, Cassandra, and Spark, if you can't play around with these tools, You can't be a pioneer.
In Moore's view, Hadoop is now much like the industry leader's funding program. But Hadoop is unstoppable, and large-scale use is within reach.