Software engineers of large architecture data solutions know that Business Analytics has a technology spanning SQL databases, NoSQL databases, unstructured data, document-oriented data storage, and large processing. If you guessed Hadoop, you answered correctly. Hadoop is also a common denominator for many giant companies, such as Amazon, Yahoo, AOL, Netflix, EBay, Microsoft, Google, Twitter and Facebook. IBM is even at the forefront of the times, promoting Hadoop for enterprise analysis. This open source model is everywhere, and its five years on this stage is a real role and we have to be surprised.
The future of Hadoop
To understand what has happened over the past few years, we have visited Chuck Lam, author of the Hadoop in Action (hadoop) Act. Chuck says Hadoop hasn't stopped for a rest. "The whole ecosystem is really evolving and changing a lot. Now there is even the official 1.0 version. More importantly, MapReduce's BASIC programming model has been revised and changed a lot. "In general, these changes are developing in a positive way." The development direction has made this framework easy to deploy in the enterprise, and solve a series of problems, such as the risk aversion to the company is the first security issue.
The benefits are increasing, including high levels of scalability. Distributed computing in this framework means adding more and more data without changing the way it is added. There is no need to change the format, or to disrupt the way work is edited or decide which application completes the work. You just add more nodes as you work. You don't have to be picky about the type of data you store or where it comes from. Modeless is the name of this game. The parallel computing capability of the framework also makes the utilization of commodity server storage more efficient. This means that the enterprise can save and use more data. No matter which node fails, it is fine. Even if the system fails, data is not lost and performance is degraded.
Power-assisted Hadoop technology
Hadoop is now also more flexible, allowing businesses to do more things and handle more data types. This powerful feature stems from the many companion projects of Hadoop, including languages like Pig, and the following scalable solutions:
1. Hive (Data Warehouse)
2.Mahout (machine learning and data mining)
3.HBase (structured storage of large tables)
4.Cassandra (Multi-host database)
Of course, this type of solution is not always good. Lam says the main trap is to deal with the assumptions made. In other words, the fault is not in our system but in ourselves. "New technology is not a panacea for all problems. As simple as NoSQL, you have to get a deeper understanding of the problem you're trying to solve. "This could mean looking at your algorithm carefully, rather than just throwing your employees to MapReduce and expecting Hadoop to expand automatically," he said. Using patterns of data can affect your expansion model-especially when the use of uneven is. Then the linear extension may not work. Again, this is not a problem with Hadoop itself. Lam believes that companies with tools in hand are mature enough. This is just to ensure that IT administrators are familiar with these tools, and that software architects using Hadoop know how to use this technology more effectively.