Today, I attended 3 keynotes,42 sessions of 8, and a lot of vendors to discuss technology, is really a big bang day.
Hadoop is 7 years from birth to this year, and this year there have been many new changes:
1, Hadoop is recognized as a set of large industry data standards open source software, in a distributed environment to provide a huge amount of data processing capacity (Gartner). Almost all major manufacturers revolve around Hadoop development tools, Open-source software, commercial tools, and technical services. This year, large IT companies, such as EMC, Microsoft, Intel, Teradata, and Cisco, have significantly increased their input to Hadoop, and Teradata has also publicly shown an all-in-one machine; This time saw several are sqrrl, WANdisco, Gridgain, InMobi and so on, have launched the open source or the commercial software.
2, Hadoop ecosystem rich and colorful, but the core has been Cloudera, hortonworks firmly in control, basically did not shake the possibility. This year Hortonworks propaganda is 100% open Source,cloudera had to do hurry, who called him not open Cloudera Enterprise Manager source code? Hortonworks introduced Ambari, the venue at least 5 Cloudera engineers listen carefully, there is a young man constantly on the ipad shorthand, competition can be seen, personal estimates, Cloudera will Enterprise Manager open source sooner or later. Hortonworks current Ambari committer is 20+,contributor 50+, the latter number may have some water, but the first is no problem. The update,1.25 version is now more obvious than the 1.0x version of the day. Other manufacturers of the survival of the way is to engage in plug-ins, such as WANdisco, VMware, Mellanox, Gridgain, and plug-ins are not to modify the core of the plug-in-these manufacturers are not able to move the core, continuous investment may have some role, such as VMware, But the first-tier Hadoop makers will never let go.
3, Hadoop 2.0 transformation is basically unstoppable. Hortonworks's Vparun in introducing Tez, gave a lot of interesting ppt, the theme is one: MapReduce is already yesterday, yarn will be the future parallel computing infrastructure. I have not used yarn myself, but Hortonworks has developed a lot of tools around yarn, especially tez, which can improve the execution time of the query plan, pig and Hive will be rewritten and reloaded. Hortonworks Although did not get out of impala, but from the more low-level technology surrounded Impala, two eldest brother layout and contest has never stopped.
4. SQL over Hadoop is an important technical trend. Last year Hadoop world, MPP also boasted how the Ox x. But after Google released Dremel and POWERDRILL,EMC to get Hawq,cloudera out of Impala, all the MPP began to rethink their technical route. and Parccel Technician (feeling is pre-sales) discussed, she found a card said Parccel speed is hive 100X, leading Impala10 years. I feel this speech will soon malfunction, the first is hive optimization has not stopped, hortonworks out Tez, Stinger (with Facebook). Although MPP has been leading Hadoop for many years, according to the 80:20 principle, if Hadoopsql only do the 20% features that the user needs, then the gap is up to 2 years, and within 2 years, Hadoopsql will surpass MPP in some areas. MPP Enterprise's Way Out is to learn HAWQ. Column storage is also new, the recent mainly Orc (MS and Hortonworks Cooperation), parquet (Twitter and cloudera cooperation), there are wood to see two giants PK figure? Have wood to see hold Regiment PK? These techniques show great advantages in testing
5, it and open source units to cooperate widely. It's not just between it vendors and open source, but it's actually working closely together. Not very clear about the internal information of cooperation, but there are basically two modes: product/software cross integration (including management system integration), cooperative development and promotion. The technical requirements of the software has a good architecture, provide an open interface, this point Ambari design and my requirements for HT, I can not do, and Amabri has developed several versions.
6, technically, large data and cloud integration is also an option (note, not the trend, but the option). New OpenStack related issues have been added this year, and some integrators and vendors have also proposed a scenario for applying cloud Hadoop. This does not apply to everyone, but some users can benefit. Netflix is a classic example, their examples are on AWS, and apparently their Hadoop is based on virtual machines, interacting with a Netflix lad (Japanese), which has about 2000 virtual instances, based on EMR, and developed a Gennie management system.
There's a big bang on the news in 4 hours. Paste a small courtyard in the hotel to enjoy the cool, see the small squirrel bar, it is not 5 meters away from me, really want to praise the environment of the U.S. Emperor!