KeywordsYahoo today every day five years open source
Hadoop is an open source distributed computing platform for large data analysis, created by Doug Cutting, chairman of the Apache Software Foundation, at Yahoo. A lot of major news on Hadoop was released recently at the Santa Clara Fifth annual Hadoop summit in the United States.
First, cutting revealed that Hadoop will be officially out of Yahoo, managed by Hortonworks, Hortonworks is a new venture by VCs, and is named after the elephant character in Dr Seuss's film "Horton Adventures."
Second, Hadoop technology has gone from a scientific project to a mainstream commercial application for five of years. The name Hortonworks is just right for the new company, as it is named after the name of a toy elephant (the toy of the president's son), like Hadoop.
Apache Hadoop is a Java based Open-source software architecture that runs distributed, data-intensive applications. It enables application security extensions to handle thousands of nodes and PB-level data. More and more businesses are discovering that they need to analyze stored data to help them make better business decisions. There are many Hadoop distributed systems in the market, which will be mentioned in the following article. In addition, the article will also mention the five years of open source software development.
1, Yahoo founded Hortonworks to guide the Hadoop community
June 29, VCs invested in creating independent private company Hortonworks to guide the Hadoop community and promote Open-source products. Yahoo, once its parent company, is now one of its clients.
2. Hadoop is no longer a science project.
Yahoo has turned Hadoop from founder Doug Cutting's science project into a world-class platform for only five years. Contributed more than 70% of the code to become the IT industry excellent large data platform.
3. Hadoop is a key part of IBM Watson
The analysis and data discovery capabilities of Hadoop are important reasons why IBM Watson has been able to overcome two other human champions in the "Dangerous edge" race.
4, the largest deployment: 200PB of data per day
The biggest deployment environment in the technology world (Yahoo, for example), the daily Hadoop analysis of more than 200PB of data, makes Yahoo more humane, closer to users and customers. It collaborates with all aspects of Yahoo IT systems, including search, advertising, user experience, and fraud discovery.
5, the system of the power to deal with large data
Yahoo's Hadoop system includes more than 42,000 servers and a cluster of 4000 devices that can handle more than 5 million jobs a month. There are 14 million new files going into the Hadoop system every day, which is a piece of cake.
6. Hadoop or selling services around the platform
Hadoop software is available for free as an Open-source project and will launch a range of advanced services for businesses that require higher levels of service.
7, the fight against spam, personal page free customization
Hadoop allows 289 million of Yahoo mailboxes to be protected from spam. In addition, Hadoop plays a key role in 13 million custom-tailored web interfaces for personal use.
8, not only to deal with network traffic
Hadoop has evolved not only to deal with network traffic and scientific research (pictured as the CERN Super Collider). It is now also used in search engines, advertising optimization, machine learning and content enhancement and content delivery. It can load 10TB of data into the research cluster every day.
9, Hadoop new company rapid growth
MAPR, Zettaset, Cloudera, hstreaming, Hadapt, DataStax, datameer these new Hadoop-related companies have been invested and are well known for bringing the latest technology to various markets.
10, Hadoop still need to improve
The senior executives of Yahoo and Hortonworks have acknowledged that Hadoop still needs time to develop to make it easier to use, especially as the user interface needs to be improved, but the two companies ' teams believe they will solve the problem within a few months.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.