Hadoop growing to lead open source cloud computing

Source: Internet
Author: User
Keywords Open Source Microsoft
Tags analysis apache applications based bigdata business change cloud
The recent investment in cloud computing by major giants has been very active, ranging from cloud platform management, massive data analysis, to a variety of emerging consumer-facing cloud platforms and cloud services. And the large-scale data processing (Bigdata 處理) technology which is represented by Hadoop makes "Business king" Change to "data is king". The prosperity of the Hadoop community is obvious. More and more domestic and foreign companies are involved in the development of the Hadoop community or directly open the software that is used online.





and Google are still in a strong competitive relationship Yahoo! So recruit Doug (Hadoop founder) came in, the Google boss rely on the survival of DFS and map-reduce open source, began Hadoop's childhood. It was almost 2008, when Hadoop became mature. From its inception to the present, Hadoop has been accumulating for at least 7 years, and now Hadoop is not only a special product of the second Yahoo, from the long list of users of Hadoop, you can see Facebook, Linkedin, Amazon, you can see EMC, EBay, Twitter, IBM, Microsoft, Apple, HP ... Domestic companies have Taobao, Baidu and so on.











not only that, the latest news shows that even Microsoft, the software giant, has also opened its arms to Hadoop recently. At the SQL Pass 2011 Summit in Seattle, October 12, Microsoft announced that it would collaborate with Hortonworks, a spin-off from Yahoo, to build Windows Server and Windows Azure platform on Apache Hadoop. Hortonworks, as Microsoft's strategic partner, will help maximize the integration of Hadoop into Microsoft's products by leveraging its expertise in this area.





Microsoft has said it expects to launch the Windows Azure Preview of Hadoop by the end of this year, while the Hadoop based Windows Server will be launched in 2012. Windows Server based on Hadoop also handles tasks jointly with Microsoft's existing BI tools. Microsoft officials also confirmed that SQL Server "Denali" would be officially named SQL Server 2012. At the same time, Microsoft will also increase its input to the JavaScript language, Microsoft will use JavaScript to achieve high-performance map/reduce. Microsoft is committed to working closely with the Hadoop community and actively contributing to the Apache Software Foundation's projects.











, senior vice president of Microsoft Business Platform, Ted Kummert, said in a statement that the move would help Microsoft's customers better manage their big data. More and more companies are looking for ways to collect and analyze unstructured data to help them gain insight into their business. But so far, traditional relational databases have been designed primarily to deal with structured data, and their inherent characteristics lead to poor scalability. While the support of Hadoop as an open source framework for large data is increasingly appealing to it executives, Hadoop is ideal for dealing with unstructured data, such as content in e-mail messages, blogs, streaming data from clicks, audio and video.





Of course other giants are not to be outdone, have acted. Oracle has also recently launched a large data device based on Hadoop and Oracle's own NoSQL database and distributed data analysis system based on open source language R. Just a few days ago, IBM announced it would buy platform Computing, a privately owned system software company. This helps IBM to better serve its customers, helping them manage and analyze large-scale data in a more appropriate manner, reducing cost and system complexity.





mentions Oracle's latest move on Hadoop, and can't help but talk about R language. R language as a source of data statistical analysis language is imperceptibly in the enterprise to expand their influence. Unique extensions provide free extensions and allow the R language engine to run on the Hadoop cluster.





R language is mainly used for statistical analysis, drawing language and operating environment. R was originally developed by Ross Ihaka and Robert Gentleman from Oakland University in New Zealand. (also known as R) is now being developed by the R Development core team. R is a GNU project based on the S language, so it can also be implemented as an S language, and code written in S language can be run without modification in R environment. The syntax of R is from scheme.











now, statisticians can use the R language, R language to excel in the analysis of unstructured data stored in the Hadoop Distributed File system. R can now run on HBase, a relational database, and a column-oriented distributed data store. The main imitation of Google's bigtable. This is essentially equivalent to using Hadoop to hold a database of structured data. Just like the subproject hbase of the Apache Software Foundation Hadoop project.





Revolution Analytics provides business software expansion and support for open source R language, which enables statisticians and scientists to discover meaningful information from a large number of important materials in a short period of time. David Champagne, chief technology officer at Revolution Analytics, says the R engine can be deployed on every node in the Hadoop cluster. Instead of reducing the algorithm in Java programming, you can set up the R algorithm in a workgroup where R is deployed. It can parse the nodes of the Hadoop mapping function, while the parallel statistical analysis is stored in the HDFS data.





In addition, the open source operating system Ubuntu 11.10 Server version began to support the JuJu (formerly code-named Ensemble) plan, which provides more than 30 kinds of cloud applications automatic deployment capabilities, support MySQL, Tomcat 6 and Hadoop, etc. Assist enterprises to accelerate large-scale deployment of cloud applications.





In the case of Hadoop deployment, IT staff had previously had to install Java programs, install Hadoop programs through Java programs, and then set up cluster relationships between servers. Now IT staff can install the JuJu program in version 11.10, as long as the command line input a few instructions to automatically install Java and Hadoop programs, and set up a Hadoop cluster, so that enterprises can quickly build Hadoop applications. Future IT staff to expand the cluster, as long as the juju input instructions, you can bring the new server into the Hadoop system.











Twitter also recently launched the open source real-time Hadoop computing system. This is a distributed, fault-tolerant real-time computing system that is hosted on GitHub and follows Eclipse public License 1.0. Storm is a real-time processing system developed by Backtype, and Backtype is now under Twitter. The latest version of the GitHub is Storm 0.5.2, basically written in Clojure.





Storm provides a set of common primitives for distributed real-time computing, which can be used in "streaming" to process messages and update databases in real time. This is another way to manage queues and worker clusters. Storm can also be used for "continuous computing" (continuous computation), a continuous query of the data stream, which outputs the results to the user in the form of a stream. It can also be used for "distributed RPC" to run expensive operations in parallel. Storm's chief engineer, Nathan Marz, said: Storm can easily write and extend complex real-time computations in a cluster of computers, storm to real-time processing, just as Hadoop is to batches. Storm guarantees that every message will be processed, and that it will quickly--in a small cluster, can handle millions of messages per second. What's even better is that you can use any programming language for development.





We will comb the history of Hadoop for seven years (2004-2011). With a detailed understanding of the development of Hadoop, it is easy to see that the development of Hadoop has largely gone through the process of using, contributing and perfecting a powerful ecosystem, from an open source Apache Foundation project, as more and more users join in. Since 2009, with cloud computing and the development of large data, Hadoop, as the best solution for massive data analysis, has begun to be the focus of many it vendors, resulting in many business versions of Hadoop and products that support Hadoop, including software and hardware.





So what does Hadoop have in terms of massive data processing to make many giants so sought after? In fact, Hadoop has irreplaceable advantages in scalability, robustness, computational performance and cost, and has in fact become the major data analysis platform for the current Internet enterprise.




The
mass level refers to the amount of data that has been completely invalidated or cost prohibitive for the database and BI products. There are also many excellent enterprise-class products with massive data levels, but based on the cost of hardware and software, most Internet companies currently use Hadoop's HDFs Distributed File system to store data and use MapReduce for analysis. This album will mainly introduce a multidimensional data analysis platform based on MapReduce on Hadoop.








use Hadoop for multidimensional Analysis, first of all to solve the above dimensions difficult to change the problem, using Hadoop data unstructured features, the data collected in itself is a large number of redundant information. At the same time, a lot of redundant dimension information can be consolidated into the fact table, which can change the angle of problem analysis flexibly under the redundancy dimension. Secondly, using Hadoop mapreduce powerful parallel processing ability, no matter how much the dimension of OLAP analysis increases, overhead does not increase significantly. In other words, Hadoop can support a huge cube that contains countless dimensions you think or expect, and each multidimensional analysis can support hundreds of dimensions without significantly impacting the performance of the analysis. More details on this album we will bring you ...





Hadoop is an excellent example of open source innovation. In fact, as James Urquhart of Cisco speculates: "Hadoop can be said to be the first milestone success in enterprise software for open source projects that do not involve any existing patents." "Even though milestones are more than one, it's rare to be able to quickly expand the fruits of success on such a scale," he said.





is precisely because of this, we are pleased to see that the academic circle of Hadoop research enthusiasm is not reduced, this year's vldb on the emergence of several mapreduce/hadoop-related papers. This shows that Hadoop has a lot to improve. On the other hand, some commercial software is also moving toward Hadoop technology, compatible with the Hadoop software stack. At the same time, at home and abroad, a number of companies to provide technical advice and services Hadoop, Hadoop large-scale data processing technology business value has gradually been the industry's attention.





and our domestic development, research and exploration of Hadoop also showed great enthusiasm. The Hadoop China 2011 Cloud Computing Conference (2011,hic2011), hosted by the Institute of Computing Technology at the Chinese Academy of Sciences, will be held at the Beijing Convention Center from December 2 to 3rd, which will be an annual technical event for Hadoop in the Chinese community! The conference will unite international and domestic successful companies in the application of Hadoop and cloud computing technology and introduce an academic view of the international research community on cloud computing and disc (Data intensive Super Computing). This paper examines the current situation and development trend of cloud computing technology and the open source ecosystem of Hadoop through the dual perspective of technology application and scientific research.











after 3 years of nurturing, the Chinese Hadoop Open source volunteer community has been formed with the Hadoop China Convention as the window. The Hadoop China convention originated in the Hadoop developer Technology Salon, held in 2008 in the Chinese Academy of Sciences, to promote the use of Hadoop technology and its applications, to understand the practical needs of Internet applications for cloud computing technology, to develop the spirit of open source, and to build an exchange platform between enthusiasts of Hadoop technology.





The conference is rich in content, a lot of bright spots. The Conference specially invited the founder of Open source software such as Lucene, Nutch, Hadoop, and chairman of the Apache Software Foundation and director of the Apache Hadoop project, Mr. Doug Cutting, founder of Condor University of Wisconsin –madison, Professor Miron Livny, and academics and senior developers at home and abroad from Google, Facebook and other famous internet companies and IT companies, have been giving lectures and technical exchanges, with some experts visiting China for the first time.











about five or six years ago, Apache Hadoop was just a prototype system that contained 20 nodes. Since then, Yahoo has been working on the Apache Hadoop project, forming a team and focusing on the Apache Hadoop project for the past time. After several years of effort, the current situation is good, the news media is interested in Hadoop, thousands of companies or departments have widely adopted Hadoop. And when Yahoo set up a hortonworks company and began to promote Hadoop alone, it was time to push the Hadoop technology forward and implement new features around Hadoop, the role of new technology. So what kind of future will it take to meet Hadoop? Let's wait and see!




The rise of
elephants: Hadoop's seven-year development and rain record





in the field of the Internet has been such a saying: "If the second can not defeat the eldest brother, then the eldest to survive the source of things." Yahoo!, who was in a strong competitive relationship with Google, then recruited Doug (the founder of Hadoop) to open the source of Dfs and Map-reduce, which was the lifeblood of Google, and began the childhood of Hadoop. It was almost 2008, when Hadoop became mature.





Hortonworks CEO: Half of the data is based on Hadoop





Prior to being appointed CEO of Hortonworks, Eric Baldeschwieler was responsible for the Apache Hadoop project in Yahoo, developing its 20-node prototype system into 42,000-node services. Eric Baldeschwieler, once the technology leader of the Inktomi company's Web services engine, was bought by Yahoo in 2003. In the interview, Eric Baldeschwieler said that half of the world's data will be used in Hadoop over the next five years, and the following is an interview with Eric Baldeschwieler.





R language brings great changes to the data analysis of Hadoop cluster





R as a source of data statistical analysis language is imperceptibly in the enterprise to expand their influence. Unique extensions provide free extensions and allow the R language engine to run on the Hadoop cluster. R language is mainly used for statistical analysis, drawing language and operating environment. R was originally developed by Ross Ihaka and Robert Gentleman from Oakland University in New Zealand. 12 Next Page
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.