The father of Hadoop Doug Cutting

Source: Internet
Author: User
Keywords themselves they at that time could search technology

In life, perhaps all of them have indirectly used his work, and he is the initiator of Lucene, Nutch, Hadoop and other projects. It was he who made the esoteric search technology a product that contributed to the general public, or he created Hadoop, which is now at the zenith of cloud computing and large data. He is a kind of thief of fire, he is Doug Cutting.

Starting with Interns

1985, Cutting graduated from Stanford University in the United States. He was not determined to join the IT industry at first, and during the first two years of his college days, cutting studied regular courses such as physics and geography. Because of the pressure of tuition, cutting began to realize that he had to learn some more practical and interesting skills. In this way, on the one hand can help pay off the loan, on the other hand, also for their future life to do. Because Stanford University is located in the IT industry's "mecca" Silicon Valley, learning software is a natural thing for young people.

Cutting's first job was to be an intern at Xerox, where the Xerox laser scanner ran three different operating systems, one of which had no screen saver. Therefore, cutting began to develop screen saver for this system. Because this program is based on the bottom of the system, other colleagues can add different topics to the program. This job gave cutting a certain sense of satisfaction, but also his earliest "platform" class works.

It could be said that Xerox had a decisive influence on cutting's later research on search techniques, except for the brief experience of working in Scotland, where the early stages of the cutting career were mostly spent in Xerox, a time that had greatly improved his knowledge of search techniques. He spent four years doing research and development, and for four years he read a lot of papers, and he published a lot of papers, using cutting's own words-"My graduate student was in Xerox." ”

Although Xerox let cutting accumulate a lot of technical knowledge, but he thought that the research he was doing is only on paper, no one tried these theories of practicality. So he decided to take this step bravely and make the search technology available to more people. At the end of 1997, cutting began spending two days a week at home trying to turn this idea into reality in Java, and soon after, Lucene was born. As the first open source function library to provide Full-text search, Lucene's greatness is needless to say.

The birth of Hadoop

After that, cutting to continue to deepen the idea of open source on the basis of Lucene. In 2004, cutting and a programmer-born Mike Cafarella decided to develop an Open-source search engine that could replace the mainstream search product at the time, named Nutch. Before that, cutting's company Architext, whose main product was excite search engine, had failed to withstand the impact of the dotcom bubble, and cutting was in posted's career, So he wants his project to build a lot of algorithms in the Web page in a low-cost way. Luckily, Google released a research report that described two software platforms Google has developed to support its own search engine. One of these two platforms is GFs (Google File System), which stores huge amounts of data generated by different devices, and the other is MapReduce, which runs on GFS and is responsible for distributing large-scale data. Based on these two platforms, cutting's most high-profile work--hadoop was born. Referring to Google's "help" for them, Cutting said: "We are beginning to envision using 4~5 computers to implement this project, but there are a lot of tedious steps involved in the actual operation that need to be done manually." Google's platform allows these steps to be automated, laying a good foundation for our overall framework. ”

Speaking of Google,cutting is also one of the witnesses of its growth, here is a little-known story. As early as cutting in Architext, two young people had visited the company and sold them their search techniques, but their demo only retrieved millions of pages, and excite engineers thought their technology was too small for pediatrics, so they despised it in their hearts, Send them away. But this is not the end of the story, the two young people back to the bitter experience, decided to start their own business. So they opened a search company of their own, named Google. The two young people are Larry Page and Sergey Brin. In cutting's view, Google's success depends largely on the design of a reverse-sorted store and confidence in its own technology.

Let "open source" influence the world

In view of the time cost, after four years of separation from Architext, cutting decided to end this posted career and find a reliable company to further improve the performance of Hadoop. He interviewed several companies, including IBM, but IBM seemed more interested in his early project Lucene, and Hadoop was noncommittal. At this time, cutting accepted the Yahoo! Search Project director Raymie Stata invitation, in 2006 formally joined the Yahoo!. In Yahoo!, a team of 100 people helped him refine the Hadoop project, and the development effort was fruitful. Soon after, Yahoo! announced that the architecture of its search business would be migrated to Hadoop. Two years later, Yahoo! started the first application project, "WebMap", based on Hadoop, an algorithm for calculating link relationships between Web pages. "In the same hardware environment, the WebMap based on Hadoop is 33 times times faster than the previous system," said Eric Baldeschwieler, Cutting's boss (then Hortonworks CEO). ”

While the performance of Hadoop was stunning, not all companies were conditionally used at the time, while user demand was growing. Some big companies (such as banks, telecoms companies, big retailers, etc.) focus only on products, but do not want to invest too much in technical engineering and consulting services, they need a platform to help them solve problems, which is the original intention of cutting to Cloudera. To some extent, Cloudera is a platform for those who are in need of consultancy and technology. Most of its customers are from traditional industries, and want to use Hadoop to deal with large-scale data that can only be discarded directly before. Now, in addition to these traditional industries, Yahoo!, Facebook, EBay, LinkedIn and other companies are using Hadoop, in cutting's words, their team is "invisible to expand."

Currently, cutting's goal is to develop Hadoop into a redhat in the cloud computing world. "I never thought that, in addition to the search engine, the role of Hadoop can be played in other ways, it has been more attention today than all of my previous imagination." Speaking of success, cutting that his success was mainly due to two points, the enthusiasm for his work (cutting began to do infrastracture classes in college, but also used Lisp to contribute code for Emacs, and he loved the way his program was used by millions of people). The second is the goal not to set too large, to be steadfast, step.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.