Is Baidu and Google's search technology an order of magnitude?

Source: Internet
Author: User
Tags knowledge base

Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source. Kenny Chao Link: http://www.zhihu.com/question/22447908/answer/21435705 Source: Know

Many of the answers are from the use of, I add two technical aspects.
    1. Search engines need to manage the captured results. When the index results are more and more, it is more and more difficult to guarantee the speed of storage and query, and ensure the consistency of tens of thousands of server contents. Google published three papers around 03-06, describing GFs, BigTable, and MapReduce three technologies to address these issues. Since Google did not publish algorithmic details, Yahoo led the creation of Hadoop, an open source project around 06, to achieve a large-scale management computing system based on Google's three papers. But until 08, some of the key indicators released by Hadoop and Google are still several times the difference. Baidu was once led by a Ph. D. Member of Wang Xuan, who wanted to implement a self-contained system based on the Google paper (pyramid scheme), but the project was too difficult to develop and eventually turned to Hadoop. Today, Amazon, Facebook, Yahoo, including Baidu are using Hadoop on a massive scale, and Google has moved from 2010 onwards to the new troika caffeine, Pregel and Dremel. In terms of search technology alone, Google is not leading Baidu, but is leading the world.
    2. In 2009-2012, Google unveiled the world's first global database system, spanner, which connects data centers across the globe, using atomic clocks and GPs to break the geographic gap and achieve a globally consistent and real-time database. Before Google, many people thought the system could not be done, but Google did [1].

In addition to search, Google is the world leader in deep learning and robotics, especially in the latter. Although Baidu also has deep study Institute, but in these two aspects compared with Google is completely blank.

In fact, let Baidu to and Google than is very unfair, search is just a department of Google, but Baidu is a whole company. Google's rivals are Apple, Amazon, Facebook, and Microsoft, and Baidu's rivals are 360, Sogou. Google did not search, there are chrome, Android, Youtube, Baidu did not search, then there is nothing.

--------------------------------------------------------------------------------------------------------------- -------------------------------------

Budding: http://www.zhihu.com/question/22447908/answer/21532527 Source: Know

2014.06.23: I do not know why this answer suddenly a lot of praise and comment, to supplement the information to thank the audience. On the question of traditional characters, I am Cantonese, watching Hong Kong TV grew up, simple and traditional to my reading has no impact, in my eyes more than the other, I use traditional Chinese characters to write this answer the only reason is that the state of the input method is traditional ... In order to maintain a consistent text supplement is still traditional, hope forgive me.

Former Baidu employees, now Google employees, in both companies do is not search related projects.

First sentence answer: In the basic technology related to search, Baidu still has a big gap with Google, but today there is still the magnitude of the gap in doubt.

Starting with an unrelated field, the Soviet Union's 1960-Mig-25 [1] Interceptor, the world's first fighter to fly double three (three times times Sonic, 30000 M ceiling). The Western world in the face of this abnormal performance parameters surprised, inferred that the Soviet Union in aviation technology has completely surpassed the West. Until Belenko driving Mig-25 defected to the West, they finally have the opportunity to contact the real machine, only to find that it uses the technology is not so advanced, the abnormal performance indicators are using the ordinary technical basis of hard to dry up, the plane is very clumsy to have " Straight Fighter "the title, poor engine to really fly three times times the sound of the ground will be scrapped. The Soviet aviation technology was not so counter-clockwise as they had imagined.

In 2009, I was in Baidu, the face of Google's public technical information and Baidu's internal system, I first think of is Mig-25. Like this fighter, Baidu at the time, in the Chinese search results of the quality of the indicators, compared to Google or have advantages. Baidu's engineers are very smart, but also very hard, at some point also done very fine very good, but in the search-related basic technology, Baidu is still completely backward. Baidu's search quality improved, there is a lot of people rely on the manual to do a large number of fine-grained strategy adjustment hard pull up.

Using common technology to fly on the double three, Mig-25 itself is a remarkable engineering achievement. The next generation fighter, whether it is the Soviet Union Su-27 or the United States F-15, and even four generation machine F-22, have not been able to fly out of double Sunline, but these next-generation fighter in the technical level and overall performance, undoubtedly far better than Mig-25, this should be able to be described as the main magnitude difference. The magnitude difference in technology cannot be evaluated by a particular indicator or solitary example (Mig-25 has also shot down the f/a-18), nor can it only compare the merits and demerits of certain technical points, but often depend on the basic level of technology.

In 2009, I can say with certainty that Baidu search related basic technology compared to Google has a magnitude gap. As far as I know, Baidu has made rapid progress in basic technology these years, of course, and Google is also progressing rapidly. Whether they have a magnitude difference today, I'm not sure.

Here are a few important and publicly available basic technologies:

• Large-scale cluster construction and management. Google's situation can be found in [2] the Datacenter as a computer:an Introduction to the Design of Warehouse-scale machines, Second Edition . Google has the world's largest computer cluster, the number of machines can be more than all other companies in magnitude. At the same time, it has a complete set of automated management software to enable engineers to apply and use these hardware resources (broadly understood as an Amazon EC2). As far as I know, now in the ordinary engineer to use the hardware resources of the convenient degree and can use the amount of, Baidu is much less than. • Large scale computing and storage. Google paper old three GFS, MapReduce, BigTable no longer repeat, in recent years, Google in these areas of research and development and progress has not stalled or even accelerating. Of course, Baidu is also trying to catch up, Baidu not only use Hadoop, but also based on Hadoop to make a lot of improvements and extensions, and contribute back to the Hadoop open source community. Baidu also has a lot of experience in SSD storage technology, such as flash storage in a recent article Asplos ' sdf:software-defined Flash for Web-scale Internet Storage System. • Machine learning and artificial intelligence. Be blown marvellous deep learning and Google Brain and so on. In this relatively new area of deep learning, Baidu is catching up faster and more closely.

The technical level of fleet management determines how much hardware resources you can have and use effectively, and large-scale computing and storage determines what you can do on a large scale on these hardware-and finally, the search engine itself is a large-scale machine learning system.

In addition to pure technology, I would like to mention a significant impact on technological progress, at least in 2009, Baidu and Google a huge gap: the level of tools that ordinary engineers can use. My favorite thing about Google is that I can easily get a lot of computing resources and do large-scale data analysis that I couldn't have imagined before. To validate an idea, I can make an analysis based on a full-day search record and get the results in just a few minutes (see [3]), make adjustments and next analysis, and without the basic software and the free-to-use hardware resources, I might have to wait all day for results, or to analyze small-scale sampling data only. Under the premise of my own knowledge and technical level, Google's system has greatly improved my work efficiency, so that I can do things that I couldn't imagine before.

I think as a technician, black or hold which company meaningless, technical things very direct, in which company can not affect the basic judgment. Still in Baidu, I often think, Mig-25 story is a very good warning, people easily for similar "double three" such achievements complacent, and real deal the basic technology gap blind, without progress, that prospect is quite dangerous. Fortunately, as far as I know, Baidu is not so disappointing.

2014.06.23: Add a practical example to illustrate the differences in the way two companies do things under different technical conditions.

Comments in a friend mentioned Baidu's word segmentation technology, which is really "Baidu more understand Chinese" a concentrated embodiment. Baidu when doing participle is likely to be this: start with a good dictionary of human editing, use this dictionary to run some Web pages, the analysis of bad case--may be the word is too fine, or Chinese names have not been divided, and then try to join the rules according to Chinese grammar rule or add a glossary to solve these case, so reciprocating, until there are satisfactory results. On-line application, found that there is a new bad case study add rules, of course, there are automatic process discovery and confirmation, such as "difficult to break" and other new words.

Google's word is to think of the problem as a probability problem: if the Chinese page of words often appear together, then they are likely to be a word. See which words will follow the ground after the ground to be followed by what words, grammatical structure is also out. (See Wu, "The Beauty of mathematics" for a specific model). The idea is to put all caught in the Chinese web page to the MapReduce lost, the parameters to calculate the good. The method of evaluating the quality of word segmentation is also very simple, take the new model into the Web page retrieval model, do an experiment to see if the quality has not promoted the line. The result of this method is good, the basic Chinese word segmentation made a few suspense simple problem, and basically do not need Chinese language experts to participate (naturally there is no one who understands the Chinese problem). At the same time this is Google to do Translate ideas. The basic method here is very simple, there is no secret, but you have to have so much of the web data, but also have a large fleet, the distribution of computing framework, there are reusable models ...

I think that under the condition of limited technology, artificial fine tuning result is an appropriate product idea, but this product thought will interact with the technical development route. For the 1000 hot words of the long tail head, it is possible to make very good results with the manual editing method, and it is almost impossible to improve the general machine model in the short term to achieve the manual editing effect. At this point, manual adjustments may be encouraged, and the technical improvements to the generic model may not be sufficiently valued – although it is never possible to manually tune all search results even at the cost of labor in China, but it is not bad to have a long tail head? Google's mainstream technical thinking is that bones do not believe in manual adjustment, what must be done to make an automatic universal extensible model, this idea may be at the beginning of the 1000 hot words on how much more than the industrious grounding gas editor, but by accumulating data to adjust the model, in time, The overall quality of the results will be significantly improved-I'm looking at the pressures of Google's search quality for 2009 years. This kind of thinking in the specific product operation is not necessarily right, not everyone has Google's resources to spend time to do general technology, but Google does in this "Technology Crush Everything" (error? On the road, the faster it gets.

--------------------------------------------------------------------------------------------------------------- -------------------------------------

Well, since we all unanimously despise Baidu, for Google cheer cheer, I would like to plug a foot for Baidu record straight a bit.
Statement position: I admit that Google is very strong in many aspects, here just say some Baidu is also a good place. To provide you with some new thinking, hope to be able to inspire you.

1, in the early history of the search, Li's technological innovation is ahead of Google.
Founded in 1994, InfoSeek launched a search engine service that soon became the most popular search technology provider on the market. As a leader in technology, the company's products are set by the Netscape browser as the default search engine. You know, in that era of the American market, Netscape Browser occupies more than 90% of the market share. So, in Netscape's triumphant years, the search engine =infoseek.
--infoseek's CTO was William Zhang, who had received a PhD in computer science from the University of California in the first breakthrough in the "sub-linear text matching algorithm", and subsequently joined Baidu as chief scientist in 2006.
--infoseek's core research engineer is Robin Li, whose pioneering "hyper-chain analysis" technology is one of the basic inventions of modern search engines. This technology pioneered the problem of how to combine Web quality-based sorting with relevance-based sequencing, and obtained US patents.
In the early history of the search engine, Li + William Zhang, is undoubtedly the technology leader. At the Brisbane World Internet Conference in 1998, Robin Li was a technical preacher on the podium, while Google's two founders were still listening to students sitting under the table.
InfoSeek's decline, not because of technology, but because of the business model, he is merely a technology provider hidden behind the Netscape browser. With the failure of Netscape in the competition with Microsoft IE browser, InfoSeek inevitably appeared a big loss, after selling to Disney did not adapt to the traditional company's bureaucratic management style, more accelerated their doom.
The same year, Google launched its own search engine, with precision advertising for the business model, successfully solved the problem of continuous growth, in the search of the river in the contest to survive the last. Google is proud of today's web-rating mechanism, PageRank technology, until 2001 when it was awarded a US patent, 5 years later than Li's 1996-year patent for the ultra-chain analysis technology.
Note: What I said above is that Li's skill level is not lost on Google's two founders. Underestimate Baidu then you are ignorant.

2, Baidu from the beginning and Google embarked on a different direction of development.
Google in 2000 has established the dominance of the search industry, and in this year, Li Yanhongcai back home, both in the capital and talent are at an absolute disadvantage.
If Baidu wants to compete with Google to search precision, that is moths, Robin Li certainly does not commit this kind of folly. From the beginning, Baidu chose a rural siege of the city, flanking the route of attack.
In the early web search, Baidu's principle is that it can be used on the line. So, you search on Baidu, almost all of the site's homepage, and Google is the inside page.
Baidu's real focus is to provide services that Google cannot provide. 2002, Baidu pioneered the launch of MP3 music Search, 2003 launched pictures, bar, news, search the leaderboard service. It is these diversified vertical services, so that Baidu has achieved in the Chinese market later on the home. If you understand the history of Baidu, you know, in the early Baidu traffic, mp3 pirated music search and download, once contributed to the 40-50% user source. Post-paste has also been a great success, in the 2004 Super Girls craze, a large number of music fans poured into the bar for their idol refueling, this product has brought more than 20% of Baidu traffic.
MP3, pictures, post bar, Baidu is the early most important three services, their traffic contribution to add up even more than the web search.

3, Baidu really started in the search technology, is in 2009 years. This year, Baidu launched box calculation, in one-stop life search, began to surpass Google.
As an example:

Judging Baidu and Google in search technology is not a measure of the standard is what?

Technology is not a magnitude, not by a bunch of Xuan and Xuan technical terms decided, but by the industry in the process of development of the actual needs of the satisfaction of the change decision.
To give an example,
The division of the aircraft Age,
Generation machine: Jet engine
Second generation machine: High speed, twice times sonic
Three-generation machine: mid-low mobility agility
Four-generation machine: Stealth performance

Second generation machine performance better than a generation, fly higher, faster, but to the era of three generations of aircraft, even if your flight can go to 30,000 meters height? This era is more about low-altitude fighting performance.
To the four-generation machine era, stealth performance is decisive, you are no longer mobile good, you can not find the enemy.
Today, Intelligent UAV technology is a decisive technology, the other is not important.

The development of technology is not linear, but the change of concept, which brings more revolutionary progress.
A performance almost stealth machine, than a three-generation machine without stealth performance, who better combat effect? What, you think stealth technology is not technology, engine performance is technology? Don't have a brain residue, okay.

Back to the development of search engine technology, is nothing more than to allow people to better find their own needs of information, is the accuracy of search results, not the number of pages included, the number of patents have decided.
The first generation of search engines: Yahoo as the representative, catalog-style results presented.
Second-generation search engine: Google and Baidu as the representative, the Web page citation rating technology to show the results, as for the different weight settings can not reflect the difference.
Third generation search engine: Do not know what is the standard of the division, what is the GPS and atomic clock technology? Is there a revolutionary change in the actual use of the difference? I can't feel it anyway.

Google's technological innovation is very strong, in the Android system, driverless cars, Google glasses, these technologies have a lot of innovation, but in search engine technology, I feel that the last 5 years has not been obvious progress.
On the contrary, Baidu from 2008 to 2013 this 5 years, my experience feeling is the progress is very obvious.

In my opinion, the most important trend of search engine development in today's era is the intelligent semantic understanding, rather than the "GPs and atomic clock technology", which is the first floor of the answer, which breaks the geographical gap and achieves a consistent and real-time database of global scale. ”
Included more pages, search speed of 0.001 seconds, these insignificant differences, equivalent to the difference between IPHONE4 and iphone4s, may be challenging the new technology limits, but for ordinary users of the actual experience, the difference is not small.
Samsung's large-screen smartphones, many people do not see any advanced technology, but it is more satisfying user experience and demand, so Samsung's market share is growing.

Baidu developed the box computing technology, in my opinion, is more suitable for the search engine industry development trend.
Make your search more understandable to your questions, and give you more accurate answers,
--Without a jump, the answer is presented directly in front of you. What the? Do you think Baidu is so against the fair and impartial spirit of search? Don't brain, search is to serve the user, who can in the shortest possible time to let users get the most correct answer, for the user is the best search experience. Future speech search Technology mature, you say tell me Faye Wong sang what popular song? Baidu directly will be a list of songs, according to the popularity of the arrangement for you to choose to play, Google first asked you, I have **,**,** here, music companies to provide services, please choose, into the music company's link, music company also said, please login members first, login member after ... In this way, whose service is better, consider it yourself.
--by knowing, Bar, encyclopedia, third-party website access, integrated to meet your multi-faceted search needs, rather than only provide a single page. You want to go somewhere and tell you how cars, trains, and planes travel in different ways, telling you how much time you spend, the cost of your purchase, the entrance you buy directly, not just a cold third-party link. You enter the name of a famous person, presenting you with relevant photos, movies, latest news, fan-discussed communities, celebrities with social relationships, all of these services are very intuitive, and not just like Google to give you a Wikipedia problem.
The search problem is more than "equals", but "equivalent" to similar problems can be understood. The so-called intelligence, not even if I use the wrong grammar, can understand it?
At this point, perhaps Google also has some applications in English, but Baidu does more in-depth, more perfect, from this point of view, they are not only a measure, Baidu also has a leading Google place.

Many people think that Baidu's search answer is only the result of manual intervention, there is no technical content, I feel very sorry.
It may have been the case in the early years, but it should be a big improvement now.
Let me give you an example.
The question of the height of the previous Nicholas Tse, why Baidu can give me the answer, Google can not give it?
Baidu should not be bored to such details of the problem are specifically to optimize the answer, of course, are not, these answers are based on the user's own written answers, Baidu is just the organic integration of their knowledge, through the design of the program, cleverly presented to your front.
Yes, Google's search technology is very advanced, but all of his answers are based on external links, he does not have his own knowledge base. This is like a person, his logical thinking and knowledge is fragmented, can give you the answer, but the answer must be blunt, unable to achieve harmony.
Baidu is different, he has his own knowledge base, know, bar, encyclopedia, the knowledge of these three communities can be integrated with his thinking, will be the highest of the most popular answers to the most intuitive presentation to your front.
............
It is because these organic integrations are the basis of the previous
Baidu can think logically, a launch b,b launch c,c to launch D's conclusion, the answer D presented to your problem results.
But Google will not be able to do, he can only give you the answer b,c after the conclusion is only for you to find.
This is why I asked Cecilia Cheung's son's father's height, Baidu can tell me why.
The so-called deep learning, is not the organic integration of human knowledge Base, Google's thinking and knowledge base, resulting in this progress must be slower than Baidu, the principle is like a person's mind to command their hands and feet is bound to be more efficient than two people's cooperation.
So in the future of the search technology competition, I think Baidu's route is correct.

A friend asked, why do you ask Nicholas Tse height can search the answer, I asked Chen Guanxi or Mao Zedong can not search?
Keep in mind that the answers to these similar questions depend entirely on the user's writing in the community. In Cecilia Cheung's Encyclopedia, there are mentions, son is Xie Xuan, in the Xie Xuan Encyclopedia has mentioned, Lao Tzu is Nicholas Tse Front, in the Encyclopedia of Nicholas Tse has mentioned, height is 174, then you can search this answer.
If any of these links are missing, you won't get the answer.
Whether the answer is correct or not depends on the knowledge Base's self-improvement and error correction. Just like you asked Baidu is a big SB? Baidu knows the high-ticket answer tells you that Baidu is a big sb. Does this show that the answer is Baidu's manual intervention? Obviously not, it only means that SB users are too many.
Just like the answer to the question, when the problem is raised, the high-ticket answer is wrong, it doesn't matter, and slowly the new correct answer will be topped up, the wrong answer will be folded or error.
Human knowledge base is in constant self-improvement and rich, Baidu's algorithm is also in continuous improvement, at present these technologies are still in the initial stage, please look at him in the eyes of development.

PS: Those who say I favor Baidu's friends:
Under this question, for Google drums, loudly praise the answer is not enough? Do you not tolerate a different voice in your heart, I also come to shout, Google Generations, Unified Lake, you are satisfied?!

Say I do not understand Google keyword search skills of friends:
How could I not understand it? But must understand the keyword, the search engine can understand, change a word, change a grammar, he does not understand, this language understanding level is too rotten? Are you sure you're not in the black Google?!

Finally, as an investor who has been observing and studying the Internet industry for more than 15 years, I can tell you clearly:
The technical development direction of search engine industry, must be "natural language man-machine dialogue + intelligent Logic thinking". Don't tell me, Baidu is to give you the answer, Google is for you to think, people are always lazy animals, hope that the products adapt to the needs of people, rather than people to adapt to the requirements of the product.
In this respect, Baidu's technological development level and Google's gap is getting closer.
It is important to know that the research and development of technology comes from the investment of money and the aggregation of talents, and Baidu gradually has a certain family property after 2008 years, began to increase research and development investment in search technology, and Google, her research and development focus has shifted to other areas. This is the main reason that Baidu and Google are getting closer to the development level of search technology.

Is Baidu and Google's search technology an order of magnitude?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.