Different Phases of search architects

Source: Internet
Author: User

From: http://blog.csdn.net/soso_blog/archive/2010/07/01/5706555.aspx

The Application of search technology is getting wider and wider, and more people are familiar with search technology or search technology. Recently, I have come into contact with many search talents. From the perspective of understanding the search, there are several different stages.

The first stage is to use an open-source standalone search system.(The most common is Lucene), adding broker and cache on top of it, and constructing an application on top of it. At this stage, we generally have a little understanding of Lucene's working principles and basic search principles, and even many have seen Lucene's source code. This is also the most convenient and common way for most people to access search engines. However, this system is generally only applicable to a small amount of data (tens of millions), and its concurrent performance can only reach millions. The advantage is that the development cycle is short. This type of talent is also easy to find in the market, coupled with some good idea, simple data mining methods (classification, clustering, collaborative filtering, user behavior analysis, etc) the prototype system can be quickly developed to meet the technical requirements of some small companies in the early start-up phase.

The second stage is self-developed vertical Domain SearchAt this stage, the data volume will usually reach hundreds of millions or hundreds of millions. If an open-source system is used, its cost effectiveness will no longer meet the requirements, and the required servers will exceed the capacity, therefore, you will develop your own search system, which is mostly for memory systems. At this stage, talents will have a deep understanding of the search engine principles and can develop simple search applications on their own. Many of these talents have evolved from the first stage, and they will understand all aspects of the search, this includes Word Segmentation, index creation, update, application construction, broker system, cache system, and simple sorting policy. For most search systems, this part of talents can complete system design and development.

In the first two stages, the data source is usually targeted capture. The template-based content analysis and extraction do not have such high requirements on service stability, and updates cannot be seamlessly updated.

The third stage is for Web search (General Search) TalentsAt this stage, there are fewer talents, and fewer people know about the entire web search. Mainly concentrated in Baidu, Google, sogou and other large search companies.

People who know about generic search mainly focus on large search companies. There are several reasons: first, it is difficult for other companies to have the strength to do web search, and it is difficult for them to have such work experience. If you have never experienced such a challenge, it is hard to imagine the difficulty. Second, even if you have done so, it is difficult to get feedback from users. Without user feedback data, the Web search engine is missing. Third, there are usually a large number of people doing Web search, including Baidu, Google, and other companies. Most people can only engage in a small part of Web search, there is little understanding and thinking about Web search.

At this stage, talents will be able to solve the challenges encountered by generic search. Including how to return user results (performance indicators) as soon as possible within limited resources, how to promptly update Internet hotspots and display them to users (current and new indicators ), how can we include as many useful pages (coverage indicators) on the Internet as possible, and how to put the most interesting results to the forefront (correlation indicators ), there are also many user availability and display-related indicators. These are the most important indicators for evaluating general search, and each indicator has a great challenge. At this stage, talents generally have their own solutions to some of the indicators.

The fourth stage is a talent with strong design and architecture capabilities for the Web search system.They will have a deep understanding of the performance of the search system and their own solutions, as well as cross-Data Center Solutions and Applications of basic storage operations in the search, highly available and flexible correlation experiment support, efficient and flexible data mining platforms, interfaces and solutions, high system scalability and flexible service capabilities. This kind of talent is rare in the market and everyone is fighting for it.

 

These are some of the experiences gained from searching for more than a decade. You are welcome to discuss them together.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.