call them search engines, there are two main categories:⒈ meta search (meta search Engine). Such search engines generally do not have their own network robots and databases, their search results are by invoking, controlling and o
refer to them as search engines, mainly including:
Meta Search Engine ). Such search engines generally do not have their own network robots and databases, their search results are displayed on the same interface in a unified format by calling, controlling, and optimizing t
use the database LIKE, the resources will definitely consume a lot. Would you LIKE to ask how the mature practices are achieved? How do I publish different results to different subscribers?
Please give a detailed description of the problem.
What are the sources of these news? UGC? Capture?
How many of these news? Millions? Tens of millions?
Taking capturing information, news, and tens of millions of data records as an example, the data can be divided into the following parts:1. the captu
the news? Million? Million?
To crawl information, news, tens of thousands of levels for example, roughly divided into a few parts:1. The captured news will have a classification module for each item labeled after the storage2. In order to access the speed, maintain a label in memory-news unique ID array data structure, you can use Redis can also write a single service3. There will be a table to maintain user and subscription topics4. When the user sends a request to obtain a message, go to
by 1 time times.
If a Web page is updated 5 times in a row, the crawl time of the setting is shortened to the original 1/2.
Note that efficiency is one of the keys to winning.
4 "What is the depth of the climb?"
Look at the situation. If you compare cows, have tens of thousands of servers to do web crawler, I advise you to skip this.
If you're like me with only one server doing web crawler, then such a statistic you should know:
Page Depth: Number of pages: how important the page is
0:1:: 10
time to reach the deep HTML file.
4. Collect collection policiesSome web pages can be collected through user submission. For example, some commercial websites send an application to search engines for indexing, the collector can directly collect the webpage information of the submitted website and add it to the index database of the search
In the formal study of SEO, you still need to learn how the search engine works, after all, SEO is for the search engine operation, then understand the working principle of the search engine, then encounter some problems, you can
Directory Index is basically a "one-click sale". You do not need to submit your webpage repeatedly for the same category of directories, directory indexes are not allowed. But the search engine is different. Because there are a large number of new web pages joining the competition every day, your leading position is easily replaced by latecomers. In addition, the ranking rules of
Search engine Author: Sand Rain
Editor's note: This is a wonderful programming teaching article, not only detailed analysis of the search engine principles, but also provides the author of the use of PHP to compile a search engine
Here is a little I study and development of the search engine in the process of a little learning and experience summary, the article tells the spider, cut words, index, query and other names of the modules of the outline and details, hope to give search engine in the beginner point of a little help, for those who can
Editor's note: This is a wonderful programming teaching article, not only detailed analysis of the principles of the search engine, but also provides the author's own use of PHP to compile some of the ideas of the search engine. The whole article in layman's terms, I believe whether it is a master or rookie, can get a
engines and want to block most collectorsWhat the Collector will do: Create a module that proposes user login submission form behavior6, the use of scripting language to do pagination (hidden paging)Analysis: Or that sentence, search engine crawler will not target a variety of web sites to analyze the hidden pages, which affect the
K.K in the documentary "Google and the World Brain," he asked Larry Page in the early days of Google start-up, now has a good performance search engine, why do one? ' Instead of developing a new search engine, we're going to do artificial intelligence, ' Larry page explains.
Crystallization of technology and Humanities
-- Search Engine Technology
■Recreation
In the face of the vast ocean of information, people are often at a loss. The emergence of the Internet search engine seems like a boat, carrying us freely traveling in the ocean. Sear
Editor's note: This is a wonderful programming teaching article, not only detailed analysis of the search engine principles, but also provides the author of the use of PHP to compile a search engine some ideas. The whole article in a simple way, I believe that whether the master or rookie, can get a lot of inspiration.
Directory (Directory)
A directory is a manually edited search result. Most directories rely on manual submission instead of spider ). (See Seo and search engines .)
Keywords, keywords, and key phrases (Keyword,KeytermAndKeyphrase)
Keywords, keywords, and key phrases are the words that the Web site sorts on the search
Site after the webmaster encountered the first problem is to let the search engine to include their own site, in my impression, let the search engine included in the site's first choice is the site Login Search engine
business card or phone number, so that she has a crush on you, and have the opportunity to further understand you.2.b, the site content to do better, and then actively to the search engine to submit their website information, let search engines crawl your site information, and have the opportunity to include.Comment on a: compared to a friend introduced the meth
Vertical search engines are new search engine service models proposed by general search engines, such as large information, inaccurate query, and insufficient depth, provides valuable information and related services for a specific domain, a specific population, or a specific demand. Compared with general
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.