In the web page analysis in the crawling field, the complexity of crawler programs is largely affected by the specialization of the target website.
The impact is mainly manifested in two aspects:
(1) Impact on the process determination process
The process is determined by analyzing the website map. For websites with a low degree of specialization, the website classification and list methods are basically linked, in this way, it is easy to obtain the desired link in the process implementation. For websites with a high degree of specialization, under the classification, it is often done by searching, JavaScript (including Ajax) similar records are displayed through pages. when the process is implemented, the programmer needs to analyze the operations performed by some JavaScript Functions, go to the URL of the List (for example, how to submit the search form and how to handle the page flip function ).
(2) impact on the process of obtaining detailed information
When getting detailed information, it is basically based on the structure of the distraction details page.
For highly specialized websites, the preparation of detailed information pages has also invested some effort. It is usually stipulated that specific information is displayed at specific locations on the pages, in this way, the specified data can be retrieved during the analysis process.
For websites with a lower degree of specialization, most of the detailed page content is directly edited using the HTML editor, the detailed information page of different categories of the same channel varies greatly, which increases the complexity of the analysis program for obtaining detailed information.
My personal wish: I would rather analyze a site with a higher degree of specialization, because the process can always be determined and implemented to solve the problem. If there are too many page structures in the detailed information analysis, it will be too costly to obtain detailed information!