Design and construction of distributed JS parsing system
Tong of Beijing Jiaotong University
This paper mainly includes two research directions: firstly, it is effective to extract and parse JavaScript in Web page, and then combine with Hadoop distributed computing technology to analyze the existing task scheduling algorithm and combine the actual situation of this system. The task scheduling algorithm of the system in distributed computing environment is designed, and the JavaScript parsing task scheduling is reasonable, which realizes the efficient parsing of the JavaScript fragment contained in the page. Through the study of JavaScript grammar rules and its existing forms in Web pages, the extraction process and algorithm of JavaScript are designed, and the JavaScript parsing engine is used for reference of browser. Constructs the JavaScript parsing environment to implement the first module. Through the research, analysis and comparison of the existing Map/reduce task scheduling algorithm, and combining the specific features of JavaScript parsing task and the environment of distributed cluster, this paper explores the Map/reduce task scheduling algorithm which is most suitable for this system. The JavaScript parsing task is reasonably scheduled, then the computer cluster is built, and the distributed JavaScript Parsing system is constructed. Finally, the distributed JS Parsing system is tested, its application performance and analytic accuracy are validated, and the shortcomings of the system are summarized. The distributed system implemented in this paper can efficiently and quickly parse the large number of JavaScript in the Web page. The experimental results show that the system can extract and parse the content of text and hyperlinks contained in JavaScript in Web pages efficiently and accurately. Thus, the research and implementation of this paper can provide more efficient and reliable technical support for search engine, public opinion analysis and data collection.
Design and construction of distributed JS parsing system
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.