Considering that the picture has a strong interpretation of events, the characteristics of communication convenience, research from a large number of data-intensive news Web pages automatically extracted data, and organize a complete set diagram structure to show the user. Dynamic page extraction and parsing are realized based on page template, and processing is converted to the corresponding data structure. Based on cosine correlation, the data from different websites are weighed and sorted according to the corresponding standards. Considering the huge news data and the number of users, the system is based on the Hadoop distributed platform to meet the high scalability of the system. This article will describe our system design and implementation in detail, and published in the Baidu Information picture column of the results of the operation.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.