Information-Type News-Set graph system
Jiangholiang Zhouchun
Considering that the picture has a strong interpretation of events, the characteristics of communication convenience, research from a large number of data-intensive news Web pages automatically extracted data, and organize a complete set diagram structure to show the user. Dynamic page extraction and parsing are realized based on page template, and processing is converted to the corresponding data structure. Based on cosine correlation, the data from different websites are weighed and sorted according to the corresponding standards. Considering the huge news data and the number of users, the system is based on the Hadoop distributed platform to meet the high scalability of the system. This article will describe our system design and implementation in detail, and published in the Baidu Information picture column of the results of the operation.
Information-Type News-Set graph system