Topic Center

Contact Sales

Home > Others

Google News (article) classification algorithm

Last Update:2018-08-23 Source: Internet

Author: User

Tags idf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: http://www.google.com.hk/ggblog/googlechinablog/2006/07/12_4010.html

Google's news is automatically sorted and sorted. The so-called classification of news is to put similar news into a class. The computer actually can't read news, it can only be calculated quickly. This requires us to design an algorithm to calculate the similarity of any two news articles. To do this, we need to find a way to describe a piece of news with a set of numbers.

For all the notional words in a news article, we can calculate their single text lexical frequency/Inverse text frequency value (TF/IDF). It is not difficult to imagine, and the news topics related to those notional words high frequency, TF/IDF value is very large. We sort their tf/idf values according to the position of these notional words in the vocabulary table. For example, the glossary has 64,000 words, respectively

Word number Chinese words
------------------
1 o
2.
3 Fools
4 Aunt
...
789 Clothing
....
64000 affectation

In a news article, the TF/IDF values of these 64,000 words were

Word number TF/IDF value
==============
1 0
2 0.0034
3 0
4 0.00052
5 0
...
789 0.034
...
64000 0.075

If one of the words in the list does not appear in the news, and the corresponding value is zero, then these 64,000 numbers form a 64,000-D vector. We use this vector to represent this piece of news and become a feature vector of the news. If the two news feature vectors are similar, the corresponding news content is the same, they should be grouped into one class, and vice versa.

As anyone who has studied vector algebra knows, vectors are actually directional segments in multidimensional space. If the two vectors are in the same direction, that is, the angle is close to 0, then the two vectors are similar. And to determine whether the two vectors are consistent, this will use the cosine theorem to compute the angle of the vector.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

google local news api google news api python google yahoo news google play news google local news api google news api php google news feed api

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Google News (article) classification algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support