Weighted technology for information retrieval and data mining using feature weight quantization TF-IDF

Source: Internet
Author: User
Tags idf

TFIDF is actually: TF * IDF,TF Word frequency (term Frequency), IDF reverse file frequencies (inverse document Frequency). TF represents the frequency at which the entry appears in document D. The main idea of IDF is that if the fewer documents that contain the entry T, that is, the smaller the n, the larger the IDF, the better the class-distinguishing ability of the term T.

The main idea of TFIDF is that if a word or phrase appears in an article with a high frequency of TF and is seldom seen in other articles, it is considered to be a good category-distinguishing ability and suitable for classification.

TF refers to the frequency of words appearing in a document, the number of words in a word.

IDF is the more often the word appears in all documents, the smaller the weight is. The Inverse file frequency (inverse document FREQUENCY,IDF) is a measure of the universal importance of a word. IDF of a particular term may be divided by the number of total documents by the number of documents containing the word, and the obtained quotient logarithm is obtained:

Calculation of the final correlation

, the formula of the correlation calculation becomes the weighted summation by the simple summation of the word frequency, namely TF1*IDF1 + TF2*IDF2 + ... + TFN*IDFN.

Weighted technology for information retrieval and data mining using feature weight quantization TF-IDF

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.