Lucene08-lucene Ranking of correlations

Source: Internet
Author: User

Directory

    • 1 What is the correlation degree
    • 2 Relevance Ratings
    • 3 Degree of correlation setting
      • 3.1 Requirements
      • 3.2 Ad Settings ...
1 What is the correlation degree

Concept: Correlation refers to the association relationship (correlation) between two things. Lucene refers to the correlation between search keywords and search results. For example, to search for books containing Java in the BookName domain, determine the relevance of the results based on the number and location of Java in BookName.

2 Relevance Ratings

Lucene scores the relevance of the Query keyword and index document, and the higher the score, the higher the ranking.

    • Lucene Scoring Method: Lucene calculates the score according to the keywords in real-time search when the user searches, in two steps:
      1. Calculate the weight of a word (term)
      2. Calculates the document relevance score based on the weight value of the word.
    • What is the weight of a word?

      The smallest unit of an easy-to-know index is the term (one word in the index dictionary) through the description of the index section. Search also queries the term from the indexed domain, and then finds the document according to the term. The importance of term to the document is called the weight of the term.

    • There are two factors that affect the term weight:

      1. Term Frequency (TF):

        How many times this term appears in this document. The larger the TF, the more important the explanation.

        The more times a word (term) appears in a document, the more important the term is to the document, such as the word "Lucene", which appears in the document many times, indicating that the document may be about Lucene technology.

      2. Document Frequency (DF):

        Refers to how many documents contain this term. The larger the DF, the less important it is.

        such as: In an English document, this appears a lot of times, can you explain this important? No, the more documents contain the term, the more common the term is, the less important it is to differentiate the documents.

3 Degree of correlation setting

Lucene affects the ranking of search results by setting a keyword term's weight (boost) value to influence relevance ratings.

3.1 Requirements

Publishers did the advertising: Lucene after receiving the money, the "Lucene Java Essence version" ranked first.

3.2 Ad Settings ...
/** * Correlation Sorting, modifying the index library by modifying the weight of the book that needs to be changed */@Testpublic void Updateindexboost () throws IOException {//1. Setting up the Parser Object (Analyzer)    For the word Analyzer Analyzer = new Ikanalyzer (); 2. Build the Index Library configuration object (indexwriterconfig), configure the index library indexwriterconfig IWC = new Indexwriterconfig (Version.lucene_4_10_4, analyzer       ); 3. Create an Index Library directory object (directory), specify the index library location Directory directory = Fsdirectory.open ("/users/healchow/documents/index")     );       4. Establish the Index library operation object (IndexWriter), manipulate the index library IndexWriter writer = new IndexWriter (DIRECTORY,IWC);     5. Create Document Object (documents) Doc = new file ();      5 Lucene Java Essentials 5.jpg doc.add (New Stringfield ("BookId", "5", Store.yes));    TextField NameField = new TextField ("BookName", "Lucene Java Essence Edition", Store.yes); Set the weight value to 100.    The default is 1 namefield.setboost (100f);    Doc.add (NameField);    Doc.add (New Floatfield ("Bookprice", 80f, Store.yes));    Doc.add (New Storedfield ("Bookpic", "5.jpg"));    6. Set up the update Criteria object (term) terms = new ("BookId", "5");    7. Use the IndexWriter object to perform the update writer.updatedocument (term, doc); 8. Release resources writer.close ();}
// 或在创建索引时即修改权重: // 打个广告: 收到钱之后, 将《Lucene Java精华版》排到第一 // 5 Lucene Java精华版 80 5.jpg TestField bookNameField = new TextField("bookName", book.getBookname(), Store.YES); if (book.getId() == 5) {    // 设置权重值为100. 默认是1    bookNameField.setBoost(100f);}document.add(bookNameField);

Copyright Notice

Author: Ma_shoufeng (Ma Ching)

Source: Blog Park Ma Ching's Blog

Your support is a great encouragement to bloggers, thank you for your reading.

The copyright of this article is owned by bloggers, welcome reprint, but without the blogger agreed to retain this paragraph statement, and in the article page obvious location to the original link, otherwise Bo Master reserves the right to pursue legal responsibility.

Lucene08-lucene Ranking of correlations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.