Directory
- 1 What is the correlation degree
- 2 Relevance Ratings
- 3 Degree of correlation setting
- 3.1 Requirements
- 3.2 Ad Settings ...
1 What is the correlation degree
Concept: Correlation refers to the association relationship (correlation) between two things. Lucene refers to the correlation between search keywords and search results. For example, to search for books containing Java in the BookName domain, determine the relevance of the results based on the number and location of Java in BookName.
2 Relevance Ratings
Lucene scores the relevance of the Query keyword and index document, and the higher the score, the higher the ranking.
- Lucene Scoring Method: Lucene calculates the score according to the keywords in real-time search when the user searches, in two steps:
- Calculate the weight of a word (term)
- Calculates the document relevance score based on the weight value of the word.
What is the weight of a word?
The smallest unit of an easy-to-know index is the term (one word in the index dictionary) through the description of the index section. Search also queries the term from the indexed domain, and then finds the document according to the term. The importance of term to the document is called the weight of the term.
There are two factors that affect the term weight:
Term Frequency (TF):
How many times this term appears in this document. The larger the TF, the more important the explanation.
The more times a word (term) appears in a document, the more important the term is to the document, such as the word "Lucene", which appears in the document many times, indicating that the document may be about Lucene technology.
Document Frequency (DF):
Refers to how many documents contain this term. The larger the DF, the less important it is.
such as: In an English document, this appears a lot of times, can you explain this important? No, the more documents contain the term, the more common the term is, the less important it is to differentiate the documents.
3 Degree of correlation setting
Lucene affects the ranking of search results by setting a keyword term's weight (boost) value to influence relevance ratings.
3.1 Requirements
Publishers did the advertising: Lucene after receiving the money, the "Lucene Java Essence version" ranked first.
3.2 Ad Settings ...
/** * Correlation Sorting, modifying the index library by modifying the weight of the book that needs to be changed */@Testpublic void Updateindexboost () throws IOException {//1. Setting up the Parser Object (Analyzer) For the word Analyzer Analyzer = new Ikanalyzer (); 2. Build the Index Library configuration object (indexwriterconfig), configure the index library indexwriterconfig IWC = new Indexwriterconfig (Version.lucene_4_10_4, analyzer ); 3. Create an Index Library directory object (directory), specify the index library location Directory directory = Fsdirectory.open ("/users/healchow/documents/index") ); 4. Establish the Index library operation object (IndexWriter), manipulate the index library IndexWriter writer = new IndexWriter (DIRECTORY,IWC); 5. Create Document Object (documents) Doc = new file (); 5 Lucene Java Essentials 5.jpg doc.add (New Stringfield ("BookId", "5", Store.yes)); TextField NameField = new TextField ("BookName", "Lucene Java Essence Edition", Store.yes); Set the weight value to 100. The default is 1 namefield.setboost (100f); Doc.add (NameField); Doc.add (New Floatfield ("Bookprice", 80f, Store.yes)); Doc.add (New Storedfield ("Bookpic", "5.jpg")); 6. Set up the update Criteria object (term) terms = new ("BookId", "5"); 7. Use the IndexWriter object to perform the update writer.updatedocument (term, doc); 8. Release resources writer.close ();}
// 或在创建索引时即修改权重: // 打个广告: 收到钱之后, 将《Lucene Java精华版》排到第一 // 5 Lucene Java精华版 80 5.jpg TestField bookNameField = new TextField("bookName", book.getBookname(), Store.YES); if (book.getId() == 5) { // 设置权重值为100. 默认是1 bookNameField.setBoost(100f);}document.add(bookNameField);
Copyright Notice
Author: Ma_shoufeng (Ma Ching)
Source: Blog Park Ma Ching's Blog
Your support is a great encouragement to bloggers, thank you for your reading.
The copyright of this article is owned by bloggers, welcome reprint, but without the blogger agreed to retain this paragraph statement, and in the article page obvious location to the original link, otherwise Bo Master reserves the right to pursue legal responsibility.
Lucene08-lucene Ranking of correlations