You can use the searcher. Explain (query, int DOC) method to view the specific composition of a document's score.
In Lucene, the score is calculated by TF * IDF * boost * lengthnorm.
TF: the square root of the number of times the query word appears in the document
IDF: indicates the document frequency to be reversed. After observing that all documents are the same, it is useless and does not take any decision.
Boost: the incentive factor can be set through the setboost method. You can set it through field and Doc, And the set value will take effect at the same time.
Lengthnorm: it is determined by the length of the field to be searched. The longer the document, the lower the score.
So what we can program to control the score is to set the boost value.
Another question is, why is the maximum score always 1.0 after a query?
Because Lucene takes over 1.0 of the calculated maximum score as the denominator, and the scores of other documents are divided by the maximum value to calculate the final score.
The code and running result are described as follows:
Java code
- Public class ScoreSortTest {
- Public final static String INDEX_STORE_PATH = "index ";
- Public static void main (String [] args) throws Exception {
- IndexWriter writer = new IndexWriter (INDEX_STORE_PATH,
- New StandardAnalyzer (), true );
- Writer. setUseCompoundFile (false );
- Document doc1 = new document ();
- Document doc2 = new document ();
- Document doc3 = new document ();
- Field F1 = new field ("bookname", "BC", field. Store. Yes, field. Index. tokenized );
- Field F2 = new field ("bookname", "AB BC", field. Store. Yes, field. Index. tokenized );
- Field f3 = new Field ("bookname", "AB bc cd", Field. Store. YES,
- Field. Index. TOKENIZED );
- Doc1.add (f1 );
- Doc2.add (f2 );
- Doc3.add (f3 );
- Writer. addDocument (doc1 );
- Writer. addDocument (doc2 );
- Writer. addDocument (doc3 );
- Writer. close ();
- IndexSearcher searcher = new IndexSearcher (INDEX_STORE_PATH );
- TermQuery q = new TermQuery (new Term ("bookname", "bc "));
- Q. setBoost (2f );
- Hits hits = searcher. search (q );
- For (int I = 0; I
- Document doc = hits.doc (I );
- System. out. print (doc. get ("bookname") + "" t "t ");
- System. out. println (hits. score (I ));
- System. out. println (searcher. explain (q, hits. id (I )));//
- }
- }
- }
Running result:Reference BC 0.629606
0.629606 = (MATCH) fieldweight (bookname: BC in 0), product:
1.4142135 = TF (termfreq (bookname: BC) = 2)
0.71231794 = IDF (docfreq = 3, numdocs = 3)
0.625 = fieldnorm (field = bookname, Doc = 0)
AB BC 0.4451987
0.4451987 = (MATCH) fieldweight (bookname: BC in 1), product:
1.0 = TF (termfreq (bookname: BC) = 1)
0.71231794 = IDF (docfreq = 3, numdocs = 3)
0.625 = fieldnorm (field = bookname, Doc = 1)
AB BC CD 0.35615897
0.35615897 = (MATCH) fieldweight (bookname: BC in 2), product:
1.0 = TF (termfreq (bookname: BC) = 1)
0.71231794 = IDF (docfreq = 3, numdocs = 3)
0.5 = fieldnorm (field = bookname, Doc = 2)
We can see from the results:
BC appears twice in the BC document. TF is the square root of 2, so it is 1.4142135. The other two documents appear once, so they are 1.0
All three documents have the same IDF value, which is 0.71231794.
By default, the boost value is 1.0, so lengthNorm is the current fieldNorm value. The first two documents have the same length, which is 0.625, while the last one is 0.5 because the length is longer.
Now we have added an incentive factor f2.setBoost (2.0f) to the f2 field );
The running result is changed to: Reference AB bc 0.8903974
0.8903974 = (MATCH) fieldWeight (bookname: bc in 1), product:
1.0 = tf (termFreq (bookname: bc) = 1)
0.71231794 = idf (docFreq = 3, numDocs = 3)
1.25 = fieldNorm (field = bookname, doc = 1)
It is found that the fieldNorm value is 0.625 to 1.25, so it is multiplied by 2.0.
Next, add the incentive factor doc2.setBoost (2.0f) to the second document );
The running result is changed to: Reference AB bc 1.0
1.7807949 = (MATCH) fieldWeight (bookname: bc in 1), product:
1.0 = tf (termFreq (bookname: bc) = 1)
0.71231794 = idf (docFreq = 3, numDocs = 3)
2.5 = fieldNorm (field = bookname, doc = 1)
FieldNorm is multiplied by 2, so the Document and Field setBoost will be multiplied together.
Because the final score of this document exceeds 1.0 to 1.7807949, the final score of the other two documents must be divided by this value,
Change to: Reference bc 0.35355335
AB bc cd 0.19999999