Score (Q,D) = Coord (q,d) querynorm (q) ∑ (TF (T in D) IDF (t) ^2 t.getboost () norm (t,d)) (∑: T in Q)
D:document
T:term
Q:query
Coor (q, D):
public float coord (int overlap, int maxoverlap)
Implemented as Overlap/maxoverlap.
Overlap-the number of query terms matched in the document
Maxoverlap-the total number of terms in the query
Querynorm (q):
public float querynorm (float sumofsquaredweights)
Implemented as 1/SQRT (sumofsquaredweights).
Sumofsquaredweights-the sum of the squares of query term weights
Sumofsquaredweights = Q.getboost () ^2 ∑ (IDF (t) t.getboost ()) ^2 (∑:t in Q)
IDF (t):
Public float IDF (long docfreq, long Numdocs)
Implemented as log (numdocs/(docfreq+1)) + 1.
Docfreq-the number of documents which contain the term
Numdocs-the total number of documents in the collection (all the indices)
TF (t ind D):
public float TF (float freq)
Implemented as sqrt (freq).
Freq-the frequency of a term within a document
Boost (T.field in D):
public void Setboost (float b)
Sets the boost for this query clause to B.
public float getboost ()
The boost is 1.0 by default.
Norm (T, D):
Norm (t,d) = Lengthnorm ∏f.boost () (∏:field F in D named as T)
If the document has multiple fields with the same name, all their boosts is multiplied together.
Lengthnorm = 1.0/math.sqrt (numterms)
Lengthnorm-computed when the document was added to the index in accordance with the number of tokens of this field in the D Ocument, so-shorter fields contribute more to the score. Lengthnorm is computed by the similarity class in effect at indexing.
Shorter fields (fewer tokens) get a bigger boost from this factor.
Numterms is the number of terms within a field,
Numterms is Fieldinvertstate.getlength () if Setdiscountoverlaps (Boolean) is false, else it ' s fieldinvertstate.getlength ()-Fieldinvertstate.getnumoverlap ().
Fieldinvertstate.getlength (): Get total number of terms in this field.