Reading Notes for SOLR source code

Source: Internet
Author: User
Tags solr idf
SOLR source code
1 org. Apache. SOLR. Common Basic Class Object
2 org. Apache. SOLR. Common. Params,
(1) analysisparams contains map, modifiablesolrparams (linkedhashmap), requiredsolrparams, and solrquery.
(2) commonparams
3 org. Apache. SOLR. Analysis
(1) tokenizerfactory (basetokenizerfactory) can be based on the number of words (ngramtokenizerfactory), regular expressions, labels, keywords,
Characters, Russian, tree structure (trietokenizerfactory), Space
(2) basecharfilterfactory
(3) basetokenfilterfactory: based on the language type, stop word, metadata (payload), voice (? Doublemetaphonefilterfactory ),
Hyphenatedwordsfilterfactory, wildcard (wildcard), and synonym)
4 org. Apache. SOLR. Core
(1) abstractsolreventlistener contains querysenderlistener, and newsearcher warm is in it.
(2) Core initialization related corecontainer [n coredescriptor];
Solrcore includes search-related
Set/obtain: responseheader, plugins, booleanquerymaxclausecount, getsearcher
Initialization/registration: deletionpolicy, listeners, searcher, indexreader (witer), index, and highlighter
(3) directoryfactory, indexreaderfactory
(4) index commmit retention policy indexdeletionpolicywrapper
(5) jmxmonitoredmap extends concurrenthashmap (highly concurrent multi-lock hashmap)
(6) requesthandlers (can read solrconfig. xml and register the appropriate handlers ),
Including lazyrequesthandlerwrapper, so that the requesthandler of lazyload can be initialized only when this requesthandler is called for the first time.
(7) solrresourceloader includes classloader and getlines5 org. Apache. SOLR. handler and org. Apache. SOLR. handler. Admin
(1) snapshooter
(2) contentstreamloader contains XML and CSV update, delete, and read data, and also can update handler which uses the javabin format
(3) requesthandlerbase
Searchhandler (@ dismax, relevance sorting;
@ Add function parameters and process parameters as components, such as eg. Highlight, facet, MLT, query, stats, and debug;
@ Shard just the string like 'localhost: 8080/complex /';
@ Shardresponse)
Analysisrequesthandlerbase (process the XML of the request and return the namedlist
@ Analyzetokenstream analyzes the given tokenstream, collecting the tokens it produces;
@ Converttokenstonamedlists;
@ Analysiscontext class)
Analysisrequesthandler (@ processcontent for tokenizing Doc;
@ Readdoc)
(4) coreadminhandler handlerequestbody is the portal for loading core, renaming core, and deleting core for each core status.
(5) morelikethishandler (@ getmorelikethis MlT. like (Lucene function) adds a collection of documents with high similarity according to req to response, and passes in a doc. First, get the termfreqvector (TF) of each field ), then add it to termfrequencies;
Traverse all the terms in termfrequencies, retrieve the TF and the largest DF among all specified fields, and calculate IDF Based on DF and the number of current index documents, calculate the score = TF * IDF for each term, and press it into priorityqueue,
Obtain a certain number of maxqueryterms from the largest to the smallest according to the score to construct a booleanquery. Use the created query to retrieve the N documents with the highest score .)
(6) plugininfohandler retrieves and presents query handlers, update handlers, cache, and highlighting information loaded under each core.
(7) replicationhandler provides an API for Server Load balancer to copy data from the master node. It sets the Replication Validation, variable preparation before replication (eg. Whether to commit, optimize, etc.), and related actions after replication.
(@ Adler32 contains a copy of the validation
@ Getreplicationdetails showing statistics and progress information
@ Filestream class with checksum)
(8) read the conf file through showfilerequesthandler web (set the file hidden to control the file not to be accessed)
(9) spellcheckerrequesthandler splits the string by space and uses the extendedresults, CMD rebuild, accuracy, suggestioncount, restricttofield, and onlymorepopular parameters of solrqueryrequest.
The word information after segmentation is selectively added, such as Word Frequency and suggestion word.
(10) systeminfohandler includes systeminfo of core, JVM, and Lucene.
(11) threaddumphandler thread Information Statistics current, peak =, daemon
(12) adminhandlers registers all management handlers (lukerequesthandler, systeminfohandler, plugininfohandler, threaddumphandler, threaddumphandler, showfilerequesthandler) 6 org. Apache. SOLR. handler. Component
(1) searchcomponent base class. The subclass includes the integration of multi-shards document. That is, process and distributeprocess are used for standalone and distributed search respectively.
(2) debugcomponent (DDS debugging information to a request)
(3) facetcomponent (@ countfacets; Class distribfieldfacet used for through each facet. Field, adding results from this shard; contains the refine operation
@ Facet_fields or facet_queries)
(4) highlightcomponent (@ usephrasehighlighter fully matches to highlight only @ highlightmultiterm fuzzy match to highlight used with usephrasehightlighter = false)
(5) querycomponent Query Class, which involves processing URL parameters and obtaining query result sets.
(6) queryelevationcomponent priority submission class. In elevate. XML, set the pre-display and excluded ID Doc
(7) spellcheckcomponent spelling check, matching inform running at Tomcat startup, loading spellcheck DIC and convert all have default values (Purpose: Do you want to find the keyword)
(8) statscomponent obtains facet stats Based on the field type
(9) Implement the automatic prompt function of termscomponent. Return termenum information, RB. Req. getsearcher (). getreader (). Terms gets the corresponding term Enum, involving the term frequency, etc.
(10) termvectorcomponent return term vectors for the specified ENTs, including TV, TF, offsets, position, DF, TF-IDF, etc. tvmapper7 org. Apache. SOLR. Highlight
Defaultsolrhighlighter getphrasehighlighter getspanqueryscorer getformatter getfragmenter
Simple Order
Termquery query = new termquery (new term ("field", "textfragment "));
Scorer = new queryscorer (query); // queryscorer is a built-in splitter.
Highlighter = new highlighter (scorer );
Tokenstream = new simpleanalyzer (). tokenstream ("field", new stringreader (text); // It is generated by the analyzer and the start and end positions of the highlighted part in the text.
System. Out. println (highlighter. getbestfragment (tokenstream, text); // fragmenter is used to split the original text into multiple fragments.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.