Use the latest solr4.10 to quickly develop a vertical search site such as group buying sites

Source: Internet
Author: User
Tags solr xpath

Description: SOLR is a very well-developed open source project, very good, he is not close to make an index so simple, but can be made into a vertical site, such as group purchase site, quickly build pages. And SOLR has a very good management background. Can view management, import data, rebuild indexes, synchronize master-slave data, very powerful. The main display here is SOLR can quickly set up a group buying site, and put the code into a demo on GitHub for everyone to learn the reference. The comparison of code modification is still being perfected.
1, environment construction, using solr4.10 version

Download the latest SOLR 4.10.0 http://lucene.apache.org/solr/from the official Apache website

Buy data from hao123 official website interface found, everyone can get, public data.

http://www.hao123.com/redian/api.htm data download: http://www.meituan.com/api/deals/hao123 Altogether 1.27G very large, because there is no compressed XML data, (using Htttp to download, the tool may be blocked.) )。


Modify the project to Maven, because instead of developing SOLR, you use SOLR, and there is a war packet under example that is decompressed to the MAVEN project.



2, configure the data that SOLR downloads to be in XML format:

<urlset> <url> <loc>http://bj.meituan.com/deal/6330826.html?source=hao123</loc> & Lt;data> <display> <website> American network </website> &LT;SITEURL&GT;HTT                 p://bj.meituan.com</siteurl> <city> Beijing </city> <sort> Entertainment </sort> <title> "2 shop General" CGV Star International Studios single movie ticket 1, 2d/3d <image>http://p1.meituan.ne                T/275.168/deal/201301/04/173144_2860489.jpg</image> <startTime>1373040000</startTime> <endTime>1377943200</endTime> <value>100</value> <pri                ce>30</price> <rebate>3 Folding </rebate> <bought>13593</bought> <spend_start_time>1373040000</spend_start_time> <spend_close_time>1377964799 </spend_close_time> <longitude>116.490591</longitude> <latitude>39.970472</latitude> &lt ;collections>0</collections> <type>2</type> <soldout>no</soldo ut> </display> </data> </url>....</urlset>

Project Catalog:

Under the project root directory there is a SOLR folder, the following is the SOLR configuration file, one of the subfolders, Tuan is the configuration folder for group purchase.
The project is built with Mave, and the expansion is stronger.
The data files for SOLR will be configured according to the American mission data.

Reference:

http://forchenyun.iteye.com/blog/650372


Configure Schema.xml

<?xml version= "1.0" encoding= "UTF-8"? ><schema name= "Example" version= "1.5" ><fields><field name = "id" type= "string" indexed= "true" stored= "true" required= "true" multivalued= "false"/><field name= "title" Type= "Text_general" indexed= "true" stored= "true"/><field name= "image" Type= "string" indexed= "false" stored= "true"/ ><field name= "value" type= "Double" indexed= "false" stored= "true"/><field name= "price" type= "Double" Indexed= "true" stored= "true"/><field name= "rebate" type= "Double" indexed= "true" stored= "true"/><field Name= "bought" type= "Long" indexed= "true" stored= "true"/><field name= "City" type= "string" indexed= "true" stored= "True"/><field name= "Sort" type= "string" indexed= "true" stored= "true"/><field name= "loc" type= "string" Indexed= "true" stored= "true"/><field name= "StartTime" type= "date" indexed= "true" stored= "true"/><field Name= "EndTime" type= "date" indexed= "true" stored= "true"/><!--Catchall field, containing all other searchable the text fields (implemented via Copyfield further the This schema-->&lt Field name= "text" type= "Text_general" indexed= "true" stored= "false" multivalued= "true"/><field name= "_version _ "type=" Long "indexed=" true "stored=" true "/></fields><uniquekey>id</uniquekey><copyfield Source= "title" dest= "text"/><types><fieldtype name= "string" class= "SOLR. Strfield "sortmissinglast=" true "/><!--Boolean type:" true "or" false "--><fieldtype name=" boolean "class=" Solr. Boolfield "sortmissinglast=" true "/><fieldtype name=" int "class=" SOLR. Trieintfield "precisionstep=" 0 "positionincrementgap=" 0 "/><fieldtype name=" float "class=" SOLR. Triefloatfield "precisionstep=" 0 "positionincrementgap=" 0 "/><fieldtype name=" Long "class=" SOLR. Trielongfield "precisionstep=" 0 "positionincrementgap=" 0 "/><fieldtype name=" Double "class=" SOLR. Triedoublefield "precisionstep=" 0 "positionincrementgap=" 0 "/><fieldtype name= "tint" class= "SOLR. Trieintfield "precisionstep=" 8 "positionincrementgap=" 0 "/><fieldtype name=" tfloat "class=" SOLR. Triefloatfield "precisionstep=" 8 "positionincrementgap=" 0 "/><fieldtype name=" Tlong "class=" SOLR. Trielongfield "precisionstep=" 8 "positionincrementgap=" 0 "/><fieldtype name=" tdouble "class=" SOLR. Triedoublefield "precisionstep=" 8 "positionincrementgap=" 0 "/><fieldtype name=" date "class=" SOLR. Triedatefield "precisionstep=" 0 "positionincrementgap=" 0 "/><fieldtype name=" tdate "class=" SOLR. Triedatefield "precisionstep=" 6 "positionincrementgap=" 0 "/><fieldtype name=" binary "class=" SOLR. Binaryfield "/><fieldtype name=" pint "class=" SOLR. Intfield "/><fieldtype name=" Plong "class=" SOLR. Longfield "/><fieldtype name=" pfloat "class=" SOLR. Floatfield "/><fieldtype name=" pdouble "class=" SOLR. Doublefield "/><fieldtype name=" pdate "class=" SOLR. Datefield "sortmissinglast=" true "/><fieldtype name=" random "Class= "SOLR. Randomsortfield "indexed=" true "/><fieldtype name=" Text_ws "class=" SOLR. TextField "positionincrementgap=" "><analyzer><tokenizer class=" SOLR. Whitespacetokenizerfactory "/></analyzer></fieldtype><fieldtype name=" text_general "class=" SOLR . TextField "positionincrementgap=" ><analyzer type= "index" ><tokenizer class= "SOLR. Standardtokenizerfactory "/><filter class=" SOLR. Lowercasefilterfactory "/></analyzer><analyzer type=" query "><tokenizer class=" SOLR. Standardtokenizerfactory "/><filter class=" SOLR. Lowercasefilterfactory "/></analyzer></fieldtype><!--CJK bigram (see Text_ja for a Japanese Configuration using morphological analysis)--><fieldtype name= "TEXT_CJK" class= "SOLR. TextField "positionincrementgap=" "><analyzer><tokenizer class=" SOLR.  Standardtokenizerfactory "/><!--normalize width before bigram, as e.g. Half-width Dakuten combine--><filter ClasS= "SOLR. Cjkwidthfilterfactory "/><!--for any non-cjk--><filter class=" SOLR. Lowercasefilterfactory "/><filter class=" SOLR. Cjkbigramfilterfactory "/></analyzer></fieldtype></types></schema>


Configuration: Solrconfig.xml
<!--add XML data Import--<requesthandler name= "/dataimport" class= "Org.apache.solr.handler.datai Mport. Dataimporthandler "> <lst name=" Defaults "> <str name=" config ">xml-data-config.xml</str&        Gt </lst> </requestHandler> <requesthandler name= "/browse" class= "SOLR.            Searchhandler "> <lst name=" Defaults "> <str name=" echoparams ">explicit</str>  <!--velocityresponsewriter Settings--<str name= "WT" >velocity</str> <str Name= "V.properties" >velocity.properties</str> <str name= "V.contenttype" >text/html;charset=utf-8 </str> <str name= "v.template" >browse</str> <str name= "V.layout" >layout</s  tr> <str name= "title" > Group buying website demo</str> <!--Query Settings-<str  Name= "Deftype" >edismax</str>          <str name= "DF" >text</str> <str name= "mm" >100%</str> <str name= " Q.alt ">*:* </str> <str name=" Rows ">32</str> <str name=" FL ">*,score</str > <!--faceting defaults--<str name= "facets" >on</str> <str name = "Facet.field" >city</str> <str name= "Facet.field" >sort</str> <str name= "face T.range ">price</str> <int name=" F.price.facet.range.start ">100</int> <int na            Me= "F.price.facet.range.end" >1500</int> <int name= "F.price.facet.range.gap" >200</int> <!--highlighting Defaults--<str name= "HL" >on</str> <str name= "Hl.fl" >title</str> <str name= "Hl.encoder" >html</str> <str name= "Hl.simple.pre" > <font colr= ' Red ' ></str> <str name= "Hl.simple.post" ></font></str> <str name= "F.title.hl.fra Gsize ">0</str> <str name=" F.title.hl.alternatefield ">title</str> <!--Spell Checking Defaults--<str name= "spellcheck" >on</str> <str name= "Spellcheck.extend Edresults ">false</str> <str name=" Spellcheck.count ">5</str> <str name=" spell            Check.alternativetermcount ">2</str> <str name=" Spellcheck.maxresultsforsuggest ">5</str> <str name= "Spellcheck.collate" >true</str> <str name= "Spellcheck.collateextendedresults" >true</str> <str name= "spellcheck.maxcollationtries" >5</str> <str name= "Spel Lcheck.maxcollations ">3</str> </lst> <!--append spellchecking to our list of components- <arr NamE= "Last-components" > <str>spellcheck</str> </arr> </requestHandler> 


Configuration: Xml-data-import.xml
<dataconfig><script><!                    [cdata[function Replacelocaddid (row) {var loc_1 = row.get (' loc '). Split ('/deal/');                    var loc_2 = Loc_1[1].split ('. html ');                    var id = loc_2[0];                    Row.put (' ID ', id);                    Format time.                    var sdf = new Java.text.SimpleDateFormat (' Yyyy-mm-dd HH:mm:ss ');                    Start time.                    Row.put (' StartTime ', Com.demo.tuan.DateUtils.format (Row.get (' startTime '));                    The end time.                    Row.put (' EndTime ', Com.demo.tuan.DateUtils.format (Row.get (' endTime '));                    Remove the discount kanji.                                        Row.put (' Rebate ', Row.get (' rebate '). Replace (' fold ', ' ");                return row; }]]></script><datasource type= "Filedatasource" encoding= "Utf-8"/><document><entity name= " Tuan "pk=" loc "Url="/data/workspace.freewebsys/solr4_demo/doc/meituan_hao123.xml "processor=" Xpathentityprocessor "foreach="/urlset/url "transformer=" Script:replacelocaddid, Dateformattransformer "><field column=" loc "xpath="/urlset/url/loc "commonfield=" true "/><field column=" City "xpath="/urlset/url/data/display/city "commonfield=" true "/><field column=" Sort "xpath="/urlset/url/data /display/sort "commonfield=" true "/><field column=" title "Xpath="/urlset/url/data/display/title "commonField= "True"/><field column= "image" Xpath= "/urlset/url/data/display/image" commonfield= "true"/><field column = "Value" xpath= "/urlset/url/data/display/value" commonfield= "true"/><field column= "Price" xpath= "/urlset/url /data/display/price "commonfield=" true "/><field column=" rebate "xpath="/urlset/url/data/display/rebate " Commonfield= "true"/><field column= "bought" xpath= "/urlset/url/data/display/bought" commonField= "true"/> <field column= "StartTime" xpath= "/urlset/url/data/display/starttime" datetimeformat= "Yyyy-mm-dd HH:mm:sS "commonfield=" true "/><field column=" EndTime "xpath="/urlset/url/data/display/endtime "datetimeformat=" Yyyy-mm-dd HH:mm:ss "commonfield=" true "/></entity></document></dataconfig>

configured to start importing data:Management background, use super convenient. The interface is also very beautiful.


Pour data successfully.
3, modify the page display modification style: The original page to modify, to apply other group purchase site style. Information such as images can be displayed directly in the data. You can use Nginx directly to the/tuan/browse proxy to the online deployment, the best front end plus a layer of varnish do page caching, better performance.


SOLR has integrated a lot of things, such as classification data, interval classification data, data recommendations, related recommendations, etc., very powerful, the more research more flavor Tao.

4, Code submission:
The code and configuration have been uploaded to GitHub:
Https://github.com/freewebsys/solr4_demo

Including all the code, configuration files, local can directly use jetty run.


Use the latest solr4.10 to quickly develop a vertical search site such as group buying sites

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.